# Does Lightroom speed scale with number of cores?



## dhazeghi (Mar 18, 2014)

Good morning,

As part of my research before I upgrade my computer, I'm trying to figure out what CPUs would make sense in terms of maximizing performance for rendering-related tasks.  What I've noticed from various benchmark results (e.g. from Lloyd Chambers) is Lightroom doesn't seem to scale very well with the number of CPU scores.

Specifically, the gain going from 2 to 4 cores seems fairly high - ~1.5x or so, but going from 4 to 6 cores shows only around a 1.1x speedup and another 1.1x going from 6 to 8 cores.  I'm wondering if these results are representative?  In particular, I'd have expected at least a 1.3x speedup going from 4 to 6 cores.

The reason I'm interested is that my current machine takes 6-7 seconds on average (and occasionally much longer) to render 1:1 previews for a 24MP RAW file, and this sort of performance spills over to doing edits in the Development module, as well as exporting.  My computer is far from new (Core i7 860 - 4x2.93GHZ cores, 12GB RAM, 240GB SSD), but I was hoping to find something that would be at least 2.5x faster.

Thanks for any thoughts or comments.


----------



## DaveS (Mar 18, 2014)

Hello,

    My understanding is that 4 cores is probably the sweet spot for lightroom.   Cpu clock frequency is probably more important (given a similar technology over a generation or two of processors, 4th GEN Core i processors do more work per clock tick than Core 2Duo's did, and in turn, they did more work with a slower clock speed than the Pentium 4 could).

For comparison, I took a folder of a dozen 16 MPixel images (yah, I know, they are smaller), deleted the 1-1 previews.   Then asked Lightroom to build 1-1 previews for the folder, it took 20 seconds or so.   So a little under 2 seconds per photo.   I'm on a core i7-4770 (4x3.4Ghz cores (with hyperthreading enabled)), with 24GB of Ram, and a 240GB SSD (although the catalog and images live on spinning mirrored disk).   CPU load on the box averaged between 35 and 40%.

   The last time I tested when my catalog (and previews) were on the SSD, it was only a second or so difference in total time.


----------



## dhazeghi (Mar 18, 2014)

DaveS said:


> For comparison, I took a folder of a dozen 16 MPixel images (yah, I know, they are smaller), deleted the 1-1 previews.   Then asked Lightroom to build 1-1 previews for the folder, it took 20 seconds or so.   So a little under 2 seconds per photo.   I'm on a core i7-4770 (4x3.4Ghz cores (with hyperthreading enabled)), with 24GB of Ram, and a 240GB SSD (although the catalog and images live on spinning mirrored disk).   CPU load on the box averaged between 35 and 40%.
> 
> The last time I tested when my catalog (and previews) were on the SSD, it was only a second or so difference in total time.



Thanks for the comments and the results!  That sounds about like what I would have expected.  I would be pretty happy with under 2 second/image, though of course faster is always better!


----------



## Jknights (Mar 19, 2014)

My 8core MacPro does not outperform my 4 cores Macbook Pro until you get to doing complex editing in Photoshop.  In LR they both seem to perform well enough and the MacPro has no advantage except on disk I/O.


----------



## dhazeghi (Mar 19, 2014)

Jknights said:


> My 8core MacPro does not outperform my 4 cores Macbook Pro until you get to doing complex editing in Photoshop.  In LR they both seem to perform well enough and the MacPro has no advantage except on disk I/O.



Thanks for the feedback.

I don't suppose there's any chance this might change in the next major Lightroom release?  It seems a shame to leave all those extra cores essentially idle.


----------



## dhazeghi (Mar 20, 2014)

Well, I've just done my own test, with 1-4 cores enabled on my system (no TurboBoost, so CPU frequencies are constant, no HyperThreading either).  Also tested on 4 cores with HT and TB enabled/disabled.  Results of importing 25 RAW images and rendering 1:1 previews, in seconds:

1 core: 260
2 core: 144
3 core: 116
4 core: 104
4 + HT: 111
4 + HT + TB: 99
4 + TB: 94

In theory, you can get up to a 2x gain from doubling the number of cores.  From 1 to 2 cores, there's a 1.81x speedup, which is fairly close.  From 2 to 4 cores though, the speedup is only 1.38x.  So it seems LR is not doing nearly as effective a job of using the additional cores above 2 cores.  Judging from what other folks have said, the improvements above 4 cores are even smaller.


----------



## Cowboy (Mar 20, 2014)

One thing we all have to keep in mind is that depending on the application system design there is a logical limit on the number of functions that can be performed concurrently. To clarity, some functions cannot be broken down into multiple component parts and performed concurrently. Others can be divided into 2 or more running tasks, but if they share the same files there may be increased overhead and other things such as disk reading and writing that will become a bottleneck. Increasing the performance of any computer application is an art in itself, and requires that you investigate all aspect of the system and weed out the bottlenecks. That said, a fast multicore processor and a solid state disk drive are good place to start, but make sure that disk drive is on a controller that can take advantage of the increased disk speed.


----------



## dhazeghi (Mar 21, 2014)

Cowboy said:


> One thing we all have to keep in mind is that depending on the application system design there is a logical limit on the number of functions that can be performed concurrently. To clarity, some functions cannot be broken down into multiple component parts and performed concurrently. Others can be divided into 2 or more running tasks, but if they share the same files there may be increased overhead and other things such as disk reading and writing that will become a bottleneck. Increasing the performance of any computer application is an art in itself, and requires that you investigate all aspect of the system and weed out the bottlenecks. That said, a fast multicore processor and a solid state disk drive are good place to start, but make sure that disk drive is on a controller that can take advantage of the increased disk speed.



Agreed - it certainly possible to have bottle-necks other than the CPU.  But in the case of Lightroom at least, the difference between importing using a decent HDD (100MB/s) and a RAM disk (6400MB/s) is negligible in my testing.  Same story for exporting.  In terms of parallelizable tasks, I'd think importing would be about as good as it gets - if you have n images and m cores, create m threads each responsible for rendering n / m images.  The only serial part of the process is updating the LR database, and in practice that's really a tiny fraction of the runtime compared to rendering the previews.


----------



## Jknights (Mar 21, 2014)

dhazeghi said:


> Well, I've just done my own test, with 1-4 cores enabled on my system (no TurboBoost, so CPU frequencies are constant, no HyperThreading either).  Also tested on 4 cores with HT and TB enabled/disabled.  Results of importing 25 RAW images and rendering 1:1 previews, in seconds:
> 
> 1 core: 260
> 2 core: 144
> ...



I'd agree as in my tests (untimed) with my Macbook Pro (4 cores) v MacPro (8 cores) there is a performance difference but it is small and I'd attribute this to better I/O bus rather than cores but I may be horribly wrong.   The 2008 Macbook Air (4 cores) is slightly slower than the Macbook Pro but that is bus speed I guess.


----------



## dhazeghi (Mar 23, 2014)

Jknights said:


> I'd agree as in my tests (untimed) with my Macbook Pro (4 cores) v MacPro (8 cores) there is a performance difference but it is small and I'd attribute this to better I/O bus rather than cores but I may be horribly wrong.   The 2008 Macbook Air (4 cores) is slightly slower than the Macbook Pro but that is bus speed I guess.



Thing I noticed when doing my tests from 1-4 cores is that the utilization of each core drops as more are enabled.  So even though the extra cores are doing some work, on average each one is doing less.  It may be the case that the I/O bus is the limiting factor, but it may also be the case that each thread is just bottlenecked by something else.  If you pull up the CPU History window on Activity Monitory on your machines, you can compare the utilizations on both.


----------

