Skip to main content

performance tuning - Mma 10: Half the parallel power (Macs)?


Here is a comparison of the parallel kernels launched under Mathematica under v9 and v10, on the same identical current 2014 R2-D2 Mac Pro ...



[ Update: Valerio has commented that the same issue arises on the Macbook Air.]



$ProcessorCount 


12



Issuing:


 LaunchKernels[]


... launches 12 kernels, and actually uses them ... notice that the ParallelTable is 12 times the speed of Table[] for this construct:


In[5]:= Table[Pause[1]; f[i], {i, 12}] // AbsoluteTiming


Out[5]= {12.003106, {f1, f2, f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11], f[12]}}



In[6]:= ParallelTable[Pause[1]; f[i], {i, 12}] // AbsoluteTiming


Out[6]= {1.010648, {f1, f2, f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11], f[12]}}




So, to perform the same operation, the parallel result under v9 is 12 times the speed of the single kernel result.



$ProcessorCount


6



... down from 12 - even though I am running on the identical machine. Now, I know that my Mac Pro actually has 6 processors, and each runs 2 threads ... and under v9, that yielded 12 processor kernels for Mma 9 ... but under v10, it is only yielding 6 kernels ... ON THE SAME MACHINE. And this has real effects ... it effectively reduces by 50% the maximum potential power of my Mac:


 LaunchKernels[]


... launches 6 kernels (not 12 kernels as under v9).


Compare the performance:


 In[3]:= ParallelTable[Pause[1]; f[i], {i, 12}] // AbsoluteTiming


Out[3]= {2.009933, {f1, f2, f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11], f[12]}}



So, under the new v10, I am getting half the parallel performance here and half the kernels that I got under v9. Even more perplexing is that this worked fine in an earlier pre-release version of v10.


I am very confused. Anyone have any ideas how I can get my missing kernels back? Or why a decision may have been made to hobble the performance of the Mac Pro under v10?




Just noticed that if I go to:



  • Evaluation Menu -> Parallel Kernel Configuration


... the automatic setting for:



  • Number of kernels to use: is set to: Automatic (which Mma sets to 6)


If I change this to:




  • Manual setting


and set it to 12 ... then it seems to use 12.


But I am still confused as to why, if Mathematica 10 can actually support 12 kernels on the machine, ... why would Wolfram set it to use only half of them by default, when v9 supported all of them by default?



Szabolcs suggests below that Mathematica may not practically use more kernels than physical cores, even if your processor supports virtual cores ... so there is no real difference. In reply, here is a quick timing test of a real-world application (kernel density estimation) from the mathStatica benchmarking test suite. The task is to plot 12 kernel density estimates, corresponding to 12 different bandwidths.


bandwidths = {.2, .35, .45, .55, .65, 1, 1.5, 2, 2.2, 2.5, 3, 3.2};

enter image description here



Here are the results running under:



  • v9 (default: 12 kernels): 3.38 seconds

  • v10 (default: 6 kernels): 9.53 seconds

  • v10 (manual overide to 12 kernels): 7.46 seconds


I don't know what has changed to cause such a performance hit under v10 ... but even so, that is not the point. The point is that the v10 default kernel setting fails to take advantage of the power of the Mac Pro ... and results in worse performance in a typical parallel-processing application.



Update: 1 August 2014


I have now had the opportunity to run the full mathStatica (primarily symbolic) benchmark suite under both:




  • the default v10 parallel setting (6 kernels)

  • the manual override v10 setting (12 kernels)


Here are the results:


enter image description here


The results fall into 2 categories:




  • For problems that have more than 6 separate components to them: ... For such problems, using 12 kernels is ALWAYS unambiguously faster, and significantly so.





  • For problems that have 6 or less separate components: ...For instance, Examples 7 and 9 can only be broken down into 2 symbolic components, so the benefits of parallelism max out with 2 kernels. In these cases, the 6 automatic kernels case is sometimes marginally faster than the 12 kernel case (presumably due to running overheads etc) ... but the difference is tiny, and essentially unnoticeable.




In summary: for problems that CAN benefit from more than 6 kernels, the default Mma 10 (automatic) setting of 6 kernels on a Mac Pro appears to be sub-optimal, and fails to take advantage of the full capability of the machine. This problem is new to v10, and does not occur under v9.




Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...