Skip to main content

performance tuning - Why won't Parallelize speed up my code?


What reasons are there that can cause parallelized Mathematica code not to run with full performance?



Answer




This is a general guide on debugging issues with parallelization performance.


1. Measuring performance


The proper way to measure the timing of parallelized calculations is AbsoluteTiming, which measures wall time. Timing measures CPU time on the main kernel only and won't give a correct result when used with parallel calculations.


2. How to parallelize effectively?


Simply using Parallelize will not work magically on most code snippets. It won't work at all on most built-in functions such as NIntegrate. Here's some info on what is auto-Parallelizeable.


It is better to formulate the problem in terms of more specific constructs such as ParallelTable, ParallelMap, ParallelCombine, ParallelDo, etc. and take full control.


Try to use functional code with no side effects for easy and effective parallelization. Read more about this here and here.


Using procedural code might require synchronization using SetSharedFunction and SetSharedVariable. Both of these force code to be evaluated on the main kernel only and can cause a significant performance hit. Every time a variable or function marked with SetSharedVariable or SetSharedFunction is evaluated on a parallel kernel, it triggers a costly callback to the main kernel.


Do not parallelize code that is already parallelized internally, as this is likely to reduce performance. Certain functions such as LinearSolve use multi-threaded code internally. Some others, such as NIntegrate, will make use of multiple cores in certain cases only. Use a process monitor to check whether your code already makes use of multiple cores without explicit parallelization.


3. Common issues causing slowdowns



3.1 Communication overhead


There is an overhead to parallelization. The data being operated on needs to be broken into pieces, sent to subkernels, processed there, then the result needs to be sent back. The subkernels are separate processes. The interprocess communication involved can take a considerable amount of time, especially when the subkernels are running on a remote machine. So it is important to minimize the communication with subkernels:




  • Try to communicate less often. This is controlled by the Method option of Parallelize and related functions.


    Method -> "CoarsestGrained" minimizes communication.


    Method -> "FinestGrained" breaks the data into as many pieces as possible. This is useful when the pieces can take vastly different times to process.




  • Try to send as little data back and forth as possible. Sending large expressions takes a longer time. If the subkernels generate large data, see if you can reduce it before returning it to the main kernel.





  • Different types of data can take a hugely different amount of time to transfer. Prefer packed arrays whenever you can.




  • A common mistake (example) is to send a huge array along with each parallel evaluation. Try to send the array once, then index into it as necessary in the evaluations. I show an example at the end of this post.




Up til version 9, Mathematica launched twice as many subkernels as there were available cores when the CPU had HyperThreading ($ProcessorCount). This typically increases communication overhead, but does not always improve computation performance. Sometimes it's better to use only as many subkernels as the number of physical cores. (The optimal number differs from case to case.)


3.2 Improperly distributed definitions



Since subkernels are completely separate Mathematica processes, all the definitions that are used in the parallel calculation need to be distributed (sent) to subkernels. This must be done manually in version 7, while it's mostly automatic in later version.


In some cases it does not happen automatically, causing a situation when incorrectly parallelized code will return the correct result, but will run slowly.


Example: ParallelEvaluate does not automatically distribute definitions. The following code returns the expected result:


f[] := RandomReal[]
ParallelEvaluate[f[]]

What happens is that f[] is evaluated on each subkernel, and the results are returned as a list. Since f has no associated definition on subkernels, f[] is returned unevaluated by each subkernel, and the main kernel receives the list {f[], f[], ..., f[]}. This list is then further evaluated on the main kernel to a list of random numbers. Notice that all the calculation will happen on the main kernel. This computation doesn't really run in parallel. The solution is to use DistributeDefinitions[f].


3.3 Make sure packages are loaded in subkernels


This is closely related to the previous point. Functions from packages loaded into the main kernel are not automatically distributed to subkernels. If you use any packages in the parallel code, make sure they are loaded into the subkernels using ParallelNeeds.


Warning: In certain cases the parallelized code appears to work even without loading the packages in the subkernels, but will be much slower. What actually happens is completely analogous to the example from the previous point: functions are returned unevaluated from the subkernels, and will subsequently get evaluated on the main kernel.



Loading custom packages: To load a custom package from the current directory of the main kernel, make sure that the current directory of the subkernels is that same as the current directory of the main kernel:


With[{d = Directory[]}, ParallelEvaluate[SetDirectory[d]]]

If you set a custom $Path in init.m, it won't take effect in subkernels. To make subkernels use the same $Path as the main kernel, use


With[{p = $Path}, ParallelEvaluate[$Path = p]];

3.4 There are a few bugs known to affect parallel performance




  • Packed arrays get temporarily unpacked when sent back to the main kernel (reference). Affects performance when large packed arrays are sent back. See link for workaround.





  • There are certain functions which lose performance when evaluated on subkernels (ref1, ref2).


    Some functions known to be affected: Rule, InterpolatingFunction.


    Workaround: re-evaluate the affected expression as expression = expression on the subkernels. This is described in the last entry under the Possible Issues for DistributeDefinitions.




Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...