I always assumed that distributing a computation is faster, but it isn't necessarily true. When I do Sum[i,{i,10^4}]
I retrieve an answer much faster then if I do ParallelSum[i,{i,10^4}]
. Why is this the case? Is there a certain rule on when I should compute in parallel and when I should stick to a single core?
Answer
Sum
, like Integrate
, does some symbolic processing. For instance, your sum with an indefinite end point n
returns a closed-form formula:
Sum[i, {i, n}]
(* 1/2 n (1 + n) *)
ParallelSum
will do the actual summation, one term at a time.
There is overhead in parallelization. Often a significant bottleneck is the amount of data that has to be transferred between the master and slave kernels. Another slow-down is the time to set up the parallel computation. Neither of these is the real issue here. No matter how big I make n
, I can't seem to make Sum
take more than 0.1 seconds on a fresh kernel. After the first run, the symbolic result is cached and the result (for any n
) is returned about 400 times faster.
To answer the second question, I do not know of either a precise or rough calculation that will tell you when parallelization results in a speed up. Consider a generic sum, where n0
below is an actual integer, such as 10000
and not a Symbol
:
Sum[f[i], {i, n0}]
One consideration is how long it takes to calculate f[i]
. The longer it takes, the more likely ParallelSum
will be faster. Another is how big n0
is. The bigger n0
is, the more likely it is worth the time it takes to set up the parallel computation.
Examples
One way to prevent symbolic processing is to define one's own function using ?NumericQ
.
Slow function
Here the computation time is simulated with Pause[0.001]
. Even on a small number of terms ParallelSum
is faster. (4-core/8-virtual-core 2.7GHz i7 MacBook Pro.) It's important to start with fresh kernels, since some results and parallelization set-up are cached.
Quit[]
f[i_?NumericQ] := (Pause[0.001]; i);
Table[{Sum[1. f[i], {i, 2^n}] // AbsoluteTiming // First,
CloseKernels[]; LaunchKernels[];
ParallelSum[1. f[i], {i, 2^n}] // AbsoluteTiming // First},
{n, 6, 15}] // Grid
(* 0.075313 0.037028
0.151674 0.049317
0.299712 0.049672
0.589223 0.111681
1.179922 0.179192
2.336402 0.500043
4.795604 0.833306
9.600580 1.740492
19.218265 2.986417
38.453306 5.214645 *)
Number of terms
Here it takes a fairly large number of terms before ParallelSum
begins to run faster.
Quit[]
g[i_?NumericQ] := i;
Table[{Sum[1. g[i], {i, 2^n}] // AbsoluteTiming // First,
CloseKernels[]; LaunchKernels[];
ParallelSum[1. g[i], {i, 2^n}] // AbsoluteTiming // First},
{n, 11, 20}] // Grid
(* 0.002350 0.032552
0.004389 0.114484
0.008307 0.044456
0.016554 0.049290
0.033395 0.064034
0.067941 0.089265
0.133811 0.112625
0.275909 0.158116
0.554793 0.407610
1.123326 0.504677 *)
In short, I think a certain amount of testing is necessary to figure out each case precisely. For a one-time computation, it may or may not be worth the personal time it takes; instead an educated guess might be sufficient. For a program in which the computation will be done repeatedly, then it might be worth working it out.
Comments
Post a Comment