parallelization - Why is parallel slower?

I always assumed that distributing a computation is faster, but it isn't necessarily true. When I do Sum[i,{i,10^4}] I retrieve an answer much faster then if I do ParallelSum[i,{i,10^4}]. Why is this the case? Is there a certain rule on when I should compute in parallel and when I should stick to a single core?

Answer

Sum, like Integrate, does some symbolic processing. For instance, your sum with an indefinite end point n returns a closed-form formula:

Sum[i, {i, n}]
(* 1/2 n (1 + n) *)

ParallelSum will do the actual summation, one term at a time.

There is overhead in parallelization. Often a significant bottleneck is the amount of data that has to be transferred between the master and slave kernels. Another slow-down is the time to set up the parallel computation. Neither of these is the real issue here. No matter how big I make n, I can't seem to make Sum take more than 0.1 seconds on a fresh kernel. After the first run, the symbolic result is cached and the result (for any n) is returned about 400 times faster.

To answer the second question, I do not know of either a precise or rough calculation that will tell you when parallelization results in a speed up. Consider a generic sum, where n0 below is an actual integer, such as 10000 and not a Symbol:

Sum[f[i], {i, n0}]

One consideration is how long it takes to calculate f[i]. The longer it takes, the more likely ParallelSum will be faster. Another is how big n0 is. The bigger n0 is, the more likely it is worth the time it takes to set up the parallel computation.

Examples

One way to prevent symbolic processing is to define one's own function using ?NumericQ.

Slow function

Here the computation time is simulated with Pause[0.001]. Even on a small number of terms ParallelSum is faster. (4-core/8-virtual-core 2.7GHz i7 MacBook Pro.) It's important to start with fresh kernels, since some results and parallelization set-up are cached.

Quit[]

f[i_?NumericQ] := (Pause[0.001]; i);
Table[{Sum[1. f[i], {i, 2^n}] // AbsoluteTiming // First,

   CloseKernels[]; LaunchKernels[];
   ParallelSum[1. f[i], {i, 2^n}] // AbsoluteTiming // First},
  {n, 6, 15}] // Grid

(* 0.075313  0.037028
   0.151674  0.049317
   0.299712  0.049672
   0.589223  0.111681
   1.179922  0.179192
   2.336402  0.500043

   4.795604  0.833306
   9.600580  1.740492
  19.218265  2.986417
  38.453306  5.214645 *)

Number of terms

Here it takes a fairly large number of terms before ParallelSum begins to run faster.

Quit[]

g[i_?NumericQ] := i;

Table[{Sum[1. g[i], {i, 2^n}] // AbsoluteTiming // First,
   CloseKernels[]; LaunchKernels[];
   ParallelSum[1. g[i], {i, 2^n}] // AbsoluteTiming // First},
  {n, 11, 20}] // Grid

(* 0.002350  0.032552
   0.004389  0.114484
   0.008307  0.044456
   0.016554  0.049290
   0.033395  0.064034

   0.067941  0.089265
   0.133811  0.112625
   0.275909  0.158116
   0.554793  0.407610
   1.123326  0.504677 *)

In short, I think a certain amount of testing is necessary to figure out each case precisely. For a one-time computation, it may or may not be worth the personal time it takes; instead an educated guess might be sufficient. For a program in which the computation will be done repeatedly, then it might be worth working it out.

Blog

Search This Blog

parallelization - Why is parallel slower?

Comments

Post a Comment

Popular posts from this blog

How to thread a list

front end - keyboard shortcut to invoke Insert new matrix

functions - Get leading series expansion term?