Skip to main content

performance tuning - ParallelTable 70 times slower on 16 cores than Table on single core


Consider these code(sorry it's messy), why the ParallelTable version is 70 times slower than the Table version?


Quit[]
Clear["`*"]
$Version
(*
==> "8.0 for Linux x86 (64-bit) (February 23, 2011)"
*)
Kernels[]

(*
==> {}
*)
a = RandomReal[{0., 1.}, {401, 300000}];
b = RandomReal[{0., 1.}, {401, 300000}];
Developer`PackedArrayQ /@ {a, b}
(*
==> {True, True}
*)


ht = 2*0.375*^-9;
DFT[A_, ht_] :=
RotateRight[
ht/Sqrt[2 \[Pi]]*
Fourier[RotateLeft[A, Length[A]/2 - 1],
FourierParameters -> {1, 1}], Length[A]/2 - 1];

SmoothDFT[A_, ht_, n_] :=
DFT[Table[0., {(n - 1)*Length[A]/2}]~Join~A~Join~
Table[0., {(n - 1)*Length[A]/2}], ht];


SelectbyWRange[A_, {WMin_, WMax_}, {TakeWMin_, TakeWMax_}] :=
Module[{lthA, nMax, nMin}, lthA = Length[A];
nMin = Round[-((-WMax + lthA WMin)/(
WMax - WMin)) - ((1 - lthA) TakeWMin)/(WMax - WMin)];
nMax = Round[-((-WMax + lthA WMin)/(
WMax - WMin)) - ((1 - lthA) TakeWMax)/(WMax - WMin)];
Transpose[{Table[
TakeWMin + n *(TakeWMax - TakeWMin)/(nMax - nMin), {n, 0,
nMax - nMin}], Take[A, {nMin, nMax}]}]

]

Smtx1 =
Table[SelectbyWRange[-Im[
SmoothDFT[a[[n]], ht, 2]*
Conjugate[SmoothDFT[b[[n]], ht, 2]]], {-834., 834.}, {19.5,
20.5}], {n, 1, 2}]; // AbsoluteTiming

(*
==> {0.404922, Null}

*)
Kernels[]
(*
==> {}
*)
LaunchKernels[]
(*
==> {KernelObject[1, "local"], KernelObject[2, "local"],
KernelObject[3, "local"], KernelObject[4, "local"],
KernelObject[5, "local"], KernelObject[6, "local"],

KernelObject[7, "local"], KernelObject[8, "local"],
KernelObject[9, "local"], KernelObject[10, "local"],
KernelObject[11, "local"], KernelObject[12, "local"],
KernelObject[13, "local"], KernelObject[14, "local"],
KernelObject[15, "local"], KernelObject[16, "local"]}
*)
Smtx2 =
ParallelTable[
SelectbyWRange[-Im[
SmoothDFT[a[[n]], ht, 2]*

Conjugate[SmoothDFT[b[[n]], ht, 2]]], {-834., 834.}, {19.5,
20.5}], {n, 1, 2}]; // AbsoluteTiming

(*
==> {25.459674, Null}
*)

Note that I have 16 cores on the node and the table have only 2 elements, i.e. {n,1,2}, even if I change to {n,1,16}, the ParallelTable version is still 10 times slower than the Table version. If it is the overhead, why it has a such huge overhead? Thanks.


These are the screenshots:


enter image description here enter image description here



Update


1.As OleksandrR point out that there is no DistributeDefinitions, but in the documentation of ParallelTable it promises to automatically distribute the calculations (version 9):



ParallelTable is a parallel version of Table which automatically distributes different evaluations of expr among different kernels and processors.


The default value is DistributedContexts:>$DistributedContexts with $DistributedContexts:=$Context, which distributes definitions of all symbols in the current context, but does not distribute definitions of symbols from packages.



2.OleksandrR also gave an excellent analysis in this post, pointing out that the performance problem is the MemberQ function.


Indeed if we turn on the unpack warning, we can see it unpacks the array in call to MemberQ


On["Packing"];
Smtx2 = AbsoluteTiming[ParallelTable[

SelectbyWRange[-Im[SmoothDFT[a[[n]], ht, 2]*
Conjugate[SmoothDFT[b[[n]], ht, 2]]], {-834., 834.}, {19.5, 20.5}], {n, 1, 2}];]


Developer`FromPackedArray::unpack: Unpacking array in call to MemberQ. >>



(*{25.557433, Null}*)

However a second evaluation is much faster even the array still unpacks


ClearSystemCache[]

Smtx2 = AbsoluteTiming[ParallelTable[
SelectbyWRange[-Im[SmoothDFT[a[[n]], ht, 2]*
Conjugate[SmoothDFT[b[[n]], ht, 2]]], {-834., 834.}, {19.5, 20.5}], {n, 1, 2}];]


Developer`FromPackedArray::unpack: Unpacking array in call to MemberQ. >>



(*{0.156995, Null}*)

If we use the temporary fix of MemberQ proposed by Szabolcs in the same post, then the unpack warning is gone, but the evaluation is still slow.



Quit[]
On["Packing"];

(*need to reevaluate all the definition code above*)

memberQ[list_, form_] := Or @@ (MatchQ[#, form] & /@ list)
ClearAll[fix]
SetAttributes[fix, HoldAll]
fix[expr_] := Block[{MemberQ = memberQ}, expr]


Smtx2 = fix@
AbsoluteTiming[ParallelTable[SelectbyWRange[-Im[
SmoothDFT[a[[n]], ht, 2]*
Conjugate[SmoothDFT[b[[n]], ht, 2]]], {-834., 834.}, {19.5,
20.5}], {n, 1, 2}];]
(*{25.192359,Null}*)

So how to fix this?




Comments

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...