Skip to main content

performance tuning - Why is CompilationTarget -> C slower than directly writing with C?



Probably a hard question, but I think it's better to cry out loud.


I've hesitated for a while about whether I should post this in StackOverflow with a c tag or not, but finally decide to keep it here.


This question can be viewed as a follow up of Has this implementation of FDM touched the speed limit of Mathematica?. In the answer under that post, Daniel managed to implement a compiled Mathematica function that's almost as fast (to be more precise, 3/4 as fast) as the one directly implementing with C++, with the help of devectorization,CompilationTarget -> "C", RuntimeOptions -> "Speed" and Compile`GetElement. Since then, this combination has been tested in various samples, and turns out to be quite effective in speeding up CompiledFunction that involves a lot of array element accessing. I do benefit a lot from this technique, nevertheless in the mean time, another question never disappear in my mind, that is:


Why is the CompiledFunction created with the combination above still slower than the one directly writing with C++?


To make the question more clear and answerable, let's use a simpler example. In the answers under this post about Laplacian of a matrix, I create the following function with the technique above:


cLa = Hold@Compile[{{z, _Real, 2}}, 
Module[{d1, d2}, {d1, d2} = Dimensions@z;
Table[z[[i + 1, j]] + z[[i, j + 1]] + z[[i - 1, j]] + z[[i, j - 1]] -
4 z[[i, j]], {i, 2, d1 - 1}, {j, 2, d2 - 1}]], CompilationTarget -> C,
RuntimeOptions -> "Speed"] /. Part -> Compile`GetElement // ReleaseHold;


and Shutao create one with LibraryLink (which is almost equivalent to writing code directly with C):


src = "
#include \"WolframLibrary.h\"

DLLEXPORT int laplacian(WolframLibraryData libData, mint Argc, MArgument *Args, \
MArgument Res) {
MTensor tensor_A, tensor_B;
mreal *a, *b;
mint const *A_dims;

mint n;
int err;
mint dims[2];
mint i, j, idx;
tensor_A = MArgument_getMTensor(Args[0]);
a = libData->MTensor_getRealData(tensor_A);
A_dims = libData->MTensor_getDimensions(tensor_A);
n = A_dims[0];
dims[0] = dims[1] = n - 2;
err = libData->MTensor_new(MType_Real, 2, dims, &tensor_B);

b = libData->MTensor_getRealData(tensor_B);
for (i = 1; i <= n - 2; i++) {
for (j = 1; j <= n - 2; j++) {
idx = n*i + j;
b[idx+1-2*i-n] = a[idx-n] + a[idx-1] + a[idx+n] + a[idx+1] - 4*a[idx];
}
}
MArgument_setMTensor(Res, tensor_B);
return LIBRARY_NO_ERROR;
}

";
Needs["CCompilerDriver`"]
lib = CreateLibrary[src, "laplacian"];

lapShutao = LibraryFunctionLoad[lib, "laplacian", {{Real, 2}}, {Real, 2}];

and the following is the benchmark by anderstood:


enter image description here


Why cLa is slower than lapShutao?


Do we really touch the speed limit of Mathematica this time?



Answer(s) addressing the reason for the inferiority of cLa or improving the speed of cLa are both welcomed.





…OK, the example above turns out to be special, as mentioned in the comment below, cLa will be as fast as lapShutao if we extract the LibraryFunction inside it:


cLaCore = cLa[[-1]];

mat = With[{n = 5000}, RandomReal[1, {n, n}]];

cLaCore@mat; // AbsoluteTiming
(* {0.269556, Null} *)


lapShutao@mat; // AbsoluteTiming
(* {0.269062, Null} *)

However, the effect of this trick is remarkable only if the output is memory consuming.


Since I've chosen such a big title for my question, I somewhat feel responsible to add a more general example. The following is the fastest 1D FDTD implementation in Mathematica so far:


fdtd1d = ReleaseHold@
With[{ie = 200, cg = Compile`GetElement},
Hold@Compile[{{steps, _Integer}},
Module[{ez = Table[0., {ie + 1}], hy = Table[0., {ie}]},

Do[
Do[ez[[j]] += hy[[j]] - hy[[j - 1]], {j, 2, ie}];
ez[[1]] = Sin[n/10.];
Do[hy[[j]] += ez[[j + 1]] - ez[[j]], {j, 1, ie}], {n, steps}]; ez],
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"] /. Part -> cg /.
HoldPattern@(h : Set | AddTo)[cg@a__, b_] :> h[Part@a, b]];

fdtdcore = fdtd1d[[-1]];

and the following is an implemenation via LibraryLink (which is almost equivalent to writing code directly with C):



str = "#include \"WolframLibrary.h\"
#include

DLLEXPORT int fdtd1d(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument \
Res){
MTensor tensor_ez;
double *ez;
int i,t;
const int ie=200,steps=MArgument_getInteger(Args[0]);
const mint dimez=ie+1;

double hy[ie];

libData->MTensor_new(MType_Real, 1, &dimez, &tensor_ez);
ez = libData->MTensor_getRealData(tensor_ez);

for(i=0;i for(i=0;i
for(t=1;t<=steps;t++){
for(i=1;i
ez[0]=sin(t/10.);
for(i=0;i }

MArgument_setMTensor(Res, tensor_ez);
return 0;}
";

fdtdlib = CreateLibrary[str, "fdtd"];
fdtdc = LibraryFunctionLoad[fdtdlib, "fdtd1d", {Integer}, {Real, 1}];


test = fdtdcore[10^6]; // AbsoluteTiming
(* {0.551254, Null} *)
testc = fdtdc[10^6]; // AbsoluteTiming
(* {0.261192, Null} *)

As one can see, the algorithms in both pieces of code are the same, but fdtdc is twice as fast as fdtdcore. (Well, the speed difference is larger than two years ago, the reason might be I'm no longer on a 32 bit machine. )


My C compiler is TDM-GCC 4.9.2, with "SystemCompileOptions"->"-Ofast" set in Mathematica.




Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...