Skip to main content

performance tuning - Why is CompilationTarget -> C slower than directly writing with C?



Probably a hard question, but I think it's better to cry out loud.


I've hesitated for a while about whether I should post this in StackOverflow with a c tag or not, but finally decide to keep it here.


This question can be viewed as a follow up of Has this implementation of FDM touched the speed limit of Mathematica?. In the answer under that post, Daniel managed to implement a compiled Mathematica function that's almost as fast (to be more precise, 3/4 as fast) as the one directly implementing with C++, with the help of devectorization,CompilationTarget -> "C", RuntimeOptions -> "Speed" and Compile`GetElement. Since then, this combination has been tested in various samples, and turns out to be quite effective in speeding up CompiledFunction that involves a lot of array element accessing. I do benefit a lot from this technique, nevertheless in the mean time, another question never disappear in my mind, that is:


Why is the CompiledFunction created with the combination above still slower than the one directly writing with C++?


To make the question more clear and answerable, let's use a simpler example. In the answers under this post about Laplacian of a matrix, I create the following function with the technique above:


cLa = Hold@Compile[{{z, _Real, 2}}, 
Module[{d1, d2}, {d1, d2} = Dimensions@z;
Table[z[[i + 1, j]] + z[[i, j + 1]] + z[[i - 1, j]] + z[[i, j - 1]] -
4 z[[i, j]], {i, 2, d1 - 1}, {j, 2, d2 - 1}]], CompilationTarget -> C,
RuntimeOptions -> "Speed"] /. Part -> Compile`GetElement // ReleaseHold;


and Shutao create one with LibraryLink (which is almost equivalent to writing code directly with C):


src = "
#include \"WolframLibrary.h\"

DLLEXPORT int laplacian(WolframLibraryData libData, mint Argc, MArgument *Args, \
MArgument Res) {
MTensor tensor_A, tensor_B;
mreal *a, *b;
mint const *A_dims;

mint n;
int err;
mint dims[2];
mint i, j, idx;
tensor_A = MArgument_getMTensor(Args[0]);
a = libData->MTensor_getRealData(tensor_A);
A_dims = libData->MTensor_getDimensions(tensor_A);
n = A_dims[0];
dims[0] = dims[1] = n - 2;
err = libData->MTensor_new(MType_Real, 2, dims, &tensor_B);

b = libData->MTensor_getRealData(tensor_B);
for (i = 1; i <= n - 2; i++) {
for (j = 1; j <= n - 2; j++) {
idx = n*i + j;
b[idx+1-2*i-n] = a[idx-n] + a[idx-1] + a[idx+n] + a[idx+1] - 4*a[idx];
}
}
MArgument_setMTensor(Res, tensor_B);
return LIBRARY_NO_ERROR;
}

";
Needs["CCompilerDriver`"]
lib = CreateLibrary[src, "laplacian"];

lapShutao = LibraryFunctionLoad[lib, "laplacian", {{Real, 2}}, {Real, 2}];

and the following is the benchmark by anderstood:


enter image description here


Why cLa is slower than lapShutao?


Do we really touch the speed limit of Mathematica this time?



Answer(s) addressing the reason for the inferiority of cLa or improving the speed of cLa are both welcomed.





…OK, the example above turns out to be special, as mentioned in the comment below, cLa will be as fast as lapShutao if we extract the LibraryFunction inside it:


cLaCore = cLa[[-1]];

mat = With[{n = 5000}, RandomReal[1, {n, n}]];

cLaCore@mat; // AbsoluteTiming
(* {0.269556, Null} *)


lapShutao@mat; // AbsoluteTiming
(* {0.269062, Null} *)

However, the effect of this trick is remarkable only if the output is memory consuming.


Since I've chosen such a big title for my question, I somewhat feel responsible to add a more general example. The following is the fastest 1D FDTD implementation in Mathematica so far:


fdtd1d = ReleaseHold@
With[{ie = 200, cg = Compile`GetElement},
Hold@Compile[{{steps, _Integer}},
Module[{ez = Table[0., {ie + 1}], hy = Table[0., {ie}]},

Do[
Do[ez[[j]] += hy[[j]] - hy[[j - 1]], {j, 2, ie}];
ez[[1]] = Sin[n/10.];
Do[hy[[j]] += ez[[j + 1]] - ez[[j]], {j, 1, ie}], {n, steps}]; ez],
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"] /. Part -> cg /.
HoldPattern@(h : Set | AddTo)[cg@a__, b_] :> h[Part@a, b]];

fdtdcore = fdtd1d[[-1]];

and the following is an implemenation via LibraryLink (which is almost equivalent to writing code directly with C):



str = "#include \"WolframLibrary.h\"
#include

DLLEXPORT int fdtd1d(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument \
Res){
MTensor tensor_ez;
double *ez;
int i,t;
const int ie=200,steps=MArgument_getInteger(Args[0]);
const mint dimez=ie+1;

double hy[ie];

libData->MTensor_new(MType_Real, 1, &dimez, &tensor_ez);
ez = libData->MTensor_getRealData(tensor_ez);

for(i=0;i for(i=0;i
for(t=1;t<=steps;t++){
for(i=1;i
ez[0]=sin(t/10.);
for(i=0;i }

MArgument_setMTensor(Res, tensor_ez);
return 0;}
";

fdtdlib = CreateLibrary[str, "fdtd"];
fdtdc = LibraryFunctionLoad[fdtdlib, "fdtd1d", {Integer}, {Real, 1}];


test = fdtdcore[10^6]; // AbsoluteTiming
(* {0.551254, Null} *)
testc = fdtdc[10^6]; // AbsoluteTiming
(* {0.261192, Null} *)

As one can see, the algorithms in both pieces of code are the same, but fdtdc is twice as fast as fdtdcore. (Well, the speed difference is larger than two years ago, the reason might be I'm no longer on a 32 bit machine. )


My C compiler is TDM-GCC 4.9.2, with "SystemCompileOptions"->"-Ofast" set in Mathematica.




Comments

Popular posts from this blog

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

What is and isn't a valid variable specification for Manipulate?

I have an expression whose terms have arguments (representing subscripts), like this: myExpr = A[0] + V[1,T] I would like to put it inside a Manipulate to see its value as I move around the parameters. (The goal is eventually to plot it wrt one of the variables inside.) However, Mathematica complains when I set V[1,T] as a manipulated variable: Manipulate[Evaluate[myExpr], {A[0], 0, 1}, {V[1, T], 0, 1}] (*Manipulate::vsform: Manipulate argument {V[1,T],0,1} does not have the correct form for a variable specification. >> *) As a workaround, if I get rid of the symbol T inside the argument, it works fine: Manipulate[ Evaluate[myExpr /. T -> 15], {A[0], 0, 1}, {V[1, 15], 0, 1}] Why this behavior? Can anyone point me to the documentation that says what counts as a valid variable? And is there a way to get Manpiulate to accept an expression with a symbolic argument as a variable? Investigations I've done so far: I tried using variableQ from this answer , but it says V[1...