Still, I'll use the implementation of the 1D FDTD method (you can simply understand it as a kind of explicit finite difference scheme for the Maxwell's equation) as the example. Just for completeness, here is the 1D Maxwell's equation:
$$\mu \frac{\partial H_y}{\partial t}=\frac{\partial E_z}{\partial x}$$ $$\epsilon \frac{\partial E_z}{\partial t}=\frac{\partial H_y}{\partial x}$$
and the corresponding finite difference equation:
$$H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]\text{==}H_y^{q-\frac{1}{2}}\left[m+\frac{1}{2}\right]+\frac{\Delta _t}{\mu \Delta _x}\left(E_z^q[m+1]-E_z^q[m]\right)$$ $$E_z^{q+1}[m]==E_z^q[m]+\frac{\Delta _t}{\epsilon \Delta _x}\left(H_y^{q+\frac{1}{2}}\left[m+\frac{1}{2}\right]-H_y^{q+\frac{1}{2}}\left[m-\frac{1}{2}\right]\right)$$
The toy code I've repeatedly used in several posts implementing the difference scheme is:
ie = 200;
ez = ConstantArray[0., {ie + 1}];
hy = ConstantArray[0., {ie}];
fdtd1d = Compile[{{steps}},
Module[{ie = ie, ez = ez, hy = hy},
Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]);
ez[[1]] = Sin[n/10];
hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n,
steps}]; ez]];
result = fdtd1d[10000]; // AbsoluteTiming
Notice that constants like $\mu$, $\Delta _t$ are omitted for simplicity.
Personally I think it's a typical example for the implementation of finite difference method (FDM), so here's the question: has this piece of code (at least almost) touched the speed limit of Mathematica? In fact several months ago, I found that if the code is rewrited with Julia, it'll be 4 times faster.
Indeed, I know this might be the case that one should use the best-suited tool for a specific job, but since I've already gained some stupid pride on using Mathematica and am unwilling to spend time to learn a new programming language (Wolfram is almost my first programming language, I used to learn some VB, but already gave it back to my teacher when started to use Mathematica), I still want to make sure if the Mathematica version of the code can be faster.
If it's the limitation, I'd like to know why there's such a big difference.
Any help would be appreciated.
Answer
Okay, this is a bit of an embarassment.
Here is a very small modification of the original code. I simply made explicit option settings, made a denominator to Sin explicitly real, that kind of thing. My tests show the same timing as the original, give or take an iota.
ie = 200;
ez = ConstantArray[0., {ie + 1}];
hy = ConstantArray[0., {ie}];
fdtd1d = Compile[{{steps}},
Module[{ie = ie, ez = ez, hy = hy},
Do[ez[[2 ;; ie]] += (hy[[2 ;; ie]] - hy[[1 ;; ie - 1]]);
ez[[1]] = Sin[n/10.];
hy[[1 ;; ie]] += (ez[[2 ;; ie + 1]] - ez[[1 ;; ie]]), {n,
steps}]; ez],
CompilationOptions -> {"InlineExternalDefinitions" -> True},
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"];
I'll beef up the example a factor of 10.
result = fdtd1d[100000]; // AbsoluteTiming
(* Out[172]= {0.555320, Null} *)
Now we remove the spans and replace with inner loops. No vectorization, that is.
fdtd1d2 = Compile[{{steps}}, Module[{ie = ie, ez = ez, hy = hy},
Do[
Do[ez[[j]] += (hy[[j]] - hy[[j - 1]]), {j, 2, ie}];
Do[ez[[1]] = Sin[n/10.];
hy[[j - 1]] += (ez[[j]] - ez[[j - 1]]), {j, 2, ie + 1}], {n,
steps}]; ez],
CompilationOptions -> {"InlineExternalDefinitions" -> True},
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"];
result2 = fdtd1d2[100000]; // AbsoluteTiming
result2 == result
(* Out[174]= {0.179435, Null}
Out[175]= True *)
So that's a factor of 3. I guess I need to report this as a suggestion to revisit vector operations in Compile.
--- edit ---
It is of course slightly faster to take the assignment of ez[[1]] =... out of the second inner loop (sound of hand smacking forehead). Also it turns out to be slightly faster to reduce the index arithmetic in the second loop. I also took the assignment of the constant, ie, out of the Module variables and made it really constant using With; this does not seem to affect timing one way or the other.
fdtd1d3 = With[{ie = ie}, Compile[{{steps, _Integer}},
Module[
{ez = ez, hy = hy},
Do[
ez[[1]] = Sin[n/10.];
Do[ez[[j]] += (hy[[j]] - hy[[j - 1]]), {j, 2, ie}];
Do[
hy[[j]] += (ez[[j + 1]] - ez[[j]]), {j, 1, ie}],
{n, steps}]; ez],
CompilationOptions -> {"InlineExternalDefinitions" -> True},
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"]];
result3 = fdtd1d3[100000]; // AbsoluteTiming
result3 === result2
(* Out[107]= {0.135636, Null}
Out[108]= True *)
So that's a modest improvement.
--- end edit ---
--- edit #2 ---
Per suggestion by @s0rce, we'll use Compile`GetElement instead of Part.
fdtd1d4 = Compile[{{steps, _Integer}},
Module[
{ie = ie, ez = ez, hy = hy},
Do[
ez[[1]] = Sin[n/10.];
Do[ez[[j]] += (Compile`GetElement[hy, j] -
Compile`GetElement[hy, j - 1]), {j, 2, ie}];
Do[
hy[[j]] += (Compile`GetElement[ez, j + 1] -
Compile`GetElement[ez, j]), {j, 1, ie}],
{n, steps}]; ez],
CompilationOptions -> {"InlineExternalDefinitions" -> True},
"CompilationTarget" -> "C", "RuntimeOptions" -> "Speed"];
result4 = fdtd1d4[100000]; // AbsoluteTiming
result4 === result2
(* Out[122]= {0.076532, Null}
Out[123]= True *)
Now that's progress.
--- end edit #2 ---
Comments
Post a Comment