Background: I was trying to convert a Matlab code (fluid simulation, SPH method) into a Mathematica one, but the speed difference is huge.
Matlab code:
function s = initializeDensity2(s)
nTotal = s.params.nTotal; %# particles
h = s.params.h;
h2Sq = (2*h)^2;
for ind1 = 1:nTotal %loop over all receiving particles; one at a time
%particle i is the receiving particle; the host particle
%particle j is the sending particle
xi = s.particles.pos(ind1,1);
yi = s.particles.pos(ind1,2);
xj = s.particles.pos(:,1); %all others
yj = s.particles.pos(:,2); %all others
mj = s.particles.mass; %all others
rSq = (xi-xj).^2+(yi-yj).^2;
%Boolean mask returns values where r^2 < (2h)^2
mask1 = rSq rSq = rSq(mask1);
mTemp = mj(mask1);
densityTemp = mTemp.*liuQuartic(sqrt(rSq),h);
s.particles.density(ind1) = sum(densityTemp);
end
And the corresponding Mathematica code:
Needs["HierarchicalClustering`"]
computeDistance[pos_] :=
DistanceMatrix[pos, DistanceFunction -> EuclideanDistance];
initializeDensity[distance_] :=
uniMass*Total/@(liuQuartic[#,h]&/@Pick[distance,Boole[Map[#<2h&,distance,{2}]],1])
initializeDensity[computeDistance[totalPos]]
The data are coordinates of 1119 points, in the form of {{x1,y1},{x2,y2}...}
, stored in s.particles.pos
and totalPos
respectively. And liuQuartic
is just a polynomial function. The complete Matlab code is way more than this, but it can run about 160 complete time steps in 60 seconds, whereas the Mathematica code listed above alone takes about 3 seconds to run. I don't know why there is such huge speed difference. Any thoughts is appreciated. Thanks.
Edit:
The liuQuartic
is defined as
liuQuartic[r_,h_]:=15/(7Pi*h^2) (2/3-(9r^2)/(8h^2)+(19r^3)/(24h^3)-(5r^4)/(32h^4))
and example data can be obtained by
h=2*10^-3;conWidth=0.4;conHeight=0.16;totalStep=6000;uniDensity=1000;uniMass=1000*Pi*h^2;refDensity=1400;gamma=7;vf=0.07;eta=0.01;cs=vf/eta;B=refDensity*cs^2/gamma;gravity=-9.8;mu=0.02;beta=0.15;dt=0.00005;epsilon=0.5;
iniFreePts=Block[{},Table[{-conWidth/3+i,1.95h+j},{i,10h,conWidth/3-2h,1.5h},{j,0,0.05,1.5h}]//Flatten[#,1]&];
leftWallIniPts=Block[{x,y},y=Table[i,{i,conHeight/2-0.5h,0.2h,-0.5h}];x=ConstantArray[-conWidth/3,Length[y]];Thread[List[x,y]]];
botWallIniPts=Block[{x,y},x=Table[i,{i,-conWidth/3,-0.4h,h}];y=ConstantArray[0,Length[x]];Thread[List[x,y]]];
incWallIniPts=Block[{x,y},Table[{i,0.2125i},{i,0,(2conWidth)/3,h}]];
rightWallIniPts=Block[{x,y},y=Table[i,{i,Last[incWallIniPts][[2]]+h,conHeight/2,h}];x=ConstantArray[Last[incWallIniPts][[1]],Length[y]];Thread[List[x,y]]];
topWallIniPts=Block[{x,y},x=Table[i,{i,-conWidth/3+0.7h,(2conWidth)/3-0.7h,h}];y=ConstantArray[conHeight/2,Length[x]];Thread[List[x,y]]];
freePos = iniFreePts;
wallPos = leftWallIniPts~Join~botWallIniPts~Join~incWallIniPts~Join~rightWallIniPts~Join~topWallIniPts;
totalPos = freePos~Join~wallPos;
where conWidth=0.4
, conHeight=0.16
and h=0.002
Answer
Modify the calculation order a little to avoid ragged array and then make use of Listable
and Compile
:
computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function]
15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity =
With[{l = liuQuartic, m = uniMass},
Compile[{{d, _Real, 2}, {h, _Real}}, m Total@Transpose[l[d, h] UnitStep[2 h - d]]]];
new = initializeDensity[computeDistance[N@totalPos], h]; // AbsoluteTiming
Tested with your new added sample data, my code ran for 0.390000 s while the original code ran for 4.851600 s and ybeltukov's code ran for 0.813200 s on my machine.
If you have a C compiler installed, the following code
computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function]
15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity =
With[{l = liuQuartic, m = uniMass, g = Compile`GetElement},
Compile[{{d, _Real, 2}, {h, _Real}},
Module[{b1, b2}, {b1, b2} = Dimensions@d;
m Table[Sum[If[2 h > g[d, i, j], l[g[d, i, j], h], 0.], {j, b2}], {i, b1}]],
CompilationTarget -> "C", RuntimeOptions -> "Speed"]];
will give you a 2X speedup once again. Notice the C compiler is necessary, see this post for some more details.
Comments
Post a Comment