There is an edited section at the bottom to clarify some questions which have come up in the comments.
I have some code which generates (according to two coordinates) two $m\times n$ arrays and then performs element wise multiplication of the two arrays to then get the average.
part[da_, db_, array_] := Mean@Flatten@(array[[span[db], span[da]]]* array[[span[-db], span[-da]]]);
span[x_] := Switch[Sign[x], -1, 1 ;; x - 1, 0, 1 ;; -1, 1, x + 1 ;; -1];
SeedRandom[1]
array = Developer`ToPackedArray[RandomChoice[{-1, 1}, {1000, 1000}]];
p = {2,3};
q = {-6,8};
diff=p-q;
d1=diff[[1]];
d2=diff[[2]];
test = part[d1,d2,array]// AbsoluteTiming
This gives me
{0.0647411, -(919/493520)}
I need to run this code for a lot of different points - in the order of 500 million. Since all I am doing in the function "part" is generating
a1=array[span[db], span[da]] ]
and
a2=array[span[-db], span[-da]] ]
and element wise multiplying them
a1*a2
I really feel that I could use the power of my graphics card (GTX 1070) and 4 CPU cores (i7 7700k) to really get a massive speed boost.
As a current bench mark (on my slower 2 core laptop), I get the following for 1000 different points
pointsList = Partition[RandomInteger[1000, {2000, 2}], 2];
compute = ParallelTable[
diff = pointsList [[i, 2]] - pointsList [[i, 1]];
N@part[diff[[1]], diff[[2]], array],
{i, Length[pointsList]}]; // AbsoluteTiming
This gives me
{18.8524, Null}
which would roughly make it ~ 110 days for all 500 million points. I honestly believe this could be done in a matter of hours using a CUDA implementation. Since I have never used CUDA in mathematica and will be teaching myself - I could in principal take some time and figure this out myself. However, due to the fact that I need the results of these computations for a paper I am writing, It would really be great to use a working code now, and figure it out later.
EDIT
To give better clarity to my question here is the full code with explanation of what is happening.
I have 14 sets of arrays. Each is 2048 x 2048, with entries which can be real values. For the sake of explanation, this is what I call
SeedRandom[1]
array = Developer`ToPackedArray[RandomChoice[{-1, 1}, {2048,2048}]];
Next I feed a list of points. This list also does not change. I generate my list according to the following:
solns[r_] := DeleteCases[{ToRules[Reduce[(x)^2 + (y)^2 == r, {x, y}, Integers]]}[[All, All, 2]], n_?(#[[2]] < 0 &)] /. {n_ /; n < 0, 0} ->Sequence[];
makedata[{x_, y_}] := ({x^2 + y^2, x, y})
makeRdata[m_] := Module[{allR, allSolns},
allR = DeleteCases[Table[If[SquaresR[2, i] != 0, i, 0], {i, m}], 0];
allSolns = Flatten[Table[DeleteCases[solns[allR[[i]]], n_?(#[[1]] < 0 &[[2]] == 0 &)], {i, Length[allR]}], 1];
Table[makedata[allSolns[[i]]], {i, Length[allSolns]}]
];
max = 2048^2+2048^2;
pntsList = makeRdata[max];
Once I have now created all the points I will be using (if you plan on test running this code, I suggest using max = 100^2+100^2), I feed them into part, to create a nested list called t1.
t1 = ParallelTable[{N@Sqrt@dat[[j, 1]], N@part[dat[[j, 2]], dat[[j, 3]],array]}, {j, Length[dat]}] // AbsoluteTiming
Once the nested list t1 has been generated, I will write it to a file, and then re run the same code, but this time for a new array - still exactly the same size. For this new array, pntsList does not change, and so it will only have to be generated once.
In summary,I create all the points to be fed into part before hand. Once I start feeding them into part the array does not change. It is all the same array with the same dimensions. Only after all the points have been proccessed and the result written to a file. Do I rerun the code for a new array - still the same size- and I feed into it the exact same points. I repeat this for 14 different arrays.
Answer
I would use ListCorrelate and rely on Mathematica's internal parallel implementation. Here is your code:
part[da_, db_, array_] := Mean @ Flatten @ (array[[span[db],span[da]]]*array[[span[-db],span[-da]]]);
span[x_]:=Switch[Sign[x],-1,1;;x-1,0,1;;-1,1,x+1;;-1];
Let's use a smaller array example:
array = Developer`ToPackedArray @ RandomChoice[{-1, 1}, {100, 100}];
Here is a table of all possible indices:
r1 = Table[part[da, db, array], {da, -99, 99}, {db, -99, 99}]; //AbsoluteTiming
{0.768523, Null}
And, here is an equivalent ListCorrelate implementation:
count[dim_] := Outer[
Times,
Join[Range[dim], Reverse[Range[dim-1]]],
Join[Range[dim], Reverse[Range[dim-1]]]
]
r2 = ListCorrelate[array, array, {-1, 1}, 0] / count[100] //Transpose; //AbsoluteTiming
{0.025539, Null}
They are equal:
r1 === r2
True
Now, let's consider a larger array:
array = Developer`ToPackedArray @ RandomChoice[{-1, 1}, {2048, 2048}];
I won't bother timing the OP code, but here is the timing for ListCorrelate:
ListCorrelate[array, array, {-1, 1}, 0] / count[2048] //Transpose; //AbsoluteTiming
{11.6008, Null}
This is slow because rational numbers can't be packed. If an answer using reals is acceptable, you could use:
ListCorrelate[array, array, {-1, 1}, 0] / N[count[2048]] //Transpose; //AbsoluteTiming
{1.15444, Null}
So, you should be able to do all 14 array objects in less than 170 seconds using rationals, or 15 seconds using reals.
Comments
Post a Comment