Let us suppose I have to do the following computation $A_i\cdot B_i^{-1}\cdot C_i$, where $A_i,B_i,C_i$ are conformable matrices(dim=$50\times 50$).
For that, I have 3 lists, each of size 1 by 10,000 , where in list $A$, each component is a matrix $A_i$, and similarly for list $B$, we have $B_i^{-1}$, and for $C$.
- I thought of doing a ParallelTable. Or should I go for a loop? Is there a faster way to create a list where each component is $A_i\cdot B_i^{-1}\cdot C_i$? Is there a way to vectorize this ?
- What if I do a huge loop where I build the lists A,B and C, and as I build them, I also calculate $A_i\cdot B_i^{-1}\cdot C_i$?
If this question is too broad, or really needs a working example, tell me, don't just vote to close. Thanks ;)
Answer
A few possibilities:
n = 10000;
m = 50;
a = RandomReal[{-1, 1}, {n, m, m}];
b = RandomReal[{-1, 1}, {n, m, m}];
c = RandomReal[{-1, 1}, {n, m, m}];
result = MapThread[#1.LinearSolve[#2, #3] &, {a, b, c}]; // AbsoluteTiming // First
result2 = Table[a[[i]].LinearSolve[b[[i]], c[[i]]], {i, 1, Length[a]}]; // AbsoluteTiming // First
DistributeDefinitions[a, b, c]; // AbsoluteTiming // First
result3 = ParallelTable[a[[i]].LinearSolve[b[[i]], c[[i]]], {i, 1, Length[a]}]; // AbsoluteTiming // First
On my Haswell Quad Core CPU, this produces the following timings:
1.29189
3.87935
6.70797
0.454736
This highlights that moving all the data between several core is the major bottleneck here. So, in total, MapTread
seems to be the better option. In fact, this makes me wonder whether MapTread
is parallelized (the documentation does not state that).
I am currently not aware of a better way to parallelize this. Unfortunately, Compile
with options RuntimeAttributes -> {Listable}
and Parallelization -> True
does not work because LinearSolve
cannot be compiled and thus needs calls to MainEvaluate
which makes parallelization inpossible.
Comments
Post a Comment