A friend of mine introduced array broadcasting in the Python NumPy
package which is very convenient (and also highly efficient).
The idea is perfectly shown in this picture:
Basically, the method first checks the shape of the two arrays; if a dimension is not the same, it "broadcasts" that dimension to generate arrays of the same dimensions.
Here is an excerpt from the General Broadcasting Rules in the documentation of NumPy:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a
ValueError: frames are not aligned
exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.Arrays do not need to have the same number of dimensions.
This is different from the built-in auto-threading in Mathematica. For example, Mathematica does not do this:
{1, 2} + {{1, 2}, {2, 3}, {3, 4}}
I know that there is a duplicate question. But there is no strong reason why Mathematica can't support such a technique. At least, I think that it doesn't cause any contradictions to Mathematica's existing list operation: we just need to check shape first and then "broadcast" it, which seems quite natural. And perhaps broadcasting can yield an efficiency boost because we don't need to transpose twice.
How could this technique be implemented in Mathematica?
edit
I just run a comparison for python and mma regard adding a vector to a matrix. Python's numpy is faster. The matrix is random
data=RandomReal[{0,1},{40000000,2}];
for mma
Transpose[{1.,2.}+Transpose[data];//AbsoluteTiming
Takes 1.8sec
for python
import numpy as np
import time
a=np.random.rand(40000000,2)
b=np.array([1.,2.])
start=time.time()
a+b
end=time.time()
print end-start
takes 1.08sec.
I think for mma, time is wasted in Transpose
, because simply Transpose[data]
takes 0.6sec
Answer
NumPy broadcasting lets you perform, in efficient way, element-wise operations on arrays, as long as dimensions of those arrays are considered "compatible" in some sense.
Mathematica also has such mechanism. Some Mathematica functions are Listable
and also allow you to perform element-wise operations on nested lists with dimensions "compatible" in some sense. Built-in listable functions are optimized for packed arrays and, similarly to NumPy's broadcasting, will give you "C-level" efficiency.
In addition to that Mathematica allows you to Compile
functions with Listable
RuntimeAttributes
which gives you some additional control over "compatibility" of arrays. Listable
compiled functions can also be easily parallelized.
There are two important differences between how NumPy's broadcasting and Mathematica's listability (compiled and not) determine if arrays are "compatible":
- order in which dimensions are compared,
- what happens when certain dimensions are equal 1.
Leading vs Trailing Dimensions
Broadcasting
NumPy starts with trailing dimensions, Mathematica - with leading. So NumPy can e.g. add arrays with dimensions {8,5,7,4}
and {7,4}
out of the box:
import numpy as np
(np.zeros((8,5,7,4))+np.ones((7,4))).shape
# (8, 5, 7, 4)
In Mathematica this would lead to an error:
Array[0 &, {8, 5, 7, 4}] + Array[1 &, {7, 4}];
(* Thread::tdlen: Objects of unequal length in ... cannot be combined. *)
To use listability we can transpose one of arrays to put "compatible" dimensions to the front and after addition transpose back:
Transpose[
Transpose[Array[0 &, {8, 5, 7, 4}], {3, 4, 1, 2}] +
Array[1 &, {7, 4}], {3, 4, 1, 2}
] // Dimensions
(* {8, 5, 7, 4} *)
Listability
In contrast Mathematica can, out of the box, add arrays with dimensions {4,7,5,8}
and {4,7}
:
Array[0 &, {4, 7, 5, 8}] + Array[1 &, {4, 7}] // Dimensions
(* {4, 7, 5, 8} *)
which would lead to an error in NumPy
import numpy as np
(np.zeros((4,7,5,8))+np.ones((4,7)))
# Traceback (most recent call last):
# File "", line 1, in
# ValueError: operands could not be broadcast together with shapes (4,7,5,8) (4,7)
Similarly to use broadcasting we could transpose our arrays:
import numpy as np
(np.zeros((4,7,5,8)).transpose(2,3,0,1)+np.ones((4,7))).transpose(2,3,0,1).shape
# (4, 7, 5, 8)
I don't know if this is the "correct" way to do it in NumPy. As far as I know, in contrast to Mathematica, NumPy is not copying an array on transposition, it returns a view
of an array i.e. an object with information on how data from base
array should be accessed. So I think that those transpositions are much cheaper than in Mathematica.
I doubt that it's possible to replicate NumPy's efficiency, on arrays which are "listability incompatible", using only top-level Mathemaica code.
As noted in comment, by @LLlAMnYP, design decision to start from leading dimensions makes, in Mathematica, more sense, since listability applies not only to full arrays, but to arbitrary nested lists.
Compiled Listability
Since compiled functions accept only full arrays with specified rank, Compilation allows you to "split" ranks of full arrays into two parts. Last dimensions given by ranks in arguments list of Compile
will be handled inside body of your compiled function, and remaining leading dimensions will be handled by Listable
attribute of compiled function.
For tests let's compile simple listable function accepting two rank 2 arrays of reals:
cPlus22 = Compile[{{x, _Real, 2}, {y, _Real, 2}}, x + y, RuntimeAttributes -> {Listable}]
Now last two dimensions need to be equal since they are handled by Plus
inside body of compiled function. Remaining dimensions will be handled by ordinary listability rules starting with leading ones:
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8}]] // Dimensions
(* {4, 7, 5, 8} *)
cPlus22[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 3, 5, 8}]] // Dimensions
(* {4, 7, 3, 5, 8} *)
Treating Dimensions equal to 1
Broadcasting
When comparing consecutive dimensions NumPy's broadcasting treats them as "compatible" if they are equal, or one of them is 1. Mathematica's listability treats dimensions as "compatible" only if they are equal.
In NumPy we can do
import numpy as np
(np.zeros((1,8,1,3,7,1))+np.ones((2,1,5,3,1,4))).shape
# (2, 8, 5, 3, 7, 4)
which gives a generalized outer product.
Outer
Mathematica has a built-in to do this kind of tasks: Outer
(as noted in comment by @Sjoerd), which is "C-level efficient" when given Plus
, Times
and List
functions and packed arrays. But Outer
has its own rules for dimension "compatibility", to replicate NumPy's broadcasting conventions, all pairwise equal dimensions should be moved to the end, and dimensions equal one, that are supposed to be broadcasted, should be removed. This in general requires accessing Part
s of arrays and transpositions (which in Mathematica enforces copying).
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {1, 3, 2}]) // Dimensions
(* {8, 7, 3} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {1, 2, 4, 3}]) // Dimensions
(* {2, 5, 4, 3} *)
Transpose[Outer[Plus, a, b, 2, 3], {2, 5, 1, 3, 6, 4}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
Compiled Listability
Using different ranks in argument list of Compile
results in a kind of outer product to. "Excessive" trailing dimensions of higher rank array don't have to be compatible with any dimensions of lower rank array since they will end up appended at the and of dimensions of result.
cPlus02 = Compile[{x, {y, _Real, 2}}, x + y, RuntimeAttributes -> {Listable}];
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 3, 9} *)
cPlus02[Array[0 &, {4, 7, 5, 8}], Array[1 &, {4, 7, 5, 8, 2, 3, 9}]] // Dimensions
(* {4, 7, 5, 8, 2, 3, 9} *)
To emulate broadcasting in this case dimensions equal 1
should be removed, dimensions to be broadcasted from one array should be moved to beginning, and from other - to the end. Compiled function should have an argument with rank equal to number of compatible dimensions, as this argument, array with dimensions to be broadcasted at beginning, should be passed. Other argument should have rank equal to rank of array with dimensions to be broadcasted at end.
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {1, 3, 2}]) // Dimensions
(* {8, 7, 3} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {2, 3, 1, 4}]) // Dimensions
(* {3, 2, 5, 4} *)
cPlus14 = Compile[{{x, _Real, 1}, {y, _Real, 4}}, x + y, RuntimeAttributes -> {Listable}];
Transpose[cPlus14[a, b], {2, 5, 4, 1, 3, 6}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
Since compatible dimensions don't have to be handled inside body of compiled function, but can be handled by Listable
attribute, there are different orderings possible. Each compatible dimension can be moved from the middle of dimensions of first array to the beginning, and rank of both arguments of compiled function can be decreased by one for each such dimension.
(a = Transpose[Array[0 &, {1, 8, 1, 3, 7, 1}][[1, All, 1, All, All, 1]], {2, 1, 3}]) // Dimensions
(* {3, 8, 7} *)
(b = Transpose[Array[1 &, {2, 1, 5, 3, 1, 4}][[All, 1, All, All, 1]], {2, 3, 1, 4}]) // Dimensions
(* {3, 2, 5, 4} *)
cPlus03 = Compile[{x, {y, _Real, 3}}, x + y, RuntimeAttributes -> {Listable}];
Transpose[cPlus03[a, b], {4, 2, 5, 1, 3, 6}] // Dimensions
(* {2, 8, 5, 3, 7, 4} *)
Below I present three approaches to broadcasting in Mathematica, with different generality and efficiency.
Top-level Procedural code.
It's straightforward, completely general (works for arbitrary number of lists and arbitrary function), but it's slow.
LibraryLink static function.
It's very fast, currently works for addition of arbitrary number of real arrays with arbitrary dimensions.
LibraryLink JIT compiled function.
It's fastest, from presented solutions, and quite general (works for arbitrary compilable function and arbitrary number of arbitrary packable arrays with arbitrary dimensions), but it's compiled separately for each function and each "type" of arguments.
1. Top-level Procedural
This implementation uses dimensions of input arrays to construct proper Table
expression that creates resulting array in one call by extracting proper elements from input arrays.
A helper function that constructs the Table
expression:
ClearAll[broadcastingTable]
broadcastingTable[h_, f_, arrays_, dims_, maxDims_] :=
Module[{inactive, tableVars = Table[Unique["i"], Length[maxDims]]},
Prepend[
inactive[h] @@ Transpose[{tableVars, maxDims}],
inactive[f] @@ MapThread[
inactive[Part][#1, Sequence @@ #2] &,
{
arrays,
MapThread[
If[#1 === 1, 1, #2] &,
{#, PadLeft[tableVars, Length[#]]}
] & /@ dims
}
]
] /. inactive[x_] :> x
]
Example table expression (with head replaced by Hold
) for three arrays with dimensions: {4, 1, 5}
, {7, 4, 3, 1}
and {1, 5}
looks like this:
broadcastingTable[Hold, Plus,
{arr1, arr2, arr3},
{{4, 1, 5}, {7, 4, 3, 1}, {1, 5}},
{7, 4, 3, 5}
]
(* Hold[arr1[[i4, 1, i6]] + arr2[[i3, i4, i5, 1]] + arr3[[1, i6]], {i3, 7}, {i4, 4}, {i5, 3}, {i6, 5}] *)
And now the final function:
ClearAll[broadcasted]
broadcasted::incompDims = "Objects with dimentions `1` can't be broadcasted.";
broadcasted[f_, lists__] :=
Module[{listOfLists, dims, dimColumns},
listOfLists = {lists};
dims = Dimensions /@ listOfLists;
dimColumns = Transpose@PadLeft[dims, Automatic, 1];
broadcastingTable[Table, f, listOfLists, dims, Max /@ dimColumns] /;
If[MemberQ[dimColumns, dimCol_ /; ! SameQ @@ DeleteCases[dimCol, 1]],
Message[broadcasted::incompDims, dims];
False
(* else *),
True
]
]
It works for any function and any lists not necessary full arrays:
broadcasted[f, {a, {b, c}}, {{1}, {2}}]
(* {{f[a, 1], f[{b, c}, 1]}, {f[a, 2], f[{b, c}, 2]}} *)
For full arrays gives same results as NumPy:
broadcasted[Plus, Array[a, {2}], Array[b, {10, 2}]] // Dimensions
(* {10, 2} *)
broadcasted[Plus, Array[a, {3, 4, 1, 5, 1}], Array[b, {3, 1, 2, 1, 3}]] // Dimensions
(* {3, 4, 2, 5, 3} *)
broadcasted[Plus, Array[a, {10, 1, 5, 3}], Array[b, {2, 1, 3}], Array[# &, {5, 1}]] // Dimensions
(* {10, 2, 5, 3} *)
If dimensions are not broadcastable message is printed and function remains unevaluated:
broadcasted[Plus, Array[a, {3}], Array[b, {4, 2}]]
(* During evaluation of In[]:= broadcasted::incompDims: Objects with dimentions {{3},{4,2}} can't be broadcasted. *)
(* broadcasted[Plus,
{a[1], a[2], a[3]},
{{b[1, 1], b[1, 2]}, {b[2, 1], b[2, 2]}, {b[3, 1], b[3, 2]}, {b[4, 1], b[4, 2]}}
] *)
2. LibraryLink static
Here is a LibraryLink function that handles arbitrary number of arrays of reals with arbitrary dimensions.
/* broadcasting.c */
#include "WolframLibrary.h"
DLLEXPORT mint WolframLibrary_getVersion() {
return WolframLibraryVersion;
}
DLLEXPORT int WolframLibrary_initialize(WolframLibraryData libData) {
return LIBRARY_NO_ERROR;
}
DLLEXPORT void WolframLibrary_uninitialize(WolframLibraryData libData) {}
DLLEXPORT int plusBroadcastedReal(
WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res
) {
switch (Argc) {
case 0:
/* At least one argument is needed. */
return LIBRARY_FUNCTION_ERROR;
case 1:
/* If one argument is given just return it. */
MArgument_setMTensor(Res, MArgument_getMTensor(Args[0]));
return LIBRARY_NO_ERROR;
}
mint i, j;
/* ranks[i] is rank of i-th argument tensor. */
mint ranks[Argc];
/* dims[i][j] is j-th dimension of i-th argument tensor. */
const mint *(dims[Argc]);
/* data[i][j] is j-th element of i-th argument tensor. */
double *(data[Argc]);
/* Rank of result tensor. */
mint resultRank = 1;
for (i = 0; i < Argc; i++) {
MTensor tmpT = MArgument_getMTensor(Args[i]);
if (libData->MTensor_getType(tmpT) != MType_Real) {
return LIBRARY_TYPE_ERROR;
}
ranks[i] = libData->MTensor_getRank(tmpT);
dims[i] = libData->MTensor_getDimensions(tmpT);
data[i] = libData->MTensor_getRealData(tmpT);
if (resultRank < ranks[i]) {
resultRank = ranks[i];
}
}
/*
* Array of dimensions of argument tensors, with rows,
* for tensors with ranks lower than rank of result,
* filled with 1s from the beginning.
*/
mint extendedDims[Argc][resultRank];
/*
* Array of strides of argument tensors, with rows,
* for tensors with ranks lower than rank of result,
* filled with product of all tensor dimensions from the beginning.
*/
mint strides[Argc][resultRank];
/* Array of indices enumerating element of argument tensors. */
mint indices[Argc];
for (i = 0; i < Argc; i++) {
mint rankDiff = resultRank - ranks[i];
extendedDims[i][resultRank - 1] = dims[i][ranks[i] - 1];
strides[i][resultRank - 1] = extendedDims[i][resultRank - 1];
for (j = resultRank - 2; j >= rankDiff; j--) {
extendedDims[i][j] = dims[i][j - rankDiff];
strides[i][j] = strides[i][j + 1] * extendedDims[i][j];
}
for (j = rankDiff - 1; j >= 0; j--) {
extendedDims[i][j] = 1;
strides[i][j] = strides[i][rankDiff];
}
indices[i] = 0;
}
/* Dimensions of result tensor. */
mint resultDims[resultRank];
/*
* jumps[i][j] is jump of index of i-th argument tensor when index in j-th
* dimension of result tensor is incremented.
*/
mint jumps[Argc][resultRank];
/* Total number of elements in result tensor. */
mint resultElementsNumber = 1;
/* Array of indices enumerating elements of result tensor one index per dimension. */
mint resultIndices[resultRank];
for (i = resultRank - 1; i >= 0; i--) {
resultDims[i] = 1;
for (j= 0; j < Argc; j++) {
if (extendedDims[j][i] == 1) {
/*
* i-th dimension of j-th argument tensor is 1,
* so it should be broadcasted.
*/
jumps[j][i] = 1 - strides[j][i];
} else if (resultDims[i] == 1 || resultDims[i] == extendedDims[j][i]) {
/*
* i-th dimension of j-th argument tensor is not 1,
* but it's equal to all non-1 i-th dimensions of previous argument tensors,
* so i-th dimension of j-th argument tensor should be i-th dimension
* of result and it shouldn't be broadcasted.
*/
resultDims[i] = extendedDims[j][i];
jumps[j][i] = 1;
} else {
/*
* i-th dimension of j-th argument tensor is not 1,
* i-th dimension of at least one of previous argument tensors was not 1
* and those dimensions are not equal, so tensors are not broadcastable.
*/
libData->Message("plusBroadcastedDims");
return LIBRARY_DIMENSION_ERROR;
}
}
resultElementsNumber *= resultDims[i];
resultIndices[i] = 0;
}
/* Returned tensor. */
MTensor resultT;
libData->MTensor_new(MType_Real, resultRank, resultDims, &resultT);
/* Actual data of returned tensor. */
double *result;
result = libData->MTensor_getRealData(resultT);
/*
* We use single loop over all elements of result array.
* resultIndices array is updated inside loop and contains indices
* corresponding to current result element as if it was accessed using one
* index per dimension, i.e. result[i] is like
* result[resultIndices[0]][resultIndices[1]]...[resultIndices[resultRank-1]]
* for multidimensional array.
*/
for (i = 0; i < resultElementsNumber; i++) {
mint k = resultRank - 1;
resultIndices[k]++;
while (resultIndices[k] >= resultDims[k] && k >= 1) {
resultIndices[k] = 0;
k--;
resultIndices[k]++;
}
/*
* If result would be accessed using one index per dimension,
* then current value of k would correspond to dimension which
* index was incremented in this iteration.
*/
/* At this point we know that we have at least two argument tensors. */
result[i] = data[0][indices[0]] + data[1][indices[1]];
indices[0] += jumps[0][k];
indices[1] += jumps[1][k];
for (j = 2; j < Argc; j++) {
result[i] += data[j][indices[j]];
indices[j] += jumps[j][k];
}
}
MArgument_setMTensor(Res, resultT);
return LIBRARY_NO_ERROR;
}
Save above code in broadcasting.c
file in same directory as current notebook, or paste it as a string, instead of {"broadcasting.c"}
, as first argument of CreateLibrary
in code below. Pass, in "CompileOptions"
, appropriate optimization flags for your compiler, the ones below are for GCC.
Needs["CCompilerDriver`"]
SetDirectory[NotebookDirectory[]];
broadcastingLib =
CreateLibrary[
{"broadcasting.c"}, "broadcasting",
(* "CompileOptions" -> "-Wall -march=native -O3" *)
];
LibraryFunction::plusBroadcastedDims =
"Given arrays could not be broadcasted together.";
A helper function that loads appropriate library function for given number of array arguments.
ClearAll[loadPlusBroadcastedReal]
loadPlusBroadcastedReal[argc_] := loadPlusBroadcastedReal[argc] =
Quiet[
LibraryFunctionLoad[
broadcastingLib,
"plusBroadcastedReal",
ConstantArray[{Real, _, "Constant"}, argc],
{Real, _}
],
LibraryFunction::overload
]
Now final function that accepts arbitrary number of arrays with arbitrary dimensions, loads necessary library function, and uses it.
ClearAll[plusBroadcastedReal]
plusBroadcastedReal[arrays__] :=
loadPlusBroadcastedReal[Length@{arrays}][arrays]
It works as expected:
plusBroadcastedReal[{1., 2.}, {{3., 4.}, {5., 6.}, {7., 8.}}]
(* {{4., 6.}, {6., 8.}, {8., 10.}} *)
If given arrays have incompatible dimensions, then an error is generated:
plusBroadcastedReal[RandomReal[{0, 1}, {4}], RandomReal[{0, 1}, {2, 3}]]
(* During evaluation of In[]:= LibraryFunction::plusBroadcastedDims: Given arrays could not be broadcasted together. >> *)
(* During evaluation of In[]:= LibraryFunction::dimerr: An error caused by inconsistent dimensions or exceeding array bounds was encountered evaluating the function plusBroadcastedReal. >> *)
(* LibraryFunctionError["LIBRARY_DIMENSION_ERROR", 3] *)
Full post exceeded maximum allowed size, so it's continued in second answer.
Comments
Post a Comment