I want to compile the following function which takes a matrix variable.
ClusterFind =
Compile[{{Inte, _Real}, {Matrix}},
Map[If[Last@# == Inte, Take[#, 4]] &, Matrix], CompilationTarget -> "C",
RuntimeAttributes -> {Listable}, Parallelization -> True];
Here Matrix is a $m\times n$ matrix of real,integer or complex numbers. I also would like to know if it is possible to compile without knowing the dimension of the variable matrix beforehand. Can the Matrix in the above example be of PackedArray data type?
UPDATE
Thanks for the suggestions. I found the following compiled version most efficient as far as speed is concerned.
ClusterFind =
Compile[{{Inte, _Real}, {Matrix, _Complex, 2}},
Take[Select[Matrix, (Last@# == Inte) &], All, 4],
CompilationTarget -> "C"];
BR
Answer
Not only can Matrix be a PackedArray, it must be a PackedArray. However, it will be packed for you if necessary before the compiled code is called.
The following code is substantially faster than that given by acl above and does not require any post-processing of the output, but is still sub-optimal in terms of requiring CopyTensor calls and using a rather larger working set than one would think necessary. Perhaps these limitations can be lifted, but after a brief survey of possible implementations I didn't find a way better than this (though note that I didn't try anything with Internal`Bag).
clusterFind = Compile[{{inte, _Real, 0}, {matrix, _Complex, 2}},
Module[{tmp = matrix[[All, {1, 2, 3, 4, -1}]]},
Select[tmp, Last[#] == inte &][[All, ;; -2]]
], RuntimeAttributes -> Listable, Parallelization -> True
];
An example of the improved timings:
range = {0, 10};
data = RandomInteger[range, {1*^5, 100}];
acl's version:
Timing[
cf4[RandomInteger[range], data];
]
producing: {2.297, Null}
My version:
Timing[
clusterFind[RandomInteger[range], data];
]
which gives {0.047, Null}.
Note that the above timings are not for C-compiled versions of the two functions; compilation to C does not help very much as there is not much computational work to be done in this process anyway and most of the timing consists of copying or extracting parts of tensors. Also, I should mention that RuntimeAttributes -> Listable and Parallelization -> True do not really buy you anything here unless you are operating on a list of matrices.
Comments
Post a Comment