I have an application where I need to drop some arbitrary list of columns from a ragged array (where of course the shortest array rows have at least all the specified columns).
E.g., given a ragged list:
{{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {1, 2, 3, 4, 5}, {1, 2, 3, 4, 5, 6, 7, 8, 9},
{1, 2, 3, 4, 5, 6, 7, 8}}
and a column selection of {1, 3, 5}
, the resulting array after dropping these is
{2, 4, 6, 7, 8, 9, 10}, {2, 4}, {2, 4, 6, 7, 8, 9}, {2, 4, 6, 7, 8}
The closest I could find searching here was How to use “Drop” function to drop matrix' rows and columns in an arbitrary way?, but that has (nice) solutions for well-formed arrays, while I'm working with ragged arrays of (1000-100000) X (20-2000).
I've tried schemes using unique padding to the length of the longest row, operating, then dropping the padding and found that slow.
I'm currently using:
colDropper[array_, cols_] := Module[{s = Split[Sort@cols, #2 == #1 + 1 &]},
Fold[Drop[#, {}, #2] &, array, (DeleteDuplicates /@ s[[All, {1, -1}]]) -
Most@Accumulate@Prepend[Length /@ s, 0]]]
which does the job and performs... OK, but there's got to be a way that combines elegance and speed (unfortunately, as with most data I work with, not machine-precision is the usual here so compiling seems out...)
Answer
Here is a simple method that seems to be somewhat faster than yours on unpackable data:
colDrop[array_, drop_] :=
Module[{m = array}, m[[All, drop]] = Sequence[]; m]
Test:
data = Range /@ RandomInteger[{15, 50}, 500000];
data = Map[FromCharacterCode, data + 37, {2}];
colDropper[data, {1, 3, 5, 8, 10, 11}] // Timing // First
colDrop[data, {1, 3, 5, 8, 10, 11}] // Timing // First
1.217
0.733
Also worth consideration is a rather direct application of Delete
:
colDrop2[array_, drop_] :=
Outer[Delete, array, {List /@ drop}, 1]
colDrop2[data, {1, 3, 5, 8, 10, 11}] // Timing // First
0.952
Update
The timings above were performed in Mathematica 7. In version 10.1.0 the picture is different:
colDropper[data, {1, 3, 5, 8, 10, 11}] // RepeatedTiming // First
colDrop[data, {1, 3, 5, 8, 10, 11}] // RepeatedTiming // First
colDrop2[data, {1, 3, 5, 8, 10, 11}] // RepeatedTiming // First
0.911
0.752
1.020
colDrop
does manage to hold the edge but it is closer; colDrop2
is actually slower.
Version 10 also brings a new method; operator forms:
Delete[List /@ {1, 3, 5, 8, 10, 11}] /@ data // RepeatedTiming // First
0.9402
That is slower than the original but faster than the non-operator-form version:
Delete[#, {{1}, {3}, {5}, {8}, {10}, {11}}] & /@ data // RepeatedTiming // First
1.032
Comments
Post a Comment