I have a stream of data like this:
0001100111100000111111001110000001111111111000000111000111110000...
(I can represent them as a list, like in {0,0,0,1,1,...}, I guess that's easier to work with.)
Now I want to count how many sequences of two "1"s, three "1"s, etc there are (the zeros lengths are not important, they're just separators), to show them in a histogram. I have no problems doing this procedural, but functional programming remains difficult for me. While I don't mind pausing for a cup of coffee (there's 4.8 million data points), I guess in functional programming this will be orders of magnitude faster. How do I do this with functional programming?
Note
"0011100" only counts as a sequence of length 3, the two sub-sequences of length 2 should not be taken into account.
Answer
If your data is in list form (conversion from string will swamp advantage), this should be quite a bit faster (5-50+X than existing answers, timings on the loungbook, so I'd expect 10+X faster for all on W/S):
tOnes = Module[{p = Append[Pick[Range@Length@#, #, 1], 0], sa},
If[p === {0}, {},
sa = SparseArray[Subtract[Rest@p, Most@p], Automatic, 1]["AdjacencyLists"];
Tally[Differences[Prepend[sa, 0]]]]] &;
Comparable in speed, and arguably prettier:
tOnes2 = With[{d = Join[{0}, #, {0}]},
Tally[Differences@DeleteDuplicates@Pick[Accumulate@d, d, 0]]] &;
Comparison:
(* make some data & string/digit equivalents for string/Mr.W solutions *)
data = RandomInteger[{0, 1}, 4000000];
strng = StringJoin[ToString /@ data];
mwdata = FromDigits[data];
ClearSystemCache[]
(* eldo *)
eldotim =
First@Timing[
eldo = Tally@
Select[StringLength /@ StringSplit[strng, "0"], # > 0 &];];
(* Mr. W *)
mwtim = First@
Timing[mwr =
Tally[Length /@ Split[IntegerDigits@mwdata][[;; ;; 2]]];];
(* 2012rcampion *)
rctim = First@Timing[
lengths = Cases[Split[data], l : {1, ___} :> Length[l]];
tally = Tally[lengths];
];
(* kguler *)
kgtim = First@
Timing[tally2 = Tally@StringLength@StringCases[strng, "1" ..];];
(* Me *)
me1tim = First@Timing[me = tOnes@data;];
me2tim = First@Timing[me2 = tOnes2@data;];
Transpose[{{"Mr.W", "eldo", "2012rcampion", "kguler", "Me1", "Me2"},
{mwtim, eldotim, rctim, kgtim, me1tim, me2tim}}] // TableForm
(* Check *)
me == me2 == tally == eldo == tally2 == mwr
(* True *)
Comments
Post a Comment