Skip to main content

How can I tally continuous sequences in a list?


I have a stream of data like this:


0001100111100000111111001110000001111111111000000111000111110000...

(I can represent them as a list, like in {0,0,0,1,1,...}, I guess that's easier to work with.)


Now I want to count how many sequences of two "1"s, three "1"s, etc there are (the zeros lengths are not important, they're just separators), to show them in a histogram. I have no problems doing this procedural, but functional programming remains difficult for me. While I don't mind pausing for a cup of coffee (there's 4.8 million data points), I guess in functional programming this will be orders of magnitude faster. How do I do this with functional programming?


Note
"0011100" only counts as a sequence of length 3, the two sub-sequences of length 2 should not be taken into account.



Answer




If your data is in list form (conversion from string will swamp advantage), this should be quite a bit faster (5-50+X than existing answers, timings on the loungbook, so I'd expect 10+X faster for all on W/S):


tOnes = Module[{p = Append[Pick[Range@Length@#, #, 1], 0], sa},
If[p === {0}, {},
sa = SparseArray[Subtract[Rest@p, Most@p], Automatic, 1]["AdjacencyLists"];
Tally[Differences[Prepend[sa, 0]]]]] &;

Comparable in speed, and arguably prettier:


tOnes2 = With[{d = Join[{0}, #, {0}]}, 
Tally[Differences@DeleteDuplicates@Pick[Accumulate@d, d, 0]]] &;


Comparison:


(* make some data & string/digit equivalents for string/Mr.W solutions *)
data = RandomInteger[{0, 1}, 4000000];
strng = StringJoin[ToString /@ data];
mwdata = FromDigits[data];
ClearSystemCache[]

(* eldo *)
eldotim =
First@Timing[

eldo = Tally@
Select[StringLength /@ StringSplit[strng, "0"], # > 0 &];];

(* Mr. W *)
mwtim = First@
Timing[mwr =
Tally[Length /@ Split[IntegerDigits@mwdata][[;; ;; 2]]];];

(* 2012rcampion *)
rctim = First@Timing[

lengths = Cases[Split[data], l : {1, ___} :> Length[l]];
tally = Tally[lengths];
];

(* kguler *)
kgtim = First@
Timing[tally2 = Tally@StringLength@StringCases[strng, "1" ..];];

(* Me *)
me1tim = First@Timing[me = tOnes@data;];

me2tim = First@Timing[me2 = tOnes2@data;];

Transpose[{{"Mr.W", "eldo", "2012rcampion", "kguler", "Me1", "Me2"},
{mwtim, eldotim, rctim, kgtim, me1tim, me2tim}}] // TableForm

(* Check *)
me == me2 == tally == eldo == tally2 == mwr

enter image description here


(* True *)



Comments

Popular posts from this blog

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Adding a thick curve to a regionplot

Suppose we have the following simple RegionPlot: f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}] Now I'm trying to change the curve defined by $y=g[x]$ into a thick black curve, while leaving all other boundaries in the plot unchanged. I've tried adding the region $y=g[x]$ and playing with the plotstyle, which didn't work, and I've tried BoundaryStyle, which changed all the boundaries in the plot. Now I'm kinda out of ideas... Any help would be appreciated! Answer With f[x_] := 1 - x^2 g[x_] := 1 - 0.5 x^2 You can use Epilog to add the thick line: RegionPlot[{y < f[x], f[x] < y < g[x], y > g[x]}, {x, 0, 2}, {y, 0, 2}, PlotPoints -> 50, Epilog -> (Plot[g[x], {x, 0, 2}, PlotStyle -> {Black, Thick}][[1]]), PlotStyle -> {Directive[Yellow, Opacity[0.4]], Directive[Pink, Opacity[0.4]],