Skip to main content

list manipulation - Counting the population of integers


Suppose that myData is a list of sublists. Each sublist has a length of one or greater and contains any number of replicates of the integers 1, 2, 3, and 4. I would like to create a function myFun that counts the number of each integer in the sublist.


As an example, suppose that I have the following myData:


myData = {{2, 3, 3, 1, 1, 3, 2, 2, 4, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1},
{2, 2, 3, 1, 1, 3, 2, 2, 3, 1, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1},

{1, 2, 2, 1, 1, 3, 2, 1, 3, 1, 1, 2, 2, 2, 2, 3, 1, 1},
{2, 3, 3, 1, 1, 2, 3, 2, 3, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1}};

I would like myFun /@ myData to return the output:



{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}, {4, 0}},
{{1, 8}, {2, 7}, {3, 3}, {4, 0}},
{{1, 8}, {2, 8}, {3, 4}, {4, 0}}}


In other words, sublist 1 has eight 1s, eight 2s, three 3s, and one 4, and similarly for the other three sublists. The key feature here is that all of 1, 2, 3, and 4 are listed, even if one or more of them have zero population.


Using the following (taking advantage of Tally) gets me close:


myFun[sublist_] := SortBy[Tally[sublist], First]
myFun /@ myData


{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}},
{{1, 8}, {2, 7}, {3, 3}},
{{1, 8}, {2, 8}, {3, 4}}}


But Tally does not list members with zero population, so the above does not satisfy the key feature.


On the other hand, the following (using Count) seems to accomplish my goal:


myFun[sublist_] := Map[{#, Count[sublist, #]} &, {1, 2, 3, 4}]
myFun /@ myData


{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}, {4, 0}},
{{1, 8}, {2, 7}, {3, 3}, {4, 0}},

{{1, 8}, {2, 8}, {3, 4}, {4, 0}}}

But, is there a more succinct, more elegant way to do this? Thanks for your time!



Answer



Though it is not typically as fast as Tally I am fond of Sow and Reap for this kind of problem:


countBy[dat_, bins_] :=
{bins, Tr /@ Reap[Sow[1, #], bins, Tr@#2 &][[2]]}\[Transpose] & /@ dat

countBy[myData, {1, 2, 3, 4}] // List // MatrixForm


Mathematica graphics


Since Tally is probably better in the long run despite my enjoyment of Sow and Reap here is a way to use that efficiently:


countBy2[dat_, bins_] :=
{bins, Replace[bins, Dispatch @ Append[Rule @@@ Tally @ #, _ -> 0], {1}]}\[Transpose] & /@ dat



Performance


I commented that while a method that jVincent proposed (also in a comment) was short and elegant that it was "not at all efficient computationally." He rebuts this in an answer, to which I will now reply. I was of course not speaking about the absolute performance on this tiny example but rather about algorithmic complexity and the way the method scales to a larger problem. The issue with using Count is that each list must be scanned as many times as there are unique elements. When the list is long and contains a large number of such elements this becomes a very slow process.


I shall use the faster Tally method (countBy3) rather than the playful Sow and Reap method while comparing performance. I will use the following functions:


SetAttributes[timeAvg, HoldFirst]


timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]

myFun[sublist_] := SortBy[Tally[sublist], First]
myFun[sublist_, elems_] :=
Replace[myFun[sublist~Join~elems], {el : Alternatives @@ elems, n_} :> {el, n - 1}, 1];
leonid[dat_, bins_] := myFun[#, bins] & /@ dat

jV[dat_, bins_] :=
Outer[{#2, Count[#1, #2]} &, dat, bins, 1]


First, using an array of fixed size but varying the number bins to tally:


times =
Table[
With[{dat = RandomInteger[{1, n}, {1000, 1000}]},
timeAvg[#[dat, Range@n]] & /@ {jV, leonid, countBy2}
],
{n, 3^Range@6}
]



{{0.1028, 0.01312, 0.007608},
{0.2776, 0.01996, 0.012104},
{0.811, 0.0418, 0.02372},
{2.371, 0.1216, 0.05616},
{7.161, 0.593, 0.2246},
{21.45, 3.573, 0.468}}

ListPlot[times\[Transpose], PlotRange -> All, AxesOrigin -> {1, -1}, Joined -> True]


Mathematica graphics


The blue line is the timing of jVincent's Count method, the purple line Leonid's, and the yellow line mine. Clearly Count does not scale well with regard to an increasing number of unique elements.


Here is a run-off between Leonid's code an my own (not that he wrote it with peak efficiency in mind), using much longer sublists to allow for more unique elements:


times2 =
Table[
With[{dat = RandomInteger[{1, n}, {20, 25000}]},
timeAvg[#[dat, Range@n]] & /@ {leonid, countBy2}
],
{n, 3^Range@9}
]



{{0.0011728, 0.0009232},
{0.0012976, 0.0009984},
{0.0017712, 0.001248},
{0.003368, 0.001872},
{0.011352, 0.003744},
{0.0688, 0.009608},
{0.561, 0.03304},
{4.914, 0.156},

{43.462, 0.39}}

ListPlot[times2\[Transpose], PlotRange -> All, AxesOrigin -> {1, -1}, Joined -> True, 
PlotStyle -> ColorData[1] /@ {2, 3}]

Mathematica graphics


So once again at the extremes Leonid's code "blows up" while mine does not.


That is what I meant by "not at all efficient computationally."


Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...