Skip to main content

list manipulation - Counting the population of integers


Suppose that myData is a list of sublists. Each sublist has a length of one or greater and contains any number of replicates of the integers 1, 2, 3, and 4. I would like to create a function myFun that counts the number of each integer in the sublist.


As an example, suppose that I have the following myData:


myData = {{2, 3, 3, 1, 1, 3, 2, 2, 4, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1},
{2, 2, 3, 1, 1, 3, 2, 2, 3, 1, 1, 1, 2, 2, 1, 2, 2, 1, 1, 1},

{1, 2, 2, 1, 1, 3, 2, 1, 3, 1, 1, 2, 2, 2, 2, 3, 1, 1},
{2, 3, 3, 1, 1, 2, 3, 2, 3, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1, 1}};

I would like myFun /@ myData to return the output:



{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}, {4, 0}},
{{1, 8}, {2, 7}, {3, 3}, {4, 0}},
{{1, 8}, {2, 8}, {3, 4}, {4, 0}}}


In other words, sublist 1 has eight 1s, eight 2s, three 3s, and one 4, and similarly for the other three sublists. The key feature here is that all of 1, 2, 3, and 4 are listed, even if one or more of them have zero population.


Using the following (taking advantage of Tally) gets me close:


myFun[sublist_] := SortBy[Tally[sublist], First]
myFun /@ myData


{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}},
{{1, 8}, {2, 7}, {3, 3}},
{{1, 8}, {2, 8}, {3, 4}}}


But Tally does not list members with zero population, so the above does not satisfy the key feature.


On the other hand, the following (using Count) seems to accomplish my goal:


myFun[sublist_] := Map[{#, Count[sublist, #]} &, {1, 2, 3, 4}]
myFun /@ myData


{{{1, 8}, {2, 8}, {3, 3}, {4, 1}},
{{1, 9}, {2, 8}, {3, 3}, {4, 0}},
{{1, 8}, {2, 7}, {3, 3}, {4, 0}},

{{1, 8}, {2, 8}, {3, 4}, {4, 0}}}

But, is there a more succinct, more elegant way to do this? Thanks for your time!



Answer



Though it is not typically as fast as Tally I am fond of Sow and Reap for this kind of problem:


countBy[dat_, bins_] :=
{bins, Tr /@ Reap[Sow[1, #], bins, Tr@#2 &][[2]]}\[Transpose] & /@ dat

countBy[myData, {1, 2, 3, 4}] // List // MatrixForm


Mathematica graphics


Since Tally is probably better in the long run despite my enjoyment of Sow and Reap here is a way to use that efficiently:


countBy2[dat_, bins_] :=
{bins, Replace[bins, Dispatch @ Append[Rule @@@ Tally @ #, _ -> 0], {1}]}\[Transpose] & /@ dat



Performance


I commented that while a method that jVincent proposed (also in a comment) was short and elegant that it was "not at all efficient computationally." He rebuts this in an answer, to which I will now reply. I was of course not speaking about the absolute performance on this tiny example but rather about algorithmic complexity and the way the method scales to a larger problem. The issue with using Count is that each list must be scanned as many times as there are unique elements. When the list is long and contains a large number of such elements this becomes a very slow process.


I shall use the faster Tally method (countBy3) rather than the playful Sow and Reap method while comparing performance. I will use the following functions:


SetAttributes[timeAvg, HoldFirst]


timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]

myFun[sublist_] := SortBy[Tally[sublist], First]
myFun[sublist_, elems_] :=
Replace[myFun[sublist~Join~elems], {el : Alternatives @@ elems, n_} :> {el, n - 1}, 1];
leonid[dat_, bins_] := myFun[#, bins] & /@ dat

jV[dat_, bins_] :=
Outer[{#2, Count[#1, #2]} &, dat, bins, 1]


First, using an array of fixed size but varying the number bins to tally:


times =
Table[
With[{dat = RandomInteger[{1, n}, {1000, 1000}]},
timeAvg[#[dat, Range@n]] & /@ {jV, leonid, countBy2}
],
{n, 3^Range@6}
]



{{0.1028, 0.01312, 0.007608},
{0.2776, 0.01996, 0.012104},
{0.811, 0.0418, 0.02372},
{2.371, 0.1216, 0.05616},
{7.161, 0.593, 0.2246},
{21.45, 3.573, 0.468}}

ListPlot[times\[Transpose], PlotRange -> All, AxesOrigin -> {1, -1}, Joined -> True]


Mathematica graphics


The blue line is the timing of jVincent's Count method, the purple line Leonid's, and the yellow line mine. Clearly Count does not scale well with regard to an increasing number of unique elements.


Here is a run-off between Leonid's code an my own (not that he wrote it with peak efficiency in mind), using much longer sublists to allow for more unique elements:


times2 =
Table[
With[{dat = RandomInteger[{1, n}, {20, 25000}]},
timeAvg[#[dat, Range@n]] & /@ {leonid, countBy2}
],
{n, 3^Range@9}
]



{{0.0011728, 0.0009232},
{0.0012976, 0.0009984},
{0.0017712, 0.001248},
{0.003368, 0.001872},
{0.011352, 0.003744},
{0.0688, 0.009608},
{0.561, 0.03304},
{4.914, 0.156},

{43.462, 0.39}}

ListPlot[times2\[Transpose], PlotRange -> All, AxesOrigin -> {1, -1}, Joined -> True, 
PlotStyle -> ColorData[1] /@ {2, 3}]

Mathematica graphics


So once again at the extremes Leonid's code "blows up" while mine does not.


That is what I meant by "not at all efficient computationally."


Comments

Popular posts from this blog

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

How to thread a list

I have data in format data = {{a1, a2}, {b1, b2}, {c1, c2}, {d1, d2}} Tableform: I want to thread it to : tdata = {{{a1, b1}, {a2, b2}}, {{a1, c1}, {a2, c2}}, {{a1, d1}, {a2, d2}}} Tableform: And I would like to do better then pseudofunction[n_] := Transpose[{data2[[1]], data2[[n]]}]; SetAttributes[pseudofunction, Listable]; Range[2, 4] // pseudofunction Here is my benchmark data, where data3 is normal sample of real data. data3 = Drop[ExcelWorkBook[[Column1 ;; Column4]], None, 1]; data2 = {a #, b #, c #, d #} & /@ Range[1, 10^5]; data = RandomReal[{0, 1}, {10^6, 4}]; Here is my benchmark code kptnw[list_] := Transpose[{Table[First@#, {Length@# - 1}], Rest@#}, {3, 1, 2}] &@list kptnw2[list_] := Transpose[{ConstantArray[First@#, Length@# - 1], Rest@#}, {3, 1, 2}] &@list OleksandrR[list_] := Flatten[Outer[List, List@First[list], Rest[list], 1], {{2}, {1, 4}}] paradox2[list_] := Partition[Riffle[list[[1]], #], 2] & /@ Drop[list, 1] RM[list_] := FoldList[Transpose[{First@li...

front end - keyboard shortcut to invoke Insert new matrix

I frequently need to type in some matrices, and the menu command Insert > Table/Matrix > New... allows matrices with lines drawn between columns and rows, which is very helpful. I would like to make a keyboard shortcut for it, but cannot find the relevant frontend token command (4209405) for it. Since the FullForm[] and InputForm[] of matrices with lines drawn between rows and columns is the same as those without lines, it's hard to do this via 3rd party system-wide text expanders (e.g. autohotkey or atext on mac). How does one assign a keyboard shortcut for the menu item Insert > Table/Matrix > New... , preferably using only mathematica? Thanks! Answer In the MenuSetup.tr (for linux located in the $InstallationDirectory/SystemFiles/FrontEnd/TextResources/X/ directory), I changed the line MenuItem["&New...", "CreateGridBoxDialog"] to read MenuItem["&New...", "CreateGridBoxDialog", MenuKey["m", Modifiers-...