I'm lookin for binning of
list1={{"1A",1},{"2A",2},{"170A",170},{"3A",3},{"90A",90},{"80A",80},{"2A",2},{"110A",110},{"222A",222},{"200A",200},{"215A",215},{"30A",30}}
into
bins={{0,20,100,∞}}
according to 2nd element in sublists as bin criterion?
Answer
I think this should work for you:
binBy1[dat_, bins_, fn_] :=
With[{intv = Interval /@ Partition[bins, 2, 1]},
dat //
GroupBy[IntervalMemberQ[intv, fn@#] &] //
KeyMap[Pick[intv, #][[1, 1]] & ] //
KeySort
]
Use:
binBy1[list1, {0, 20, 100, ∞}, Last]
<|{0, 20} -> {{"1A", 1}, {"2A", 2}, {"3A", 3}, {"2A", 2}},
{20, 100} -> {{"90A", 90}, {"80A", 80}, {"30A", 30}},
{100, ∞} -> {{"170A", 170}, {"110A", 110}, {"222A", 222},
{"200A", 200}, {"215A", 215}}|>
If you just want the values:
binBy2[dat_, bins_, fn_] :=
With[{intv = Interval /@ Partition[bins, 2, 1]},
dat //
GroupBy[IntervalMemberQ[intv, fn@#] &] //
KeyMap[Pick[intv, #][[1]] & ] //
Lookup[#, intv, {}] &
]
binBy2[{ {"90A", 90}, {"3A", 3}}, {-50, 0, 20, 100, ∞}, Last]
{{}, {{"3A", 3}}, {{"90A", 90}}, {}}
Performance
This ends up less clean than the code above, which you already feel is complicated, but for performance Interpolation
can be far superior to IntervalMemberQ
as I used it above.
binsToIFn[bins_List] :=
Interpolation[{Join[{$MinMachineNumber}, bins, {$MaxMachineNumber}],
Range[0, Length@bins + 1]}\[Transpose], InterpolationOrder -> 0]
binBy3[dat_, bins_, fn_] :=
With[{IFn = binsToIFn @ bins},
dat //
GroupBy[IFn @* fn] //
KeyMap[Round] //
Lookup[#, Range[Length@bins + 1], {}] &
]
Note that with this function $MinMachineNumber
and $MaxMachineNumber
are automatically used as the bounding intervals so they may be omitted from the list.
Timings compared to my first two functions on a large problem:
bins = Union @ RandomInteger[999, 300];
bins = Join[{-10}, bins, {1200}];
big = RandomReal[999, {50000, 2}];
binBy1[big, bins, Last] // Length // Timing
binBy2[big, bins, Last] // Length // Timing
binBy3[big, bins, Last] // Length // Timing
{5.63164, 269}
{5.60044, 269}
{0.109201, 271}
Coolwater's function on my machine:
binBy[big, {bins}] // Length // Timing
{9.36006, 269}
Comments
Post a Comment