Suppose if I have following list
{
{10,b,30},
{100,a,40},
{1000,b,10},
{1000,b,70},
{100,b,20},
{10,b,70}
}
How to find rows that have max value in 3rd column, in this case
(*{{1000,b,70},{10,b,70}}*)
Answer
With:
dat = {{10, b, 30}, {100, a, 40}, {1000, b, 10}, {1000, b, 70}, {100, b, 20}, {10, b, 70}};
Perhaps most directly:
Cases[dat, {_, _, Max@dat[[All, 3]]}]
More approaches:
Last @ SplitBy[SortBy[dat, {#[[3]] &}], #[[3]] &]Pick[dat, #, Max@#] &@dat[[All, 3]]Reap[Fold[(If[#2[[3]] >= #, Sow@#2]; #2[[3]]) &, dat]][[2, 1]]
Of these Pick appears to be concise and efficient, so it is my recommendation.
Edit: Position and Extract are three times as efficient as Pick on some data. Using Transpose is slightly more efficient on packed rectangular data.
dat ~Extract~ Position[#, Max@#] & @ dat[[All, 3]]dat ~Extract~ Position[#, Max@#] & @ Part[dat\[Transpose], 3]
Here are some timings performed in version 7:
SetAttributes[timeAvg, HoldFirst]
timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]
SeedRandom[1]
dat = RandomInteger[99999, {500000, 3}];
Cases[dat, {_, _, Max@dat[[All, 3]]}] // timeAvg
Last@SplitBy[SortBy[dat, {#[[3]] &}], #[[3]] &] // timeAvg
Pick[dat, #, Max@#] &@dat[[All, 3]] // timeAvg
Reap[Fold[(If[#2[[3]] >= #, Sow@#2]; #2[[3]]) &, dat]][[2, 1]] // timeAvg
dat ~Extract~ Position[#, Max@#] &@dat[[All, 3]] // timeAvg
dat ~Extract~ Position[#, Max@#] &@Part[dat\[Transpose], 3] // timeAvg
0.1278
0.764
0.0904
0.904
0.02996
0.02496
(In actuality I restarted the Kernel between each individual timing line as otherwise each run gets slower, unfairly biasing the test toward the earlier lines.)
These can be further optimized by using faster position functions for numeric data.
Michael E2 recommended compiling (probably faster in versions after 7):
pos = Compile[{{list, _Real, 1}, {pat, _Real}}, Position[list, pat]];
dat ~Extract~ pos[#, Max@#] & @ Part[dat\[Transpose], 3] // timeAvg
0.01372
My favorite method is SparseArray properties:
spos = SparseArray[Unitize[#], Automatic, 1]["AdjacencyLists"] &;
dat[[spos[# - Max@#]]] & @ Part[dat\[Transpose], 3] // timeAvg
0.002872
This is now about 30X faster than Pick, my original recommendation.
Comments
Post a Comment