filtering - Restoring the 1-to-1 correspondence between elements in two lists, where one list is used as a guide to prune elements from the other
when we wish to use one list to select items from another using something like Pick, writing the following code, for example, causes us to lose the 1-to-1 correspondence between items in the two lists:
dataArray = {"A","B","C","D","E","F","G"};
testArray = {0.223,0.3,1.2,0.44,4,0.24449,1.01};
dataArray = Pick[dataArray, #>= 1 &/@ testArray];
output = {"C", "E", "G"}
Without having to make a copy of anything, how do we safely prune items from, here testArray, to restore the previous 1-to-1 correspondence between elements in testArray and elements in dataArray? For example, if B in dataArray corresponds to 0.3 in testArray (based on its index), it should again do so after the Pick pruning step.
Answer
There are surely many ways to approach this problem. Which is best likely (again) depends on your data. I will illustrate three variants.
Paired data (a la decorate-and-sort)
We can do as Kuba did and merge the two lists into one to keep the elements together:
Select[{dataArray, testArray}\[Transpose], #[[2]] >= 1 &]
{{"C", 1.2}, {"E", 4}, {"G", 1.01}}
You can finish with a second Transpose to separate the data into two lists.
Reused mask
A typically faster method is to simply construct the mask once and then reuse it in Pick as needed:
mask = UnitStep[testArray - 1];
Pick[#, mask, 1] & /@ {dataArray, testArray}
{{"C", "E", "G"}, {1.2, 4, 1.01}}
Note that I converted your test to a vectorized numeric form for better performance.
Index-based filtering
Perhaps the top performing method for filter reapplication (especially in version 7 before Pick was better optimized) is to create a list of positions you wish to keep, then extract them using Part or Extract. Faster than Position, when applicable, is SparseArray, using the undocumented Properties method:
fastpos = SparseArray[#]["AdjacencyLists"] &;
idx = fastpos @ UnitStep[testArray - 1]
{3, 5, 7}
#[[idx]] & /@ {dataArray, testArray}
{{"C", "E", "G"}, {1.2, 4, 1.01}}
You can also process multiple lists at once with the help of All, like this:
{dataArray, testArray}[[All, idx]]
{{"C", "E", "G"}, {1.2, 4, 1.01}}
Comments
Post a Comment