I have data that is given as a list of ordered pairs mixed with scalars. The pairs can contain infinite bounds. My goal is to convert the data into an index used in future computations.
data = {{1, ∞}, {-∞, 2}, 3, {2, 2}, {2, 3}};
This gives me all of the unique values present in data.
udata = Sort[DeleteDuplicates[Flatten@data], Less]
==> {-∞, 1, 2, 3, ∞}
Now I use Dispatch to create replacement rules based on the unique values.
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
Finally I replace the values with their indices and expand scalars a such that they are also pairs {a,a}. This results in a matrix of indices which is what I'm after.
Replace[data /. dsptch, a_Integer :> {a, a}, 1]
==> {{2, 5}, {1, 3}, {4, 4}, {3, 3}, {3, 4}}
NOTES:
The number of unique values is generally small compared to the length of
databut this doesn't have to be the case.Any real numbers are possible. The
dataI've shown simply gives a sense of the structural possibilities.
Question: Is there a way to create the final matrix of indices that is much faster than what I'm doing here?
Edit: To test the how potential solutions scale I recommend using the following data. It is fairly representative of a true-to-life case.
inf = {#, ∞} & /@ RandomChoice[Range[1000], 3*10^5];
neginf = {-∞, #} & /@ RandomChoice[Range[1000], 10^5];
int = Sort /@ RandomChoice[Range[1000], {10^5, 2}];
num = RandomChoice[Range[1000], 5*10^5];
testData = RandomSample[Join[inf, neginf, int, num]];
Answer
A modest improvement when you replace Replace[...] with Transpose@Thread:
(udata = Sort[DeleteDuplicates[Flatten@testData], Less];
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
out1 = Replace[testData /. dsptch, a_Integer :> {a, a}, 1];) // AbsoluteTiming
(* {2.1282128, Null} *)
(udata = Sort[DeleteDuplicates[Flatten@testData], Less];
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
out2 = Transpose@Thread[testData /. dsptch];) // AbsoluteTiming
(* {1.9421942, Null} *)
out1==out2
(* True *)
Comments
Post a Comment