In the example below, Association
has a different behavior from Dispatch
:
{1, 2, 3} /. Association@{1 -> "test",2 -> "test" , _Integer -> Null}
{1, 2, 3} /. Dispatch@{1 -> "test" ,2 -> "test" , _Integer -> Null}
{"test", "test", 3}
{"test", "test", Null}
The pattern _Integer -> Null
is not applied on the first case.
The question is: There is some way to efficiently make Association
behave like Dispatch
? Or Dispatch
is the best solution for this case?
PS: this is a toy code, my original list if much bigger and used a lot o times, so a hash table is necessary.
Answer
General
If you are not going to change your list of rules after you construct them, Dispatch
is pretty good. From the user's viewpoint, the main difference is that it is cheap to add new key-value pairs to associations, or remove existing ones, constructing new associations. Not so with Dispatch
- once you obtained the Dispatch
-ed set of rules, you can't really change the rule set efficiently. OTOH, you can apply Dispatch
-ed rules to any expression (and at any level) efficiently, while Association
s are more limited here - you can only extract the values for keys. So, these constructs generally have different sets of use cases, which however do have a significant overlap.
The case of duplicate keys
As Istvan noted in comments, there are important differences in semantics between rule lists / Dispatch
and Association
s, in the way they treat duplicate keys.
The first one is that, while in rule application (with or without Dispatch
), the rule is that "the first one wins" (since rules are applied left to right), for Association
s the rule is that "the last one wins", since the key-value pair which occurs later in the rule list, overwrites the earlier entries with the same key.
So, in particiular:
ClearAll[a];
a /. {a -> 0, a -> 1}
Lookup[{a -> 0, a -> 1}, a]
Lookup[<|a -> 0, a -> 1|> , a]
(*
0
0
1
*)
The second difference is that in fact, while lists of rules, and also Dispatch
, actually do store all rules, even if normally only the first matching one is used, Associations
never store more than one rule for a given key: all earlier key-value pairs are eliminated at association construction time, and only the last entry remains.
This may matter in some cases. Sometime we may want to get the results of all possible rule applications, e.g. with ReplaceList
:
ReplaceList[a, {a -> 0, a -> 1}]
(* {0, 1} *)
This will also work for Dispatch
- ed rules, but not for Association
s, for reasons I just outlined.
Using Lookup
to emulate Dispatch
with Association
You can somewhat emulate the action of Dispatch
-ed rules on a list of elements, by using Lookup
with Association
s. Lookup
can take a list of keys. It has also optional argument for a default value. So, you can do
Lookup[Association@{1 -> "test", 2 -> "test"}, {1, 2, 3}, Null]
(* {"test", "test", Null} *)
We can now make a quick comparison for larger volumes of data:
assocLrg = AssociationThread[Range[1000000] -> Range[1000000] + 1];
dispatchedLarge =
Dispatch[Append[Thread[Range[1000000] -> Range[1000000] + 1], _Integer -> Null]];
The space they occupy is the same:
ByteCount[dispatchedLarge]
(* 116390576 *)
ByteCount[assocLrg]
(* 116390368 *)
It might be an indication that Dispatch
has been reimplemented to use Association
s under the hood, or it might not. In any case, their key extraction speeds are comparable:
Range[100000] /. dispatchedLarge; // AbsoluteTiming
(* {0.073587, Null} *)
Lookup[assocLrg, Range[100000], Null]; // AbsoluteTiming
(* {0.041016, Null} *)
It may look as if Dispatch
is much slower, but let's not forget that ReplaceAll
is a pretty imprecise operation. We now use Replace
with level 1:
Replace[Range[100000], dispatchedLarge, {1}]; // AbsoluteTiming
(* {0.047408, Null} *)
and observe the timing in the same range as for associations, if a tiny bit slower.
In the case where you don't need to change the set of key-value pairs after it has been constructed, it is probably more a matter of personal preferences which one to use, at least at the time of this writing.
Summary
Association
s and Dispatch
ed rules are not the same constructs, although their use cases do have a significant overlap. For such uses, they are more or less speed - equivalent, as have the same memory efficiency as well.
They also have significant differences, both in semantics and in their sets of use cases, so one can't fully replace one with the other in all cases. As always, which one to use depends on the problem at hand. However, many cases where in the past Dispatch
was the only way to get efficient solutions, are done conceptually cleaner with Association
s.
Comments
Post a Comment