rule - Difference between Association and Dispatch

In the example below, Association has a different behavior from Dispatch:

{1, 2, 3} /. Association@{1 -> "test",2 -> "test" , _Integer -> Null}
{1, 2, 3} /. Dispatch@{1 -> "test" ,2 -> "test" , _Integer -> Null}

{"test", "test", 3}
{"test", "test", Null}

The pattern _Integer -> Null is not applied on the first case.

The question is: There is some way to efficiently make Association behave like Dispatch? Or Dispatch is the best solution for this case?

PS: this is a toy code, my original list if much bigger and used a lot o times, so a hash table is necessary.

Answer

General

If you are not going to change your list of rules after you construct them, Dispatch is pretty good. From the user's viewpoint, the main difference is that it is cheap to add new key-value pairs to associations, or remove existing ones, constructing new associations. Not so with Dispatch - once you obtained the Dispatch-ed set of rules, you can't really change the rule set efficiently. OTOH, you can apply Dispatch-ed rules to any expression (and at any level) efficiently, while Associations are more limited here - you can only extract the values for keys. So, these constructs generally have different sets of use cases, which however do have a significant overlap.

The case of duplicate keys

As Istvan noted in comments, there are important differences in semantics between rule lists / Dispatch and Associations, in the way they treat duplicate keys.

The first one is that, while in rule application (with or without Dispatch), the rule is that "the first one wins" (since rules are applied left to right), for Associations the rule is that "the last one wins", since the key-value pair which occurs later in the rule list, overwrites the earlier entries with the same key.

So, in particiular:

ClearAll[a];
a /. {a -> 0, a -> 1}
Lookup[{a -> 0, a -> 1}, a]
Lookup[<|a -> 0, a -> 1|> , a]

(*
    0
    0
    1
*)

The second difference is that in fact, while lists of rules, and also Dispatch, actually do store all rules, even if normally only the first matching one is used, Associations never store more than one rule for a given key: all earlier key-value pairs are eliminated at association construction time, and only the last entry remains.

This may matter in some cases. Sometime we may want to get the results of all possible rule applications, e.g. with ReplaceList:

ReplaceList[a, {a -> 0, a -> 1}]

(* {0, 1} *)

This will also work for Dispatch- ed rules, but not for Associations, for reasons I just outlined.

Using `Lookup` to emulate `Dispatch` with `Association`

You can somewhat emulate the action of Dispatch-ed rules on a list of elements, by using Lookup with Associations. Lookup can take a list of keys. It has also optional argument for a default value. So, you can do

Lookup[Association@{1 -> "test", 2 -> "test"}, {1, 2, 3}, Null]

(* {"test", "test", Null}  *)

We can now make a quick comparison for larger volumes of data:

assocLrg = AssociationThread[Range[1000000] -> Range[1000000] + 1];
dispatchedLarge = 
   Dispatch[Append[Thread[Range[1000000] -> Range[1000000] + 1], _Integer -> Null]];

The space they occupy is the same:

ByteCount[dispatchedLarge]

(* 116390576 *)

ByteCount[assocLrg]

(* 116390368 *)

It might be an indication that Dispatch has been reimplemented to use Associations under the hood, or it might not. In any case, their key extraction speeds are comparable:

Range[100000] /. dispatchedLarge; // AbsoluteTiming


(* {0.073587, Null} *)

Lookup[assocLrg, Range[100000], Null]; // AbsoluteTiming

(* {0.041016, Null} *)

It may look as if Dispatch is much slower, but let's not forget that ReplaceAll is a pretty imprecise operation. We now use Replace with level 1:

Replace[Range[100000], dispatchedLarge, {1}]; // AbsoluteTiming


(* {0.047408, Null} *)

and observe the timing in the same range as for associations, if a tiny bit slower.

In the case where you don't need to change the set of key-value pairs after it has been constructed, it is probably more a matter of personal preferences which one to use, at least at the time of this writing.

Summary

Associations and Dispatched rules are not the same constructs, although their use cases do have a significant overlap. For such uses, they are more or less speed - equivalent, as have the same memory efficiency as well.

They also have significant differences, both in semantics and in their sets of use cases, so one can't fully replace one with the other in all cases. As always, which one to use depends on the problem at hand. However, many cases where in the past Dispatch was the only way to get efficient solutions, are done conceptually cleaner with Associations.

Blog

Search This Blog