I cannot see any reason for this behavior besides a bug, but I've been wrong before so I put it before the community:
ds = Dataset[{
<|"ID" -> "Alpha", "v1" -> 1, "v2" -> 0|>,
<|"ID" -> "Beta" , "v1" -> 1, "v2" -> 1|>,
<|"ID" -> "Alpha", "v1" -> 1, "v2" -> 0|>}];
ds[Select[#"ID" == "Alpha" &], {"ID"}]
% /. {}
The internal structure changed in from:
Dataset[{<|"ID" -> "Alpha"|>, <|"ID" -> "Alpha"|>},
TypeSystem`Vector[TypeSystem`Struct[{"ID"}, {TypeSystem`Atom[String]}],
TypeSystem`AnyLength], <|
"Origin" ->
HoldComplete[
Query[Select[#ID == "Alpha" &], {"ID"}][Dataset`DatasetHandle[43190226294398]]],
"ID" -> 96078453577381|>]
To:
Dataset[{<|"ID" -> "Alpha"|>, <|"ID" -> "Alpha"|>},
TypeSystem`Vector[TypeSystem`Assoc[TypeSystem`Atom[String], TypeSystem`Atom[String], 1],
2], <|"Origin" -> HoldComplete[Dataset`DatasetHandle[96078453577381] /. {}],
"ID" -> 3384469395109|>]
Why the change?
Why does this change affect the output formatting?
Answer
In the first case, the type of the dataset was deduced via type deduction mechanism applied at Dataset construction. In the second case, it was inferred using a set of type inference rules. The fact that this happens even with inert replacement does not matter: as long as ReplaceAll
is used, type of the resulting Dataset
is inferred rather than deduced.
In some cases, type inference isn't yet able to do as good a job as type deduction, which results in more generic types. This is what we see here. It may still be considered a bug (Taliesin Beynon is the one to confirm this). As to the output formatting, it is tied to the type of the Dataset
. In the first case, it uses specialized formatting rules for type Struct
, in the last case - generic formatting for associations.
Comments
Post a Comment