Given
ds = Dataset[{"a b", "c-d"} ]
multi-character StringSplit
is broken with Dataset
(10.1 regression?)
ds[All, StringSplit[#, {" ", "-"}] &]
though single split charaters works:
ds[All, StringSplit[#, " "] &] // Normal
{{"a", "b"}, {"c-d"}}
As does plain non-Dataset version of multi-char of course (same output as above)
ds // Normal // Map[StringSplit[#, {" ", "-"}] &]
Answer
This issue is due to the same type-inferencing problem described here.
Using printSignatures
from the referenced answer, we can see that the type inferencer will only accept a single string as the second argument, not a list:
printSignatures[StringSplit]
(*
{Vector[Atom[String], n_]}
{Atom[String]}
{Atom[String], Atom[String]}
{Vector[Atom[String], n_], Atom[String]}
*)
This list of valid signatures will only accept a single string as the second argument.
The referenced answer shows how to dodge the type-inferencer. We can use similar work-arounds here: either by using Query
directly on the raw data...
ds // Normal // Query[Dataset, StringSplit[#, {" ", "-"}] &]
... or by disguising the StringSplit
operator:
ds[All, StringSplit&[][#, {" ", "-"}] &]
Notice how the second work-around loses useful type information in this case, causing the dataset visualization to fall back to a cruder form. We can restore the missing type information by inserting a terminal Dataset
ascending operator into the query:
ds[Dataset, StringSplit&[][#, {" ", "-"}] &]
This last operation causes the proper type information to be deduced from the final output data (using TypeSystem`DeduceType
), restoring the proper visualization.
Comments
Post a Comment