Given System
names of 2 or more characters:
systemNames = Names["System`*"] // Select[StringLength[#] > 1 &] ;
Is there a more compact way to split at upper case letters that avoids the Partition
and subsequent StringJoin
?
EDIT: match to digits as well as upper case, but still w/ limitations outlined below:
systemNames //
Map[StringSplit[#,
x : c_ /; (UpperCaseQ[c] || DigitQ[c]) :> x] & /* (Partition[#, 2] & )] //
Map[Map[StringJoin]]
Also, what's the easiest way to also split at $
and also the final upper case char, eg N
? I coudn't find an easy way with Alternatives
and longer ___
.
Answer
You can do all of it directly with StringCases
!
StringCases[systemNames, hump : (CharacterRange["A", "Z"] | DigitCharacter) | "$" ~~ restOfCamel : CharacterRange["a", "z"] ...]
(*
{{A,A,S,Triangle},{Abelian,Group},{Abort},{Abort,Kernels},{Abort,Protect},
{Above},{Abs}, ...5182... ,{$,User,Base,Directory},
{$,User,Documents,Directory},{$,User,Name},{$,Version},{$,Version,Number},
{$,Wolfram,I,D},{$,Wolfram,U,U,I,D}}
*)
And, out of pure envy, a version that wrongly groups abbreviations as suggested by @WReach!
StringCases[systemNames, hump : CharacterRange["A", "Z"] .. | "$" ~~ restOfCamel : CharacterRange["a", "z"] ...]
(*
{{AASTriangle},{Abelian,Group},{Abort},{Abort,Kernels},{Abort,Protect},{Above},
{Abs},{Absolute}, ...5185... ,{$,User,Base,Directory},
{$,User,Documents,Directory},{$,User,Name},{$,Version},{$,Version,Number},
{$,Wolfram,ID},{$,Wolfram,UUID}}
*)
Just 2 dots of difference! Finally something that was easier to do with ordinary patterns instead of regexes! Noooot.
Finally
A version with ordinary patterns that does the marvels that @WReach`s look-ahead regexes can do. Two separate pattern "levels" must be used, though.
StringCases[systemNames, hump : (CharacterRange["A", "Z"] | DigitCharacter) | "$" ~~ restOfCamel : CharacterRange["a", "z"] ...]
Replace[%, {pre___, Longest[d1__?UpperCaseQ], post___} :> {pre, StringJoin@d1, post}, {1}]
(*
{{"AAS", "Triangle"}, {"Abelian", "Group"}, {"Abort"}, {"Abort", "Kernels"}, {"Abort", "Protect"},
{"Above"}, {"Abs"}, {"Absolute"}, ...5184... , {"$", "User", "Base", "Directory"},
{"$", "User", "Documents","Directory"}, {"$", "User", "Name"}, {"$", "Version"}, {"$", "Version", "Number"},
{"$", "Wolfram", "ID"}, {"$", "Wolfram", "UUID"}}
*)
And a quick fix for the {Te,X}
issue
Replace[%, {pre___, "Te", "X", post___} :> {pre, "TeX", post}, {1}]
Comments
Post a Comment