Given a string of alphanumerical characters, how to split it simply and quickly at the center of continuous letter-substrings? Is there an elegant and fast solutions out there in the "computational universe"?
The splitter should create "syllables" with one digit as a nucleus for each syllable, that is, at the end there should be only one digit per sublist. When there are more letter characters between digits, letters should be shared by the bordering digits (here I simulated a half share, distributing towards the right bordering digit in case of an odd number of letters), and starting and ending letter-sequences should be just attached to the closest digit.
"xxx00xxx000x0xx0xxxx000xx0xx" (* original string *)
"xxx0 | 0x | xx0 | 0 | 0 | x0x | x0xx | xx0 | 0 | 0x | x0xx" (* intermediate *)
{"xxx0", "0x", "xx0", "0", "0", "x0x", "x0xx", "xx0", "0", "0x", "x0xx"} (* end *)
Note that the string never contains spaces by default.
Answer
Here is a faster version of István's function:
split[s_String] :=
StringReplace[s, {
StartOfString ~~ l : LetterCharacter .. :> l,
l : LetterCharacter .. ~~ EndOfString :> l,
l : LetterCharacter .. :>
StringInsert[l, " ", 1 + Quotient[StringLength@l, 2] ],
d : Repeated[DigitCharacter, {2, ∞}] :>
StringJoin @ Riffle[Characters@d, " "]
}] // StringSplit
Timings:
str = StringJoin @@ (RandomInteger[{0, 1}, {500000}] /. {0 -> "0", 1 -> "x"});
First@AbsoluteTiming[istvan = splitIstvan@str;]
First@AbsoluteTiming[mrwizard = split@str;]
istvan === mrwizard
0.7710441
0.4260243
True
Comments
Post a Comment