syntax - Convenient string manipulation

With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL one must juggle a lot of code to accomplish the same task.

The available functionality is not bad but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.

In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if doing so would not break stacks of code. Nevertheless it precludes certain terse string syntax wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.

What is the best way to make string manipulation more convenient in Mathematica?

Ideas that come to mind, either alone or in combination, are:

Overload existing functions to work on strings, e.g. Take, Replace, Reverse.
- This was the original topic of my question to which Sasha replied. It was seen as inadvisable.

Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> RegEx

Create new infix syntax for string functions, and possibly new string operations.

Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)

A variation of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.

Call another language such as PERL from within Mathematica to handle string processing.

Create new string functions that conglomerate frequently used sequences of operations.

Answer

I suggest an approach based on creating lexical and / or dynamic environments (custom scoping constructs if you wish), inside which the rules of our "universe" will be altered. I will illustrate with a dynamic environment:

ClearAll[withStringManipulations];
SetAttributes[withStringManipulations, HoldAll];
withStringManipulations[code_] :=
  Internal`InheritedBlock[{Take, Drop, Position, Join, Append, 
        Prepend, Length, Part, Plus},

   Unprotect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
   Take[s_String, pos_] := StringTake[s, pos];
   Drop[s_String, pos_] := StringDrop[s, pos];
   HoldPattern[Part[s_String, n_]] := StringTake[s, {n, n}];
   Join[ss__String] := StringJoin[ss];
   Append[s_String, ss_String] := StringJoin[s, ss];
   Prepend[s_String, ss_String] := StringJoin[ss, s];
   Length[s_String] := StringLength[s];
   Plus = 
    Function[Null, 

      If[MatchQ[{##}, {__String}],
        StringJoin[##],
        (* else *)
        Module[{result, ov = OwnValues[Plus]},
          Unprotect[Plus];
          OwnValues[Plus] = {};
          result = Plus[##];
          OwnValues[Plus] = ov;
          Protect[Plus];
          result]]];

   Protect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
   code
];

This is not a complete set of things you can do, just an example. Because I used Internal`InheritedBlock, the global versions of functions Part etc are never modified, so this is safe in the sense that it does not have system-wide effects. With Plus, I had to go through some pain, since it has an Orderless attribute and I did not want to alter that, but wanted to avoid sorting when arguments are strings.

Some examples:

In[31]:= withStringManipulations["a"+"b"+"c"]
Out[31]= abc

In[32]:= withStringManipulations[1+2+3]

Out[32]= 6

In[34]:= withStringManipulations[With[{s = "abc"},Table[s[[i]],{i,Length[s]}]]]//InputForm
Out[34]//InputForm=
 {"a", "b", "c"}

withStringManipulations[Append["abc","d"]]
Out[37]= abcd

As I said, this is just an example to illustrate the idea. Anyone interested can create their own environments by setting their own rules. This is IMO a very cheap and powerful way to reuse the system functions' syntax to one's liking, without endangering the system.

Be aware, however, that the above environment is dynamic (in terms of scoping), and so not suitable for example to create higher-order functions which would accept some arbitrary user's code (unless the user knows exactly what the consequences will be, but in practice you as a package-writer can not depend on the user much), since these functions (Part etc) will be also behaving differently in that code. It is also possible to create lexical environments, where the changes will only affect the code literally present inside the environment.

Blog

Search This Blog

syntax - Convenient string manipulation

What is the best way to make string manipulation more convenient in Mathematica?

Comments

Post a Comment

Popular posts from this blog

front end - keyboard shortcut to invoke Insert new matrix

How to thread a list

functions - Get leading series expansion term?