Skip to main content

syntax - Convenient string manipulation


With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL one must juggle a lot of code to accomplish the same task.


The available functionality is not bad but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.


In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if doing so would not break stacks of code. Nevertheless it precludes certain terse string syntax wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.




What is the best way to make string manipulation more convenient in Mathematica?


Ideas that come to mind, either alone or in combination, are:




  1. Overload existing functions to work on strings, e.g. Take, Replace, Reverse.




    • This was the original topic of my question to which Sasha replied. It was seen as inadvisable.




  2. Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> RegEx




  3. Create new infix syntax for string functions, and possibly new string operations.





  4. Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)




  5. A variation of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.




  6. Call another language such as PERL from within Mathematica to handle string processing.





  7. Create new string functions that conglomerate frequently used sequences of operations.





Answer



I suggest an approach based on creating lexical and / or dynamic environments (custom scoping constructs if you wish), inside which the rules of our "universe" will be altered. I will illustrate with a dynamic environment:


ClearAll[withStringManipulations];
SetAttributes[withStringManipulations, HoldAll];
withStringManipulations[code_] :=
Internal`InheritedBlock[{Take, Drop, Position, Join, Append,
Prepend, Length, Part, Plus},

Unprotect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
Take[s_String, pos_] := StringTake[s, pos];
Drop[s_String, pos_] := StringDrop[s, pos];
HoldPattern[Part[s_String, n_]] := StringTake[s, {n, n}];
Join[ss__String] := StringJoin[ss];
Append[s_String, ss_String] := StringJoin[s, ss];
Prepend[s_String, ss_String] := StringJoin[ss, s];
Length[s_String] := StringLength[s];
Plus =
Function[Null,

If[MatchQ[{##}, {__String}],
StringJoin[##],
(* else *)
Module[{result, ov = OwnValues[Plus]},
Unprotect[Plus];
OwnValues[Plus] = {};
result = Plus[##];
OwnValues[Plus] = ov;
Protect[Plus];
result]]];

Protect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
code
];

This is not a complete set of things you can do, just an example. Because I used Internal`InheritedBlock, the global versions of functions Part etc are never modified, so this is safe in the sense that it does not have system-wide effects. With Plus, I had to go through some pain, since it has an Orderless attribute and I did not want to alter that, but wanted to avoid sorting when arguments are strings.


Some examples:


In[31]:= withStringManipulations["a"+"b"+"c"]
Out[31]= abc

In[32]:= withStringManipulations[1+2+3]

Out[32]= 6

In[34]:= withStringManipulations[With[{s = "abc"},Table[s[[i]],{i,Length[s]}]]]//InputForm
Out[34]//InputForm=
{"a", "b", "c"}

withStringManipulations[Append["abc","d"]]
Out[37]= abcd

As I said, this is just an example to illustrate the idea. Anyone interested can create their own environments by setting their own rules. This is IMO a very cheap and powerful way to reuse the system functions' syntax to one's liking, without endangering the system.



Be aware, however, that the above environment is dynamic (in terms of scoping), and so not suitable for example to create higher-order functions which would accept some arbitrary user's code (unless the user knows exactly what the consequences will be, but in practice you as a package-writer can not depend on the user much), since these functions (Part etc) will be also behaving differently in that code. It is also possible to create lexical environments, where the changes will only affect the code literally present inside the environment.


Comments

Popular posts from this blog

plotting - How to draw lines between specified dots on ListPlot?

I would like to create a plot where I have unconnected dots and some connected. So far, I have figured out how to draw the dots. My code is the following: ListPlot[{{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4,13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full] I have thought using ListLinePlot command, but I don't know how to specify to the command to draw only selected lines between the dots. Do have any suggestions/hints on how to do that? Thank you. Answer One possibility would be to use Epilog with Line : ListPlot[ {{1, 1}, {2, 2}, {3, 3}, {4, 4}, {1, 4}, {2, 5}, {3, 6}, {4, 7}, {1, 7}, {2, 8}, {3, 9}, {4, 10}, {1, 10}, {2, 11}, {3, 12}, {4, 13}, {2.5, 7}}, Ticks -> {{1, 2, 3, 4}, None}, AxesStyle -> Thin, TicksStyle -> Directive[Black, Bold, 12], Mesh -> Full, Epilog -> { Line[ ...

equation solving - Invert and fit implicitly defined curve

I need to fit an implicitly defined curve. I thought I could get some data out of Solve , and then using FindFit . Therefore, I would like to find the relation the parametric curve defined by $F(x,y)=0$: Solve[-(1/2) + 1/2 (0.41202 BesselK[0, 0.1 Sqrt[x^2 + y^2]] + (0.101483 x BesselK[1, 0.1 Sqrt[x^2 + y^2]])/Sqrt[x^2 + y^2]) == 0, y] But I can't get an output: Solve was unable to solve the system with inexact coefficients or the system obtained by direct rationalization of inexact numbers present in the system. Since many of the methods used by Solve require exact input, providing Solve with an exact version of the system may help. >> Edit: In particular, I would like to fit the data coming from the curve with the expression of another curve, and not with a function $f(x)$. In particular, since this clearly looks like a cardioid , I would like it to fit to something like it. What other strategies could I try?

dynamic - How can I make a clickable ArrayPlot that returns input?

I would like to create a dynamic ArrayPlot so that the rectangles, when clicked, provide the input. Can I use ArrayPlot for this? Or is there something else I should have to use? Answer ArrayPlot is much more than just a simple array like Grid : it represents a ranged 2D dataset, and its visualization can be finetuned by options like DataReversed and DataRange . These features make it quite complicated to reproduce the same layout and order with Grid . Here I offer AnnotatedArrayPlot which comes in handy when your dataset is more than just a flat 2D array. The dynamic interface allows highlighting individual cells and possibly interacting with them. AnnotatedArrayPlot works the same way as ArrayPlot and accepts the same options plus Enabled , HighlightCoordinates , HighlightStyle and HighlightElementFunction . data = {{Missing["HasSomeMoreData"], GrayLevel[ 1], {RGBColor[0, 1, 1], RGBColor[0, 0, 1], GrayLevel[1]}, RGBColor[0, 1, 0]}, {GrayLevel[0], GrayLevel...