Skip to main content

syntax - Convenient string manipulation


With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL one must juggle a lot of code to accomplish the same task.


The available functionality is not bad but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.


In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if doing so would not break stacks of code. Nevertheless it precludes certain terse string syntax wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.




What is the best way to make string manipulation more convenient in Mathematica?


Ideas that come to mind, either alone or in combination, are:




  1. Overload existing functions to work on strings, e.g. Take, Replace, Reverse.




    • This was the original topic of my question to which Sasha replied. It was seen as inadvisable.




  2. Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> RegEx




  3. Create new infix syntax for string functions, and possibly new string operations.





  4. Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)




  5. A variation of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.




  6. Call another language such as PERL from within Mathematica to handle string processing.





  7. Create new string functions that conglomerate frequently used sequences of operations.





Answer



I suggest an approach based on creating lexical and / or dynamic environments (custom scoping constructs if you wish), inside which the rules of our "universe" will be altered. I will illustrate with a dynamic environment:


ClearAll[withStringManipulations];
SetAttributes[withStringManipulations, HoldAll];
withStringManipulations[code_] :=
Internal`InheritedBlock[{Take, Drop, Position, Join, Append,
Prepend, Length, Part, Plus},

Unprotect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
Take[s_String, pos_] := StringTake[s, pos];
Drop[s_String, pos_] := StringDrop[s, pos];
HoldPattern[Part[s_String, n_]] := StringTake[s, {n, n}];
Join[ss__String] := StringJoin[ss];
Append[s_String, ss_String] := StringJoin[s, ss];
Prepend[s_String, ss_String] := StringJoin[ss, s];
Length[s_String] := StringLength[s];
Plus =
Function[Null,

If[MatchQ[{##}, {__String}],
StringJoin[##],
(* else *)
Module[{result, ov = OwnValues[Plus]},
Unprotect[Plus];
OwnValues[Plus] = {};
result = Plus[##];
OwnValues[Plus] = ov;
Protect[Plus];
result]]];

Protect[Take, Drop, Position, Join, Append, Prepend, Length, Part, Plus];
code
];

This is not a complete set of things you can do, just an example. Because I used Internal`InheritedBlock, the global versions of functions Part etc are never modified, so this is safe in the sense that it does not have system-wide effects. With Plus, I had to go through some pain, since it has an Orderless attribute and I did not want to alter that, but wanted to avoid sorting when arguments are strings.


Some examples:


In[31]:= withStringManipulations["a"+"b"+"c"]
Out[31]= abc

In[32]:= withStringManipulations[1+2+3]

Out[32]= 6

In[34]:= withStringManipulations[With[{s = "abc"},Table[s[[i]],{i,Length[s]}]]]//InputForm
Out[34]//InputForm=
{"a", "b", "c"}

withStringManipulations[Append["abc","d"]]
Out[37]= abcd

As I said, this is just an example to illustrate the idea. Anyone interested can create their own environments by setting their own rules. This is IMO a very cheap and powerful way to reuse the system functions' syntax to one's liking, without endangering the system.



Be aware, however, that the above environment is dynamic (in terms of scoping), and so not suitable for example to create higher-order functions which would accept some arbitrary user's code (unless the user knows exactly what the consequences will be, but in practice you as a package-writer can not depend on the user much), since these functions (Part etc) will be also behaving differently in that code. It is also possible to create lexical environments, where the changes will only affect the code literally present inside the environment.


Comments

Popular posts from this blog

mathematical optimization - Minimizing using indices, error: Part::pkspec1: The expression cannot be used as a part specification

I want to use Minimize where the variables to minimize are indices pointing into an array. Here a MWE that hopefully shows what my problem is. vars = u@# & /@ Range[3]; cons = Flatten@ { Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; Minimize[{Total@((vec1[[#]] - vec2[[u[#]]])^2 & /@ Range[1, 3]), cons}, vars, Integers] The error I get: Part::pkspec1: The expression u[1] cannot be used as a part specification. >> Answer Ok, it seems that one can get around Mathematica trying to evaluate vec2[[u[1]]] too early by using the function Indexed[vec2,u[1]] . The working MWE would then look like the following: vars = u@# & /@ Range[3]; cons = Flatten@{ Table[(u[j] != #) & /@ vars[[j + 1 ;; -1]], {j, 1, 3 - 1}], 1 vec1 = {1, 2, 3}; vec2 = {1, 2, 3}; NMinimize[ {Total@((vec1[[#]] - Indexed[vec2, u[#]])^2 & /@ R...

functions - Get leading series expansion term?

Given a function f[x] , I would like to have a function leadingSeries that returns just the leading term in the series around x=0 . For example: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x)] x and leadingSeries[(1/x + 2 + (1 - 1/x^3)/4)/(4 + x)] -(1/(16 x^3)) Is there such a function in Mathematica? Or maybe one can implement it efficiently? EDIT I finally went with the following implementation, based on Carl Woll 's answer: lds[ex_,x_]:=( (ex/.x->(x+O[x]^2))/.SeriesData[U_,Z_,L_List,Mi_,Ma_,De_]:>SeriesData[U,Z,{L[[1]]},Mi,Mi+1,De]//Quiet//Normal) The advantage is, that this one also properly works with functions whose leading term is a constant: lds[Exp[x],x] 1 Answer Update 1 Updated to eliminate SeriesData and to not return additional terms Perhaps you could use: leadingSeries[expr_, x_] := Normal[expr /. x->(x+O[x]^2) /. a_List :> Take[a, 1]] Then for your examples: leadingSeries[(1/x + 2)/(4 + 1/x^2 + x), x] leadingSeries[Exp[x], x] leadingSeries[(1/x + 2 + (1 - 1/x...

What is and isn't a valid variable specification for Manipulate?

I have an expression whose terms have arguments (representing subscripts), like this: myExpr = A[0] + V[1,T] I would like to put it inside a Manipulate to see its value as I move around the parameters. (The goal is eventually to plot it wrt one of the variables inside.) However, Mathematica complains when I set V[1,T] as a manipulated variable: Manipulate[Evaluate[myExpr], {A[0], 0, 1}, {V[1, T], 0, 1}] (*Manipulate::vsform: Manipulate argument {V[1,T],0,1} does not have the correct form for a variable specification. >> *) As a workaround, if I get rid of the symbol T inside the argument, it works fine: Manipulate[ Evaluate[myExpr /. T -> 15], {A[0], 0, 1}, {V[1, 15], 0, 1}] Why this behavior? Can anyone point me to the documentation that says what counts as a valid variable? And is there a way to get Manpiulate to accept an expression with a symbolic argument as a variable? Investigations I've done so far: I tried using variableQ from this answer , but it says V[1...