Skip to main content

boxes - Tokenize Mathematica input in a simple way


Background


Usually, I give detailed descriptions when I have a question which sometimes lead to that users don't write their answers because they maybe think their answer is too simple. Therefore, I chose to just throw the direct question in the room and collect the ideas of all answers.



Although it seems, that we cannot get a simple tokenizer by using functions like TreeForm, MakeBoxes, MakeExpression, ... I want to give some background information now:


What really bothers me is, that we have here on Mathematica.SE a highlighter for Mathematica code which is far away from perfect, but which does a reasonable job. If I want to include a snippet of code into a LaTeX document on the other hand, I'm totally stuck with a b/w-pdf export from Mathematica or with the Mathematica 5.2 support of the listings package.


Therefore, I hacked a simple parser of the html-output of our google-prettify plugin. This seems to work reasonable and with a little bit adjustment, one could include styled Mathematica-code into a LaTeX document. It should be noted, that I don't intent to export formulas or sophisticated styled code. I want to stick with good old ascii-style code which is used in most packages.


Before I used the html-output I was again having a long look at Leonids formatter but at its current state it lacks of the same issues since it relies on MakeBoxes as well and there are other issues. Leonid pointed out, that he want to reimplement this completely.


On the other hand, we have functions like SyntaxLength, SyntaxQ, MakeExpression, MakeBoxes (and their To counterparts), all kind of Forms, we can keep expressions unevaluated and so on. Therefore, I was asking myself whether we can do the tokenizing much easier with Mathematica that it is possible with the JavaScript from google-prettify.


Question


Is it possible to implement a reliable tokenizer which takes a valid input-string of Mathematica code and returns a list of tokens without implementing the rules of the Mathematica-language itself?


Although tokens usually don't contain whitespace characters, for the purpose of testing it would be nice, if all characters stay even in the tokenized version.


Especially I want


input == StringJoin@@Tokenize[input]


to return True.


Take for instance this function


Tokenize[str_String /; SyntaxQ[str]] := 
With[{expr = MakeExpression[str, StandardForm]},
Most[Drop[Flatten[MakeBoxes[expr] /. {
RowBox -> List, SuperscriptBox[a_, b_] :> {a, "^", b},
"\[Rule]" :> "->"}], 2]]
];


Tokenize[
"Plot3D[{x^2+y^2,-x^2-y^2},{x,-2,2},{y,-2,2},RegionFunction->Function[{x,y,z},x^2+y^2<=4]]"
]
(*
{"Plot3D", "[", "{", "x", "^", "2", "+", "y", "^", "2", ",",
"-", "x", "^", "2", "-", "y", "^", "2", "}", ",", "{", "x",
",", "-", "2", ",", "2", "}", ",", "{", "y", ",", "-", "2",
",", "2", "}", ",", "RegionFunction", "->",
"Function", "[", "{", "x", ",", "y", ",", "z", "}", ",", "x",
"^", "2", "+", "y", "^", "2", "<=", "4", "]", "]"}

*)

Although the output looks good here, inside Mathematica we have \[LessEqual] instead of <= (due to the StandardForm I assume). Furthermore, all different kind boxes need to be handled and I'm afraid many more things.


Is there any chance to get this working really correctly?


Test examples:


In some of these cases I'm not sure whether my given output is the correct one. E.g. the handling of linebreaks may be system-dependent, a_ seems to stay together in the box-representation (which would be ok), ...


"a\nb" (* {"a","\n","b"} *)
"a_:>a/2<=3" (* {"a_",":>","a","/","2","<=","3"} *)
"1`3+1.00`3" (* I'm not sure how this should be tokenized but my intention should be clear *)

Answer




tokenize[str_] := Module[{exp,
nb = CreateDocument[{ExpressionCell@
InputForm@MakeExpression[str, StandardForm]},
Visible -> False]},
SelectionMove[nb, Next, Cell];
exp = Flatten[
NotebookRead[nb][[1, 1]] /. {RowBox -> List,
i_String /; StringMatchQ[i, Whitespace ..] :> Sequence[]}];
NotebookClose[nb];
exp[[3 ;;-2]]

]

Haven't tested this much. Does this give the output you expect?


tokenize["Plot3D[{x^2+y^2,-x^2-y^2},{x,-2,2},{y,-2,2},\
RegionFunction->Function[{x,y,z},x^2+y^2<=4]]"]

(*{"Plot3D","[","{","x","^","2","+","y","^","2",",","-","x","^","2","-\
","y","^","2","}",",","{","x",",","-","2",",","2","}",",","{","y",",",\
"-","2",",","2","}",",","RegionFunction","->","Function","[","{","x",\
",","y",",","z","}",",","x","^","2","+","y","^","2","<=","4","]","]",\

"]"}*)

EDIT


Thanks to @JohnFultz's recent introduction of the following front end undocumented function, this becomes straightforward


 fultzTokenize[t_String]:=Cases[MathLink`CallFrontEnd[
FrontEnd`UndocumentedTestFEParserPacket[t, False]], _String, Infinity]

Comments

Popular posts from this blog

plotting - Plot 4D data with color as 4th dimension

I have a list of 4D data (x position, y position, amplitude, wavelength). I want to plot x, y, and amplitude on a 3D plot and have the color of the points correspond to the wavelength. I have seen many examples using functions to define color but my wavelength cannot be expressed by an analytic function. Is there a simple way to do this? Answer Here a another possible way to visualize 4D data: data = Flatten[Table[{x, y, x^2 + y^2, Sin[x - y]}, {x, -Pi, Pi,Pi/10}, {y,-Pi,Pi, Pi/10}], 1]; You can use the function Point along with VertexColors . Now the points are places using the first three elements and the color is determined by the fourth. In this case I used Hue, but you can use whatever you prefer. Graphics3D[ Point[data[[All, 1 ;; 3]], VertexColors -> Hue /@ data[[All, 4]]], Axes -> True, BoxRatios -> {1, 1, 1/GoldenRatio}]

plotting - Filling between two spheres in SphericalPlot3D

Manipulate[ SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, Mesh -> None, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], {n, 0, 1}] I cant' seem to be able to make a filling between two spheres. I've already tried the obvious Filling -> {1 -> {2}} but Mathematica doesn't seem to like that option. Is there any easy way around this or ... Answer There is no built-in filling in SphericalPlot3D . One option is to use ParametricPlot3D to draw the surfaces between the two shells: Manipulate[ Show[SphericalPlot3D[{1, 2 - n}, {θ, 0, Pi}, {ϕ, 0, 1.5 Pi}, PlotPoints -> 15, PlotRange -> {-2.2, 2.2}], ParametricPlot3D[{ r {Sin[t] Cos[1.5 Pi], Sin[t] Sin[1.5 Pi], Cos[t]}, r {Sin[t] Cos[0 Pi], Sin[t] Sin[0 Pi], Cos[t]}}, {r, 1, 2 - n}, {t, 0, Pi}, PlotStyle -> Yellow, Mesh -> {2, 15}]], {n, 0, 1}]

plotting - Mathematica: 3D plot based on combined 2D graphs

I have several sigmoidal fits to 3 different datasets, with mean fit predictions plus the 95% confidence limits (not symmetrical around the mean) and the actual data. I would now like to show these different 2D plots projected in 3D as in but then using proper perspective. In the link here they give some solutions to combine the plots using isometric perspective, but I would like to use proper 3 point perspective. Any thoughts? Also any way to show the mean points per time point for each series plus or minus the standard error on the mean would be cool too, either using points+vertical bars, or using spheres plus tubes. Below are some test data and the fit function I am using. Note that I am working on a logit(proportion) scale and that the final vertical scale is Log10(percentage). (* some test data *) data = Table[Null, {i, 4}]; data[[1]] = {{1, -5.8}, {2, -5.4}, {3, -0.8}, {4, -0.2}, {5, 4.6}, {1, -6.4}, {2, -5.6}, {3, -0.7}, {4, 0.04}, {5, 1.0}, {1, -6.8}, {2, -4.7}, {3, -1....