Is there a way that NotebookFind
can be used to match string pattern expressions rather than just strings?
The documentation for NotebookFind
states that only a string, box expression or complete cell can be used as the search term so my question is really whether or not pattern matching can be achieved through writing some additional code that wraps or replaces NotebookFind
.
One obvious strategy would be to convert the notebook to a text representation using NotebookGet
and then perform the pattern matching search on the text representation, but this is not ideal for my intended application because I would like any match that is found to be highlighted (by selecting it) much like NotebookFind
already does.
Eventually I would like to build a replacement for Mathematica's built-in Search and Replace functionality. Two key enhancements that I hope to provide are:
the ability to search and replace across all open notebooks in the front-end or all notebooks in a selected directory (which is not too difficult to accomplish) and
the ability to search and replace using string pattern expressions.
I realize that Workbench already offers these features. My goal is to enable users who prefer the notebook interface (rather than the .m editor promoted by Workbench) to continue developing complex multi-notebook packages from within the front-end.
Edit:
Celtschk proposes a strategy below in the comments that may provide a partial solution. One of the issues that is still not clear however is how to deal with surrounding context in a pattern match when returning to NotebookFind
.
Perhaps the following example will help clarify the potential problem. Without digressing into the theory of formal grammars, let's say that we want our string pattern language to be powerful enough to express not just wildcard patterns but also surrounding context. Imagine in particular that we want to find each occurrence of the string pattern "foo?" in some notebook that is enclosed by a pair of parentheses (not necessarily immediately surrounding the "foo?" pattern). We can do that easily using standard Mathematica string pattern expressions by operating on the string representation of the notebook.
Let's now assume that there is one occurrence of "foo1" and two occurrences of "foo2" in the notebook, the latter of which is not surrounded at any distance by a pair of parentheses. How would we then exclude the second "foo2" from being found when we return to NotebookFind
to search for "foo1" and "foo2"?
Of course we could have matched the entire string plus surrounding context (which in this case would include the surrounding pair of parentheses) when searching the string representation for parentheses-enclosed instances of "foo?" -- but this is not really what we want, and in certain instances could be quite inconvenient in a tool designed to assist the user in refactoring a large body of Mathematica code.
Answer
Ok so this is going to be a long one. This is definitely not a general purpose implementation, but It shows the general idea that one could use.
So you basicly want to be able to type out NotebookFind["(.?foo\d.?)"], which would match to for example "(something something foo4 dark side)". However you only want it to highlight foo4, and not the rest. So the way to do this is to first search through the notebook and figure out that our pattern matches the entire string, and search only for the particular realized sub-expression "foo4" and figure out which of the potentially many search result for foo4 collides with the search for the entire pattern.
So for the purpose of this implimentation I'll assume that you have a RegularExpression pattern, where the part you want to highlight the first matched subpattern (Which means you enclose it in parenthesis in the search string). So the above pattern would be: RegularExpression["[(].?(foo[\d]+).?[)]"]. We then:
- search through strings in the notebook expression for cases where this matches
- then extract the subexpression matched,
- then sort out how many times we match the subexpression without matching the full.
- Then call NotebookFind[] enough times to land on the correct match.
So here goes for the actual code. It doesn't work for matching notebook level expressions and only searches through strings.
This function just creates a pattern for the actual substitution based on the search pattern.
StringPatternWrapper[stringpattern_]:=
(a_String/;StringMatchQ[a,stringpattern]):>StringCases[a,stringpattern:>"$1"]
This function finds the positions and cases of the matched pattern. The pattern provided for this function should first be sent through StringPatternWrapper[]-
findPostionAndExactMatch[nbexp_,pattern_]:=
{Position[nbexp,pattern[[1]],∞],
Cases[nbexp,pattern,∞]}//ridiculousFormatingFunction
where ridiculousFormatingFunction is a messy function for reformating the output.
ridiculousFormatingFunction[list_] :=
Map[(a\[Function]Map[{a[[1]],#}&,a[[2]]]),Transpose[list]]//
(Flatten[Table[{#[[1,1]],#[[1,2]],n},{n,1,#[[2]]}]&/@Tally[Flatten[#,1]],1])&
And then a function for finding all the matches to the matched subexpression
findAllExactMatches[nbexp_,exact_] :=
Flatten[Map[Table[#, {Length@StringCases[nbexp[[Sequence@@#]],exact]}]&,
Position[nbexp,a_String/;StringMatchQ[a,exact],∞]],1]
Because some stings might contain more then one match, we need some fixing of the numbers
repeatNumberForMatch[match_,nbexp_] :=
First@Position[
findAllExactMatches[nbexp,RegularExpression[".*?"<>match[[2]]<>".*?"]],match[[1]]
][[match[[3]]]]
And finally we have a nice little function which returns a list of all the matches, which expressions they apear in, and how many times you need to skip when using NotebookFind.
matchTable[nbexp_,pattern_] := Prepend[#,repeatNumberForMatch[#,nbexp]]&/@
findPostionAndExactMatch[nbexp,pattern]
Here is an output example from Match table using the provided example notebook below:
matchTable[NotebookGet[nb],
StringPatternWrapper[
RegularExpression[".*?" <> "[(].*?(foo[\\d]+).*?[)]" <> ".*?"]]] //
Prepend[#, {"Repat find number", "Indices", "Exact Match",
"Number inside string"}] & // Grid
Here is a short usage example
notebookFindN[nb_,find_,n_]:=
(SelectionMove[nb,Before,Notebook];Do[NotebookFind[nb,find,Next],{n}])
clickerUI[nb_,pattern_]:=
Button[#[[3]],notebookFindN[nb,#[[3]],#[[1]]]]&/@matchTable[NotebookGet[nb],pattern]
And some test code and a test notebook:
nb = {
Cell["This is a direct match for the realised sub-pattern foo1 but not the full", "Text"],
Cell["This is another match identical to the realised one foo2, and still not the full, however this one needs to be skiped when using NotebookFind", "Text"],
Cell["And finally ( we have a full match for foo2 the pattern ) (foo2) <- That's another one, and so is that -> (foo4)", "Text"],
Cell["Some times a single string can have more then one entry foo2 foo2, so we need to count how many and which ones we are looking for, which makes the code slightly messy.", "Text"],
Cell["And finally ( we have one last full match for foo2 ) enclosed in parenthesis", "Text"]
} // CreateDocument;
clickerUI[nb,
StringPatternWrapper@
RegularExpression[".*?" <> "[(].*?(foo[\\d]+).*?[)]" <>".*?"]
] // Row
Hope this can be of some help. Personally I'd like to have code that could equally well search though strings and notebook level expressions, however this requires a better structuring of the method I think.
Comments
Post a Comment