In C#, we can use regular expression to do some balancing groups, like we can get
{ aa>, <a>}
from
"xx aa> yy<a>"
using the regular expression
<((?<)|(?<-gro>>)|[^<>])*(?(gro)(?!))>
We can get the syntax coloring with for instance RegexBuddy:
How to realize this in Mathematica?
I prefer using regular expression to do this, but Mathematica just supports some basic usage in regular expressions, and doesn't support advanced usage such as dynamic regular expression and balancing groups.
PS: Another example: how to get {[ab*[c]d], (b(x99))}
from "dd9[ab*[c]d]esiddx(45x(b(x99))"
?
Answer
First Case
str = "xx aa> yy<a>";
StringCases[str,
RegularExpression["(?P<([^<>]|(?P>a))*>)"]
]
(* {" aa>", "<a>"} *)
This works as follows:
(?P ...)
namesa
the pattern<([^<>]|(?P>a))*>
.- The string or substring matching this pattern must start with
<
and end with>
. - Within these characters, the pattern
([^<>]|(?P>a))
can be repeated 0 or more times. - This subpattern says that no character can be
<
or>
. If such a character is met while reading the string, then the patterna
is called by(?P>a)
and we start again at bullet 2 with the substring starting with this character.
Second Case
str2 = "dd9[ab*[c]d]esiddx(45x(b(x99))"
StringCases[str2,
RegularExpression["(?P(\\[|\\()([^\\[\\]\\(\\)]|(?P>a))*(\\]|\\)))"]
]
(* {"[ab*[c]d]", "(b(x99))"} *)
This works as above. Here, instead of <
at the beginning of the (sub)string, we allow for [
or (
with (\\[|\\()
. The other modifications are in line with this change.
Note that this regular expression may not be satisfying for cases such as
str3 = "dd9[ab*[c]d)esiddx(45x(b(x99))";
(* The square bracket after d is replaced by a parenthesis. *)
StringCases[str3,
RegularExpression["(?P(\\[|\\()([^\\[\\]\\(\\)]|(?P>a))*(\\]|\\)))"]
]
(* {"[ab*[c]d)", "(b(x99))"} *)
The first element starts with a [
and ends with )
. This can be avoided by adding a pattern and a condition test on this pattern:
StringCases[str3,
RegularExpression["(?P((?P\\[)|\\()([^\\[\\]\\(\\)]|(?P>a))*(?(b)\\]|\\)))"]
]
(* {"[c]", "(b(x99))"} *)
The starting [
is referred to as b
. The pattern (?(b)\\]|\\))
tells us that if b
had a match, then the character to match should be ]
, or otherwise )
.
Comments
Post a Comment