I'm looking for robust code to solve the "Longest Common Substring" problem:
Find the longest string (or strings) that is a substring (or are substrings) of two or more strings.
I can just code it up from that description, but I'd thought I'd ask here, first, in case someone knows of an implementation either distributed with Mathematica or available from an open source. I found a hint here that a solution might be part of the (huge) Combinatorica package, but a quick search of the documentation did not disclose it.
Answer
Mathematica supports two related functions, LongestCommonSequence[] and LongestCommonSubsequence[]. The first one finds the longest (contiguous or non-contiguous) sequence common to the two strings given as arguments to it:
LongestCommonSequence["AAABBBBCCCCC", "CCCBBBAAABABA"]
"AAABB"
while the second function is constrained to give the longest contiguous sequence:
LongestCommonSubsequence["AAABBBBCCCCC", "CCCBBBAAABABA"]
"AAAB"
These functions became available only in version seven; if you need to do this in an earlier version, István's routine is useful.
Comments
Post a Comment