I would like to know how I can remove accents from a string. For example, how can I transform "string test áéíóú" into "string test aeiou"? I have to normalize some text to make comparisons, and this would be very helpful.
Answer
To remove accents from a string I use this function:
removeAccent[string_] := Module[{accentMap,l1,l2},
l1 = Characters["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"];
l2 = Characters["SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"];
accentMap = Thread[l1 -> l2];
StringReplace[string, accentMap]
]
So, if you apply it as removeAccent["string test áéíóú"]you get: "string test aeiou"
Update
Now in version 10.1 we have the native function: RemoveDiacritics
RemoveDiacritics["string test áéíóú"] you get "string test aeiou"
Timing comparison using the new RepeatedTiming.
RepeatedTiming[removeAccent["string test áéíóú"]]
RepeatedTiming[RemoveDiacritics["string test áéíóú"]]
> 0.000057
> 0.000015
RemoveDiacritics wins!
Comments
Post a Comment