Skip to main content

string manipulation - How to remove accents from text?


I would like to know how I can remove accents from a string. For example, how can I transform "string test áéíóú" into "string test aeiou"? I have to normalize some text to make comparisons, and this would be very helpful.



Answer




To remove accents from a string I use this function:


removeAccent[string_] := Module[{accentMap,l1,l2},
l1 = Characters["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"];
l2 = Characters["SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"];
accentMap = Thread[l1 -> l2];
StringReplace[string, accentMap]
]

So, if you apply it as removeAccent["string test áéíóú"]you get: "string test aeiou"


Update



Now in version 10.1 we have the native function: RemoveDiacritics


RemoveDiacritics["string test áéíóú"] you get "string test aeiou"


Timing comparison using the new RepeatedTiming.


RepeatedTiming[removeAccent["string test áéíóú"]]
RepeatedTiming[RemoveDiacritics["string test áéíóú"]]


> 0.000057
> 0.000015


RemoveDiacritics wins!


Comments