I would like to know how I can remove accents from a string. For example, how can I transform "string test áéíóú"
into "string test aeiou"
? I have to normalize some text to make comparisons, and this would be very helpful.
Answer
To remove accents from a string I use this function:
removeAccent[string_] := Module[{accentMap,l1,l2},
l1 = Characters["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"];
l2 = Characters["SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"];
accentMap = Thread[l1 -> l2];
StringReplace[string, accentMap]
]
So, if you apply it as removeAccent["string test áéíóú"]
you get: "string test aeiou"
Update
Now in version 10.1 we have the native function: RemoveDiacritics
RemoveDiacritics["string test áéíóú"]
you get "string test aeiou"
Timing comparison using the new RepeatedTiming
.
RepeatedTiming[removeAccent["string test áéíóú"]]
RepeatedTiming[RemoveDiacritics["string test áéíóú"]]
> 0.000057
> 0.000015
RemoveDiacritics
wins!
Comments
Post a Comment