Alternative String.replace ()

3

I have a flea behind my ear.

I have a method that removes / changes some characters from the string, it's something like this:

public static String replaceCharSet(String texto) {
    texto = texto.replace("&", "E");
    texto = texto.replace("$", "S");
    texto = texto.replace("á", "a");
    ................
    return texto;
}

Well this is repeated over several lines and in addition to causing a loss in performance I'm suspicious of memory leak.

Is there any more elegant / functional way to do this?

Here is the list of all the characters I need to edit / modify:

"&", "E"
"$", "S"
"ç", "c"
"Ç", "C"
"á", "a"
"Á", "A"
"à", "a"
"À", "A"
"ã", "a"
"Ã", "A"
"â", "a"
"Â", "A"
"ä", "a"
"Ä", "A"
"é", "e"
"É", "E"
"è", "e"
"È", "E"
"ê", "e"
"Ê", "E"
"ë", "e"
"Ë", "E"
"í", "i"
"Í", "I"
"ì", "i"
"Ì", "I"
"î", "i"
"Î", "I"
"ï", "i"
"Ï", "I"
"ó", "o"
"Ó", "O"
"ò", "o"
"Ò", "O"
"õ", "o"
"Õ", "O"
"ô", "o"
"Ô", "O"
"ö", "o"
"Ö", "O"
"ú", "u"
"Ú", "U"
"ù", "u"
"Ù", "U"
"û", "u"
"Û", "U"
"ü", "u"
"Ü", "U"
"º", "o"
"ª", "a"
"-", " "
".", " "
  

I use JAVA 8, unable to migrate at the moment to other versions. It is an old code here of the company that I want to improve.

    
asked by anonymous 07.11.2017 / 14:27

1 answer

4

Basically need to change accented characters to non-accented class Normalizer it seems like a good choice it does character decomposition based on UTF-8 code and this behavior varies according to the chosen form.

Since there are four exceptions, I made a replace for each since $ will not be converted to S , nor & to E . You can organize them as an enum in your class.

import java.text.Normalizer;

public class t {

    String entrada = "olá mundo? é ª º 123 ? $ & * ., x";

    entrada = entrada.replace('$', 'S')
                     .replace('&', 'E')
                     .replace('-', ' ')
                     .replace('.', ' ');

    String saida = Normalizer.normalize(entrada, Normalizer.Form.NFKD);
    System.out.println(saida.replaceAll("\p{InCombiningDiacriticalMarks}+", ""));
}

Output:

ola mundo? e a o 123 ? S E *  , x

Based on:

Easy way to remove UTF-8 accents from a string?

Unicode Normalization Forms

Unicode Normalization

    
07.11.2017 / 14:52