Remove String Connectors with Regular Expression

6

How could I remove connectors: "and", "do", "of", "of", "of", "of", "di", "du". One sentence without changing the whole name.

Example name: Daniela de Andrade. I want to remove only "from", without removing the "DA" from andra "FROM" I am using relaceAll function in java.

    String retiraConector = "^\s e $\s";

   nome = nome.replaceAll(retiraConector, " ");
    
asked by anonymous 23.11.2014 / 23:20

2 answers

9

You can use the following pattern:

String padrao = "(\w)(\s+)(e|do|da|do|das|de|di|du)(\s+)(\w)";

This pattern has been divided into five groups, these follow the order:

+ + one or more spaces + any letter or number

Note: groups are formed through parentheses.

To make the replacement use:

public static void main(String[] args) {
    String padrao = "(\w)(\s+)(e|do|da|do|das|de|di|du)(\s+)(\w)";
    String nome = "Daniela de Andrade";
    System.out.println(nome.replaceAll(padrao, "$1 $5"));
}

The result is as follows:

Daniela Andrade

When you use replaceAll , the default is found in Daniel[a de A]ndrade , and is replaced by groups 1 and 5, which are separated by white space, these groups are represented by a > from Daniel to and A , A ndrade.

Review

To ignore uppercase and lowercase letters, you can use (?i) in your expression, for example:

String padrao = "(?i)(\w)(\s+)(e|do|da|do|das|de|di|du)(\s+)(\w)";

The way to do the substitution is the same one as above.

    
24.11.2014 / 01:30
1

Following the response from @Mateus Alexandre:

You can use the following pattern as well:

String padrao = "\s(e|d(a|e|i|o|u)s?)\s";

This pattern has been divided into two groups, these follow the order:

space space space connector

just changing:

System.out.println(nome.replaceAll(padrao, "$1 $5"));

for

System.out.println(nome.replaceAll(padrao, " "));
    
24.11.2014 / 16:41