Error replacing occurrence in string using replaceAll

6

I extract the text of several lines of a PDF, at the beginning of each line I have a configuration of the font size and family used in that line, but then I need to remove this information.

First I made using replace , as follows:

String myText = line.replace(fontConfiguration, "");

And this example of strings :

String line = "[ABCDEE+Georgia,BoldItalic-9.0]Relação de poemas";
String fontConfiguration = "[ABCDEE+Georgia,BoldItalic-9.0]";

I can replace it perfectly, however there are still occurrences of fontConfiguration in the text, so I put replaceAll .

My question is : Why do I get this exception when I use replaceAll ?

This is an example that will display an error:

String line = "[ABCDEE+Calibri-11.04]1 ";
String fontConfiguration = "[ABCDEE+Calibri-11.04]";
String myText = line.replaceAll(fontConfiguration, "");

Exception :

  

Method threw 'java.util.regex.PatternSyntaxException' exception.   java.util.regex.PatternSyntaxException: Illegal character range near   index 16 [ABCDEE + Calibri-11.04]                 ^

    
asked by anonymous 11.10.2015 / 03:53

2 answers

1

Both replace (CharSequence target, CharSequence replacement) replaceAll (String regex, String replacement) makes overriding using matching of patterns using regular expressions. The question that remains is: both replace (CharSequence target, CharSequence replacement) replaceAll (String regex, String replacement) use regular expressions, because only replaceAll (String regex, String replacement) gives error for same entry ? Notice how these methods do this:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(this)
            .replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

The difference, as can be seen from their code, is how Pattern is creating. While replace ( CharSequence target, CharSequence replacement) uses Pattern.LITERAL , that is, roughly the input is treated as normal characters and not a regular expression. For example, if replace (CharSequence target, CharSequence replacement) :

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString()).matcher(this)
            .replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

We would also have problems with the [ABCDEE+Calibri-11.04] entry as regex , since it is not a valid regular expression and now we are not using a literal string but a normal regular expression pattern.

It's worth noting that it's not the way these methods handle input and use regular expressions that are wrong, but rather the purpose of each.

The suggestion is then to use a valid expression in replaceAll (String regex, String replacement) , as \[.+\] , that will guarantee the replacement of everything that is more than one character and is started by [ and finished with ] , then something like this:

final String[] lines = new String[] {"[ABCDEE+Calibri-11.04]1 ", "[ABCDEE+Georgia,BoldItalic-9.0]Relação de poemas"};
Arrays.stream(lines).forEach(line -> System.out.println(line.replaceAll("\[.+\]", "")));

Would print this:

1 
Relação de poemas
    
22.10.2015 / 18:32
0

Observing the function signature String.replaceAll() :

public String replaceAll(String regex, String replacement)

The Illegal character range error is caused because the function interprets the first parameter as regular expression (documentation) , and the brackets define a class of characters, within the class the hyphen defines a range of characters and the range i-1 is invalid. >

To solve, it is necessary to escape the brackets with a backslash (% w / o%). Other characters with special functions, such as \ and . , also need to be escaped:

String line = "[ABCDEE+Calibri-11.04]1 ";
String fontConfiguration = "\[ABCDEE\+Calibri-11\.04\]";
String myText = line.replaceAll(fontConfiguration, "");
    
22.10.2015 / 17:52