How to indicate in a regex that the '(' and ')' symbols, the parentheses, are one of the alternatives in a list of symbols in Java?

2

I'm developing code that captures text using regular expressions (regex). This text is made up of parentheses.

The point is that parentheses are used in regular expressions as group definers and I want to use them as literals.

I have already tried to use \( as escape, but the Eclipse IDE already rejects, saying that only a few other symbols are escaped (the traditional Java characters).

I have tried \\( , it gets to run, but it soon gives error, and when checking it indicates that it actually "translates" to \( instead of ( as literal.

  

"First, who is 1st placed, second who is second (who is Rubinho)"

([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"\'!?$%:;,º°ª]+)

I wanted to add the parentheses in this list of characters.

    
asked by anonymous 21.12.2016 / 03:06

1 answer

3

The process to "escape" a special character in String in Java has two steps:

  • "Escape" special characters for Java.
  • "Escape" special characters for the regular expression, which may include, "escape" the "escape" character.
  • Example: escaping parentheses

    The parenthesis is not a special character for Java, but is for the regular expression, so it should receive an escape character \ before (Reason # 1).

    Since the \ character is special in Java, it must escape and become \ (Reason # 2).

    Result:

    String regex1 = "\(";
    String regex2 = "\)";
    

    Example: escaping quotation marks

    Double quotes are special for Java, so they need an escape with \ (Motive # 1), but they are not special for regular expressions.

    So the result is:

    String regex3 = "\"";
    

    Simple quotes are not special most of the time (I can not remember right now if single quotes can have special meaning in some regular expression implementation), so they do not need to escape at least to the more common uses.

    String regex4 = "'";
    

    Putting it all together

    To capture text in parentheses, you need the following elements:

  • A class of characters to capture everything that may be in parentheses. In this case:

    [A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ"'!?$%:;,º°ª ]
    
  • A quantifier: +

  • Delimiters for the group of characters to be captured (delimiters come before the limit characters if you do not want to include parentheses of the original text in the captured group): ( and )
  • The limit characters for the group, or the parentheses in this case: \( and \)
  • Converting each to Java strings, we can construct the final expression:

  • Escaped class in double quotation marks:

    "[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]"
    
  • Quantifier:

    "[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+"
    
  • Delimiters:

    "([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)"
    
  • Limit characters, with escaped escape:

    "\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)\)"
    
  • Sample code:

    String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
    Matcher matcher = Pattern.compile("\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+')\)").matcher(s);
    if (matcher.find()) {
        System.out.println(matcher.group(1));
    }
    

    Result:

      

    What is Rubinho?

    Alternative

    Instead of trying to specify all characters that may be inside the parentheses, how about deleting only those that can not?

    For example, the [^()] class negates the parentheses and captures everything but them.

    Applying all the steps of the previous topic, changing only the class of item # 1, we can arrive at the following example, which has the same result:

    String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
    Matcher matcher = Pattern.compile("\(([^()]+)\)").matcher(s);
    if (matcher.find()) {
        System.out.println(matcher.group(1));
    }
    
        
    21.12.2016 / 04:37