The process to "escape" a special character in String
in Java has two steps:
"Escape" special characters for Java.
"Escape" special characters for the regular expression, which may include, "escape" the "escape" character.
Example: escaping parentheses
The parenthesis is not a special character for Java, but is for the regular expression, so it should receive an escape character \
before (Reason # 1).
Since the \
character is special in Java, it must escape and become \
(Reason # 2).
Result:
String regex1 = "\(";
String regex2 = "\)";
Example: escaping quotation marks
Double quotes are special for Java, so they need an escape with \
(Motive # 1), but they are not special for regular expressions.
So the result is:
String regex3 = "\"";
Simple quotes are not special most of the time (I can not remember right now if single quotes can have special meaning in some regular expression implementation), so they do not need to escape at least to the more common uses.
String regex4 = "'";
Putting it all together
To capture text in parentheses, you need the following elements:
A class of characters to capture everything that may be in parentheses. In this case:
[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ"'!?$%:;,º°ª ]
A quantifier: +
Delimiters for the group of characters to be captured (delimiters come before the limit characters if you do not want to include parentheses of the original text in the captured group): (
and )
The limit characters for the group, or the parentheses in this case: \(
and \)
Converting each to Java strings, we can construct the final expression:
Escaped class in double quotation marks:
"[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]"
Quantifier:
"[A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+"
Delimiters:
"([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)"
Limit characters, with escaped escape:
"\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+)\)"
Sample code:
String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\(([A-Za-z0-9çãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ\"'!?$%:;,º°ª ]+')\)").matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Result:
What is Rubinho?
Alternative
Instead of trying to specify all characters that may be inside the parentheses, how about deleting only those that can not?
For example, the [^()]
class negates the parentheses and captures everything but them.
Applying all the steps of the previous topic, changing only the class of item # 1, we can arrive at the following example, which has the same result:
String s = "Em primeiro lugar, quem é 1º colocado, em segundo quem é segundo (que é Rubinho)";
Matcher matcher = Pattern.compile("\(([^()]+)\)").matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}