Actually, the two regex you indicated do not return the same result. I tested the JDK 1.7.0_80 , and you can also see them working (differently) here and here .
I created a very simple method to test a regex:
public void testregex(String input, String regex) {
Matcher matcher = Pattern.compile(regex).matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Then I tested the same input using the two regex (detail that \
should be escaped, so it is written as \
):
String input = "Detalhamento de Serviços nº: 999-99999-9999";
testregex(input, "Detalhamento de Serviços.+(\d+-\d+-\d+)");
testregex(input, "Detalhamento de Serviços\D+(\d+-\d+-\d+)");
The result was:
9-99999-9999
999-99999-9999
This happens because the quantifiers +
and *
are "greedy" and try to catch as many characters as possible. In the first case, it also takes the first two digits 9
, because the remainder of String
( 9-99999-9999
) also satisfies the last part of the regex ( \d+-\d+-\d+
).
In the second case, it does not take the first two 9
because \D
ensures that it will not get digits.
So, some possible solutions are:
- Use
\D
: thus, you guarantee that, even though the quantifier is greedy, it will not pick up a digit in error
- Use
?
soon after quantifier +
, as this cancels "greedy" behavior . The regex looks like this: Detalhamento de Serviços.+?(\d+-\d+-\d+)
- note the use of .+?
to remove "greed"
- Set the number of digits, using
{}
. For example, if the number of digits is always "3-5-4", you can use Detalhamento de Serviços.+?(\d{3}-\d{5}-\d{4})
. If the number of digits varies, use the {min,max}
syntax. For example, if there is a minimum of 2 digits and a maximum of 3, use {2,3}
(and use "grease canceler", or \D
to guarantee). Adapt according to your needs.