Regular expressions do not match the desired text snippet

1

I'm needing my program to capture an item from a certain text, but it's not doing it, instead it's capturing everything that comes after that.

Code that I'm using, eg:

String html = "ItemPago12.569,00DeducoesPagas36.567,52ItensQnt6DeducoesRetidas21.354,11";
Pattern conteudo = Pattern.compile("ItemPago([^<]+)Deducoes");
Matcher match = conteudo.matcher(html);
match.find();

System.out.println(match.group(1));

Running program: link

I need to get what's in the middle, between: ItemPago and Deducoes . I would like examples and explanations of how to use this method correctly. Thank you.

    
asked by anonymous 31.03.2018 / 23:22

1 answer

3

There are three possible behaviors in regular expressions: greedy, reluctant, and possessive. What you want is reluctant behavior. You can use .*? , where .* means to catch anything and ? means reluctant.

The reluctant behavior tells the regular expression parser to settle for the first match possibility, not trying anything else.

Here is the complete code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Ideone {
    public static void main(String[] args) {
        String html = "ItemPago12.569,00DeducoesPagas36.567,52ItensQnt6DeducoesRetidas21.354,11";
        Pattern conteudo = Pattern.compile("ItemPago(.*?)Deducoes");
        Matcher match = conteudo.matcher(html);
        match.find();

        System.out.println(match.group(1));
    }
}

Here's the output:

12.569,00

See here working on ideone.

    
31.03.2018 / 23:54