Regular expression to find numbers in the middle of words

2

I'm currently developing a project that I'm using regular expressions to find certain patterns, however there is a specific String and I need to extract two numbers from this, the expression is like this:

Agência: 0000 Conta: 00000-0

I need to extract the numbers in the middle of these Strings , can anyone help me?

    
asked by anonymous 27.02.2018 / 19:28

2 answers

10

The regular expression is:

(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})

Based on on this other answer of mine :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class TesteRegex {

    private static final Pattern AGENCIA_CONTA = Pattern.compile(
            "(?:Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X])|" +
            "(?:Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4})");

    public static void main(String[] args) {
        String texto = ""
                + "Banana abacaxi pêra Agência: 5720 Conta: 43821-X abacate "
                + "melancia Agência: 3481 Conta: 53895-0. verde azul "
                + "amarelo Agência: 6666 Conta: 66667-NÃO É ESSA "
                + "Agência: 9123 Conta: 44578-2 "
                + "laranja Conta: 43210-7 Agência: 6589 verde "
                + "rosa lilás Conta: 77777-7 Não vai dar Agência: 4444";

        Matcher m = AGENCIA_CONTA.matcher(texto);
        while (m.find()) {
            String achou = texto.substring(m.start(), m.end());
            System.out.println("Achou nas posições " + m.start() + "-" + m.end() + ": "
                    + achou);
            String agencia, conta;
            if (achou.startsWith("Agência:")) {
                agencia = achou.substring(9, 13);
                conta = achou.substring(21, 28);
            } else {
                agencia = achou.substring(24, 28);
                conta = achou.substring(7, 14);
            }
            System.out.println("Os valores encontrados são: " + agencia + " e " + conta + ".");
        }
    }
}

Here's the output:

Achou nas posições 20-48: Agência: 5720 Conta: 43821-X
Os valores encontrados são: 5720 e 43821-X.
Achou nas posições 66-94: Agência: 3481 Conta: 53895-0
Os valores encontrados são: 3481 e 53895-0.
Achou nas posições 153-181: Agência: 9123 Conta: 44578-2
Os valores encontrados são: 9123 e 44578-2.
Achou nas posições 190-218: Conta: 43210-7 Agência: 6589
Os valores encontrados são: 6589 e 43210-7.

See here working on ideone.

Explanation of regex, beginning with the general structure:

  • (?: ... :) - Group without capture.
  • aaa|bbb - Choose between aaa and bbb . It gives match the first one to find.
  • (?: ... :)|(?: ... :) - Choose between two groups without capture.
  • Agência: [0-9]{4} Conta: [0-9]{5}-[0-9X] - First group.
  • Conta: [0-9]{5}-[0-9X] Agência: [0-9]{4} - Second group.

Explanation of codes in groups:

  • [0-9]{4} - Four occurrences of digits between 0 and 9. This is the agency number.
  • [0-9]{5} - Five occurrences of digits between 0 and 9. This is part of the account number.
  • - - The hyphen. This is part of the account number.
  • [0-9X] - A digit from 0 to 9 or an X. This is part of the account number.

The rest (including spaces) is explicit text that is only recognized exactly the way it is.

So, regex looks for agency before account or account before agency, accepting both forms. With if I identify what form is found and retreat using substring the agency and account digits.

When there is some other text in the middle of the agency and the account or when the number that follows is incomplete, it will not be recognized.

    
27.02.2018 / 20:24
-2

With this regex, you can retrieve these values through the Groups property.

\p{L}+:\s*(?<Agencia>\d{4})\s*\p{L}+\:\s*(?<Conta>\d{5}\-\d+)
    
27.02.2018 / 19:41