Blank space in Regular Expression

2

With the following regular expression (^ DOC) * [0-9] I can capture all the numbers after the "DOC" sequence. However, by testing in this text:

TEXT TEXT TEXT TEXT DOCUMENT: 240010 9/24/2014

It returns me "24001024092014", the date comes along. The question is, how do I get the number sequence, and if I find a space, it does not include in the regex? I would like to capture only the document number.

Follow the java code:

public class Teste {

    public static void main(String args[]){

        String CAPTURAR_SOMENTE_NUMEROS_APOS_PALAVRA_DOC = "(^DOC)*\d+ ";

        Pattern pattern = Pattern.compile(CAPTURAR_SOMENTE_NUMEROS_APOS_PALAVRA_DOC);

        Matcher matcher = pattern.matcher("TEXTO TEXTO TEXTO TEXTO DOCUMENTOLEGAL:240010 24/09/2014 ");

        while(matcher.find()){
            System.out.printf(matcher.group());
        }

    }
}
    
asked by anonymous 26.09.2014 / 16:09

3 answers

2

The method Matcher.find search for occurrences of a regular expression in a string. That is, it returns any substring that matches the regex you are looking for. If you want to extract the number that is just ahead of DOC , there are two ways: capture groups and lookarounds.

Capture Groups

The catch-group method is the simplest, such as demonstrated in Rodrigo Rigotti's response : you establish the text you want to marry, and in parentheses put the subtexts that interest you the most. Simple example:

DOCUMENTO:([0-9]+)
DOCUMENTO:(\d+)

This takes the string DOCUMENTO: and any sequence of numbers that follows it, and nothing else. The sequence of numbers - being within a caput group - can be accessed through the

26.09.2014 / 17:42
1

One suggestion:

DOCUMENTO:(\d+)\s*(\d+)\/(\d+)\/(\d+)

Implementation example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main(String args[]) {

      String line = "TEXTO TEXTO TEXTO TEXTO DOCUMENTO:240010 24/09/2014";
      String pattern = "DOCUMENTO:(\d+)\s*(\d+)\/(\d+)\/(\d+)";

      Pattern r = Pattern.compile(pattern);

      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println(m.group(0) + m.group(1) + m.group(2) + m.group(3));
      } else {
         System.out.println("Sem resultados.");
      }
   }
}
    
26.09.2014 / 16:25
0
String[] vetor = texto.split((^DOC)*[0-9]\S);  

The resulting vector will have two positions: vector [0] = 240010 and vector [1] = 24092014, just take the vector [0] than the part that interests you.

    
26.09.2014 / 16:32