Regular expression to deny everything other than social reason

0

I'm trying to create a regular expression to remove everything that is not part of the business name in a string, but I'm having trouble not removing the symbols that are in the middle of it.

Entry:

201700000000111 01/02/2017 11.111.111/0001-74 ADAMA BRASIL S/A ATIVA 0,00 160,00 160,00 0,00 0,00 0,00 0,00 0,00
201700000000122 01/02/2017 22.222.222/0002-75 AGRITEX COMERCIAL AGRÍCOLA LTDA (QUERÊNCIA) ATIVA 2,79 170,00 170,00 0,00 0,00 0,00 4,74 0,00
201700000000133 07/02/2017 33.333.333/0001-76 CREMONESE WANDSCHEER & CIA LTDA - ME ATIVA 0,00 50,00 50,00 0,00 0,00 0,00 0,00 0,00
201700000000144 23/02/2017 44.444.444/0001-77 G3 SEMENTES LTDA ATIVA 0,00 230,00 230,00 0,00 0,00 0,00 0,00 0,00

Output required:

ADAMA BRASIL S/A ATIVA
AGRITEX COMERCIAL AGRÍCOLA LTDA (QUERÊNCIA) ATIVA
CREMONESE WANDSCHEER & CIA LTDA - ME ATIVA

Currently I have created one as follows, but it is not getting as I need it. I am using java, but can post in other ways that I add the code.

s.replaceAll("[^A-zÀ-ú\s]", "").trim();
    
asked by anonymous 17.04.2017 / 17:34

2 answers

0

If the sentence always follows this pattern. Just check the boundaries.

  • Left: Preceded from a CNPJ, end of CNPJ \d{4}-\d{2}
  • Right: Followed by a monetary value: \d+,\d{2}

Resolution

  • Pattern: .*\d{4}-\d{2} (.*?) \d+,\d{2}.*
  • Replace: $1

See Working at REGEX101

    
18.04.2017 / 13:53
1

Good afternoon, I believe you can search the entire set of words within the expression:

I did a test, it follows:

Rubular

Set the regular expression you can escape to java with freeformatter

As this expression I get the expected output this way:

public static void main(String args[]) {

    String input = "201700000000111 01/02/2017 11.111.111/0001-74 ADAMA BRASIL S/A ATIVA 0,00 160,00 160,00 0,00 0,00 0,00 0,00 0,00"
            + System.lineSeparator()
            + "201700000000122 01/02/2017 22.222.222/0002-75 AGRITEX COMERCIAL AGRÍCOLA LTDA (QUERÊNCIA) ATIVA 2,79 170,00 170,00 0,00 0,00 0,00 4,74 0,00"
            + System.lineSeparator()
            + "201700000000133 07/02/2017 33.333.333/0001-76 CREMONESE WANDSCHEER & CIA LTDA - ME ATIVA 0,00 50,00 50,00 0,00 0,00 0,00 0,00 0,00"
            + System.lineSeparator()
            + "201700000000204 23/02/2017 23.972.199/0001-15 G3 SEMENTES LTDA ATIVA 0,00 230,00 230,00 0,00 0,00 0,00 0,00 0,00";

    String regex = "\b(\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2})\b([A-zÀ-ú-1-9\s\\\/&\-\(|)]{5,}.*[a-zA-Z])\b";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);

    while (matcher.find()) {
        String cnpj = matcher.group(1).trim();
        String nome = matcher.group(2).trim();
        System.out.println(nome);
    }

}

Now explaining my regular expression:

\b(\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2})\b([A-zÀ-ú-1-9\s\\/&\-\(|)]{5,}.*[a-zA-Z])\b

Before and after% means that there can be any special character before and after the regular expression, which is defined by the set of characters between \b where they occur 5 times or more in sequence. You can add more characters within [] as needed Another important point here was to use [] basically everything in parentheses are clusters, I used 2. The first cluster is the cnpj pattern and the second cluster is the sequence pattern for the name.

When you use group 1 you will recover cnpj when you use group 2 you will regain the name

See how it works in ideone

I hope to have helped hug

    
17.04.2017 / 18:27