<END>Av. Dr. Walter Belian, nº 2.230, Distrito Industrial, João Pessoa-PB, com CNPJ nº 07.526.557/0013-43 e Inscrição Estadual nº 16.218.7157; (NR) II - Sergipe, localizada na Rodovia BR-101, s/nº, km 133, Distrito Industrial, Estância-SE, com CNPJ nº 07.526.577/0012-62 e Inscrição Estadual nº 27.142.202-5; (NR) III - Camaçari, localizada na Rua João Úrsulo, nº 1.620, Polo Petroquímico, Camaçari-BA</END>, com CNPJ nº 07.526.557/0015-05e Inscrição Estadual nº 110.250.399;
It is as follows: above is a text and below the regex to capture the text information. Something that should take into account, is what text from where the regex will do the capture, is a semi-structured text and has some repetitions. Below is the regex. To contextualize, it is a regex that captures addresses.
, (established | localized | localized) (in | no | em) ([^ (Municipality | State)] ([0-9A-Za-zçãàáâéêíóôõúÂÃÁÀÉÊÍÓÔÕÚÇ \ Q () < - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I want to capture each of the addresses in the document and put each of the addresses between the <END>
and </END>
tags. It is considered address, only the part delimited by
>
That is, the remainder is considered "normal text", which should not be captured, but should not be discarded. So, for the given example, make sure it looks like this:
<END>Av. Dr. Walter Belian, nº 2.230, Distrito Industrial, João Pessoa-PB</END>, com CNPJ nº 07.526.557/0013-43 e Inscrição Estadual nº 16.218.7157; (NR) II - Sergipe, localizada na <END>Rodovia BR-101, s/nº, km 133, Distrito Industrial, Estância-SE</END>, com CNPJ nº 07.526.577/0012-62 e Inscrição Estadual nº 27.142.202-5; (NR) III - Camaçari, localizada na <END>Rua João Úrsulo, nº 1.620, Polo Petroquímico, Camaçari-BA</END>, com CNPJ nº 07.526.557/0015-05 e Inscrição Estadual nº 110.250.399;
However, as you can see from the text, I only get all the addresses at once. I thought about using regex, because that's how I was capturing other things. But if there's any way you can fix it, fine.