I need to extract data from a text and I'm trying to do this using grep. But the way to make use of regular expressions with this command are quite different from what is usually done in Ruby or JavaScript, and I am not able to do what I need. In the following text:
Judicial Court of the Regional Labor Court of the 1st Region
ELECTRONIC JOURNAL OF JUSTICE OF JUDICIAL WORK
# 1697/2015
FEDERATIVE REPUBLIC OF BRAZIL
Date of release: Wednesday, April 1, 2015.
Regional Labor Court of the 1st Region
I need to get only the number that can be seen on the third line. This number will then be used to make a request to a webservice. I tried with grep as follows:
pdftotext Diario_1697_1_1_4_2015.pdf -f 1 -l 1 - | grep -o /Nº(\d+\/\d+)/
I get the first page of a pdf file, convert it to txt and step to the grep command to extract the information. But that does not work at all. Can anyone tell me the correct way to do this with grep or some other bash command?