Regular expressions with 'grep'

3

I need to extract data from a text and I'm trying to do this using grep. But the way to make use of regular expressions with this command are quite different from what is usually done in Ruby or JavaScript, and I am not able to do what I need. In the following text:

  

Judicial Court of the Regional Labor Court of the 1st Region

     

ELECTRONIC JOURNAL OF JUSTICE OF JUDICIAL WORK

     

# 1697/2015

     

FEDERATIVE REPUBLIC OF BRAZIL

     

Date of release: Wednesday, April 1, 2015.

     

Regional Labor Court of the 1st Region

I need to get only the number that can be seen on the third line. This number will then be used to make a request to a webservice. I tried with grep as follows:

pdftotext Diario_1697_1_1_4_2015.pdf -f 1 -l 1 - | grep -o /Nº(\d+\/\d+)/

I get the first page of a pdf file, convert it to txt and step to the grep command to extract the information. But that does not work at all. Can anyone tell me the correct way to do this with grep or some other bash command?

    
asked by anonymous 02.04.2015 / 23:13

1 answer

2

First, grep is a shell command and its arguments are simple strings like any others. Instead of delimiting the regex with / you should use single quotation marks (or use double quotation marks if you are careful about expanding shell variables). Also, you need to escape your backslashes with \ .

Second, the grep's default regex syntax is half different and very weak. For example, it does not understand + , only * . You can switch to Perl syntax with the -P flag

grep -P -o 'Nº\d+/\d+'

or use POSIX syntax with grep -E or egrep .

grep -E -o 'Nº[[:digit:]]+/[[:digit:]]+'
grep -E -o 'Nº[0-9]+/[0-9]+'
    
03.04.2015 / 00:01