Command to replace characters recursively

5

I need a command that replaces a specific pattern in each line of a file as many times as necessary until the pattern is no longer found.

For example, in a CSV file, the fields are separated by a semicolon ; . Null fields do not have a character, as in the following file that represents a contact list with 3 records:

Nome;Sobrenome;Telefone1;Telefone2;Email
Joao;Silva;9999-8888;9292-9292;[email protected]
Maria;Souza;8899-0011;;[email protected]
Carlos;Oliveira;;;

The first line is the header of the file. The Maria Souza contact has the Telefone2 null and the Carlos Oliveira contact has null the Telefone1 , Telefone2 , and Email fields.

I want to add \N where the field is null.

On Linux, use the command:

$ sed -e 's/;;/;\N;/g' -e 's/;$/;\N/' arquivo.csv > novo-arquivo.csv

The result is satisfactory for register Maria Souza , but not for Carlos Oliveira , because finding the first ;; pattern and performing substitution ( Carlos;Oliveira;\N;; ) it does not consider substitute text in the continuation of the search and passes to the next pattern, which is ;$ , with the result being as follows:

Carlos;Oliveira;\N;;\N

Remaining a null field yet.
I would like a solution both Unix and Windows.

    
asked by anonymous 16.12.2013 / 16:25

3 answers

4

Use perl, which supports look-ahead :

 perl -p -e 's/;(?=;|$)/;\N/g' arquivo.csv > novo-arquivo.csv

Incidentally, if you want to make the change within the same file (without having to redirect to another), simply pass the -i ( infile ) option:

 perl -p -i -e 's/;(?=;|$)/;\N/g' arquivo.csv
    
16.12.2013 / 16:30
1

The sed Linux command allows you to work with labels, useful for working with recursion.

For the example, it can be used as follows:

$ sed -e ':loop' -e 's/;;/;\N;/g' -e 't loop' -e 's/;$/;\N/' arquivo.csv > novo-arquivo.csv

Remembering that if the file was generated in Windows and will use the command in Linux, you should convert the DOS default file to Unix because the end-of-line character is different. And vice versa.

You can use the commands dos2unix or unix2dos .

    
16.12.2013 / 16:25
1

I'm used to Java development environments, both Linux and Windows, so I'd use an Ant task to perform cross-platform file manipulation operations like this.

Ant is a powerful and versatile tool used for automation, build and package assembly, and file processing. It is important to note that Ant is not a programming language , as some people think, but it is a form of tasks

Installing Ant

Download the binary package here , unzip it to a folder and add it to PATH of your operating system .

Example in Windows:

set path=%path%;c:\caminho\apache-ant-1.9.3\bin

Writing Ant Build

The following Ant project replaces lines in a given file:

<project name="MyProject" default="replace" basedir=".">
    <target name="replace">
        <replaceregexp
                file="${file}"
                byline="true"
                match=";(?=;|$)"
                replace=";\\N"
                flags="gs" />
    </target>
</project>

Running the Project

Ant automatically looks for a file named build.xml in the current directory. Then, if file.txt is the file to be processed, the following command will perform the substitution:

ant -Dfile=file.txt

If the Ant project has another name, you can use the parameter -f :

ant -f /caminho/meu-build.xml -Dfile=file.txt

Learning more about Ant

Just read the manual completely.

    
30.12.2013 / 13:29