Read line txt and include ";"

2

I have a txt file whose lines have the following data:

0 02 020 0201 020110 Z DEMONSTRAR COMPETÊNCIAS PESSOAIS 1 Primar pela correção de atitudes

This way I can not import the data either to excel or to mysql, since the words do not have the same number of characters in relation to the other lines of the txt file.

Using delphi , lazarus or java , how do I read the line and include the ";" character in the spaces so that it looks like this:

0 ;02 ;020 ;0201 ;020110 ;Z ;DEMONSTRAR COMPETÊNCIAS PESSOAIS ;1 ;Primar pela correção de atitudes

Each item corresponds to a table top.

    
asked by anonymous 27.01.2017 / 17:58

3 answers

2

I created a routine in Delphi exclusively for this file.

Uses System.Character;


procedure TForm1.Button1Click(Sender: TObject);
Var
   str :  string;
   linhacsv : string;
   oldFile, NewFile : TextFile;
   n : Integer;
begin
  AssignFile( newFile, 'c:\pasta\CB02002 - PerfilOcupacional.csv');
  Rewrite( newFile );

  AssignFile( oldFile, 'c:\pasta\CBO2002 - PerfilOcupacional.txt');
  Reset( oldFile );

  readln( oldFile, str ); // ignora o cabeçalho.
  readln( oldFile, str ); // e a proxima linha

  while not Eof( oldFile ) do
  begin
    linhacsv := '';
    readln( oldFile, str );
    for n := 1 to length( str ) do
    begin
      if ( str[n] = ' ' )  then
      begin
        if ( IsNumber(str[n-1]) and ( IsNumber(str[n+1]))) then
          linhacsv := linhacsv + ';'
        else if ( IsNumber(str[n-1]) and ( not IsNumber(str[n+1]))) then
          linhacsv := linhacsv + ';'
        else if ( not IsNumber(str[n-1]) and ( IsNumber(str[n+1]))) then
          linhacsv := linhacsv + ';'
        else if ( not IsNumber(str[n-1]) and ( not IsNumber(str[n+1])) and ( n = 23 )) then
          linhacsv := linhacsv + ';'
        else
         linhacsv := linhacsv + str[n]
      end else
         linhacsv := linhacsv + str[n]
    end;
    writeln( newFile, linhacsv );
  end;
  CloseFile( newFile );
  CloseFile( oldFile );

end;
    
30.01.2017 / 13:10
6

Depends on the Data Pattern

You can use a regular expression based on the example data you have placed, but it is tricky to know if it will work for all rows because you do not have a pattern.

The files are either delimited by one character or are delimited by the number of characters for each column. In your case, it does not follow either a standard or another.

I made an example in Java using an expression that works for your example line:

String REGEX = "\s([\dZ]+)\s";
String INPUT = "0 02 020 0201 020110 Z DEMONSTRAR COMPETÊNCIAS PESSOAIS 1 Primar pela correção de atitudes";
String REPLACE = " ;$1 ;";

Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); 
INPUT = m.replaceAll(REPLACE);

System.out.println(INPUT);

Need to import in Java:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Result:

0 ;02 ;020 ;0201 ;020110 ;Z ;DEMONSTRAR COMPETÊNCIAS PESSOAIS ;1 ;Primar pela correção de atitudes

You can use RegExr to test with more sample lines and adapt to your needs. The Notepad ++ also makes find+replace with regular expression, in case you have to do this operation only once for the file.

A Python converter

Anthony's response has been posted to make the file parser in Java and I believe it is the best answer for the problem. As I had downloaded the file and suggested in the comment for you to separate the file into two parts, I decided to leave an example in Python to do as I had suggested.

import re

line_count = 1

with open('C:\temp\CBO2002 - PerfilOcupacional.csv', 'w') as w:
    with open('C:\temp\CBO2002 - PerfilOcupacional.txt') as r:
        for line in r:
            if (line_count == 1):
                # parse do cabecalho
                header = re.sub(r"([\w_]+)\s*", r";", line)
                w.write(header + '\n')

            elif (line_count > 2):
                # descarta a linha 2 e
                # divide em dois grupos que tem padrao definido
                side_a = line[0:22]
                side_b = line[23:]

                # faz o parse de cada grupo
                parse_side_a = re.sub(r"(\d)\s([\d|\w])", r";", side_a)
                parse_side_b = re.sub(r"([^\d]+)\s(\d+)\s(.+)", r";;", side_b)

                # junta os dois grupos (o CRLF ja esta no grupo B)
                line_out = parse_side_a + ';' + parse_side_b 
                w.write(line_out)

            line_count += 1
    
27.01.2017 / 19:15
2

Building on the idea of replace with regular expressions suggested in Pagotti's response , this is an example that processes the complete file, line by line, according to a specific regular expression. To compile you need Java 8:

import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Parser {
    public static void main(String[] args) {
        final Pattern patternLinha =
                Pattern.compile("^(\d) (\d{2}) (\d{3}) (\d{4}) (\d{6}) ([A-Z]) (.+?) (\d{1,2}) (.+)$");

        final Path entrada = Paths.get(args[0]);
        final Path saida = Paths.get(args[1]);
        final Charset cs = Charset.forName(args[2]);
        final String quebraDeLinha = args[3].replace("\r", "\r").replace("\n", "\n");

        try (BufferedWriter bw = Files.newBufferedWriter(saida, cs)) {
            Files.lines(entrada, cs).map(linha -> {
                final Matcher matcher = patternLinha.matcher(linha);
                if (matcher.matches()) {
                    return matcher.replaceFirst("$1 ;$2 ;$3 ;$4 ;$5 ;$6 ;$7 ;$8 ;$9");
                } else {
                    throw new RuntimeException("Formato invalido para a linha: " + linha);
                }

            }).forEach(linhaTransformada -> {
                try {
                    bw.write(linhaTransformada);
                    bw.write(quebraDeLinha);
                } catch (IOException e) {
                    System.err.println("Erro ao escrever linha no arquivo de saida: " + saida.toAbsolutePath());
                    e.printStackTrace();
                }
            });
        } catch (IOException e) {
            System.err.println("Erro ao ler do arquivo de entrada: " + entrada.toAbsolutePath());
            e.printStackTrace();
        }
    }
}

Example usage:

java Parser arquivoEntrada.txt arquivoSaida.txt ISO-8859-1 \r\n

As the question does not contain code or even an example file, you can not be sure if the answer will work for all the data. In order to do so, it would be necessary to know the formal structure of the content, as well as the particularities of the file such as charset , type of line break, etc. Having said that I did everything possible to make everything easily configurable. By changing the pattern and the command line arguments you can make fine adjustments.

    
27.01.2017 / 23:26