Separate data from a txt file

0

I have a relatively large data file, which I removed from a point marking machine, however it comes as follows:

00003000527005792012635570932000219305130720170713
00003000527005792012635570932000219305130720170713

I would like to separate this data into columns so that I can export to Excel:

00003000527005792 - numero de serie do relógio | 012635570932 - Numero do PIS | 000219305 - NSR  |13 - dia | 07 - Mês | 2017 - Ano |07 - hora |13 - minuto

Looking like this:

00003000527005792 012635570932 000219305 13 07 2017 07 13

Well, so far I've been able to read the data using this code:

arquivo = open('DATA.txt', 'r')
for linha in arquivo:
    print(linha)
arquivo.close()

How can I apply slice to this context? Well I can do this in one sentence, but I do not know how to apply it in several lines

    
asked by anonymous 12.01.2018 / 13:31

2 answers

1

Assuming that each line will always have the same formatting, number of digits etc ...

I do not think it's a good idea to separate by spaces, since the column names have spaces, in this case I'll separate by ";".

You can do this:

cols = ['numero de serie do relógio', 'Numero do PIS', 'NSR', 'Dia', 'Mês', 'Ano', 'Hora', 'Minuto']
novos_dados = ''
with open('DATA.txt') as f:
    for l in f: # o slice vai ser feito na linha abaixo para cada linha
        novos_dados += '{};{};{};{};{};{};{};{}\n'.format(l[:17], l[17:29], l[29:38], l[38:40], l[40:42], l[42:46], l[46:48], l[48:])
content = '{}\n{}'.format(';'.join(cols), novos_dados)

# gravar content em um csv
print(content, file=open('novos_dados.csv', 'w'))

In principle .csv will open by default in excel, and in this case you should choose the ";" when importing the file novos_dados.csv

DEMONSTRATION

    
12.01.2018 / 13:51
0

One alternative is to use regex like this:

import re
with open(arquivo) as fp:
    data = list([
        re.match(r'^(?P<serial_number>\d{17})(?P<PIS>\d{12})(?P<NSR>\d{9})(?P<day>\d{2})(?P<month>\d{2})(?P<year>\d{4})(?P<hour>\d{2})(?P<minute>\d{2})', linha).groupdict()
        for linha in fp])
    
12.01.2018 / 15:15