Python Python too slow

Question

Python Python too slow

Navigation

#1 by (3 votes)

2

Can anyone help me? I'm reading from a file, I make some changes and then saved to another folder. but this takes 2 hours, the file has 15 million lines, would it have some different and more effective method?

# LER ARQUIVO NA PASTA STAGING
arq5 = pd.read_csv(r'C:\Users\Usuário\staging\arquivo5.txt',delimiter='\t',encoding='cp1252',engine='python')


# FAZ ALTERAÇÕES NO ARQUIVO 
columns = ['PERIODO', 'CRM', 'CAT', 'MERCADO', 'MERCADO_PX', 'CDGLABORATORIO', 'CDGPRODUTO', 'PX']
arq5.drop(columns, inplace=True, axis=1)

# SALVA O ARQUIVO 5 COMO CSV NA PASTA ALPHA
arq5.to_csv(r'C:\Users\Usuário\alpha\arquivo5.txt', index=False)

python pandas

asked by anonymous 17.10.2018 / 19:52

1 answer

Create controls dynamically How to use TrackBy in ngFor in Angular?

score 3 · Accepted Answer

The pandas loads the entire file into memory, and this can be slow for very large files.

Try not to load the entire file. The code below does the same as yours, however without using pandas and without loading the entire file into memory - it will read the source file line by line, then modifying, and saving direct to the destination:

colunas_remover = ['PERIODO', 'CRM', 'CAT', 'MERCADO', 
    'MERCADO_PX', 'CDGLABORATORIO', 'CDGPRODUTO', 'PX']
nome_arquivo = r'C:\Users\Usuário\staging\arquivo5.txt'
destino = r'C:\Users\Usuário\alpha\arquivo5.txt'

# LER ARQUIVO JA GRAVANDO O RESULTADO EM OUTRA PASTA
with open(nome_arquivo, encoding='cp1252', newline='') as f:
    cf = csv.DictReader(f, delimiter='\t')
    with open(destino, 'w', encoding='cp1252', newline='') as fw:
        colunas_manter = [c for c in cf.fieldnames if c not in colunas_remover]
        cw = csv.DictWriter(fw, colunas_manter, delimiter='\t',
            extrasaction='ignore') # ignora o que nao esta em "manter"
        cw.writeheader()
        cw.writerows(cf)