Read csv file using delimiter as the character "|" - Python

0

I tried to create a DataFrame with the lib pandas from a file that is sent to me in the following format:

--------------------------------
|Indice|Preço|Quantidade|Cidade|
--------------------------------
|1|1000|2|São Paulo|
.
.
.

I used the read_csv method with the delimiter "|" and I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 3: invalid continuation byte

I tried to use some other encodings but I could not figure out how to separate the data correctly. Nowadays I use excel to make this division and delete dashed lines (------).

Thank you in advance for the help you can give me.

    
asked by anonymous 08.05.2017 / 00:38

1 answer

0

I believe the error occurs because of São Paulo. In Text fields, the value should be in. "But you can also try to define the function's encoding. Here's an example:

read_csv('arquivo.csv',encoding='iso-8859-1',delimiter ='|')

Function documentation: link

Related issue: link

Note: If you open in excel to remove dashed lines, I imagine it is a manual operation and not an automatic routine. Try using Notepad ++ which can change the text in multiple files at once, and replace those pipes with a comma. There is also the possibility of creating macros to edit the text automatically.

    
08.05.2017 / 01:18