Importing data using pandas in python

2

Good afternoon, guys!

I'm trying to import a csv file using the Pandas package in Python

import pandas as pd
names_col = ['AnoInfracao',
'TrimestreInfracao',
'CodigoInfracao',
'DescricaoAbreviadaInfracao',
'Gravidade',
'DescricaoTipoVeiculo',
'DescricaoEspecie',
'UF',
'Municipio',
'BR',
'KM',
'NacionalidadeVeiculo']

data = pd.read_csv("C:\Pasta\pasta1\Documents\PRF_DADOS_ABERTOS_INFRACOES_2015_T4\PRF_DADOS_ABERTOS_INFRACOES_2015_T4.csv", delimiter=';',header=None, names=names_col,skiprows=1,dtype={'AnoInfracao':'category'})

The command executes successfully but when viewing the data the column names are correct, but in data lines only NaN are displayed.

 AnoInfracao  TrimestreInfracao  CodigoInfracao  DescricaoAbreviadaInfracao
0         NaN                NaN             NaN                         NaN   
1         NaN                NaN             NaN                         NaN   
2         NaN                NaN             NaN                         NaN   
3         NaN                NaN             NaN                         NaN   
4         NaN                NaN             NaN                         NaN 

Does the pandas package only import numeric values? This file contains columns of quantitative and qualitative data.

Does anyone have an idea what it might be?

To access the data use this link link Failure Data Registered by the PRF

Thank you!

Léo

    
asked by anonymous 13.12.2016 / 19:10

1 answer

1

When trying to execute your code, I first received an error information regarding dtype={'AnoInfracao':'category'} , so I removed it to run. In the end, gave it here:

  

File "pandas \ parser.pyx", line 805, in pandas.parser.TextReader.read (pandas \ parser.c: 8748)

     

File "pandas \ parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas \ parser.c: 9003)

     

File "pandas \ parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas \ parser.c: 9731)

     

File "pandas \ parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas \ parser.c: 9602)

     

File "pandas \ parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas \ parser.c: 23325) pandas.io.common.CParserError: Error tokenizing data. C error: EOF inside string starting at line 35

I opened the file .csv in excel and realized that it is badly formatted. Already has row with column name, has blank line, and the data only begins in line 4 - if I am not mistaken.

If you are solving the errors step by step, you may come up with a solution. But, answering your questions:

  • Pandas do not only import numerical values.
  • I think the problem is '.csv' badly formatted.
13.12.2016 / 21:34