In pandas and unidecode, how to avoid warning messages - copy of a slice from a DataFrame?

0

In Python3 and pandas I'm reading CSV files to create dataframes. In some columns I need to remove the accent (Portuguese). I do this with unidecode

But in some files a warning message appears

import pandas as pd
import unidecode

def f(str):
    return (unidecode.unidecode(str))

candidatos_2014 = pd.read_csv("candidatos_2014.csv",sep=',',encoding = 'utf-8', converters={'cpf': lambda x: str(x), 'sequencial': lambda x: str(x)})

candidatos_2014.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26245 entries, 0 to 26244
Data columns (total 9 columns):
Unnamed: 0         26245 non-null int64
uf                 26245 non-null object
cargo              26245 non-null object
nome_completo      26245 non-null object
sequencial         26245 non-null object
cpf                26245 non-null object
nome_urna          26245 non-null object
partido_eleicao    26245 non-null object
situacao           26245 non-null object
dtypes: int64(1), object(8)
memory usage: 1.8+ MB

eleitos = candidatos_2014[(candidatos_2014['situacao'] == 'ELEITO POR QP') | (candidatos_2014['situacao'] == 'ELEITO POR MÉDIA') | (candidatos_2014['situacao'] == 'ELEITO')]

eleitos_d_2014 = eleitos[(eleitos['cargo'] == 'DEPUTADO FEDERAL')]

eleitos_d_2014["nome_completo"] = eleitos_d_2014["nome_completo"].apply(f)

/home/reinaldo/Documentos/Code/seguranca/lib/python3.6/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """

eleitos_d_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 513 entries, 144 to 26209
Data columns (total 9 columns):
Unnamed: 0         513 non-null int64
uf                 513 non-null object
cargo              513 non-null object
nome_completo      513 non-null object
sequencial         513 non-null object
cpf                513 non-null object
nome_urna          513 non-null object
partido_eleicao    513 non-null object
situacao           513 non-null object
dtypes: int64(1), object(8)
memory usage: 40.1+ KB

The accent has been removed, it seems. But is there any risk of faults in some lines? Please, how to avoid this warning message? How to use .loc?

    
asked by anonymous 02.03.2018 / 15:03

1 answer

1

I solved the same problem (in fact, with the same database) without using unidecode.

from bs4 import BeautifulSoup
import requests
import pandas as pd

candidatosal2014 = pd.read_csv("candidatos_alagoas_2014.csv", encoding="latin1", delimiter=";", header=None, usecols=[9, 10, 14, 43, 44])

candidatosal2014[10] = candidatosal2014[10].str.normalize('NFKD').str.encode('ascii', errors='ignore').str.decode('utf-8')



display(candidatosal2014.loc[candidatosal2014[43].isin([1,2,3])]) #1 é eleito, 2 é eleito por quociente parlamentar e 3 é eleito por média
    
13.03.2018 / 07:25