How to select codes with different sizes in pandas?

0

In Python 3, with pandas, I have this dataframe with several codes in the columns "CPF_CNPJ_doador" and "CPF_CNPJ_doador_originario"

cand_doacoes = pd.read_csv("doacoes_csv.csv",sep=';',encoding = 'latin_1',  decimal = ",")

cand_doacoes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 427489 entries, 0 to 427488
Data columns (total 12 columns):
UF                                427489 non-null object
Partido                           427489 non-null object
Cargo                             427489 non-null object
Nome_candidato                    427489 non-null object
CPF_candidato                     427489 non-null int64
CPF_CNPJ_doador                   426681 non-null float64
Nome_doador                       427489 non-null object
Nome_doador_Receita               427489 non-null object
Valor                             427489 non-null float64
CPF_CNPJ_doador_originario        427489 non-null object
Nome_doador_originario            427489 non-null object
Nome_doador_originario_Receita    427489 non-null object
dtypes: float64(2), int64(1), object(9)
memory usage: 39.1+ MB

The codes in the columns "CPF_CNPJ_doador" and "CPF_CNPJ_doador_originario" are always integers and of different sizes: 14 digits, 13 digits, 11 digits or 10 digits.

I need to create a dataframe with only 14- and 13-digit codes. Please, does anyone know how I can select only the 14- and 13-digit codes in the "CPF_CNPJ_doador" column in the dataframe "cand_doacoes"? Do I need to convert to string?

    
asked by anonymous 21.11.2017 / 13:09

1 answer

1

Hello,

According to the dataframe information above, the CPFs you are dealing with are float64, which makes things a little easier.

You can make slice in your dataframe by taking only the values that interest you. For this, you can apply a function that detects the size of the CPFs of the desired column.

Here I use the function apply with an anonymous function lambda as the parameter that does the calculations for me.

df[df['CPF_CNPJ_doador'].apply(lambda x: len(str(x)) == 13 or len(str(x)) == 14)]
    
12.07.2018 / 02:41