In Python 3 and Pandas I have two dataframes
eleitos_d_doadores_d.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16090 entries, 16 to 26412
Data columns (total 23 columns):
uf_x 16090 non-null object
partido_eleicao_x 16090 non-null object
cargo_x 16090 non-null object
nome_completo_x 16090 non-null object
cpf 16090 non-null object
cpf_cnpj_doador 16090 non-null object
nome_doador 16090 non-null object
valor 16090 non-null object
tipo_receita 16090 non-null object
fonte_recurso 16090 non-null object
especie_recurso 16090 non-null object
cpf_cnpj_doador_originario 16090 non-null object
nome_doador_originario 16090 non-null object
tipo_doador_originario 16090 non-null object
Unnamed: 0 16090 non-null int64
uf_y 16090 non-null object
cargo_y 16090 non-null object
nome_completo_y 16090 non-null object
nome_urna 16090 non-null object
partido_eleicao_y 16090 non-null object
situacao 16090 non-null object
dtypes: int64(1), object(22)
memory usage: 2.9+ MB
segura.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35581 entries, 0 to 35580
Data columns (total 3 columns):
cnpj 35581 non-null object
nome_empresa 35581 non-null object
estado 35581 non-null object
dtypes: object(3)
memory usage: 834.0+ KB
I need a merge of this type, to eliminate duplicates
segura_doadores_diretos = pd.merge(eleitos_d_doadores_d, segura.drop_duplicates('cnpj'), left_on='cpf_cnpj_doador', right_on='cnpj')
The columns "cnpj" and "cpf_cnpj_doador" are codes. But I need to compare only the first eight digits of each column. Please, is it possible to do this? Without having to create new columns with the excerpts