In pandas, how to merge two dataframes, but only a two-column stretch?

Question

In pandas, how to merge two dataframes, but only a two-column stretch?

Navigation

#1 by (2 votes)

0

In Python 3 and Pandas I have two dataframes

eleitos_d_doadores_d.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16090 entries, 16 to 26412
Data columns (total 23 columns):
uf_x                           16090 non-null object
partido_eleicao_x              16090 non-null object
cargo_x                        16090 non-null object
nome_completo_x                16090 non-null object
cpf                            16090 non-null object
cpf_cnpj_doador                16090 non-null object
nome_doador                    16090 non-null object
valor                          16090 non-null object
tipo_receita                   16090 non-null object
fonte_recurso                  16090 non-null object
especie_recurso                16090 non-null object
cpf_cnpj_doador_originario     16090 non-null object
nome_doador_originario         16090 non-null object
tipo_doador_originario         16090 non-null object
Unnamed: 0                     16090 non-null int64
uf_y                           16090 non-null object
cargo_y                        16090 non-null object
nome_completo_y                16090 non-null object
nome_urna                      16090 non-null object
partido_eleicao_y              16090 non-null object
situacao                       16090 non-null object
dtypes: int64(1), object(22)
memory usage: 2.9+ MB

segura.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35581 entries, 0 to 35580
Data columns (total 3 columns):
cnpj            35581 non-null object
nome_empresa    35581 non-null object
estado          35581 non-null object
dtypes: object(3)
memory usage: 834.0+ KB

I need a merge of this type, to eliminate duplicates

segura_doadores_diretos = pd.merge(eleitos_d_doadores_d, segura.drop_duplicates('cnpj'), left_on='cpf_cnpj_doador', right_on='cnpj')

The columns "cnpj" and "cpf_cnpj_doador" are codes. But I need to compare only the first eight digits of each column. Please, is it possible to do this? Without having to create new columns with the excerpts

python pandas

asked by anonymous 21.02.2018 / 18:03

1 answer

How to mount the page data received via get Error in database workbench, birth date, error 0000-00-00

score 2 · Accepted Answer

Looking at the Pandas documentation I do not believe it's possible without creating a temporary column because the merge method only gets the label name as the key.

link

I think you have to create the same temporary table. Then just delete.

segura['CNPJ_8_digitos'] = segura.cnpj.str[:8]
eleitos_d_doadores_d['cpf_cnpj_doador_8_digitos'] = segura.cpf_cnpj_doador.str[:8]

segura_doadores_diretos = pd.merge(eleitos_d_doadores_d, segura.drop_duplicates('cnpj'), left_on='cpf_cnpj_doador_8_digitos', right_on='CNPJ_8_digitos')