Compare fields in two data sets

Question

Compare fields in two data sets

Navigation

#1 by (1 votes)

1

Consider two sets of data read from files of type *.CSV with Pandas . Each set has only one CPF Favorecido field, which contains millions of records. Each set of data is equal to one month. I need to find out what records (CPF numbers) are in one dataset but not another.

The code looks like this:

atual = pandas.read_csv(arquivo_atual, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])  
seguinte = pandas.read_csv(arquivo_seguinte, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])

I need only the count of the CPFs that appear in the atual file but are not in the seguinte file and vice versa.

Is there a function that counts these records? Or do I need to build a loop and do the comparison one by one?

python csv pandas

asked by anonymous 24.05.2016 / 19:11

1 answer

Is it possible to select a table column without entering its name in SQL? Moving character and map with canvas + js

score 1 · Accepted Answer

The way I know it, using pandas, would look like this:

atual.where(~atual['CPF Favorecido'].isin(seguinte['CPF Favorecido'])).count()
seguinte.where(~seguinte['CPF Favorecido'].isin(atual['CPF Favorecido'])).count()