Consider two sets of data read from files of type *.CSV
with Pandas
. Each set has only one CPF Favorecido
field, which contains millions of records. Each set of data is equal to one month.
I need to find out what records (CPF numbers) are in one dataset but not another.
The code looks like this:
atual = pandas.read_csv(arquivo_atual, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])
seguinte = pandas.read_csv(arquivo_seguinte, header=0, delimiter='\t', quotechar='"', usecols=['CPF Favorecido'])
I need only the count of the CPFs that appear in the atual
file but are not in the seguinte
file and vice versa.
Is there a function that counts these records? Or do I need to build a loop and do the comparison one by one?