Python / Pandas - How to create a data frame that contains the original line and the duplicate line

0

In a data frame that contains two lines with Country = India, I was able to create a data frame without duplicity with only one line in India A data frame only with the duplicate line I need to create a data frame that contains only the two lines of Country = India How can I do this?

import pandas as pd
import numpy as np
data = {
'País': ['Bélgica', 'Índia', 'Brasil','Índia'],
'Capital': ['Bruxelas', 'Nova Delhi', 'Brasília', 'Nova Delhi'],
'População': [123465, 456789, 987654, 456789]
}
# gera DF excluindo as linhas duplicadas
drop_df = df.drop_duplicates()
# gera data frame somente com as duplicidades 
dfdrop = df[df.duplicated() == True]

How to generate a DF only with the two lines of the Country India ???

    
asked by anonymous 29.06.2017 / 16:26

2 answers

1

( TL; DR ) Building the dataframe from the data:

import pandas as pd
import numpy as np
from collections import OrderedDict
data = OrderedDict(
{
'País': ['Bélgica', 'Índia', 'Brasil','Índia'],
'Capital': ['Bruxelas', 'Nova Delhi', 'Brasília', 'Nova Delhi'],
'População': [123465, 456789, 987654, 456789]
})

df = pd.DataFrame(data)

Introducing the original dataframe:

df

Output:

ExpandingDuplicates:

df_clean=df.drop_duplicates()df_clean

output:

SelectingDuplicates:

paises=df.Paísdf_duplicates=df[paises.isin(paises[paises.duplicated()])]df_duplicates

Output:

View the code running on a jupyter notebook.

    
30.06.2017 / 14:26
0

Searching the duplicated command, it has the keep keep - last mark only the second as duplicate (default)      = first mark only the first      = False scores both So to create the dataframe with the duplicate lines just run the command: dfdrop = df [df.duplicated ('Country', keep = False) == True]

    
13.07.2017 / 17:41