Python / Pandas - How to create a data frame that contains the original line and the duplicate line

Question

Python / Pandas - How to create a data frame that contains the original line and the duplicate line

Navigation

#1 by (1 votes)
#2 by (0 votes)

0

In a data frame that contains two lines with Country = India, I was able to create a data frame without duplicity with only one line in India A data frame only with the duplicate line I need to create a data frame that contains only the two lines of Country = India How can I do this?

import pandas as pd
import numpy as np
data = {
'País': ['Bélgica', 'Índia', 'Brasil','Índia'],
'Capital': ['Bruxelas', 'Nova Delhi', 'Brasília', 'Nova Delhi'],
'População': [123465, 456789, 987654, 456789]
}
# gera DF excluindo as linhas duplicadas
drop_df = df.drop_duplicates()
# gera data frame somente com as duplicidades 
dfdrop = df[df.duplicated() == True]

How to generate a DF only with the two lines of the Country India ???

python pandas

asked by anonymous 29.06.2017 / 16:26

2 answers

Know result of count () in query How to handle EXCEL with PHP?

score 1 · Answer 1

( TL; DR ) Building the dataframe from the data:

import pandas as pd
import numpy as np
from collections import OrderedDict
data = OrderedDict(
{
'País': ['Bélgica', 'Índia', 'Brasil','Índia'],
'Capital': ['Bruxelas', 'Nova Delhi', 'Brasília', 'Nova Delhi'],
'População': [123465, 456789, 987654, 456789]
})

df = pd.DataFrame(data)

Introducing the original dataframe:

df

Output:

ExpandingDuplicates:

df_clean=df.drop_duplicates()df_clean

output:

SelectingDuplicates:

paises=df.Paísdf_duplicates=df[paises.isin(paises[paises.duplicated()])]df_duplicates

Output:

View the code running on a jupyter notebook.

score 0 · Answer 2

Searching the duplicated command, it has the keep keep - last mark only the second as duplicate (default) = first mark only the first = False scores both So to create the dataframe with the duplicate lines just run the command: dfdrop = df [df.duplicated ('Country', keep = False) == True]