Grouping of groupby pandas table

3

I have a table in this format

User   Material   Nota
1       youtube    5
2       facebook   4
3       facebook   3
4       facebook   5
6       youtube    2
7       orkut      3
8       twitter    4

and would like to group so that in the final table shows only sites rated by more than 1 different user

Material   User    Nota
youtube     1      5
            6      2
facebook    2      4 
            3      3
            4      5

My code and my logic

import pandas as pd

df = pd.read_csv("tabela1.csv",delimiter=",")


tabela2= pd.DataFrame({'count' : df.groupby(["Material","User","Nota"]).size()})


del tabela2['count']

tabela2.to_csv('tabela_2.csv')
tabela2 = pd.read_csv("tabela_2.csv",delimiter=",")

But so it lists those that have been evaluated by 1 user, I wonder if there is something in grupby that lists only the different ones to group?

What I thought to do both is 1 fixed in the Material column and in this case counting how many times each material appears after deleting the materials that appear less than twice, to treat materials with less than 1 evaluation, however I believe that for a very large base this will be very costly in time

    
asked by anonymous 17.07.2018 / 17:10

1 answer

1

You can use value_counts to find out which of the values in the Material column have more than one entry, and then filter your DataFrame by these values:

import pandas as pd
import io

dados = io.StringIO("""
User,Material,Nota
1,youtube,5
2,facebook,4
3,facebook,3
4,facebook,5
6,youtube,2
7,orkut,3
8,twitter,4""")

df = pd.read_csv(dados, index_col='User')

# Criamos uma série com a contagem de 
# ocorrências pra cada valor da coluna Material
counts = df['Material'].value_counts()
print(counts)  # facebook 3, youtube 2, orkut 1...

# Filtramos para que a série só inclua os 
# materiais com mais de uma contagem
counts = counts[counts > 1]
print(counts.index)  # Index(['facebook', 'youtube'], dtype='object')

# Filtramos o dataframe original para que 
# contenha somente entradas que estejam na série
df = df[df['Material'].isin(counts.index)]
print(df)

# Se quisermos, podemos ordená-los também, como
# no exemplo dado na questão
df = df.sort_values('Material')
print(df)

Final result:

User  Material     Nota
2     facebook     4
3     facebook     3
4     facebook     5
1      youtube     5
6      youtube     2
    
18.07.2018 / 04:46