Grouping table

3

I have this table and would like to set up a list or another table

     users_ID_User                                           Material  Rating
0            A            "Hey Pogo!...from citation to fulltext"       5
1            B            "Hey Pogo!...from citation to fulltext"       4
2            C            "Hey Pogo!...from citation to fulltext"       5
3            D            "Hey Pogo!...from citation to fulltext"       4
4            D              "Julius Caesar" Speech Analysis StAIR       4
5            A              "Julius Caesar" Speech Analysis StAIR       4
6            B              "Julius Caesar" Speech Analysis StAIR       4
7            E              "Julius Caesar" Speech Analysis StAIR       4
8            E  "Virgin Birth" method promises ethical stem cells       5
9             A  "Virgin Birth" method promises ethical stem cells       3
10             D                   .NET Overview, VB.NET and C#.NET       5
11            C                   .NET Overview, VB.NET and C#.NET       4
12            D  1 Project Management and Software Engineering ...       4
13             A  1 Project Management and Software Engineering ...       4
14             S  1 Project Management and Software Engineering ...       4
15             B  1 Project Management and Software Engineering ...       4
16            A  1 Project Management and Software Engineering ...       4
17           C                       10 Best Foods 10 Worst Foods       1
18             D                       10 Best Foods 10 Worst Foods       2
19           E                       10 Best Foods 10 Worst Foods       4
20           E                           100 preguntas de química       0
21           F                           100 preguntas de química       0
22           G                           100 preguntas de química       0
23           A                           100 preguntas de química       2
24           B                           100 preguntas de química       0
25           C                           100 preguntas de química       4
26           E                           100 preguntas de química       5
27           A                           100 preguntas de química       4
28           F                           1000 Quick Writing Ideas       2
29           A                           1000 Quick Writing Ideas       0

I'm using pandas, I've tried:

df_3 = pd.read_csv("tabela_final_1.csv",delimiter=",")
del df_3['Unnamed: 0']
df_3 = df_3.drop_duplicates(['users_ID_User','Material'])

df_3 = df_3.pivot('users_ID_User','Material','Rating')

It did not work as I would like, I'm trying to mount.

Usuario   Hey Pogo !...       Julius Caesar ...  Virgin Birth ... 
A                     5           4
B                     4           4
C                     6           NaN
D                     4           NaN
E                     NaN          5                    5

and at the end order by the line that has the highest rating ...

In the case shown would be first A, B, E ......

I would like a logic to mount this table, I'm kind of confused!

    
asked by anonymous 28.08.2018 / 02:44

2 answers

3

A simple strategy is to create a rating count column greater than 0, sort the table using that new column as a criterion, and then dispense it.

For this, the code below resolves

import pandas as pd

# do exemplo do autor:
# df = pd.read_csv("tabela_final_1.csv")
# del df['Unnamed: 0']
# df = df.drop_duplicates(['users_ID_User','Material'])
# df = df.pivot('users_ID_User','Material','Rating')

# Criando a coluna de quantidade de avaliações:
# - 'df > 0' resulta em Trues para valores > 0
# - '.sum(axis=1)' soma os Trues de cada linha
df["num_ratings"] = (df > 0).sum(axis=1)

# Reordenando a tabela.
df = df.sort_values("num_ratings", ascending=False)

# Excluindo a coluna auxiliar das contagens.
df = df.drop("num_ratings", axis=1)
    
28.08.2018 / 05:54
2

One solution I found was to get the unique values of the columns 'users_ID_User' and 'Material', generate a new data through a dictionary and insert that data with an append into another dataframe. (In my example I used your dataframe until index 9)

Take the unique values:

lista_valores_unicos_ID = list(set(df['users_ID_User'].values))
>>> print(lista_valores_unicos_ID)
['B', 'A', 'C', 'E', 'D']

lista_valores_unicos_material = list(set(df['Material'].values))
>>> print(lista_valores_unicos_material)
['"Julius Caesar" Speech Analysis StAIR', '"Virgin Birth" method promises ethical stem cells', 'Hey Pogo!...from citation to fulltext']

Now I create an empty dataframe with the desired output template:

data_resposta = {'users_ID_User':[], 'Rating':[]}
for elemento in lista_valores_unicos_material:
    data_resposta.update({elemento:[]})

df_resultado = pd.DataFrame(data=data_resposta)

To make it easier, I created a dataframe with the already calculated Ratings, so I can just get these values:

df_groupby = df.groupby('users_ID_User').agg({'Rating':'sum',})
>>> print(df_groupby)
               Rating
users_ID_User        
A                  12
B                   8
C                   5
D                   8
E                   9

#Pego eles assim
>>> print(df_groupby.loc['A'].values[0])
12

I can get the occurrence of each sentence in the 'Material' column of a specific user like this:

df_unico_A = df.loc[df['users_ID_User'] == 'A']['Material'].value_counts().to_frame()
>>> print(df_unico_A)
                                                   Material
"Julius Caesar" Speech Analysis StAIR                     1
"Virgin Birth" method promises ethical stem cells         1
Hey Pogo!...from citation to fulltext                     1

#Iterando por linha
>>> for index, row in df_unico_A.iterrows():
        print(index, ':', row['Material'])

"Julius Caesar" Speech Analysis StAIR : 1
"Virgin Birth" method promises ethical stem cells : 1
Hey Pogo!...from citation to fulltext : 1

Now iterate for each user of the lista_valores_unicos_ID list by picking up the occurrences with the above logic, placing it in a dictionary, and inserting the data in df_resultado with an append:

for id in lista_valores_unicos_ID:

    nova_data = {'users_ID_User':id, 'Rating':df_groupby.loc[id].values[0]}
    for elemento in lista_valores_unicos_material:
        #Cada frase é inicializada com uma contagem 0
        nova_data.update({elemento:0})

    df_unicos = df.loc[df['users_ID_User'] == id]['Material'].value_counts().to_frame()

    for index, row in df_unicos.iterrows():
        nova_data[index] = row['Material']

    #for i in (0,len(df_unicos)):
    #    nova_data[df_unicos.index[i]] = df_unicos.values[i]

    df_resultado = df_resultado.append(nova_data, ignore_index=True)

Now the df_resultado is the expected output, but there is no order:

df_resultado = df_resultado.sort_values('Rating', ascending=False )

Now its output should be something like: (I cut the sentences manually and put '...', but did not change the values obtained)

>>> print(df_resultado )
   "Julius...   "Virgin...   Hey Pogo!...   Rating   users_ID_User  
1         1.0          1.0            1.0     12.0               A  
3         1.0          1.0            0.0      9.0               E  
0         1.0          0.0            1.0      8.0               B  
4         1.0          0.0            1.0      8.0               D  
2         0.0          0.0            1.0      5.0               C  
    
28.08.2018 / 04:18