Stacked data. How to work with this in pandas? [closed]

1

I have a table that is structured with "stacked" data, that is, all the information of a customer occupy some first lines. After completing the information of this client, the next client occupies the next lines, and so on. I'm seeing how I can work this on pandas. In the header of each block of data of a given customer, there is some identification information, including its ID, which is denominated as Matricula1, Matricula2, Matricula3 ... MatriculaN. One idea I had was to create a column, copy the Matricula data to it and repeat the enrollment field until the next enrollment. For example, in the case of the image below, repeat Enrollment1 to line B25. On line B26, the enrollment changes to become a Matricula2, and then repeat this value until the enrollment of another customer. How can I do this? Thankful.

    
asked by anonymous 12.12.2018 / 21:46

1 answer

0
___ erkimt ___ Stacked data. How to work with this in pandas? [closed] ______ qstntxt ___

I have a table that is structured with "stacked" data, that is, all the information of a customer occupy some first lines. After completing the information of this client, the next client occupies the next lines, and so on. I'm seeing how I can work this on pandas. In the header of each block of data of a given customer, there is some identification information, including its ID, which is denominated as Matricula1, Matricula2, Matricula3 ... MatriculaN. One idea I had was to create a column, copy the Matricula data to it and repeat the enrollment field until the next enrollment. For example, in the case of the image below, repeat Enrollment1 to line B25. On line B26, the enrollment changes to become a Matricula2, and then repeat this value until the enrollment of another customer. How can I do this? Thankful.

    
_________azszpr349920

Use Groupby

df = pd.DataFrame({'Time': ['Alpha', 'Alpha', 'Beta', 'Beta', 'Gama', 'Delta', 
                            'Gama', 'Gama', 'Alpha', 'Delta', 'Delta', 'Alpha'],
        'Rank': [2, 1, 3, 2, 3, 1, 4, 1, 2, 4, 1, 2],
        'Ano': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
        'Pontos':[976,689,963,773,845,712,866,999,684,721,794,700]})

print(df)
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
2    Beta     3  2014     963
3    Beta     2  2015     773
4    Gama     3  2014     845
5   Delta     1  2015     712
6    Gama     4  2016     866
7    Gama     1  2017     999
8   Alpha     2  2016     684
9   Delta     4  2014     721
10  Delta     1  2015     794
11  Alpha     2  2017     700

Grouping by the desired column (accept multiple):

dfg = df.groupby('Time')

Iterating over groups:

for name, group in dfg:
  print(name, group, sep='\n')
Alpha
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700
Beta
   Time  Rank   Ano  Pontos
2  Beta     3  2014     963
3  Beta     2  2015     773
Delta
     Time  Rank   Ano  Pontos
5   Delta     1  2015     712
9   Delta     4  2014     721
10  Delta     1  2015     794
Gama
   Time  Rank   Ano  Pontos
4  Gama     3  2014     845
6  Gama     4  2016     866
7  Gama     1  2017     999

Selecting a group:

print (dfg.get_group('Alpha'))
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700

Aggregations:

print('Media dos pontos de cada time',dfg.Pontos.agg(np.mean), sep='\n')
Media dos pontos de cada time
Time
Alpha    762.250000
Beta     868.000000
Delta    742.333333
Gama     903.333333
Name: Pontos, dtype: float64

print('Somatória dos pontos de cada time',dfg.Pontos.agg(np.sum), sep='\n')
Somatória dos pontos de cada time
Time
Alpha    3049
Beta     1736
Delta    2227
Gama     2710
Name: Pontos, dtype: int64

Filtering:

print('Times que estão presentes 4+ vezes no conjunto de dados:',\  
       dfg.filter(lambda n: len(n) >= 4),  sep='\n')
Times que estão presentes 4+ vezes no conjunto de dados:
     Time  Rank   Ano  Pontos
0   Alpha     2  2014     976
1   Alpha     1  2015     689
8   Alpha     2  2016     684
11  Alpha     2  2017     700

The imagination is the limit to what u can do with pd.groupby %: -)

See the repl.it

    
___
13.12.2018 / 17:19