___ erkimt ___ Stacked data. How to work with this in pandas? [closed] ______ qstntxt ___
I have a table that is structured with "stacked" data, that is, all the information of a customer occupy some first lines. After completing the information of this client, the next client occupies the next lines, and so on. I'm seeing how I can work this on pandas. In the header of each block of data of a given customer, there is some identification information, including its ID, which is denominated as Matricula1, Matricula2, Matricula3 ... MatriculaN. One idea I had was to create a column, copy the Matricula data to it and repeat the enrollment field until the next enrollment. For example, in the case of the image below, repeat Enrollment1 to line B25. On line B26, the enrollment changes to become a Matricula2, and then repeat this value until the enrollment of another customer. How can I do this? Thankful.
_________azszpr349920
Use Groupby
df = pd.DataFrame({'Time': ['Alpha', 'Alpha', 'Beta', 'Beta', 'Gama', 'Delta',
'Gama', 'Gama', 'Alpha', 'Delta', 'Delta', 'Alpha'],
'Rank': [2, 1, 3, 2, 3, 1, 4, 1, 2, 4, 1, 2],
'Ano': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Pontos':[976,689,963,773,845,712,866,999,684,721,794,700]})
print(df)
Time Rank Ano Pontos
0 Alpha 2 2014 976
1 Alpha 1 2015 689
2 Beta 3 2014 963
3 Beta 2 2015 773
4 Gama 3 2014 845
5 Delta 1 2015 712
6 Gama 4 2016 866
7 Gama 1 2017 999
8 Alpha 2 2016 684
9 Delta 4 2014 721
10 Delta 1 2015 794
11 Alpha 2 2017 700
Grouping by the desired column (accept multiple):
dfg = df.groupby('Time')
Iterating over groups:
for name, group in dfg:
print(name, group, sep='\n')
Alpha
Time Rank Ano Pontos
0 Alpha 2 2014 976
1 Alpha 1 2015 689
8 Alpha 2 2016 684
11 Alpha 2 2017 700
Beta
Time Rank Ano Pontos
2 Beta 3 2014 963
3 Beta 2 2015 773
Delta
Time Rank Ano Pontos
5 Delta 1 2015 712
9 Delta 4 2014 721
10 Delta 1 2015 794
Gama
Time Rank Ano Pontos
4 Gama 3 2014 845
6 Gama 4 2016 866
7 Gama 1 2017 999
Selecting a group:
print (dfg.get_group('Alpha'))
Time Rank Ano Pontos
0 Alpha 2 2014 976
1 Alpha 1 2015 689
8 Alpha 2 2016 684
11 Alpha 2 2017 700
Aggregations:
print('Media dos pontos de cada time',dfg.Pontos.agg(np.mean), sep='\n')
Media dos pontos de cada time
Time
Alpha 762.250000
Beta 868.000000
Delta 742.333333
Gama 903.333333
Name: Pontos, dtype: float64
print('Somatória dos pontos de cada time',dfg.Pontos.agg(np.sum), sep='\n')
Somatória dos pontos de cada time
Time
Alpha 3049
Beta 1736
Delta 2227
Gama 2710
Name: Pontos, dtype: int64
Filtering:
print('Times que estão presentes 4+ vezes no conjunto de dados:',\
dfg.filter(lambda n: len(n) >= 4), sep='\n')
Times que estão presentes 4+ vezes no conjunto de dados:
Time Rank Ano Pontos
0 Alpha 2 2014 976
1 Alpha 1 2015 689
8 Alpha 2 2016 684
11 Alpha 2 2017 700
The imagination is the limit to what u can do with pd.groupby
%: -)
See the repl.it