Mounting table from a csv, grouped per week, with python and pandas

2

I'm using pandas and I open the following table using the code

tst = pd.read_csv('Iteracao.csv',delimiter=",")

I'mtryingtogroupasfollows,whereweek1istheweekofthedate2017-02-10/2017-02-16,followingthelogicfortheotherweek,theindexdoesnothavetobeweek,itmaybethedatebutI'mtryingtogroupitlikethis.

My attempt was to use the following logic, I saved in a column

semana = df['Semana'].dt.to_period('W-THU')
semana = np.unique(semana)

I only get once each value afterwards I would try to go through a for iterrows to check if the week (i) equals the row row Week row ['Week'] dps try to save in another table.

for index, row in tst.iterrows():
    for i in range(s):      
        n = semana[i]
        if row['Semana'] == n :
            print('1')
            tp['Nome']=row['Nome completo']
            tp['count']=row['count']
            print(tp)

The answer need not be code but just logical for me to follow. Thanks

    
asked by anonymous 20.08.2017 / 04:16

1 answer

1

TL; DR
I noticed that you have applied this response and exported the resulting table to a csv and now you want to mount the table end from reading it, right? As you have left images and not data in this question, I will use the data from that answer , (exported to csv test1.csv ) to present the solution. I swear I tried to answer your request not to put the code but an explanation, but it came in the end even I could not understand and I was implement to see and I ended up doing everything different :-). So I'll reply with code.

Reading the file:

import io
import pandas as pd
from collections import OrderedDict

# Lendo o arquivo para uma lista de linhas
f = open('test1.csv', mode='r')
lines = f.readlines()

# Exculuindo o cabeçalho
del lines[0]

Initial result:

lines
['2017-03-03/2017-03-09,Filho,8\n',
 '2017-03-03/2017-03-09,Gabriel,2\n',
 '2017-03-03/2017-03-09,Jao,10\n',
 '2017-03-03/2017-03-09,Otavio,6\n',
 '2017-03-03/2017-03-09,Talita,9\n',
 '2017-03-10/2017-03-16,Guilherme,1\n',
 '2017-03-10/2017-03-16,Talita,7\n']

Creating dictionary and auxiliary variables

semanas, nomes = [], []
d1 = OrderedDict()
for l in lines:
    line = l.rstrip().split(',')
    if line[0] not in semanas:
        semanas.append(line[0])
        d1[line[0]] = {}
    d1[line[0]][line[1]] = line[2]

nomes=[]
[nomes.append(k) for v in d1.values() for k in v.keys() if k not in nomes] 

Tabbing data to feed the dataframe

data = [] 
for nome in nomes:
    n = []
    for semana in semanas:
        if nome in d1[semana]:
            n.append(d1[semana][nome])
        else:
            n.append(0)
    data.append(n) 

Resolved intermediate (dictionary, names, weeks, and data):

d1
{'2017-03-03/2017-03-09': {'Filho': '8',
  'Gabriel': '2',
  'Jao': '10',
  'Otavio': '6',
  'Talita': '9'},
 '2017-03-10/2017-03-16': {'Guilherme': '1', 'Talita': '7'}}

nomes
['Filho', 'Gabriel', 'Jao', 'Otavio', 'Talita', 'Guilherme']

semanas
['2017-03-03/2017-03-09', '2017-03-10/2017-03-16']

data
[['8', 0], ['2', 0], ['10', 0], ['6', 0], ['9', '7'], [0, '1']]

Constructing and formatting the final table.

columns = []
for i in range(len(semanas)):
    columns.append('Week'+str(i))    
df = pd.DataFrame(data, index=nomes, columns=columns)

End result (image):

If you want, you can name the index, I usually do not because it creates a blank line for the naming only, instead of setting the white space of the index column, I still can not do that.

View the code in a Jupyter Notebook, here.

    
20.08.2017 / 19:15