Create list with column contents

0

Hello Please, I have a pandas.core.frame.DataFrame with these columns, in Python3:

Estado            150 non-null object
Cargo             150 non-null object
Nome_candidato    150 non-null object
CPF               150 non-null int64
Nome_urna         150 non-null object
Partido           150 non-null object
Situacao          150 non-null object
Avaliacao         130 non-null object
Projeto           150 non-null object
Link              72 non-null object
Autor_1           150 non-null object
Autor_1_limpo     150 non-null object
Autor_2           6 non-null object
Autor_2_limpo     6 non-null object
Autor_3           1 non-null object
Autor_3_limpo     1 non-null object
Autor_4           1 non-null object
Tipo              150 non-null object
Fonte             150 non-null object

I want to create a list with only the contents of the Project column. I did so:

projetos_eleitos = []
for i in autor1:
    valor_projeto = i.Projeto 
    projetos_eleitos.append([valor_projeto])

With this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-32-c6845dcc0293> in <module>()
      1 for i in autor1:
----> 2     valor_projeto = i.Projeto
      3     projetos_eleitos.append([valor_projeto])
      4 

AttributeError: 'str' object has no attribute 'Projeto'

Does anyone know what the error is?

    
asked by anonymous 06.10.2017 / 16:12

2 answers

1

If you are using pandas you do not need to use a for ...

If you just want to turn projects into a list, do df.Projeto.tolist()

For example:

import pandas as pd

df = pd.DataFrame({'Autor': ['João', 'João', 'Maria', 'Maria', 'Joana'],
                   'Projeto': ['P'+str(i+1) for i in range(5)]})
print(df)

#    Autor Projeto
# 0   João      P1
# 1   João      P2
# 2  Maria      P3
# 3  Maria      P4
# 4  Joana      P5

print(df.Projeto.tolist())

['P1', 'P2', 'P3', 'P4', 'P5']

Now if you want to group the projects by author, you can generate a list of lists

df.groupby('Autor').apply(lambda grupo: grupo.Projeto.tolist()).tolist()

This returns [['P5'], ['P1', 'P2'], ['P3', 'P4']]

One drawback to this approach is that you miss the project author's reference. In this case one option is to create a dictionary:

df.groupby('Autor').apply(lambda grupo: grupo.Projeto.tolist()).to_dict()

What it generates {'Joana': ['P5'], 'João': ['P1', 'P2'], 'Maria': ['P3', 'P4']}

    
06.10.2017 / 19:50
0

Maybe this helps:

projetos_eleitos = []

for i in autor1:
    projetos_eleitos.append( i['valor_projeto'] )
    
06.10.2017 / 18:07