How to extract all td names in order?

2

I need to extract all the names of people on this site:

Camara.gov.br

I wrote this code in Python3:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

emendas = urlopen("http://www.camara.gov.br/proposicoesWeb/prop_emendas?idProposicao=2122076&subst=0")

bsObje =  BeautifulSoup(emendas, "lxml")

tabelas = bsObje.findAll("tbody", {"class":"coresAlternadas"})

deputados = []

for linha in tabelas:
    deputados.append(linha.select('td')[3].text.strip())

print(deputados)
Resultado -> ['Laura Carneiro', 'André Figueiredo']

It did not work. Please, does anyone know how I can get all the names in order?

    
asked by anonymous 19.09.2017 / 13:42

2 answers

1

What order do you want? Alphabetical or in the order in which they were found?

Below I'll cover both scenarios:

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen

req = urlopen('http://www.camara.gov.br/proposicoesWeb/prop_emendas?idProposicao=2122076&subst=0')
soup = bs(req.read(), 'html.parser')

tables_ele = soup.findAll('tbody', {'class': 'coresAlternadas'})
deputados = []
for table_ele in tables_ele:
    for row in table_ele.findAll('tr'):
        cols = row.findAll('td')
        deputados.append(cols[3].text.strip())

print(deputados) # pela ordem encontrados na tabela

Then to sort alphabetically you can:

...
deputados = sorted(deputados)

To remove duplicates, (there are many duplicates) and sort alphabetically you can convert the list into a set and sort later:

...
deputados = sorted(set(deputados))
    
19.09.2017 / 14:14
0

From what I saw in the table, the names themselves are not sorted there.

Then you can capture them normally, and in the end, with all the captured list sort them via python, using the sorted command in the list, or else only the .sort() method (this without assigning to the variable) since it operates on the list.

  

Using the sorted

>>> deputados = sorted(deputados)
>>> deputados
['André Figueiredo', 'Laura Carneiro']
  

Using .sort ()

>>> deputados.sort()
>>> deputados
['André Figueiredo', 'Laura Carneiro']
    
19.09.2017 / 14:06