How to make a count of how many candidates are on this page? Python 3.6

2

Simple thing. I need to make a count of how many candidates in the table on this page, for example: link  For example there are 110 names, but I need to get this number and I have to do it in a huge number of pages with the same structure. Here's what I've tried:

from bs4 import BeautifulSoup
import requests
import string
import re
import urllib
r = requests.get('http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-    de-espera-sisu-3/?id_curso=01GV&id_grupo=70')
soup = BeautifulSoup(r.text, "html.parser")
contador = 0
for node in soup.findAll(".XXX-XX<"):
  contador = contador+1
print(contador)  

But he is not finding these characters being that they are there, in the cpf column, for example ...  How to do this ?

    
asked by anonymous 02.03.2017 / 00:21

3 answers

0

If only to know how many candidates there are:

import requests
from bs4 import BeautifulSoup as bs

req = requests.get('http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70')
soup = bs(req.text, 'html.parser')
rows = soup.select('#sisu tr')
print(len(rows[1:])) # 110

BeautifulSoup Installation

Note that I make rows[1:] because the first row ( <tr> ) are the column names (I think they do not count as a candidate)

    
02.03.2017 / 11:10
1
print(len(re.findall('XXX-XX', str(soup))))
    
02.03.2017 / 03:32
0
import HTMLParser
import urllib2
import re
from pprint import pprint

request = urllib2.Request("websiteURL")

response = urllib2.urlopen(request)

responseContent = response.read()

# Aqui pelo que reparei o que precisas é pegar apenas o conteudo da primeira coluna de cada linha, para isso a utilização desta regular expression e do findall
match = re.findall(r'<tr></td>(.*)</td>', responseContent)

# Depois de teres o code podes fazer o que bem desejares, contar, imprimir...
for code in match:
    print code

I think the code is working perfectly, if you have some error you can simply correct because by logic what you want is here

    
02.03.2017 / 04:06