Web scraping python running javascript on CEF website [closed]

1

CEF changed the way it displays lottery results on its site, before I was able to get the results that all came into HTML via webscraping relatively easily using BealtfulSoup, but now those results are displayed running via javascript browser. I searched the net for some things but could not understand the process itself. If anyone can help me, thank you.

    
asked by anonymous 05.06.2018 / 22:50

3 answers

0

The box's own website provides the download of all results in html format. On this page you can download http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena , but if it is for didactic reasons, you have two alternatives, one is to explore the endpoint that javascript looks for

http://loterias.caixa.gov.br/wps/portal/loterias/landing/megasena/!ut/p/a1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOLNDH0MPAzcDbwMPI0sDBxNXAOMwrzCjA0sjIEKIoEKnN0dPUzMfQwMDEwsjAw8XZw8XMwtfQ0MPM2I02-AAzgaENIfrh-FqsQ9wNnUwNHfxcnSwBgIDUyhCvA5EawAjxsKckMjDDI9FQE-F4ca/dl5/d5/L2dBISEvZ0FBIS9nQSEh/pw/Z7_HGK818G0KO6H80AU71KG7J0072/res/id=buscaResultado/c=cacheLevelPage/=/?timestampAjax=1528262624920

That the only parameter is the timestamp at the end.

Another alternative is to use the selenium library and render the javascript and then pass the already rendered javascript to the beautiful soup for example.

    
06.06.2018 / 04:32
0

The URL " link " still makes available in its content HTML the latest results of the lotteries, and you can extract them as follows:

import requests
from bs4 import BeautifulSoup

req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )

soup = BeautifulSoup( req.content, "html.parser" )

ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )

for li in ul[0].findAll( "li" ):
    print( li.text )

Here is a function that can retrieve the results of Mega Sena using BeautifulSoup :

import requests
from bs4 import BeautifulSoup

def obterDezenasMegaSena():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria mega-sena" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasMegaSena() )

Output:

[3, 6, 11, 27, 28, 46]

The same can be done to extract the drawn dozens of Quina :

import requests
from bs4 import BeautifulSoup

def obterDezenasQuina():
    try:
        req = requests.get( "http://loterias.caixa.gov.br/wps/portal/loterias" )
        soup = BeautifulSoup( req.content, "html.parser" )
        ul = soup.findAll( "ul", class_="resultado-loteria quina" )
        return [ int(li.text) for li in ul[0].findAll( "li" ) ]
    except:
        return None

print( obterDezenasQuina() )

Output:

[21, 25, 40, 66, 67]
    
06.06.2018 / 18:44
0

You can use the " link " website to extract all information about all lottery draws from CEF using BeautifulSoup , see:

import requests
from bs4 import BeautifulSoup

def obterPremiacaoMegaSena( soup, premio ):
    td = soup.find( 'th', text=lambda x: x.startswith(premio)).find_parent('tr').findAll("td")
    if( td[1].text == "-" ):
        return { "Tipo" : premio, "QtdGanhadores" : u"0", "ValorPremio" : u"0,00" }
    else:
        return { "Tipo" : premio, "QtdGanhadores" : td[0].text.split(' ')[0], "ValorPremio" : td[1].text.split(' ')[1] }


def obterResultadoMegaSena( nconcurso ):
    try:
        req = requests.get( "http://www.loteriaseresultados.com.br/megasena/concurso/" + str(nconcurso) )
        soup = BeautifulSoup( req.content, "html.parser" )
        dezenas = [ int(dezena.text) for dezena in soup.findAll( "div", class_="bola bg-success" ) ]
        sena = obterPremiacaoMegaSena( soup, "SENA" )
        quina = obterPremiacaoMegaSena( soup, "QUINA" )
        quadra = obterPremiacaoMegaSena( soup, "QUADRA" )
        return { "Concurso" : nconcurso, "DezenasSorteadas" : dezenas, "Premiacao" : [ sena, quadra, quina ] }
    except:
        return None

print( obterResultadoMegaSena( 2047 ) )

Output:

{
  'Concurso': 2047,
  'DezenasSorteadas': [1, 18, 19, 29, 44, 54],
  'Premiacao': [ {
                   'ValorPremio': u'0,00',
                   'QtdGanhadores': u'0',
                   'Tipo': 'SENA'
                 },
                 { 
                   'ValorPremio': u'1.002,65',
                   'QtdGanhadores': u'2.390',
                   'Tipo': 'QUADRA'
                 },
                 { 
                   'ValorPremio': u'55.914,69',
                   'QtdGanhadores': u'30',
                   'Tipo': 'QUINA'
                 }
               ]
}
    
07.06.2018 / 17:14