Web Crawler searching for specific text on the page

3

Well, I'm doing a web crawler to fetch the value of a coin.

I wrote the following code in python:

#coding: utf-8

from urllib2 import urlopen

conteudo = urlopen('http://dolarhoje.com/bitcoin').read()

procurar1 = '<span class="symbol">'
posicao1 = int(conteudo.index(procurar1) + len(procurar1))
moeda1 = conteudo[posicao1 : posicao1 + 3]

procurar2 = '<span class="symbol">'
posicao2 = int(conteudo.index(procurar2) + len(procurar2))
moeda2 = conteudo[posicao2 : posicao2 + 3]

procurar3 = '<input type="text" id="nacional" value="'
posicao3 = int(conteudo.index(procurar3) + len(procurar3))
valor = conteudo[posicao3 : posicao3 + 8]

print(moeda1 + ' 1,00 ' + 'vale ' + moeda2 + ' ' + valor)
print ('\n')

I know that when I enter: procurar1 = '<span class="symbol">' and I use: conteudo.index(procurar1) it will return the first incidence, but I would like to call the second incidence.

The executed code will return: ฿ 1,00 vale ฿ 25086,77

Expected: ฿ 1,00 vale R$ 25086,77

That is, return both the first currency symbol and the second currency symbol, taking only the second similarity issue of the page code.

How to do it?

    
asked by anonymous 03.11.2017 / 20:43

1 answer

1

You can do this more easily with the MechanicalSoup library ( link )

To use just install in your environment: pip install MechanicalSoup

To get the value you want is very simple:

import mechanicalsoup


browser = mechanicalsoup.StatefulBrowser()
browser.open("http://dolarhoje.com/bitcoin")

page = browser.get_current_page()

symbols = page.select(".symbol")
inputs = page.find_all("input")

moeda1 = { 'symbol': symbols[0].text, 'value': inputs[0].attrs['value'] }
moeda2 = { 'symbol': symbols[1].text, 'value': inputs[1].attrs['value'] }

print(moeda1)
print(moeda2)
    
04.11.2017 / 01:01