Web Crawler searching for specific text on the page

Question

Web Crawler searching for specific text on the page

Navigation

#1 by (1 votes)

3

Well, I'm doing a web crawler to fetch the value of a coin.

I wrote the following code in python:

#coding: utf-8

from urllib2 import urlopen

conteudo = urlopen('http://dolarhoje.com/bitcoin').read()

procurar1 = '<span class="symbol">'
posicao1 = int(conteudo.index(procurar1) + len(procurar1))
moeda1 = conteudo[posicao1 : posicao1 + 3]

procurar2 = '<span class="symbol">'
posicao2 = int(conteudo.index(procurar2) + len(procurar2))
moeda2 = conteudo[posicao2 : posicao2 + 3]

procurar3 = '<input type="text" id="nacional" value="'
posicao3 = int(conteudo.index(procurar3) + len(procurar3))
valor = conteudo[posicao3 : posicao3 + 8]

print(moeda1 + ' 1,00 ' + 'vale ' + moeda2 + ' ' + valor)
print ('\n')

I know that when I enter: procurar1 = '<span class="symbol">' and I use: conteudo.index(procurar1) it will return the first incidence, but I would like to call the second incidence.

The executed code will return: ฿ 1,00 vale ฿ 25086,77

Expected: ฿ 1,00 vale R$ 25086,77

That is, return both the first currency symbol and the second currency symbol, taking only the second similarity issue of the page code.

How to do it?

python web-crawler crawling

asked by anonymous 03.11.2017 / 20:43

1 answer

How to detect the source (file) of a function through the developer console? Can I create an event in PhpMyAdmin?

score 1 · Answer 1

You can do this more easily with the MechanicalSoup library ( link )

To use just install in your environment: pip install MechanicalSoup

To get the value you want is very simple:

import mechanicalsoup


browser = mechanicalsoup.StatefulBrowser()
browser.open("http://dolarhoje.com/bitcoin")

page = browser.get_current_page()

symbols = page.select(".symbol")
inputs = page.find_all("input")

moeda1 = { 'symbol': symbols[0].text, 'value': inputs[0].attrs['value'] }
moeda2 = { 'symbol': symbols[1].text, 'value': inputs[1].attrs['value'] }

print(moeda1)
print(moeda2)