Well my program takes URLs from a text file, enters them and checks to see if it has a certain text in its HTML code, it would be possible to read all the lines of the file, and instead of making a request at a time, does it all at once?
Well my program takes URLs from a text file, enters them and checks to see if it has a certain text in its HTML code, it would be possible to read all the lines of the file, and instead of making a request at a time, does it all at once?
Asynchronously?
If so, one way to do this is to use grequests
to make the requisitions. To install, in the terminal type:
pip install grequests
You can use it like this (adapted from documentation ):
# -*- coding: utf-8 -*-
import grequests
urls = [
'https://www.python.org/',
'https://pypi.python.org/pypi/grequests',
'http://pt.stackoverflow.com/'
]
requisicoes = (grequests.get(url) for url in urls)
mp = grequests.map(requisicoes)
for request in mp:
print ("{}: {}".format(request.status_code, request.url))
#(print request.content)
To implement cookies with grequests
, do so:
sessao = grequests.Session()
cookies = { 'foo': 'bar', 'baz': 'bar' }
requisicao = sessao.get('http://httpbin.org/cookies', cookies = cookies)
print ("{}: {}".format(requisicao.status_code, requisicao.url))
print (requisicao.text)
The logic is the same for requests
.
If you are using Python > = 3.2 , the module concurrent.futures
may be useful for doing a task asynchronously. The example below uses requests
to make the requests.
# -*- coding: utf-8 -*-
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
def get(url, timeout):
return requests.get(url, timeout = timeout)
def requestUrls(urls, timeout = 5):
with ThreadPoolExecutor(max_workers = 5) as executor:
agenda = { executor.submit(get, url, timeout): url for url in urls }
for tarefa in as_completed(agenda):
try:
conteudo = tarefa.result()
except Exception as e:
print ("Não foi possível fazer a requisição! \n{}".format(e))
else:
yield conteudo
The number of threads is set to max_workers
, if omitted or None
, the default is the number of processors on the machine. Font
Use this:
urls = [
'https://www.python.org/',
'https://pypi.python.org/pypi/requests',
'http://pt.stackoverflow.com/',
]
requisicoes = requestUrls(urls) # timeout é opcional, o padrão é 5
for requisicao in requisicoes:
codigo = requisicao.status_code
url = requisicao.url
conteudo = requisicao.content
print ("{}: {}".format(codigo, url))