How to send multiple requests at the same time

3

Well my program takes URLs from a text file, enters them and checks to see if it has a certain text in its HTML code, it would be possible to read all the lines of the file, and instead of making a request at a time, does it all at once?

    
asked by anonymous 18.09.2016 / 20:58

1 answer

3

Asynchronously?

If so, one way to do this is to use grequests to make the requisitions. To install, in the terminal type:

pip install grequests

You can use it like this (adapted from documentation ):

# -*- coding: utf-8 -*-
import grequests 

urls = [
    'https://www.python.org/',
    'https://pypi.python.org/pypi/grequests',
    'http://pt.stackoverflow.com/'
]

requisicoes = (grequests.get(url) for url in urls)
mp = grequests.map(requisicoes)

for request in mp:
    print ("{}: {}".format(request.status_code, request.url))
    #(print request.content)

To implement cookies with grequests , do so:

sessao = grequests.Session()
cookies = { 'foo': 'bar', 'baz': 'bar' }

requisicao = sessao.get('http://httpbin.org/cookies', cookies = cookies)

print ("{}: {}".format(requisicao.status_code, requisicao.url))
print (requisicao.text)

The logic is the same for requests .

Python 3.2 +

If you are using Python > = 3.2 , the module concurrent.futures may be useful for doing a task asynchronously. The example below uses requests to make the requests.

# -*- coding: utf-8 -*-
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests

def get(url, timeout):
    return requests.get(url, timeout = timeout)

def requestUrls(urls, timeout = 5):
    with ThreadPoolExecutor(max_workers = 5) as executor:
        agenda = { executor.submit(get, url, timeout): url for url in urls }

        for tarefa in as_completed(agenda):     
            try:
                conteudo = tarefa.result()
            except Exception as e:
                print ("Não foi possível fazer a requisição! \n{}".format(e))
            else:
                yield conteudo

The number of threads is set to max_workers , if omitted or None , the default is the number of processors on the machine. Font

Use this:

urls = [
    'https://www.python.org/',
    'https://pypi.python.org/pypi/requests',
    'http://pt.stackoverflow.com/',
]

requisicoes = requestUrls(urls) # timeout é opcional, o padrão é 5

for requisicao in requisicoes:
    codigo = requisicao.status_code
    url = requisicao.url
    conteudo = requisicao.content

    print ("{}: {}".format(codigo, url))
    
18.09.2016 / 23:36