Implement queues to manage competition between spiders in Scrapyd

Question

Implement queues to manage competition between spiders in Scrapyd

Navigation

#1 by (1 votes)

2

Is there any way for Scrapyd to create queues of spiders so that when I send many spiders (with different functions) I can privilege / limit the competition between them? Today, all the Spiders I send execute in the order set by the Scrapyd server.

python web-crawler scrapy

asked by anonymous 09.01.2015 / 21:08

1 answer

$ .ajax does not respond How to put 2 inputs side by side?

score 1 · Accepted Answer

Well, if you need simple priorities, one option is to use the scrapyd priorities parameter (this is not documented but is implemented here , it's basically a basic priority queue on top of Sqlite.)

To use, just pass the argument priority=NUMERO when calling the API /schedule.json . The default value is 0 , use a higher value for higher priority.

If you need some more complex queue schema, you may have to deploy your own solution. Or use the Scrapy Cloud from Scrapinghub < and crawl using the rows from the Hub Crawl Frontier .

[*] for full transparency: work at Scrapinghub