Well, if you need simple priorities, one option is to use the scrapyd priorities parameter (this is not documented but is implemented here , it's basically a basic priority queue on top of Sqlite.)
To use, just pass the argument priority=NUMERO
when calling the API /schedule.json
. The default value is 0
, use a higher value for higher priority.
If you need some more complex queue schema, you may have to deploy your own solution. Or use the Scrapy Cloud from Scrapinghub < and crawl using the rows from the Hub Crawl Frontier .
[*] for full transparency: work at Scrapinghub