Multiple pipelines to handle different spiders in Scrapy

3

How to handle pipelines.py when we have different spiders?

Example: I have a Spider that works by getting blog posts from one blog and another by saving images of jpeg banners found on each page. Both spiders work, but I use the same pipeline to persist the objects.

    
asked by anonymous 09.01.2015 / 20:54

1 answer

2

It is a common pattern in pipelines (and in spider middlewares as well) to use spider attributes to decide what to do:

class MyPipeline:
    def process_item(self, item, spider):
        if getattr(spider, 'my_pipeline_enabled', False):
            # faz a coisa aqui

In this way, although the pipeline is enabled in the entire project, you can use the my_pipeline_enabled attribute to enable the pipeline for just the spiders you want.

You can also expand this code to consider a setting, if necessary.

In Scrapy 0.25+ (not yet released, for now just by taking the Git repo), there is also the alternative of using settings in the spider that take precedence over those of the project.

    
09.01.2015 / 22:03