Move to url with an increment in address

Question

Move to url with an increment in address

Navigation

0

I need to do a crawler on a book site. I can get the data I need from a page and I also got the entire domain, but I wanted to do it in a more orderly and logical way.

I'd like to start by extracting for a URL and letting the code go through this url increment.

class QuotesSpider(CrawlSpider):
name = "adororomance2"
start_urls = [
        'http://www.adororomances.com.br/arromances.php?cod=1',
        ]

So after collecting the data, I'd like it to go to the url that is equal to start_urls but the end is '.php? cod = 2' and then callback to extract the data from the new url and continue like this until on a page he did not find the book title and then stopped.

What I've tried so far and it did not work:

def parse(self, response):
        for livro in response.xpath('//*[@id="page_livro_coluna"]'):
            yield {
                    'titulo':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[1]/h1/text()').extract_first(),
                    'autor(a)':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[2]/span/a/span/h2/text()').extract_first(),
                    'titulo original':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[3]/text()').extract_first(),
                    'coleção':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[4]/h3/a/text()').extract_first(),
                    'publicação':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[4]/div[1]/span[1]/text()').extract_first(),
                    'ano':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[4]/div[1]/span[2]/text()').extract_first(),
                    'série':
                    livro.xpath(
                        '//*[@id="page_livro_coluna"]/div[4]/div[2]/a/span/text()').extract_first(),
                    'descrição':
                    livro.xpath(
                        'normalize-space(//*[@id="description"]/text())')
                      .extract_first(),

                      }

        i = 2
        next_page = '''
                http://www.adororomances.com.br/arromances.php?cod=
                ''' +  %i  
        if titulo is not '':
            i = i + 1
            yield response.follow(next_page, callback=self.parse)

python-3.x scrapy

asked by anonymous 29.06.2018 / 01:08

0 answers

Wordpress - Export Zip Upload Send a data to the bank with node js