Questions tagged as 'scrapy'

1
answer

Extract information from lattes

Introduction Brazilian researchers have, since 1999, a website where they can post information about their academic career. This information is known as Currículos Lattes . I want to download a few thousand of these resumes and write, along...
asked by 18.04.2018 / 17:32
0
answers

How to extract information from an HTTP header with Python?

We know that in the HTTP protocol, the end of the header is indicated by " \r\n\r\n ". Example: Forsomereason,theclientmaynotsendthe" \r\n\r\n " to the server (this could be an attack, for example): Suppose I have a netw...
asked by 01.09.2017 / 17:15
2
answers

How to calculate an optimal value for the Scrapyd variable CONCURRENT_REQUESTS?

One of the settings that comes standard with Scrapyd is the number of concurrent processes (it is 16). CONCURRENT_REQUESTS = 16 What would be the best methodology to calculate an optimal value for this variable? The goal is to get the b...
asked by 09.01.2015 / 21:01
4
answers

Tool to generate XPath

Hello, I'm doing a spider to capture with XPath some web data. But the creation of xpath is a bit of work. Does anyone know of any way to train XPath? Example; I click 5 times on a link and some tool generates the xpath. Any tips are welco...
asked by 24.02.2015 / 19:07
1
answer

Error with requests with scrapy

I have a csv file with some urls that need to be accessed. http://www.icarros.com.br/Audi, Audi http://www.icarros.com.br/Fiat, Fiat http://www.icarros.com.br/Chevrolet, Chevrolet I have a spider to do all the requirments. import scrapy i...
asked by 09.09.2016 / 15:19
1
answer

Is there any way to disable the Scrapyd web interface?

Is there any way to disable the Scrapyd web interface? I would like to be monitoring the server only by the api.     
asked by 11.09.2015 / 15:09
1
answer

How to manage the running and failure to execute the Spiders?

I'm developing a module to get information about spiders running on the company system. Here is the template where we saved the start of operations and the job. I would like to validate if the jobs were done correctly and fill in the rest of the...
asked by 12.01.2015 / 17:48
1
answer

Scrapy can not select a form using xpath

Hello, I'm using scrapy to make a crawler to get questions from competitors and etc from the site gabarite.com.br, I can get the description of the question the correct alternative, but I can not pick up the alternatives whenever I run in termina...
asked by 16.06.2017 / 22:20
1
answer

Multiple pipelines to handle different spiders in Scrapy

How to handle pipelines.py when we have different spiders? Example: I have a Spider that works by getting blog posts from one blog and another by saving images of jpeg banners found on each page. Both spiders work, but I use the same pipeline...
asked by 09.01.2015 / 20:54
1
answer

Information contained in two pages Scrapy

I'm not a python programmer, but I'm trying to work with the Scrapy application. The above example is what I need, this runs in chrome extension. To explain, I need the post and all information available. In the case of the Post, the...
asked by 14.05.2016 / 00:39