Questions tagged as 'web-crawler'

1
answer

Scrapy can not select a form using xpath

Hello, I'm using scrapy to make a crawler to get questions from competitors and etc from the site gabarite.com.br, I can get the description of the question the correct alternative, but I can not pick up the alternatives whenever I run in termina...
asked by 16.06.2017 / 22:20
1
answer

Multiple pipelines to handle different spiders in Scrapy

How to handle pipelines.py when we have different spiders? Example: I have a Spider that works by getting blog posts from one blog and another by saving images of jpeg banners found on each page. Both spiders work, but I use the same pipeline...
asked by 09.01.2015 / 20:54
2
answers

What HTTP methods can a crawler crawl?

A conceptual question (or not): Of HTTP methods, which ones can not be "crawled" - or interpreted - by a crawler ? POST GET PUT PATCH DELETE Can anyone with a knowledge of the subject answer us?     
asked by 23.03.2016 / 21:19
1
answer

Web Crawler (Spider) with ajax in JSF using Node.js or api JSoup in java

I have the task of creating an interface optimized for touch monitor, taking data from a website ( link ). This site gives a listing of bus lines and consults their schedules, using an auto-complete ajax. Because it is a government agency,...
asked by 06.06.2016 / 15:04
1
answer

Information contained in two pages Scrapy

I'm not a python programmer, but I'm trying to work with the Scrapy application. The above example is what I need, this runs in chrome extension. To explain, I need the post and all information available. In the case of the Post, the...
asked by 14.05.2016 / 00:39
2
answers

How to protect my Scrapyd server from unauthenticated calls?

Let's say I have the following configuration in scrapy.cfg in Scrapyd. [deploy] url = http://example.com/api/scrapyd/ username = user password = secret project = projectX In the Scrapyd documentation it cites the username and password optio...
asked by 09.01.2015 / 21:15
1
answer

Web Crawler searching for specific text on the page

Well, I'm doing a web crawler to fetch the value of a coin. I wrote the following code in python: #coding: utf-8 from urllib2 import urlopen conteudo = urlopen('http://dolarhoje.com/bitcoin').read() procurar1 = '<span class="symbol"&g...
asked by 03.11.2017 / 20:43
1
answer

Creating a CRAWLER php [closed]

I'm a layman on the subject and would like to know where I can find more information on creating a crawler to download data and images from some sites. I searched a lot but until now I did not find anything very detailed! Thank you for the an...
asked by 12.04.2016 / 15:53
1
answer

Implement queues to manage competition between spiders in Scrapyd

Is there any way for Scrapyd to create queues of spiders so that when I send many spiders (with different functions) I can privilege / limit the competition between them? Today, all the Spiders I send execute in the order set by the Scrapyd serve...
asked by 09.01.2015 / 21:08
0
answers

Problem with multithreading crawler using jsoup

Hello, I'm developing a multithreading crawler, each job (thread) deals with X sites to parse certain content with the jsoup lib. The sites are all accessible. The problem is that the final results is never the same. That is, when I run the c...
asked by 08.12.2016 / 22:26