Questions tagged as 'web-crawler'

1
answer

In which programming language does a crawler / scrapper sweep the DOM faster?

I've developed a script in which I use PHP's DOMDocument class to make a crawler on a third-party site. The script speed does not meet the expected goal, I would like to know in which programming language a script for the same purpose...
asked by 23.11.2017 / 17:55
2
answers

Developing a WebCrawler in Python [closed]

Is there any open source webcrawler project, developed in Python, for study? I've been studying / researching for some time, but I do not find anything ready about it. My goal is to study to create an open source with the following Features:...
asked by 03.11.2015 / 06:23
1
answer

Problems with parameter restrict_xpaths in a crawler

I have no Python experience but I decided to try doing anything with Scrapy for testing. So I'm trying to collect the existing articles on a given page, namely a DIV element with a devBody ID. In this sense, my goal is to get the title of t...
asked by 10.03.2016 / 20:15
1
answer

Redirecting page indexing

I have a certain site that contains some pages, but some of these pages are not being indexed by Google. However, pages that Google does not index can not be accessed unless a certain option is chosen before accessing them. For example:...
asked by 30.10.2018 / 20:46
1
answer

How to collect text when it has no reference HTML class - Python Crawler

I have the following situation: Iwanttocollect"Crawler Text" below, how do I navigate there without a class or id? <td>Texto para crawler</td>     
asked by 15.04.2017 / 00:09
1
answer

How to do Scrapping a page that has a javascript's using python?

I need to scrape a page, however in the page entry has a button (apparently a Javascript) that gives access to the entire content of the page itself. Using the traditional libs (urllib2, requests, BeatifulSoap) I can not "pull" the content I nee...
asked by 09.03.2017 / 15:46
1
answer

PHP Crawlers for external sites API PHPcrawl

Good evening person I am new to the subject, I am trying to build a search engine for external sites (indexer) with PHP, I found an API, which makes a Crawler available, but it seems to only search for things inside only a specific site, the API...
asked by 09.12.2015 / 15:14
1
answer

Crawler for when http_status_code is different from 200

I'm doing a mini crawler in .php using a library called "PHPCrawl" to make the crawler function and the library "simple_html_dom_parser" to make html parser. The question is: simple_html_dom can not parse when http_status_code is different from...
asked by 10.02.2015 / 14:54
1
answer

index (find) + len (find) ValueError: substring not found Crawler Python

Personal I need a help for code in Python looking for internet result.    Python 3.6 The first one of Bitcoin worked, the second one with error. from urllib import request url = request.urlopen("https://dolarhoje.com/bitc...
asked by 30.12.2018 / 19:05
0
answers

Problem with crawler

I'm trying to make a simple crowler that catches the temperature, just to study, I'm using simple_html_dom to read the page, but in the file_get_html function of the link above, presents some errors, and I'm not understanding why. The codes I...
asked by 27.07.2018 / 06:57