Questions tagged as 'web-scraping'

1
answer

Extracting data from a calendar with Python and Beautifulsoup (under Linux Ubuntu-like)

Friends, I would like to get data in a calendar: link The first step would be to have the program choose time zone (-3: 00 Buenos Aires) and click Submit Time Zone. After clicking on Submit Time Zone, select the city (Rio de Janeir...
asked by 22.02.2017 / 12:27
1
answer

How to handle errors during web scraping?

Hello everyone. During the Web Scraping process, I began to encounter some errors that occur during the requisition process. I have now identified 4 most common types of errors: Error in curl::curl_fetch_memory(url, handle = handle) :...
asked by 17.12.2017 / 17:34
1
answer

POST function of the httr package returns NA

I'm trying to make a script in R to do a POST on the site: link , but I'm not getting any success. The goal is to extract the generated data table after the data update. Everything seems to be fine, but the POST function (or even the GET) of th...
asked by 02.04.2017 / 05:36
1
answer

What is the phase in which the data should be edited?

I am currently removing data from a web site, with data in English, through web scraping. For example, if you want to translate the names or values of the fields into Portuguese, or complete abbreviations, the most appropriate approach is:...
asked by 22.01.2017 / 21:35
0
answers

Simultaneous threads (parallel processing) in R and serialized writing in SQLite

Hello everyone. I'm trying to develop a code that makes it possible to parser HTML files using the R language and, consecutively, write the extracted HTML data to the SQLite database in a serialized way. In order to perform parallel processin...
asked by 21.10.2018 / 21:22
0
answers

Scraping with R - xpathSApply returning a list of 0

I'm learning to read data in XML in R. I would like to extract the information of the Brazilian football (name of the championship, game principal, result, etc.) of this site: link with the XML package. My code looks like this: [1] fileUr...
asked by 02.11.2017 / 13:01
0
answers

Error submitting form

Good afternoon, I have a code that works for some forms on the web and I'm trying to reuse it on this site: link However, my code can not find the form, it follows the code and the answer: #library's require(RCurl) require(XML) require...
asked by 28.03.2017 / 20:57
2
answers

Organize data flow by string pattern

Friends, I'm working on a scraping project. At some point, I get a table on the screen in the form of a giant string, something like this: list = ('0004434-48.2010 \ n UNION \ n (30 business days) 07/07/2017 \ n 13/07/2017 \ n 0008767-77.2013...
asked by 17.07.2017 / 18:16
1
answer

How to get the headlines for the Olympics on CNN's website using Python using BeautifulSoup?

I would like an example of how to get the headlines for the Olympics at link Using BeautifulSoup.     
asked by 07.08.2016 / 16:07
1
answer

On large scrapes how to avoid ConnectionError?

In Python 3, I have a program to do web-scraping of tables on websites. There are 5,299 pages, on each page there is a table With XHR I found the generated JSON on each page. But always have a connection error after the program scour a few pa...
asked by 13.04.2018 / 12:02