Questions tagged as 'web-scraping'

1
answer

Web scraping on a specific url with BeautifulSoup

from bs4 import BeautifulSoup import requests import re url = 'http://www.bhaktiyogapura.com/2017/03/calendario-vaisnava-marco-de-2017/' header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' 'AppleWebKit/537...
asked by 11.03.2017 / 13:14
3
answers

Regular expression in python 3.6 for phrase extraction inteitra

I need to extract only the sentences that contain ADMINISTRATION - JUDGE OF OUTSIDE - NIGHT - SISU - GROUP B, for example. That is, I need to get only the course name, city, shift, O SISU, and the group name of the following string: string = &...
asked by 28.02.2017 / 17:33
1
answer

Problem with VBA integration and Internet Explorer

I'm trying to use VBA to collect data directly from the internet. I have seen several examples of using the InternetExplorer Object, as below: Dim IE as Object Set IE = New InternetExplorer IE.navigate "http://www.minhapagina.com.br" html =...
asked by 29.07.2015 / 00:50
1
answer

Scraping parameters from a post method, with scrapy in python!

I need to collect information from a site using spiders within ScraPy in Python, however the site is a post method and I am learning the language while developing the project. I found a template for post but I'm not able to run it correctl...
asked by 07.05.2018 / 13:34
1
answer

BeautifulSoup - Real href links

I was studying about WebScraping with Python and started using the bs4 library (BeautifulSoup). When I started to get the tags a and the href attribute, I realized that I could not access the link if in href had something li...
asked by 07.11.2017 / 08:58
1
answer

How to collect text when it has no reference HTML class - Python Crawler

I have the following situation: Iwanttocollect"Crawler Text" below, how do I navigate there without a class or id? <td>Texto para crawler</td>     
asked by 15.04.2017 / 00:09
1
answer

How to do Scrapping a page that has a javascript's using python?

I need to scrape a page, however in the page entry has a button (apparently a Javascript) that gives access to the entire content of the page itself. Using the traditional libs (urllib2, requests, BeatifulSoap) I can not "pull" the content I nee...
asked by 09.03.2017 / 15:46
3
answers

Web scraping python running javascript on CEF website [closed]

CEF changed the way it displays lottery results on its site, before I was able to get the results that all came into HTML via webscraping relatively easily using BealtfulSoup, but now those results are displayed running via javascript browser. I...
asked by 05.06.2018 / 22:50
1
answer

Web Crawler with Django view.py

I am doing a simple web crawler, using django 2.0, I want to capture only the "title" class of the news and then render "return render" to a simple html, below my view.py. At the moment I'm using the "Return HttpRensonse". how can I get the data...
asked by 31.05.2018 / 03:08
0
answers

R - Download data from the Hidroweb portal

The National Water Agency makes available on the Hidroweb portal for downloading historical series referring to data obtained by various stations monitoring. I would like to automate the download of these historical series, however ANA has...
asked by 11.03.2018 / 02:41