How to do Scrapping a page that has a javascript's using python?

1

I need to scrape a page, however in the page entry has a button (apparently a Javascript) that gives access to the entire content of the page itself. Using the traditional libs (urllib2, requests, BeatifulSoap) I can not "pull" the content I need, has anyone ever had anything like this?

    
asked by anonymous 09.03.2017 / 15:46

1 answer

1

I usually use selenium to do webscrapping on sites that have a lot of javascript. I usually use Selenium with Java, but in Python it works, too. Below is a code with a silly but functional example.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('/path/to/chromedriver')
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

Please note that in order to use the Chrome Driver, you must have Chromedriver, which you can download at ChromeDriver . The Selenium WebDriver documentation in Python is at Documentation .

    
09.03.2017 / 20:04