Scraping data using Robobrowser

0

I'm trying to scrape a form, to insert an attachment and send, using Robobrowser.

To open the page I do:

browser.open('url')

To get the form I do:

form = browser.get_form(id='id_form')

To insert the data into the form I do:

form['data_dia'] = '25'  # por exemplo

To submit the form I do:

browser.submit_form(form, form['btnEnviar'])

or just

browser.submit_form(form)

But this is not working, the form is not being sent. When I try to fetch all inputs from the page, I find that the send button is not coming from the Robobrowser.

Doing,

todos_inputs = browser.find_all('input')

        for t in todos_inputs:
            print(t)

I do not get the input tag with id 'btnEnviar', which in the html code of the page is inside the form. The other form inputs are coming, such as 'day', 'month' and 'year', for example.

I did not post the html code because it needs login and password to access.

The problem is that Robobrowser is not able to wipe all the information in the html, only one part, so that I can not send the form. Is there a solution to this? Or is there another way to fill out a form and submit it with other tools except RoboBrowser and BeautifulSoup?

    
asked by anonymous 12.12.2018 / 16:40

1 answer

2

Robobrowser is a module that combines requests to download pages and BeautifulSoup to parse them.

Your problem is that the button you want to click probably does not even exist on the page! It is very likely that the pages of the website you want to use, as well as many others on the internet, are made incomplete without all the elements, and only then are those elements placed on the page through code made in javascript that runs in your browser after loading.

So by inspecting the page code using your browser, javascript will already have executed and completed the elements dynamically, so you'll find the button there. Since BeautifulSoup does not execute javascript, the button that it parsed in memory when running the script does not exist.

This is very common in today's web pages, which are very dynamic. Leaving you with two options:

  • Parse the javascript code of the page and find out where it creates the button. Or analyze what the button does. You can read and follow the javascript code manually until you find a way to mimic what it does when you click that button, which parameters to pass, and so on. Then write code in python to simulate these actions. It is not an easy task but the code would be quite optimized because it would be python code without having to open a real browser, which would be the second option:

  • Use a real browser, which runs javascript. The Selenium library allows you to open and control a real browser window through your script. As the page will open in a browser, javascript will work and you can click the button. The downside is that opening a browser is cumbersome and slow, as well as loading many unnecessary elements and images into the process, so it would not be as efficient as directly accessing the source.

12.12.2018 / 16:51