How to get Gmail source code using Python3

0

I am accessing the Email using this code that I have found and adapted:

import requests
from bs4 import BeautifulSoup

form_data = {'Email': '[email protected]', 'Passwd': 'senhaexemplo'}
post = "https://accounts.google.com/signin/challenge/sl/password"

def login(self):
    with requests.Session() as s:
        soup = BeautifulSoup(s.get("https://mail.google.com").text, "html.parser")
        for inp in soup.select("#gaia_loginform input[name]"):
            if inp["name"] not in form_data:
                form_data[inp["name"]] = inp["value"]
        s.post(post, form_data)
        html = s.get("https://mail.google.com/mail/u/0/#inbox").text
        print(html)

My goal is to get the Emails and printable on the screen, with subject and content, and I know how to do this using certain tags in the html ... But for that I need the source code of the site, and when I go to look at the result of print(html) does not come with any tags, everything gets compressed ... Something like this:

{\"1\":\"be_35\",\"53908043\":0},{\"1\":\"be_36\",\"53908043\":0},{\"1\":\"be_30\",\"53908043\":0},{\"1\":\"be_31\",\"53908043\":0},{\"1\":\"be_169\",\"53908043\":0},{\"1\":\"su_ltz\"},{\"1\":\"ic_sspvcd\"},{\"1\":\"bu_wdtfsm\"},{\"1\":\"be_26\",\"53908043\":0},{\"1\":\"be_29\",\"53908043\":0},{\"1\":\"be_280\",\"53908043\":0},{\"1\":\"be_281\",\"53908043\":0},{\"1\":\"30\",\"53908046\":0},{\"1\":\"31\",\"53908043\":0},{\"1\":\"32\",\"53908046\":0},{\"1\":\"33\",\"53908046\":0},{\"1\":\"be_277\",\"53908043\":0},{\"1\":\"34\",\"53908045\":\"\"},{\"1\":\"be_278\",\"53908043\":0},{\"1\":\"35\",\"53908046\":0},{\"1\":\"be_275\",\"53908043\":0},{\"1\":\"be_276\",\"53908043\":0},{\"1\":\"be_273\",\"53908043\":1},{\"1\":\"38\",\"83947487\":{}},{\"1\":\"se_192\",\"53908045\":\"en,es,pt,ja,fr\"},{\"1\":\"be_274\",\"53908043\":0},{\"1\":\"39\",\"53908046\":0}

How can I get the correct source code?

    
asked by anonymous 01.10.2018 / 02:54

1 answer

1

Not wanting to rain on your parade, but ... Sites that use AJAX do not return content in HTML, they generate content dynamically after loading using Javascript. You would have to use a radically different solution, such as PhantomJS, which effectively loads all the auxiliary files from the page and runs the Javascript code, then parses the DOM and extracts the content.

    
01.10.2018 / 04:27