How to get the headlines for the Olympics on CNN's website using Python using BeautifulSoup?

1

I would like an example of how to get the headlines for the Olympics at link

Using BeautifulSoup.

    
asked by anonymous 07.08.2016 / 16:07

1 answer

3

The question is how to look at the returned html get request and identify what you want, in this case we want all <span> that have class cd__headline-text , I assume that with 'headlines' refers to this. You can do this:

from bs4 import BeautifulSoup as bs4
import requests as r

req = r.get('http://edition.cnn.com/sport/olympics')
soup = bs4(req.text, 'html.parser') # req.text = html retornado
manchetes_html = soup.findAll('span', {'class': 'cd__headline-text'}) # aqui vamos procurar no html por aquilo que eu disse acima, e teremos uma lista de todos os eles que correspondam a procura
manchetes = '' # nossa futura string the manchetes
for manchete in manchetes_html:
    manchetes += '{}\n'.format(manchete.text)
print(manchetes)

DEMONSTRATION

    
07.08.2016 / 16:32