How to get the headlines for the Olympics on CNN's website using Python using BeautifulSoup?

Question

How to get the headlines for the Olympics on CNN's website using Python using BeautifulSoup?

Navigation

#1 by (3 votes)

1

I would like an example of how to get the headlines for the Olympics at link

Using BeautifulSoup.

python web-scraping

asked by anonymous 07.08.2016 / 16:07

1 answer

How do I add an event to the TrayIcon notification balloon? How do I visualize the person in charge with the dependents

score 3 · Accepted Answer

The question is how to look at the returned html get request and identify what you want, in this case we want all <span> that have class cd__headline-text , I assume that with 'headlines' refers to this. You can do this:

from bs4 import BeautifulSoup as bs4
import requests as r

req = r.get('http://edition.cnn.com/sport/olympics')
soup = bs4(req.text, 'html.parser') # req.text = html retornado
manchetes_html = soup.findAll('span', {'class': 'cd__headline-text'}) # aqui vamos procurar no html por aquilo que eu disse acima, e teremos uma lista de todos os eles que correspondam a procura
manchetes = '' # nossa futura string the manchetes
for manchete in manchetes_html:
    manchetes += '{}\n'.format(manchete.text)
print(manchetes)

DEMONSTRATION