Extracting data from a calendar with Python and Beautifulsoup (under Linux Ubuntu-like)

2

Friends,

I would like to get data in a calendar:

link

The first step would be to have the program choose time zone (-3: 00 Buenos Aires) and click Submit Time Zone.

After clicking on Submit Time Zone, select the city (Rio de Janeiro) and click Get Calendar.

Only after these steps will I have access to the calendar effectively to think about extracting the information.

I would like to take the event of the day:

For example, today is 22, so it would print:

22 Apr 2017: Ekādaśī, K, 06:09, Śatabhiṣā

+ ŚUDDHA EKĀDAŚĪ VRATA: FASTING FOR Varūthinī EKADASI

I thought about using Python and beautifulsoap but I accepted suggestions.

Question: How to make the program reach the calendar (after making the time zone and city selection automatically)?

I could not get out of:

from bs4 import BeautifulSoup
import requests

url = 'http://www.purebhakti.com/component/panjika'
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                        'AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/51.0.2704.103 Safari/537.36'}



req = requests.get(url,headers= header)

html = req.text

soup = BeautifulSoup(html,'html.parser')
    
asked by anonymous 22.02.2017 / 12:27

1 answer

2

Try this:

import requests, time
from bs4 import BeautifulSoup as bs

url_post = 'http://www.purebhakti.com/component/panjika'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
payload = {'action': 2, 'timezone': 23, 'location': 'Rio de Janeiro, Brazil        043W15 22S54     -3.00'}

req = requests.post(url_post, headers=headers, data=payload)
soup = bs(req.text, 'html.parser')
eles = soup.select('tr td')
dates = (' '.join(d.select('b')[0].text.strip().split()) for d in eles if d.has_attr('class'))
events = (' '.join(d.text.split()) for d in eles if not d.has_attr('class'))
calendar = dict(zip(dates, events))

data_hoje = time.strftime("%d %b %Y", time.gmtime())
calendar[data_hoje] = calendar.setdefault(data_hoje, 'nenhum evento para hoje')
print(calendar[data_hoje])

Output of the last print (today, Feb 22, 2017):

  

Ekādaśī, K, 05:46, Purvāṣāḍhā + ŚUDDHA EKĀDAŚĪ VRATA: FASTING FOR Vijaya EKADASI

We need to pay close attention to the HTML elements we want, in this case we want the <td> , if they have the class date is a date or the event (corresponding value)

In this case the keys of our dictionary will be the text that is inside <b> which in turn is inside a td that has the class attribute

    
22.02.2017 / 13:43