How to get the data of a specific page on the web?

0

I want to create a script that gets data like bugs, issues and others from the following page Spring Framework , unfortunately I do not have any code to display because I do not really have a clue how to get this data.

The question is: How to get the data shown on the page and generate a json file with them? preferably using python or javascript.

    
asked by anonymous 17.09.2017 / 22:47

1 answer

2

You'd better get in touch with the site and ask for access to some APIs to get this data. Using a parser on their site can generate many requests, so they can block access to your IP. But if it has to be this way, in python you can use the BeautifulSoup library. A very simple example:

from bs4 import BeautifulSoup
import urllib.request

fp = urllib.request.urlopen(
    "https://jira.spring.io/browse/spr/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel")
html = fp.read().decode("utf8")
fp.close()

soup = BeautifulSoup(html, 'html.parser')
table = soup.find(id='fragstatussummary')
nome = table.h3
print(nome.contents[0])
for linha in table.find_all('tr'):
    name = linha.a
    count = linha.find('td', class_='cell-type-collapsed')
    if name:
        print('{}: {}'.format(name.contents[0], count.contents[0]))

- Results:

    Status Summary
    Open: 1542
    In Progress: 12
    Reopened: 49
    Resolved: 4740
    Closed: 9299
    Waiting for Feedback: 45
    Investigating: 56

    
18.09.2017 / 00:53