Requests, BeautifulSoup Tables

Question

Requests, BeautifulSoup Tables

Navigation

#1 by (2 votes)

0

I have a website that I want to extract specific data from a table

Iwanttoextractalltheinformationthathas"PROLONG" element.

My difficulty is that all tables have the same name in the "class" class="field". how do I extract the data related to the element "PROLONG" code I've done:

import requests
from bs4 import BeautifulSoup

url = requests.get('http://www.praticagem-rj.com.br/')

soup = BeautifulSoup(url.text, "lxml")

list_return = soup.find_all('td', class_='field')

for list_dados in list_return:
    print(list_dados.next_element)

html python-3.x python-requests beautifulsoup

asked by anonymous 10.05.2018 / 01:21

1 answer

Subselect without condition WHERE Digitally sign documents using the Citizen Card Digital Mobile Key

score 2 · Answer 1

The code on this site is a mess. What's happening in your code is that you're looking column by column across all rows in the table. You may notice that if you read different items from the list_return list they may or may not belong to the same row as the table.

What you need then, is to take just the interesting lines, and within them look for the ones that contain "PROLONG"

import requests
from bs4 import BeautifulSoup

url = requests.get('http://www.praticagem-rj.com.br/')

soup = BeautifulSoup(url.text, "lxml")
list_return = soup.find_all('tr',style="background: #e4e4e4;")
for tr in list_return:
temp = tr.find('td', class_='field', text='PROLONG')
if temp != None:
    print(tr)
    print("-----")

From the 'soup' object, I have separated all rows belonging to the table into 'list_return'. Then iterate over each line looking for whether it has a column with the text = 'PROLONG'. Lines that do not have this column return None in the search, the ones that have are called in print.

PS:ThekeytolookingforthingswithBeautifulSoupistoknowwhattolookforusingthe'inspectobject'functioninthebrowser.ThisishowIdiscoveredhowtoisolatetablerowsusingstyle="background: # e4e4e4;"