How to collect text when it has no reference HTML class - Python Crawler

Question

How to collect text when it has no reference HTML class - Python Crawler

Navigation

#1 by (1 votes)

1

I have the following situation:

Iwanttocollect"Crawler Text" below, how do I navigate there without a class or id?

<td>Texto para crawler</td>

python-3.x web-crawler web-scraping scraping

asked by anonymous 15.04.2017 / 00:09

1 answer

Accessing a pointer pointer (c) What is the best strategy for creating a record and associating with another by means of a pivot table?

score 1 · Accepted Answer

You can use the BeautifulSoup library next to the .find_all() to extract all td tags from a site, without specifying any class , name or id , for example.

Code:

from bs4 import BeautifulSoup
import requests

url = 'https://en.wikipedia.org/wiki/Web_scraping'

html_page = requests.get(url)
html_source = html_page.text

soup = BeautifulSoup(html_source, 'html.parser')

td_tags = soup.find_all('td')

for td in td_tags:
    print(td, '\n')

Output:

<td class="mbox-image"><div style="width:52px"><img alt="Globe icon." data-file-height="290" data-file-width="350" height="40" src="//upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/48px-Ambox_globe_content.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/73px-Ambox_globe_content.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Ambox_globe_content.svg/97px-Ambox_globe_content.svg.png 2x" width="48"/></div></td> 

<td class="mbox-text"><span class="mbox-text-span">The examples and perspective in this article <b>deal primarily with the United States and do not represent a <a href="/wiki/Wikipedia:WikiProject_Countering_systemic_bias" title="Wikipedia:WikiProject Countering systemic bias">worldwide view</a> of the subject</b>. <span class="hide-when-compact">You may <a class="external text" href="//en.wikipedia.org/w/index.php?title=Web_scraping&amp;action=edit">improve this article</a>, discuss the issue on the <a href="/wiki/Talk:Web_scraping" title="Talk:Web scraping">talk page</a>, or <a href="/wiki/Wikipedia:Article_wizard" title="Wikipedia:Article wizard">create a new article</a>, as appropriate.</span> <small><i>(October 2015)</i></small> <small class="hide-when-compact"><i>(<a href="/wiki/Help:Maintenance_template_removal" title="Help:Maintenance template removal">Learn how and when to remove this template message</a>)</i></small></span></td>