Remove comment tag and its contents in Beautifulsoup 4

1

How do I remove the comment tag along with its contents with bs4?

<div class="foo">
A Arara é um animal voador.
<!-- 
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->

</div>
    
asked by anonymous 27.12.2018 / 17:56

3 answers

2

Based on the answers to the question Beautifulsoup 4: Remove comment tag and its content , you can use the extract to remove an item from the tree. To know if the item is a comment, just check if it is an instance of bs4.Comment .

from bs4 import BeautifulSoup, Comment

html = """<div class="foo">
A Arara é um animal voador.
<!-- 
<p>Animais
Nome: Arara
Idade: 12 anos e nove meses
Tempo de Vida: 15 anos
-->

</div>"""

soup = BeautifulSoup(html, 'html.parser')

div = soup.find('div', class_='foo')
for element in div(text=lambda it: isinstance(it, Comment)):
    element.extract()

print(soup.prettify())

The output will be:

<div class="foo">
 A Arara é um animal voador.
</div>
    
27.12.2018 / 18:02
1

I found a simplified solution based on the answer to the question How to find all comments with Beautiful Soup

First you import BeautifulSoup with the necessary methods.

from bs4 import BeautifulSoup, Comment

Second, use the code below to extract the comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()
    
27.12.2018 / 18:43
0

If you only want the content of div foo :

div = soup.find('div', class_='foo')
print div.text

Result

The Macaw is a flying animal.

    
27.12.2018 / 18:21