How do I remove the comment tag along with its contents with bs4?
<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->
</div>
How do I remove the comment tag along with its contents with bs4?
<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e 9 meses
Tempo de Vida: 15 anos
-->
</div>
Based on the answers to the question Beautifulsoup 4: Remove comment tag and its content , you can use the extract
to remove an item from the tree. To know if the item is a comment, just check if it is an instance of bs4.Comment
.
from bs4 import BeautifulSoup, Comment
html = """<div class="foo">
A Arara é um animal voador.
<!--
<p>Animais
Nome: Arara
Idade: 12 anos e nove meses
Tempo de Vida: 15 anos
-->
</div>"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', class_='foo')
for element in div(text=lambda it: isinstance(it, Comment)):
element.extract()
print(soup.prettify())
The output will be:
<div class="foo">
A Arara é um animal voador.
</div>
I found a simplified solution based on the answer to the question How to find all comments with Beautiful Soup
First you import BeautifulSoup with the necessary methods.
from bs4 import BeautifulSoup, Comment
Second, use the code below to extract the comments
for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
comments.extract()
If you only want the content of div foo
:
div = soup.find('div', class_='foo')
print div.text
Result
The Macaw is a flying animal.