How to remove tags in a text in Python?

9

In PHP we have a function called strip_tags that removes HTML tags from a given text.

Example:

$text = "meu nome é <strong>Wallace</strong>";

strip_tags($text); // 'meu nome é Wallace'

How can I remove Python text tags?

    
asked by anonymous 23.03.2017 / 17:30

2 answers

10

An example with would look like this:

import re

text = 'meu nome é <strong>Wallace</strong>'
text = re.sub('<[^>]+?>', '', text)
print(text)

The function re.sub() receives the first parameter as a regular expression and searches the content, defined by third parameter, snippets that match the expression, replacing them with the content defined in the second parameter.

    
23.03.2017 / 17:39
10

There are several ways, but I do not think there's any better way to do this than BeautifulSoup :

>>> from bs4 import BeautifulSoup as bs
>>> bs('<p>hey<span> brrh </span>lolol', 'html.parser').text
'hey brrh lolol'
  

Note: To install in Python 3.5 use pip :

pip install --upgrade beautifulsoup4

In-depth reading about BeautifulSoup

    
23.03.2017 / 17:35