In PHP we have a function called strip_tags
that removes HTML tags from a given text.
Example:
$text = "meu nome é <strong>Wallace</strong>";
strip_tags($text); // 'meu nome é Wallace'
How can I remove Python text tags?
In PHP we have a function called strip_tags
that removes HTML tags from a given text.
Example:
$text = "meu nome é <strong>Wallace</strong>";
strip_tags($text); // 'meu nome é Wallace'
How can I remove Python text tags?
An example with regex would look like this:
import re
text = 'meu nome é <strong>Wallace</strong>'
text = re.sub('<[^>]+?>', '', text)
print(text)
The function re.sub()
receives the first parameter as a regular expression and searches the content, defined by third parameter, snippets that match the expression, replacing them with the content defined in the second parameter.
There are several ways, but I do not think there's any better way to do this than BeautifulSoup :
>>> from bs4 import BeautifulSoup as bs
>>> bs('<p>hey<span> brrh </span>lolol', 'html.parser').text
'hey brrh lolol'
Note: To install in Python 3.5 use
pip
:pip install --upgrade beautifulsoup4
In-depth reading about BeautifulSoup