Decode HTML entities in a string in Python

4

I'm using Python 3 to access a web API. The response of the requests comes in the JSON pattern and my problem is that one of the strings is encoded with HTML entities (specifically accent).

For example:

"orientação-a-objetos"

Is there a parser that returns me the strings with the resolved HTML characters?

    
asked by anonymous 02.12.2014 / 19:59

2 answers

5

I found this one, for Python 3.4+:

>>> import html
>>> html.unescape('orientação-a-objetos')
'orientação-a-objetos'

In the case of Python 3 (versions prior to 3.4):

>>> import html.parser
>>> h = html.parser.HTMLParser()
>>> h.unescape('orientação-a-objetos')
'orientação-a-objetos'
    
02.12.2014 / 19:59
0

You can also use Beautifulsoup, bs4 for Py3 + or bs for Py3-, which in addition to converting HTML encoding to ascii, also allows you to work with HTML elements individually (if any in the input string).

from bs4 import BeautifulSoup
s='orientação-a-objetos'
t = BeautifulSoup(s, 'html.parser')
print(t.get_text())
    
25.02.2018 / 12:32