badly formatted XML

1
I'm trying to read some XML files with the Element Tree of Python, but one of them, when I go to do the parser gives me this error:
  

xml.etree.ElementTree.ParseError: not well-formed (invalid token)

This is the line that gives the error:

xml = ET.parse('./dados_apis/gamesdb/xml/infos_games/31758.xml')

This is an XML file: link

In Python, I'm reading from the disk because it's already saved.

Does anyone know how I can solve this problem? Apparently it's some special character, but in the XML opening UTF-8 encoding is declared there.

    
asked by anonymous 06.06.2018 / 13:50

1 answer

0

The XML of the link is perfectly valid!

However, note Tag <overview> , where there is a text in which a type of apostrophe appears after the word Drake , see:

<Overview>Uncharted: The Nathan Drake Collection combines the three
PlayStation 3 blockbuster Nathan Drake adventures in one package.
Included are the single-player campaigns for Uncharted: Drake’s Fortune,
Uncharted 2: Among Thieves, and Uncharted 3: Drake’s Deception. Thanks to the
power of PlayStation 4 hardware, all three games have been upgraded to run at
1080p and 60fps with better lighting, textures, and models. Also added are a
range of improvements and additions including Photo Mode and new trophies
</Overview>

This caratere is a Right Single Quotation Mark and may be in a different encoding than UTF-8 , causing this error.

Another possibility is that by copying XML content online to a local file, the file encoding was affected, causing the same error.

    
06.06.2018 / 16:27