How to create and read an XML with Python?

7

How to create and read an XML with the DOM component in Python?

How to read an XML with the cElementTree component of Python?

    
asked by anonymous 31.01.2014 / 17:46

5 answers

7

Python has two built-in ways of handling XML files: o xml.etree.ElementTree " and xml.dom.minidom . In addition, there are external libraries that can greatly simplify the work of dealing with XML, such as BeautifulSoup , the pyquery and xmltodict (in addition to native implementations with compatible API, such as lxml ). That is, the option is not lacking, the question is which one fits your needs better.

ElementTree

According to the documentation, ElementTree is "recommended for those who do not have previous experience working with the DOM". It represents an XML file and its elements in Python objects with its own API, and allows you to modify them and convert them back to XML format. It also supports a subset of XPath - which you can use in queries.

Note: The cElementTree that you mentioned in the question is simply a C implementation of the ElementTree API (that is, after installed, usage is the same).

Pros: simple and "pythonic" to use, XPath support. Cons: none. Example:

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('country_data.xml')
>>> root = tree.getroot()

>>> [(x.tag, x.attrib) for x in root] # Lista os elementos filhos: nome e atributos
[('country', {'name':'Liechtenstein'}), (...), (...)]

>>> root[0][8].text # Acessa um sub-elemento por índice, obtém seu texto
'2008'

>>> [x.attrib for x in root.iter('neighbor')] # Lista elementos descendentes: atributos
[{'name': 'Austria', 'direction': 'E'}, {...}, {...}, ...]

>>> atualizar = root.iter['rank'][0]
>>> atualizar.text = "1"
>>> atualizar.set('updated', 'yes')
>>> root.write('output.xml')

>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> c = ET.SubElement(a, 'c')
>>> d = ET.SubElement(c, 'd')
>>> ET.dump(a)
"<a><b /><c><d /></c></a>"

minidom

Minimum DOM implementation, with API similar to other languages, such as JavaScript. For those who are already familiar with DOM manipulation in pure JavaScript (i.e. without external libraries), and want to manipulate XML in Python using similar code.

Pros: JavaScript-like API. Cons: quite verbose. Example: See the @utluiz response .

lxml

Binding "python" for the C libraries libxml2 and libxslt . Efficient and complete feature rich ), and with a simple and compatible API with ElementTree .

Pros: performance. Cons: none. Example: Mostly identical to ElementTree (only changes import xml... to import lxml... ).

BeautifulSoup

Its main use is to interpret / manipulate HTML, but it supports XML as well. Its main feature is to be quite robust when your input files are not necessarily well formatted.

Pros: robustness. Cons: A little more verbose when modifying / creating. Example:

>>> from bs4 import BeautifulSoup
>>> root = BeautifulSoup(open('country_data.xml'))

>>> [(x.name, x.attrs) for x in root.children] # Lista os elementos filhos: nome e atributos
[('country', {'name':'Liechtenstein'}), (...), (...)]

>>> root.contents[0].contents[7].string # Acessa um sub-elemento por índice, obtém seu texto
'2008'

>>> [x.attrs for x in root.find_all('neighbor')] # Lista elementos descendentes: atributos
[{'name': 'Austria', 'direction': 'E'}, {...}, {...}, ...]

>>> atualizar = root.rank # "atalho" para root.find_all('rank')[0]
>>> atualizar.string = "1"
>>> with open('output.xml') as f:
...     f.write(unicode(root))

>>> soup = BeautifulSoup
>>> a = soup("<a />")
>>> a.append(soup.new_tag("b"))
>>> c = soup.new_tag("c")
>>> a.append(c)
>>> c.append(soup.new_tag("d"))
>>> str(soup)
"<a><b /><c><d /></c></a>"

pyquery

Library that attempts to mimic the jQuery in a Python environment. For those who are already familiar with the use of this framework and want to manipulate XML in Python using a similar code. It depends on lxml .

Pros: jQuery !!! Cons: weak documentation [for uncovered cases, where fallback gets pro lxml]. Example:

>>> from pyquery import PyQuery as pq
>>> root = pq(filename='country_data.xml')

>>> root.children().map(lambda x: (x.tag, x.attrib)) # Lista os elementos filhos: nome e atributos
[('country', {'name':'Liechtenstein'}), (...), (...)]

>>> root.children(":eq(0)").children(":eq(7)").text() # Acessa um sub-elemento por índice, obtém seu texto
'2008'

>>> root.find('neighbor').map(lambda x: x.attrib) # Lista elementos descendentes: atributos

>>> atualizar = root.find('rank:eq(0)').text('1')
>>> with open('output.xml') as f:
...     f.write(unicode(root))

>>> print pq('a')\
...   .append('b')\
...   .append(pq('c').append('d'))
"<a><b /><c><d /></c></a>"

xmltodict

Converts an XML file to a single dict , which can be accessed and manipulated simply through the keys and values. It can also be converted back into XML. Supports namespaces, through an extra parameter when parsing .

Pros: Super simple and homogeneous API in your operations. Cons: weak documentation. Example: See @avelino's answer .

    
01.02.2014 / 10:08
5

You can use the xml.dom.minidom library.

I made the following implementation in Python 3.3 to read an XML:

from xml.dom import minidom

xml ="""<raiz>
    <itens>
        <item name="item1">Item 1</item>
        <item name="item2">Item 2</item>
        <item name="item3">Item 3</item>
    </itens>
</raiz> 
"""

#ler do arquivo
#xmldoc = minidom.parse('itens.xml')

#ler da string
xmldoc = minidom.parseString(xml)

itemlist = xmldoc.getElementsByTagName('item') 
print('Quantidade de itens:', len(itemlist))
for s in itemlist:
    print(s.attributes['name'].value, ' =', s.firstChild.nodeValue)

And to create an XML:

#cria documento
doc = minidom.Document()

#cria raiz e adicionar no documento
raiz = doc.createElement('raiz')
doc.appendChild(doc.createElement('raiz'))

#cria itens e adiciona na raiz
itens = doc.createElement('itens')
raiz.appendChild(itens)

#cria itens e textos
for i in range(3):
    item = doc.createElement('item')
    item.setAttribute('name', 'item' + str(i+1))
    itens.appendChild(item)
    item.appendChild( doc.createTextNode('Item ' + str(i + 1)))

#xmldoc = minidom.Document()
print(raiz.toprettyxml())

Just note that the minidom documentation advises you not to use it in case of XML processing from untrusted sources, due to some vulnerabilities .

Regarding the cElementTree , I do not have it installed for testing, but the usage seems pretty straightforward as in the documentation example:

import cElementTree

for event, elem in cElementTree.iterparse(file):
    if elem.tag == "record":
        ... process record element ...
        elem.clear()

Basically:

  • cElementTree.iterparse(file) reads the file
  • the loop is invoked for each tag event
  • if tests to see if the event was caused by a particular tag, allowing it to be processed as needed.

There are several examples here .

    
31.01.2014 / 19:32
2

Some time ago I developed for PyCourses a video where I explain how to work with the main writing libraries / reading XML in Python.

    
31.01.2014 / 19:40
2

There are several ways to read XML with Python, one of the simplest forms is XMLtoDict , it converts the XML structure to a dict (Python dictionary):

link

See an example:

'''python
>>> doc = xmltodict.parse("""
... <mydocument has="an attribute">
...   <and>
...     <many>elements</many>
...     <many>more elements</many>
...   </and>
...   <plus a="complex">
...     element as well
...   </plus>
... </mydocument>
... """)
>>>
>>> doc['mydocument']['@has']
u'an attribute'
>>> doc['mydocument']['and']['many']
[u'elements', u'more elements']
>>> doc['mydocument']['plus']['@a']
u'complex'
>>> doc['mydocument']['plus']['#text']
u'element as well'
    
01.02.2014 / 06:23
-1

I recommend using BeautifulSoup . Very simple to use and very fast with very large files.

    
31.01.2014 / 17:54