Reading an XML file and printing specific fields using the Python language

0

I have the following XML file (actually it's just a piece of the file):

<!DOCTYPE sysstat PUBLIC "DTD v2.19 sysstat //EN"
        "http://pagesperso-orange.fr/sebastien.godard/sysstat-2.19.dtd">
        <sysstat>
            <sysdata-version>2.19</sysdata-version>
            <host nodename="ServerLabDoS">
                <sysname>Linux</sysname>
                <release>3.16.0-4-686-pae</release>
                <machine>i686</machine>
                <number-of-cpus>1</number-of-cpus>
                <file-date>2017-04-10</file-date>
                <file-utc-time>10:39:04</file-utc-time>
                <statistics>
                    <timestamp date="2017-04-10" time="07:50:12" utc="0" interval="119">
                        <memory per="second" unit="kB">
                            <memfree>1140168</memfree>
                            <memused>131440</memused>
                            <memused-percent>10.34</memused-percent>
                            <buffers>10928</buffers>
                            <cached>51716</cached>
                            <commit>510544</commit>
                            <commit-percent>28.87</commit-percent>
                            <active>56880</active>
                            <inactive>29832</inactive>
                            <dirty>44</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="12.58" txpck="11.50" rxkB="11.95" txkB="0.85" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
                    <timestamp date="2017-04-10" time="07:52:01" utc="0" interval="107">
                        <memory per="second" unit="kB">
                            <memfree>1140444</memfree>
                            <memused>131164</memused>
                            <memused-percent>10.31</memused-percent>
                            <buffers>11288</buffers>
                            <cached>51932</cached>
                            <commit>509260</commit>
                            <commit-percent>28.80</commit-percent>
                            <active>57024</active>
                            <inactive>29840</inactive>
                            <dirty>28</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="13.89" txpck="12.69" rxkB="13.71" txkB="0.93" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
                    <timestamp date="2017-04-10" time="07:54:01" utc="0" interval="119">
                        <memory per="second" unit="kB">
                            <memfree>1139716</memfree>
                            <memused>131892</memused>
                            <memused-percent>10.37</memused-percent>
                            <buffers>11664</buffers>
                            <cached>52192</cached>
                            <commit>509148</commit>
                            <commit-percent>28.79</commit-percent>
                            <active>57384</active>
                            <inactive>29948</inactive>
                            <dirty>76</dirty>
                        </memory>
                        <network per="second">
                            <net-dev iface="lo" rxpck="0.00" txpck="0.00" rxkB="0.00" txkB="0.00" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                            <net-dev iface="eth0" rxpck="13.35" txpck="12.40" rxkB="13.68" txkB="0.91" rxcmp="0.00" txcmp="0.00" rxmcst="0.00" ifutil-percent="0.00"/>
                        </network>
                    </timestamp>
</statistics>
        </host>
    </sysstat>

My goal is to give a timestamp interval, for example, between date="2017-04-10" time="07:50:12" and date="2017-04-10" time="07:52:01" , print memused and rxpck using Python.

I started the code:

from xml.dom import minidom

doc = minidom.parse("arq.xml")

# doc.getElementsByTagName returns NodeList
timestamp = doc.getElementsByTagName("timestamp")[0]
print(timestamp.firstChild.data)

But I do not go out of it. Could someone help?

Let's suppose that XML had a day with several different schedules. What I wanted was to print these values for all the times contained in the XML file.

Example of an XML like this: link

    
asked by anonymous 02.08.2017 / 02:01

1 answer

2

First you need to set the date thresholds that your script has to work with. For this you should use the datetime library:

from datetime import datetime

begin = datetime(2017, 4, 10, 7, 50, 12)
end = datetime(2017, 4, 10, 7, 52, 1)

Then you have to iterate over all the tags timestamp , get their date and time, and filter the ones that are not in the defined range . Get the date and time attributes with the getAttribute() method and interpret strings with datetime.strptime() :

for timestamp in doc.getElementsByTagName('timestamp'):
    date = timestamp.getAttribute('date')
    time = timestamp.getAttribute('time')
    dt = datetime.strptime('%s %s' % (date, time), '%Y-%m-%d %H:%M:%S')
    if dt < begin or dt >= end:
        continue
Now, just get the tag memused , iterate over all network interfaces (% tags) and get the desired attributes ( net-dev and maybe rxpck ):

memused = timestamp.getElementsByTagName('memused')[0].firstChild.data
for netdev in timestamp.getElementsByTagName('net-dev'):
    iface = netdev.getAttribute('iface')
    rxpck = netdev.getAttribute('rxpck')
    print 'date:%s time:%s memused:%s iface:%s rxpck:%s' % (date, time, memused, iface, rxpck)

Here is the full code for easy testing:

#!/usr/bin/env python

from xml.dom import minidom
from datetime import datetime

doc = minidom.parse('arq.xml')

begin = datetime(2017, 4, 10, 7, 50, 12)
end = datetime(2017, 4, 10, 7, 52, 1)

for timestamp in doc.getElementsByTagName('timestamp'):
    date = timestamp.getAttribute('date')
    time = timestamp.getAttribute('time')
    dt = datetime.strptime('%s %s' % (date, time), '%Y-%m-%d %H:%M:%S')
    if dt < begin or dt >= end:
        continue
    memused = timestamp.getElementsByTagName('memused')[0].firstChild.data
    for netdev in timestamp.getElementsByTagName('net-dev'):
        iface = netdev.getAttribute('iface')
        rxpck = netdev.getAttribute('rxpck')
        print 'date:%s time:%s memused:%s iface:%s rxpck:%s' % (date, time, memused, iface, rxpck)

For more information about manipulating XML files in Python I recommend reading the % / a> (in English only, unfortunately) or the answers to this question .

    
04.08.2017 / 02:38