Converting a text file in XML with Python

1

Imagine that I have the following command:

echo -e "'date'\n\n'free'\n\n'vmstat'\n" >> free_vmstat_output.txt

The generated file (the output of the above command) would be (in TEXT mode):

Fri Mar 31 22:19:55 -03 2017

             total       used       free     shared    buffers     cached
Mem:      16387080    7085084    9301996     386628     147468    3340724
-/+ buffers/cache:    3596892   12790188
Swap:     13670396          0   13670396

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 9302192 147468 3340724    0    0    27    93  173   89  5  1 93  1  0

How to convert it to XML, ie need to read the file free_vmstat_output.txt, what is the output of the command and turn it into XML?

Fri Mar 31 22:19:55 -03 2017 -> seria uma tag data

Each element below would be a tag and would enter its values:

total       used       free     shared    buffers     cached

I just started the following code snippet:

#coding:utf-8

from xml.dom.minidom import Document

doc = Document()
root = doc.createElement('InfoMemoria')
    
asked by anonymous 01.04.2017 / 08:54

1 answer

2

There is no magic there - is a complex data structure, which you want to transform into another complex structure -

You have to read the source file, line by line, separate the information elements of each line using the Python string tools ( linha.strip().split("") should give you a list with each field of headers or data.)

And for each value you want, you have to create a new node, in the right place, all this in order to have an XML in which the information related to vms is something like: <procs><r>0</r><b><0></procs><memory><swpd>0</swpd> <free>9302192</free>...</memory>

Some things would be more annoying than creating an xml like this, among which, CONSUME an xml like that.

So - let's go back a few steps back: first: if you're going to sweat Python to handle system-related data, why use a complex command line to paste as text file the output of individual commands that have the information you want?

Do everything in Python - including the call to external commands free and vmstat - For timestam, you obviously do not need to call an external process, just use Python datetime

Second: Think about the program that will consume this information, and generate a structured data file that can be: easier to read on the other side, easier to generate with your Python program, more readable when looked, and significantly lower! XML loses in all these queries for almost any thing - even it is more verbose and less legible than the collage as original text file. For a structured and easily readable format, you can think of a JSON file, for example instead of XML:

{"timestamp: ...
 "free": {"mem": {'total': ..., 'used': ...,  ...},
          "swap": {"total", ..., "used": ...}},
 "vmstat": ...
}

Or, if you want to generate several of these to monitor how the machine will be used at some point in time, it will make much more sense to put everything into a database instead of generating XML or JSON packages. (ok if the XML or JSON can be to be passed to a remote process that does this).

Well, anyway, you create a distinct function in the Python program to run the external command, parse the data, and return to the structured information - as a Python dictionary is legal - even if you choose to use the XML - the dictionary will be a good intermediate data structure.

Example of a function that calls "free" and parse the data into a dictionary structure:

import subprocess

def get_free():
    data = subprocess.check_output("free").decode("utf-8").split("\n")
    memory = data[1].split()
    swap = data[2].split()
    result = {'memory': {'total': int(memory[1]), used: int(memory[2])},
              'swap': {'total': int(swap[1]), used: int(swap[2])}}
    return result

You can create a similar function to read vmstat data, combine dictionaries with a timestamp, and use direct "json.dump" to have all data readable by machine and ready to be transmitted --- or , create another function that creates your XML based on the already structured data.

    
02.04.2017 / 18:33