How to extract specific data from a text file with Python?

2

I have a * text file of 49633 lines (file.txt) in the following format:

 -e  Tue Mar 28 20:17:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     126484     113472       4904      10292      52280
-/+ buffers/cache:      63912     176044
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 113460  10292  52308    0    0  1706    67  532  828 15 10 74  1  0

-e  Tue Mar 28 20:18:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132808     107148       4904      10796      54872
-/+ buffers/cache:      67140     172816
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107656  10796  54872    0    0   654    29  219  353  6  4 90  0  0

-e  Tue Mar 28 20:19:01 -03 2017 

              total       used       free     shared    buffers     cached
Mem:        239956     132136     107820       4904      10824      54892
-/+ buffers/cache:      66420     173536
Swap:       496636          0     496636 

 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 107776  10824  54892    0    0   400    19  147  243  3  2 94  0  0

I would like to extract the id value of the CPU field given a time interval. For example:

inicio=Mar 28 20:17:01

fim = Mar 28 20:19:01

print:

data                  id           
Mar 28 20:17:01,      74
Mar 28 20:18:01,      90
Mar 28 20:19:01,      94

I'm trying but could not write any line of code besides:

#!/usr/bin/env python


F = open(“arquivo.txt”,”r”) 

Could someone help?

    
asked by anonymous 05.08.2017 / 22:55

1 answer

2

I'm still learning Python, but here's what you need:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
arq = open('arquivo.txt', 'r')
texto = arq.readlines()
x = 0
Saida = ""
for linha in texto:
    Array = linha.split()
    if x != 1:
      print("data                  id")
      x+=1
    if (len(Array) == 7 and Array[0] == '-e'):
      Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
    if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
      Saida += ',      ' + Array[14] + "\n"
print(Saida)
arq.close()

If you want to do tests, here link

Update: I made an improvement in the code, in your question you said that the file has about 49633 rows so I suppose you have multiple dates too, so I created a function to return the result between date range.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from datetime import *
# dia   mes   ano   horario
#  28    03   2017  20:17:01
#  %d    %m   %Y    %X
def retornaResultado(DataInicio, DataFinal):
  arq = open('arquivo.txt', 'r')
  Inicio = int(datetime.strptime(DataInicio, '%d %m %Y %X').timestamp())
  Final  = int(datetime.strptime(DataFinal,  '%d %m %Y %X').timestamp())
  texto = arq.readlines()
  x = 0
  Saida = ""
  DataArquivo = ""
  for linha in texto:
      Array = linha.split()
      if x != 1:
        print("data                  id")
        x+=1
      if (len(Array) == 7 and Array[0] == '-e'):
        DataArquivo = int(datetime.strptime(Array[3] + ' ' + Array[2] + ' ' + Array[6] + ' ' + Array[4],  '%d %b %Y %X').timestamp())
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += Array[2] + ' ' + Array[3] + ' ' + Array[4]
      if (len(Array) > 7 and Array[14] in Array and Array[14] != 'id'):
        if DataArquivo >= Inicio and DataArquivo <= Final:
          Saida += ',      ' + Array[14] + "\n"
  print(Saida)
  arq.close()

# Exemplo de uso
retornaResultado('28 03 2017 20:17:01', '28 03 2017 20:17:01')

How to use: run the function: returnsResult , passing two dates in the format: DAY MONTH YEAR

When you use function you do not need to provide the exact time, you can for example put 00:00:00 so it returns the results starting with 00:00:00 until 23:00: 59:59.

    
06.08.2017 / 00:32