Handling dates in AWK scripts

3

CONTEXT: I have a CSV file from which I want to extract and process some columns. I realized that AWK would be the perfect tool for this and everything went well until I had to deal with timestamps - for example 2008-07-31T21: 42: 52.667

PROBLEM 1: I need to calculate the number of days that have passed between a base date (say 2008-07-31 ) and all timestamps in the first column of the input file.

DETAIL: I know I can do differences operations if I can use BASH's date command, because with the following command I get the number of seconds passed since the date system base date:

date -d"2008-07-31" +%s #RESULTADO: 1217473200s

In this way my problem can be reduced to the following:

PROBLEM 2: How to execute a bash command from within AWK?

    
asked by anonymous 07.08.2014 / 17:26

2 answers

3

OK. Here is an answer to the PROBLEM 2 , which already solved my problem in general, but maybe have some other legal solution.

I can execute a bash command in GAWK using the following construct:

STRING_COMANDO | getline RESULTADO_VAR_NOME

So I wrote the following script to get the first column of a file - which has a timestamp - and compute the difference of the base date in seconds.

#!/usr/bin/gawk -f
BEGIN {
  base_date="2008-07-31"
  #Comando para obter a quantidade de segundos da data base
  "date -d\""base_date"\" +%s" | getline first_day
  print "BASE: " base_date " -> " first_day
  #Variáveis utilizadas para evitar execuções BASH repitidas
  #Só ajudou pois sei que meu arquivo só tem datas em sequência crescente
  now_date="2008-07-31"
  now_day=first_day
}
{
  #Crio uma variável temp = [DATA,HORA]
  split($1,temp,"T")
  #Só invoco o BASH novamente se a data mudou
  if(temp[1] != now_date){
    now_date=temp[1]
    "date -d\""now_date"\" +%s" | getline now_day
  } 
  print now_date", " now_day", "now_day-first_day
}
    
07.08.2014 / 17:26
1

Regarding the arithmetic of dates I present below a hypothesis with perl

Assuming that the F file looks like this:

timestamp | legume | preço
2008-07-31T21:42:52.667 | batatas | 30
2008-08-31T21:42:52.667 | cebolas | 40

For demonstration I will add a first column with the days until Christmas

perl -MDate::Simple=date -pe '
  print /(\d{4}-\d\d-\d\d)(?=T)/ ? date("2016-12-25")-date($1) : "","|"' F

Gives:

|timestamp | legume | preço
3069|2008-07-31T21:42:52.667 | batatas | 30
3038|2008-08-31T21:42:52.667 | cebolas | 40
    
07.11.2016 / 21:37