How to make dates difference in R?

4
dataset <- structure(list(PLACA = structure(c(5L, 5L, 5L, 4L, 1L, 2L, 3L, 
7L, 6L, 8L), .Label = c("DSF9652", "EFR9618", "EQW6597", "ERB1522", 
"EWM3539", "LOC1949", "LQQ5554", "OQT5917"), class = "factor"), 
    COD_REV = c(113195L, 113196L, 113197L, 113303L, 80719L, 80720L, 
    80722L, 113318L, 80788L, 113386L), DATA = structure(1:10, .Label = c("2016-01-14 12:13:00.000", 
    "2016-01-18 18:48:00.000", "2016-01-18 19:00:00.000", "2016-01-25 11:46:00.000", 
    "2016-01-25 19:20:00.000", "2016-01-25 19:28:00.000", "2016-01-25 19:33:00.000", 
    "2016-01-25 20:56:00.000", "2016-01-26 21:28:00.000", "2016-01-27 13:50:00.000"
    ), class = "factor"), KM_ATUAL = c(52100L, 52100L, 52100L, 
    110676L, 62300L, 31144L, 165022L, 41021L, 155646L, 55030L
    ), KM_MEDIA = c(0L, 42L, 40L, 20L, 17L, 18L, 120L, 100L, 
    10L, 38L)), .Names = c("PLACA", "COD_REV", "DATA", "KM_ATUAL", 
"KM_MEDIA"), row.names = c(NA, -10L), class = "data.frame")

I have the above dataset and would like to group the boards to see how many visits the same customer did. So I need to calculate the difference between the dates and km_atual of visits, to compare with the field KM_media_dia and see the difference between these values. I can not figure out the difference between the dates. This was my attempt so far:

library(tidyverse)
# Carregando os datasets
dataset <- read_csv2("dados_atuais.csv")

dataset_revisao_km <- dataset %>%
  # selecionar apenas colunas importantes
  select(CPF, PLACA, COD_REV, DATA, KM_ATUAL) %>%
  arrange(DATA) %>%
  group_by(PLACA) %>%
  mutate(ORDEM_REVISAO = row_number()) %>%
  # manter apenas placas com mais de uma revisao
  filter(n() > 1) %>%
  mutate(DIFERENCA_KM = KM_ATUAL - lag(KM_ATUAL)) %>%
  # filtrar fora a primeira revisao da placa
  filter(ORDEM_REVISAO > 1) 
    
asked by anonymous 12.04.2018 / 14:21

1 answer

3

The R requires data with dates to be correctly specified so that it can perform calculations that may be required. One of the best ways to do this is with the lubridate package:

library(lubridate)

dataset$DATA <- ymd_hms(dataset$DATA)

Note that I just replaced the DATA column with its equivalent in ymd_hms (YearMonthDay_HourMinuteSecond), as it was in the original dataset. From there it was only to calculate the difference in days between the equal plates, using the function difftime :

dataset %>%
  group_by(PLACA) %>%
  filter(n() > 1) %>%
  mutate(DiferencaDias=difftime(DATA, lag(DATA), units="days")) %>%
  na.omit()
# A tibble: 2 x 6
# Groups:   PLACA [1]
  PLACA   COD_REV DATA                KM_ATUAL KM_MEDIA Diferenca          
  <fct>     <int> <dttm>                 <int>    <int> <time>             
1 EWM3539  113196 2016-01-18 18:48:00    52100       42 4.27430555555556   
2 EWM3539  113197 2016-01-18 19:00:00    52100       40 0.00833333333333333  

Note that in the data set only the EWM3539 board appears more than once. As she appears 3 times, it does not make sense to speak on her first visit, since there is no difference of days. Therefore, we remove this information through na.omit .

    
12.04.2018 / 17:50