How to calculate the difference between two dates of a column and group by category to generate a new Database in software R

0

Following the example of the original database and the new database:

    
asked by anonymous 29.04.2018 / 16:13

1 answer

4

Initially I would like to point out that the ideal is to always ask questions with reproducible examples. In your case you should have provided the data.frame data that I ended up having to type ;-). To better understand how to ask a question with a reproducible example, please read this help: Creating a Minimum, Complete, and Verifiable Example

In the first part I'm simply creating a data.frame equal to what you provided in the image.

## Criando o exemplo como um data.frame
dados <- data.frame(
  Processo = c(201701, 201701, 201702, 201702, 201702, 201703, 201703, 201704, 201704, 201704),
  Grupo = c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'A', 'A'),
  Data = c('01/02/2017', '15/02/2017', '20/03/2017', '18/04/2017', '01/07/2017', '15/02/2017', '20/02/2017', '01/03/2017', NA, '05/06/2017')
)

One important thing you need to know about R is that when reading a dataset with dates the R will initially "understand" those dates as strings. You will need to convert these strings to the R date format so that you can do addition and subtraction operations with dates:

## Convertendo para data
dados$Data <- as.Date(dados$Data, format = '%d/%m/%Y')

See that I have provided a format argument that shows R how days, months and year are represented. I used the upper Y because the year is displayed with 4 digits.

Finally use dplyr to group and then calculate the difference between the highest and the lowest date. See that I used the na.rm = T option to remove NA.

## Carregando o pacote dplyr
library(dplyr)

## Agrupando e calculando a diferença entre as datas com o dplyr
dados %>%
  group_by(Processo, Grupo) %>%
  arrange(desc(Data)) %>%
  summarise(Total_Dias = max(Data, na.rm = T) - min(Data, na.rm = T))

The result is exactly the final table you posted:

# A tibble: 4 x 3
# Groups:   Processo [?]
  Processo Grupo Total_Dias
     <dbl> <fct> <time>    
1  201701. A     14        
2  201702. B     103       
3  201703. C     5         
4  201704. A     96 
    
30.04.2018 / 10:05