R generate time series with start and end dates for service execution periods

0

I'm starting to study time series analysis. I have some data sets that are composed as follows: each line consists of serving a customer, having the start date (lower date 2012-01-01), the end date (higher date 2017-05-01) of the service, and the neighborhood where the activity took place.

> mse_df
# A tibble: 484 × 3
       mse_in    mse_fim           Bairro
       <date>     <date>           <fctr>
1  2015-11-03 2016-08-11         Pachecos
2  2013-03-18 2014-10-02       Bela Vista
3  2012-08-08 2015-09-24          Brejaru
4  2014-02-24 2014-12-17            Madri
5  2015-03-30 2015-04-29  Jardim Eldorado
6  2012-07-30 2013-09-19          Brejaru
7  2016-05-24 2017-05-19      Frei Damiao
8  2012-08-13 2015-02-09 Ponte do Imaruim
9  2012-08-08 2014-07-23 Ponte do Imaruim
10 2012-07-30 2012-10-10     Caminho Novo
# ... with 474 more rows

I would like to do the time series analysis of the service, with the data divided into 12 periods per year, and a second time series analysis, also in 12 periods per year, but this time per location but I do not I know how to generate the time-series data frames to start the analysis.

    
asked by anonymous 23.05.2017 / 22:59

1 answer

0

Replying here, since I have no reputation to comment ^ _ ^.

I do not quite understand what you mean, but here's how I would do to create a date frame with the time series of input, end, and order balances per day:

library(tidyverse)
library(magrittr)
library(lubridate)


df <- tribble(
~mse_in,    ~mse_fim,         ~Bairro,
'2015-11-03', '2016-08-11', 'Pachecos',
'2013-03-18', '2014-10-02', 'Bela Vista',
'2012-08-08', '2015-09-24', 'Brejaru',
'2014-02-24', '2014-12-17', 'Madri',
'2015-03-30', '2015-04-29', 'Jardim Eldorado',
'2012-07-30', '2013-09-19', 'Brejaru',
'2016-05-24', '2017-05-19', 'Frei Damiao',
'2012-08-13', '2015-02-09', 'Ponte do Imaruim',
'2012-08-08', '2014-07-23', 'Ponte do Imaruim',
'2012-07-30', '2012-10-10', 'Caminho Novo')

df %<>% 
  mutate(mse_in = lubridate::ymd(mse_in),
         mse_fim = lubridate::ymd(mse_fim))

número_pedidos_in <- 
  df %>% 
  group_by(mse_in) %>% 
  summarise(.in = n())

número_pedidos_out <- 
  df %>% 
  group_by(mse_fim) %>% 
  summarise(.out = n())

dias_com_movimento <- número_pedidos_in %>% 
  full_join(número_pedidos_out, by = c('mse_in' = 'mse_fim')) %>%
  rename(data = mse_in) %>% 
  mutate_if(is.numeric, funs(if_else(is.na(.), 0L, .))) %>% 
  mutate(saldo = .in - .out)

range_dias <- range(dias_com_movimento$data)

full_df <-
  tibble(data = seq(as.Date(range_dias[[1]]),
                    range_dias[[2]], 'days')) %>%
  left_join(dias_com_movimento) %>%
  replace_na(list(.in = 0, .out = 0, saldo = 0))

The last data frame is as follows:

# A tibble: 1,755 × 4
data   .in  .out saldo
<date> <dbl> <dbl> <dbl>
1  2012-07-30     2     0     2
2  2012-07-31     0     0     0
3  2012-08-01     0     0     0
4  2012-08-02     0     0     0
5  2012-08-03     0     0     0
6  2012-08-04     0     0     0
7  2012-08-05     0     0     0
8  2012-08-06     0     0     0
9  2012-08-07     0     0     0
10 2012-08-08     2     0     2
# ... with 1,745 more rows

Is this what you're going after?

    
24.05.2017 / 02:33