Improvement of code

Question

Improvement of code

Navigation

#1 by (5 votes)

5

Good afternoon. I have the following data structure:

structure(list(CIDADE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("A", "B"), class = "factor"), MES = c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L, 11L, 12L), CLIENTES = c(29L, 26L, 25L, 29L, 
30L, 27L, 25L, 25L, 25L, 25L, 27L, 27L, 27L, 26L, 27L, 28L, 28L, 
25L, 26L, 26L, 25L, 30L, 30L, 29L), REMANEJADOS = c(3L, 2L, 1L, 
1L, 4L, 4L, 2L, 3L, 1L, 1L, 3L, 4L, 5L, 3L, 4L, 2L, 5L, 5L, 5L, 
4L, 5L, 2L, 4L, 3L), REMANEJAMENTOS = c(4L, 3L, 4L, 3L, 5L, 1L, 
4L, 4L, 2L, 5L, 5L, 1L, 5L, 4L, 2L, 5L, 5L, 5L, 2L, 1L, 5L, 2L, 
3L, 3L), TRANSFERENCIAS = c(2L, 4L, 2L, 1L, 3L, 3L, 5L, 4L, 2L, 
3L, 4L, 5L, 1L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 1L, 1L, 2L, 4L), 
EVASOES = c(1L, 2L, 2L, 5L, 1L, 2L, 1L, 1L, 2L, 2L, 5L, 2L, 
5L, 4L, 3L, 2L, 5L, 2L, 4L, 1L, 2L, 4L, 3L, 5L), HORAS = c(80, 
80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 
80, 80, 80, 80, 80, 80, 80, 80), TURMAS = c(2L, 1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 3L)), .Names = c("CIDADE", "MES", "CLIENTES", 
"REMANEJADOS", "REMANEJAMENTOS", "TRANSFERENCIAS", "EVASOES", 
"HORAS", "TURMAS"), row.names = c(NA, -24L), class = "data.frame")

I calculated the accumulation of some variables that are necessary

dados <- dados %>% 
  group_by(CIDADE) %>% 
  mutate(CliAcu = cumsum(CLIENTES),
         RmdAcu = cumsum(REMANEJADOS),
         RmtAcu = cumsum(REMANEJAMENTOS),
         TrsAcu = cumsum(TRANSFERENCIAS),
         EvsAcu = cumsum(EVASOES),
         TurAcu = cumsum(TURMAS))

Now I need to calculate a formula that takes into account these accumulated variables ( CliAcu , RmdAcu , RmtAcu , TrsAcu , EvsAcu , TurAcu ) and the HORAS variable according to following relationship:

Formula = (CliAcu + RmdAcu - RmtAcu[mês_anterior] - TrsAcu[mês_anterior] - EvsAcu[mês_anterior])*HORAS/TurAcu

I have the following code that does what I need:

if(dim(dados)[1] != 0){
  dados$Valor[1] <- (dados$CliAcu[1]+dados$RmdAcu[1]-0-0-0)*
                         dados$HORAS[1]/dados$TurAcu[1]
  if(nrow(dados) != 1){
    for(i in 2:nrow(dados)){
      if(dados$MES[i] == 1){
        dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-0-0-0)*
                               dados$HORAS[i]/dados$TurAcu[i]
      } else{
        if(dados$MES[i] <= dados$MES[i-1]){
          dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-0-0-0)*
                                 dados$HORAS[i]/dados$TurAcu[i]
        } else{
          dados$Valor[i] <- (dados$CliAcu[i]+dados$RmdAcu[i]-
                               dados$EvsAcu[i-1]-dados$RmtAcu[i-1]-
                               dados$TrsAcu[i-1])*
                                 dados$HORAS[i]/dados$TurAcu[i]
        }
      }
    }
  }
}

As an observation, I am decreasing 0 three times in 3 counts because in the 1st month of the observations the subtractions do not interest me (this 1st month can be any of the months of the year).

Someone has some suggestions for improvement of this last code, using the package dplyr for example.

r dplyr

asked by anonymous 24.07.2017 / 19:24

1 answer

My project Laravel does not go up on the server Arguments barplot

score 5 · Accepted Answer

To deal with "forward" or "behind" values of an observation, given the ordering of another variable, dplyr has the arrange functions to sort, and lag and lead to access the previous or later value of a line in data.frame / tibble . There is an option to find the x value above, but check the documentation here . There is also a vignette just about the subject of pretty cool window functions.

In your case, I think it's possible that your problem is resolved with the following code:

dados %>%
    group_by(CIDADE) %>% 
    arrange(MES) %>% 
    transmute(
        MES,
        calculo = (CliAcu + RmdAcu - lag(RmtAcu, default = 0) - lag(TrsAcu, default = 0) - lag(EvsAcu, default = 0))*HORAS/TurAcu
        )

Assuming you want the value per city. The result of the previous computation is:

# A tibble: 24 x 3
# Groups:   CIDADE [2]
    CIDADE   MES   calculo
    <fctr> <int>     <dbl>
1      A     1        NA
2      B     1        NA
3      A     2  760.0000
4      B     2  976.0000
5      A     3  810.0000
6      B     3 1146.6667
7      A     4  968.8889
8      B     4  952.0000
9      A     5  966.6667
10     B     5 1054.5455
# ... with 14 more rows