Apply function in data groups

Question

Apply function in data groups

Navigation

#1 by (1 votes)
#2 by (0 votes)

3

I need to separate the data into groups and perform the calculations in two or three groups / dimensions.

I found the tapply function, it solves the problem. With it I get what I need using the average function, sum, etc.

But now, I realized that I need to homogenize the data in the selected groups, so instead of the function being average, sum and etc, I need to create a function that homogeinize and then apply to tapply. I think my homogenization function is in trouble, but I can not figure out what.

I have tried with dplyr, data.table, add following the idea of the link next, but all give error. How to consolidate (aggregate or group) ) the values in a database?

Below is the code I have:

   bairro <- c("B_FLORESTA", "B_PINHEIRAO", "B_PINHEIRAO", "B_PINHEIRINHO",
                "B_LUTHER KING", "B_LUTHER KING", "B_VILA NOVA", "B_VILA NOVA",
                "B_NOVA PETROPOLIS", "B_VILA NOVA", "B_INTERIOR", "B_ALVORADA",
                "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA", "B_SADIA",
                "B_SADIA", "B_JUPTER", "B_JUPTER", "B_FLORESTA", "B_ITALIA",
                "B_ITALIA", "B_ITALIA", "B_ITALIA")

    tipo <-   c("CASA", "CASA", "COMERCIAIS", "CASA", "CASA", "COMERCIAIS",
                "APARTAMENTO", "APARTAMENTO", "APARTAMENTO", "APARTAMENTO",
                "SITIO", "APARTAMENTO", "CASA", "CASA", "CASA", "CASA",
                "TERRENO", "TERRENO", "CASA", "CASA", "CASA", "CASA",
                "CASA", "CASA", "CASA", "CASA")

    valor <-  c(1167, 2500, 1125, 2286, 400, 400, 1500, 1500, 300, 1500, 555,
                973, 2500, 2556, 2500, 2556, 600, 850, 2338, 1857, 1857, 2000,
                2000, 2063, 2000, 2063)

    data <-   c("2015_07", "2015_07", "2015_07", "2015_07", "2015_07", "2015_07",
                "2015_07", "2015_07", "2015_08", "2015_08", "2015_08", "2015_08",
                "2015_08", "2015_08", "2015_08", "2015_08", "2015_08", "2015_08",
                "2015_09", "2015_09", "2015_09", "2015_09", "2015_09", "2015_09",
                "2015_09", "2015_09")

    dados <- data.frame(bairro, tipo, valor, data)

    x <- tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), median)

## ok, esse é o resultado final 1.

So far blz, but now, I need to homogenize, this is where my problem is !! Here is one of the functions for this:

homo <- function (a){
        a <- a[order(a$valor),] # ordenar o pvalor
        n <- nrow(a)
        a
        for(i in 1:n){
          a$sobra[i] = round(((a$valor[i+1] / a$valor[i])*100)-100, dig = 2)
        }

        a <- subset (a, a$sobra < 50)   # ponto de corte < 50
        return (a)
      }

When you apply the "homo" function on the tapply, it gives error.

tapply(dados$valor, list(dados$tipo, dados$data, dados$bairro), homo)

Can anyone help me?

r plyr dplyr

asked by anonymous 09.10.2015 / 18:55

2 answers

1

The problem is that a vector is being passed to the function homo() ( dados$valor ) and within it you are treating it as a data.frame / list (trying to call a$valor , among others.)

Below a homo() function that works, but I do not know if it is the result that you wanted (I could not understand what you consider to homogenize):

homo <- function (a){
        a <- order(a) # ordenar o pvalor
        n <- length(a)
        sobra <- rep(NA, n -1)
        for(i in 1:n){
          sobra[i] = round(((a[i+1] / a[i])*100)-100, dig = 2)
        }

        a <- subset(a, sobra < 50)   # ponto de corte < 50
        return(a)
      }

In addition to the error to consider as a list, I also fixed an error that would occur in for(i in 1:n) , where you would try to call a non-existent position ( n+1 ).

09.10.2015 / 20:10

Hibernate with multiple databases in the same application Configure PHP INI to load default PHP file

score 0 · Accepted Answer

With the help of @Pierre Lafortune, follow the answer:

  library(dplyr)    
  dados %>% group_by(tipo, data, bairro) %>%
            arrange(pvalor) %>%
            mutate(sobra = round(((lead(pvalor) / pvalor)*100)-100, dig = 2)) %>%
            filter(sobra < 50) %>%
            summarise(pvalor = mean(pvalor))