Equivalent to SOMASES of Excel, in R

11

I'm digging into R and looking for a function that is equivalent to the excel somases.

I have this following DF:

Day.of.Week Hour    Quantidade
5            21         5044
3            12         5024
1            11         4725
0            16         4643
1            15         4601
4            14         4566
2            20         4509
1            10         4502
3            13         4477
6            11         4449

I need it to do the equivalent of Somases to mount something like

Hour 0  1  2  3  4  5  6
0    -  -  -  -  -  -  -
1    -  -  -  -  -  -  -
2    -  -  -  -  -  -  -
3    -  -  -  -  -  -  -
4    -  -  -  -  -  -  -
5    -  -  -  -  -  -  -
6    -  -  -  -  -  -  -
7    -  -  -  -  -  -  -
8    -  -  -  -  -  -  -
9    -  -  -  -  -  -  -
10   -  -  -  -  -  -  -
...

It would sum the amount according to the day of the week and the time.

    
asked by anonymous 29.03.2016 / 22:35

4 answers

9

This can be done in several ways, as can be seen from the various answers. One way without using add packages is to use tapply :

> tapply(df$Quantidade, list(df$Day.of.Week, df$Hour), sum)
    10   11   12   13   14   15   16   20   21
0   NA   NA   NA   NA   NA   NA 4643   NA   NA
1 4502 4725   NA   NA   NA 4601   NA   NA   NA
2   NA   NA   NA   NA   NA   NA   NA 4509   NA
3   NA   NA 5024 4477   NA   NA   NA   NA   NA
4   NA   NA   NA   NA 4566   NA   NA   NA   NA
5   NA   NA   NA   NA   NA   NA   NA   NA 5044
6   NA 4449   NA   NA   NA   NA   NA   NA   NA

What stays as rows or columns depends on the order of the list passed as the second argument. If the order is reversed, the result is transposed:

> tapply(df$Quantidade, list(df$Hour, df$Day.of.Week), sum)
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA
    
30.03.2016 / 02:49
6

To do this you can use two functions of R: aggregate to add the "duplicates", so that you have only one case of day / hour for each quantity and the function acast of the library reshape2 to create the matrix the way you are asking.

Your data.frame:

df <- data.frame(Day.of.Week = c(5,3,1,0,1,4,2,1,3,6), 
                 Hour = c(21,12,11,16,15,14,20,10,13,11), 
                 Quantidade = c(5044,5024,4725,4643,4601,4566,4509,4502,4477,4449))

Code:

require(reshape2)

df <- aggregate(Quantidade ~ ., df, sum)
somases <- acast(df, Hour ~ Day.of.Week, value.var = "Quantidade")

Output:

> somases
      0    1    2    3    4    5    6
10   NA 4502   NA   NA   NA   NA   NA
11   NA 4725   NA   NA   NA   NA 4449
12   NA   NA   NA 5024   NA   NA   NA
13   NA   NA   NA 4477   NA   NA   NA
14   NA   NA   NA   NA 4566   NA   NA
15   NA 4601   NA   NA   NA   NA   NA
16 4643   NA   NA   NA   NA   NA   NA
20   NA   NA 4509   NA   NA   NA   NA
21   NA   NA   NA   NA   NA 5044   NA

In this case the NA's represent cases where there is no value.

A question related in English.

    
29.03.2016 / 23:58
5

The way I like the most is the following, using the dplyr and tidyr packages that are specialized for transformations in data.frames :

library(dplyr)
library(tidyr)
df %>% group_by(Day.of.Week, Hour) %>%
  summarise(Quantidade = sum(Quantidade)) %>%
  spread(Day.of.Week, Quantidade, fill = 0)

Source: local data frame [9 x 8]

   Hour     0     1     2     3     4     5     6
  (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1    10     0  4502     0     0     0     0     0
2    11     0  4725     0     0     0     0  4449
3    12     0     0     0  5024     0     0     0
4    13     0     0     0  4477     0     0     0
5    14     0     0     0     0  4566     0     0
6    15     0  4601     0     0     0     0     0
7    16  4643     0     0     0     0     0     0
8    20     0     0  4509     0     0     0     0
9    21     0     0     0     0     0  5044     0

If you do not have the packages installed you will need to use: install.packages(c("dplyr", "tidyr")) to install.

For me the advantages of doing so are:

  • The result is still data.frame . In the responses of @Molx and André Mutao the result is an unnamed array of dimensions ..

  • You do not need to explicit a line for each day of the week. In the reply of @carlosfigueira it is necessary to make hard code of the days of the week.

  • You can use the fill argument of the spread function to automatically fill empty boxes with some value. in this case, I believe it should not be NA and yes 0 .

30.03.2016 / 20:21
4

You can use a value manipulation library such as dplyr to add up the amounts based on the day of the week / hour, and then manipulate the result to get the format you want. The code below shows an example of how this can be done.

dow <- sample(0:6, 1000, replace = TRUE)
h <- sample(0:23, 1000, replace = TRUE)
q <- floor(runif(1000, 100, 10000))
df <- data.frame(Day.of.Week = dow, Hour = h, Quantidade = q)

library(dplyr)
somas <- df %>%
    group_by(Day.of.Week, Hour) %>%
    summarize(soma = sum(Quantidade))

perDay <- split(somas, somas$Day.of.Week)
result <- data.frame(Hour = perDay[[1]]$Hour,
                     Su = perDay[[1]]$soma,
                     Mo = perDay[[2]]$soma,
                     Tu = perDay[[3]]$soma,
                     We = perDay[[4]]$soma,
                     Th = perDay[[5]]$soma,
                     Fr = perDay[[6]]$soma,
                     Sa = perDay[[7]]$soma)
    
29.03.2016 / 23:16