Create column with sum and percentage of the maximum of other columns

7

I am a beginner in the R language and wanted to know how I would create a mutate function that would create a new column in my table and at the same time store in that new column the sum of the values contained in the other 6 columns of my table as in example below, I also wanted to create another column named proportion on the side of the Sum_total column that would take the largest number present between columns A and F and divide it by the value contained in the Sum_total column. For example, in the first row of the proportion column I would have 100 (highest value between columns A and F) / 150 (value contained in the column TotalSum).

Important details: The columns of A: F are of character type and I have a series of NA's in these columns that I would like to disregard when making my sum.

A   B   C   D   E   F   Soma_total Proporcao
100 50  NA  NA  NA  NA  150        (100/150)
49  51  1   NA  NA  NA  101        (51/101)
30  20  5   1   NA  NA  56         (30/56)
11  10  2   NA  NA  NA  23         (11/23)
7   3   5   1   1   1   18         (7/18)
0   10  NA  NA  NA  NA  10         (10/10)
1   2   NA  NA  NA  NA  3          (2/3)
5   6   2   NA  NA  NA  13         (6/13)
7   3   1   3   1   NA  15         (7/15)
12  3   1   2   NA  NA  18         (12/18) 
    
asked by anonymous 13.02.2017 / 14:35

4 answers

7

recreating the table in code (for everyone to be able to reproduce the example):

dt_so_bruna <- 
structure(list(A = c(100, 49, 30, 11, 7, 0, 1, 5, 7, 12), B = c(50, 
51, 20, 10, 3, 10, 2, 6, 3, 3), C = c(0, 1, 5, 2, 5, 0, 0, 2, 
1, 1), D = c(0, 0, 1, 0, 1, 0, 0, 0, 3, 2), E = c(0, 0, 0, 0, 
1, 0, 0, 0, 1, 0), F = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0)), .Names = c("A", 
"B", "C", "D", "E", "F"), row.names = c(NA, 10L), class = "data.frame")

solution with two lines of mutate ():

   dt_so_bruna %>%
   mutate(Soma        = A + B + C + D + E + F)  %>%
   mutate(Proporcao_A = pmax(A, B, C, D, E, F, na.rm = TRUE) / Soma)

         A  B C D E F Soma Proporcao_A
    1  100 50 0 0 0 0  150   0.6666667
    2   49 51 1 0 0 0  101   0.4851485
    3   30 20 5 1 0 0   56   0.5357143
    4   11 10 2 0 0 0   23   0.4782609
    5    7  3 5 1 1 1   18   0.3888889
    6    0 10 0 0 0 0   10   0.0000000
    7    1  2 0 0 0 0    3   0.3333333
    8    5  6 2 0 0 0   13   0.3846154
    9    7  3 1 3 1 0   15   0.4666667
    10  12  3 1 2 0 0   18   0.6666667

Assumi que seus NAs eram 0
    
13.02.2017 / 14:57
5

An alternative:

library(dplyr)


dados3 <- dados %>%
  mutate_at(c("A", "B", "C", "D", "E", "F"), as.numeric) %>%
  mutate(Soma_total_2 = data_frame(A, B, C, D, E, F) %>% rowSums(na.rm = TRUE),
         Maximo = pmax(A, B, C, D, E, F, na.rm = TRUE), 
         Proporcao_2 = sprintf("(%s/%s)", Maximo, Soma_total_2))

New functions there are

  • mutate_at - applies the same transformation to specified columns.
  • rowSums - Sums values of each row in an array or data.frame.
  • pmax - Get the maximum of each row in an array or data.frame.
  • sprintf - format texts.

Hugs!

    
13.02.2017 / 15:10
4

Less verbal response (you do not need to type the name of all columns), but with more code

library(tibble)
library(dplyr)
library(tidyr)

d %>% 
  # adiciona um "id" com os nomes das linhas
  rownames_to_column('id') %>% 
  # empilha as colunas A até F. ao invés de -id poderia ser A:F
  gather(key, val, -id) %>% 
  # convert val em numeric
  mutate(val = as.numeric(val)) %>% 
  # agrupa pelo id
  group_by(id) %>%
  # cria soma total e proporcao (desconsiderando NAs)
  mutate(Soma_total = sum(val, na.rm = TRUE),
         Proporcao = max(val, na.rm = TRUE) / Soma_total) %>% 
  # desagrupa
  ungroup() %>% 
  # joga A:F nas colunas novamente
  spread(key, val) %>% 
  # reordena colunas e retira o "id"
  select(A:F, Soma_total, Proporcao)

Result:

# A tibble: 10 × 8
       A     B     C     D     E     F Soma_total Proporcao
*  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>     <dbl>
1    100    50    NA    NA    NA    NA        150 0.6666667
2     12     3     1     2    NA    NA         18 0.6666667
3     49    51     1    NA    NA    NA        101 0.5049505
4     30    20     5     1    NA    NA         56 0.5357143
5     11    10     2    NA    NA    NA         23 0.4782609
6      7     3     5     1     1     1         18 0.3888889
7      0    10    NA    NA    NA    NA         10 1.0000000
8      1     2    NA    NA    NA    NA          3 0.6666667
9      5     6     2    NA    NA    NA         13 0.4615385
10     7     3     1     3     1    NA         15 0.4666667
    
13.02.2017 / 15:10
0

You can do everything easily with the basic R functions as well. In your case you are applying a function by lines.

Using the dt_so_bruna base created by Dan:

dt_so_bruna$soma <- rowSums(dt_so_bruna, na.rm = TRUE)
dt_so_bruna$prop <- apply(dt_so_bruna[-7], 1, max, na.rm = TRUE)/dt_so_bruna$soma
dt_so_bruna

     A  B C D E F soma      prop
1  100 50 0 0 0 0  150 0.6666667
2   49 51 1 0 0 0  101 0.5049505
3   30 20 5 1 0 0   56 0.5357143
4   11 10 2 0 0 0   23 0.4782609
5    7  3 5 1 1 1   18 0.3888889
6    0 10 0 0 0 0   10 1.0000000
7    1  2 0 0 0 0    3 0.6666667
8    5  6 2 0 0 0   13 0.4615385
9    7  3 1 3 1 0   15 0.4666667
10  12  3 1 2 0 0   18 0.6666667
    
25.02.2017 / 17:44