Direct (and beautiful) solution for storing base using dplyr

4

I have the following default basis:

df <- data.frame(
  lead_15 = c(1,0,0,0,0,1,0,0,1,0,0,0,0,0,1),
  lead_30 = c(0,0,0,1,0,0,1,1,0,1,0,0,0,1,0),
  lead_60 = c(0,1,0,0,1,0,0,0,0,0,1,1,0,0,0),

  inib_15 = c(1,0,0,0,0,0,0,0,1,0,0,0,0,0,0),
  inib_30 = c(0,0,0,1,0,0,1,0,0,0,0,0,0,1,0),
  inib_60 = c(0,0,0,0,1,0,0,0,0,0,1,1,0,0,0),

  motivo_15 = c("A","","","","","","","","D","","","","","",""),
  motivo_30 = c("","","","B","","","A","","","","","","","B",""),
  motivo_60 = c("","","","","C","","","","","","B","D","","","")
)

I want a solution where there is a line for each lead (3 lines) where the first column is the sum of the respective lead, the sum of the respective inhib and a column for each reason (A, B, C, D) the amount of these reasons.

LEAD    | QTD | INIB | A | B | C | D |
--------|-----|------|---|---|---|---|
lead_15 |  4  |  2   | 1 | 0 | 0 | 1 |
--------|-----|------|---|---|---|---|
lead_30 |  5  |  3   | 1 | 2 | 0 | 1 |
--------|-----|------|---|---|---|---|
lead_60 |  4  |  3   | 0 | 1 | 1 | 1 |

It's a relatively simple problem that I can solve but with many pieces of code and separate accounts. I wanted to ask here because I know there can be a direct solution using dplyr .

    
asked by anonymous 16.02.2017 / 20:31

2 answers

2

I could not think of a way to do it in just one expression. But I do not think the following organization is bad.

motivo <- df %>%
  select(starts_with("motivo")) %>%
  gather(key, motivo) %>%
  separate(key, c('x', 'grupo')) %>%
  filter(motivo != "") %>%
  group_by(grupo, motivo) %>%
  summarise(n = n()) %>%
  spread(motivo, n, fill = 0)

inib <- df %>%
  select(starts_with("inib")) %>%
  gather(key, inib) %>%
  filter(inib != 0) %>%
  separate(key, c('x', 'grupo')) %>%
  group_by(grupo) %>%
  summarise(inib = sum(inib))

qtd <- df %>%
  select(starts_with("lead")) %>%
  gather(key, lead) %>%
  separate(key, c('x', 'grupo')) %>%
  group_by(grupo) %>%
  summarise(inib = sum(lead))

final <- left_join(qtd, inib) %>% left_join(motivo)

Of course, if you know motives will always be "A," "B," "C," and "D," @Fernando's solution is better. This solution assumes that the number of reasons can be variable depending on the base, as well as the number of lead types.

    
17.02.2017 / 00:30
3

I do not know what exactly you meant by direct solution, but you follow a solution using dplyr and tidyr on a (long) line.

df %>% mutate_at(vars(starts_with("motivo")), 
                 funs(A = if_else(. == "A", 1, 0), 
                      B = if_else(. == "B", 1, 0), 
                      C = if_else(. == "C", 1, 0), 
                      D = if_else(. == "D", 1, 0))) %>%
  select(-matches("motivo_\d{2}$")) %>%
  gather %>% mutate(key = gsub("(.+)(_)(\d{2})_(.$)", "\1\4_\3", key)) %>%
  separate(key, c("tipo", "grupo")) %>%
  group_by(tipo, grupo) %>% summarise(value = sum(value)) %>% spread(tipo, value)

# A tibble: 3 x 7
grupo  inib  lead motivoA motivoB motivoC motivoD
* <chr> <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1    15     2     4       1       0       0       1
2    30     3     5       1       2       0       0
3    60     3     4       0       1       1       1
    
16.02.2017 / 23:47