Identify cases with several conditions in several columns in the R

4

I have a 20-student dataframe and I need to identify the students who attended Stage 43 for two years or more.

aluno <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
etapa_2012 <- c(42, 43, 44, 43, 42, 43, 44, 45, 42, 43, 44, 45, 42, 43, 44, 44, 42, 43, 44, 45)
etapa_2013 <- c(43, 44, 45, 43, 43, 44, 45, 45, 43, 43, 45, 45, 43, 44, 45, 44, 43, 44, 45, 45)
etapa_2014 <- c(44, 45, 45, 43, 44, 45, 45, 45, 44, 43, 45, 45, 44, 45, 45, 45, 44, 45, 45, 45)
etapa_2015 <- c(45, 45, 45, 44, 45, 45, 45, 44, 43, 45, 45, 45, 45, 45, 45, 44, 43, 45, 45, NA)
fluxo<-data.frame(aluno, etapa_2012, etapa_2013, etapa_2014, etapa_2015)

But I can only add a new column identifying the students who made step 43.

fluxo$dois_ou_mais <-ifelse(fluxo$etapa_2012==43|fluxo$etapa_2013==43|fluxo$etapa_2014==43|fluxo$etapa_2015==43, 1, 0)
fluxo

So I have the result

Iwouldliketogettotheresultwhereonlystudents4,9,10and17weremarkedincolumntwo_or_more,sincetheyhavestep43inmorethanoneyear,asshownbelow.

    
asked by anonymous 04.04.2018 / 19:00

2 answers

6

Use the

fluxo[, 2:5]==43

Thus, each position in columns 2 through 5 will be tested to see if they are equal to 43. Thus, an object with TRUE and FALSE will be created.

head(fluxo[, 2:5]==43)
     etapa_2012 etapa_2013 etapa_2014 etapa_2015
[1,]      FALSE       TRUE      FALSE      FALSE
[2,]       TRUE      FALSE      FALSE      FALSE
[3,]      FALSE      FALSE      FALSE      FALSE
[4,]       TRUE       TRUE       TRUE      FALSE
[5,]      FALSE       TRUE      FALSE      FALSE
[6,]       TRUE      FALSE      FALSE      FALSE

For R , TRUE has value 1 and FALSE has value 0. So just add the number of TRUE on each line:

apply(head(fluxo[, 2:5]==43), 1, sum)
[1] 1 1 0 3 1 1

To find out who attended more than once, without worrying about the number of times the person attended, use the command below:

as.numeric(apply(head(fluxo[, 2:5]==43), 1, sum)>1)
[1] 0 0 0 1 0 0

Remove the head from the solution I've gone through and everything will work out to solve your problem with the original size.

    
04.04.2018 / 19:14
3

using the rowSums function to get the table as you requested

fluxo$dois_ou_mais <- as.numeric(rowSums(fluxo[,-1] == 43, na.rm = TRUE) > 1)

But if you are interested in getting only the students, without modifying the original table. I'd rather use tidyr and dplyr

library(tidyr)
library(dplyr)
fluxo %>% gather(key = ano, value = etapa, -aluno) %>% 
          filter(etapa == 43) %>% group_by(aluno) %>% 
          summarise( N = n()) %>% filter(N > 1)
    
04.04.2018 / 22:20