How to remove columns from a data frame?

2

I have this date frame with 275 variables and would like to remove variables that are not contributing significantly (that have non-zero value less than 10 times). Can anyone help me?

    
asked by anonymous 05.12.2016 / 18:22

1 answer

1

One possible way to do this is to use the select_if function of the dplyr package.

First define a function that counts the number of zeros:

contar_zeros <- function(x){
  sum(x == 0)
}

Now consider this data.frame

df <- data_frame(
  x = 0,
  y = 1:10,
  z = c(rep(0,5), 6:10)
)
df
# A tibble: 10 × 3
       x     y     z
   <dbl> <int> <dbl>
1      0     1     0
2      0     2     0
3      0     3     0
4      0     4     0
5      0     5     0
6      0     6     6
7      0     7     7
8      0     8     8
9      0     9     9
10     0    10    10

Using select_if :

df_sem_colunas <- select_if(df, function(col) contar_zeros(col) < 10)
df_sem_colunas
# A tibble: 10 × 2
       y     z
   <int> <dbl>
1      1     0
2      2     0
3      3     0
4      4     0
5      5     0
6      6     6
7      7     7
8      8     8
9      9     9
10    10    10
    
05.12.2016 / 19:22