How to remove line that has missing?

8

I have a database that has some missings (NA's) but only one variable (a column), and I need to remove the entire line that has the missing.

    
asked by anonymous 18.09.2015 / 17:11

4 answers

3

The subset function resolves this directly and clearly, in my opinion. This can be done in conjunction with the is.na function being applied to the variable of interest.

> data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
    x           y    z
1   1  1.02572367 TRUE
2   2  0.03988014 TRUE
3   3 -0.33269252   NA
4   4  0.05357787 TRUE
5   5 -0.05166907 TRUE
6   6 -0.68981171   NA
7   7  1.14728375 TRUE
8   8 -0.76820827 TRUE
9   9 -0.45425148   NA
10 10 -0.27369393 TRUE
11 11 -0.12687725 TRUE
12 12 -0.38773276   NA

> df <- data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
> subset(df, !is.na(z))
    x          y    z
1   1 -0.2223889 TRUE
2   2 -0.7398008 TRUE
4   4 -1.6382330 TRUE
5   5  1.2596270 TRUE
7   7  1.0555701 TRUE
8   8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
11 11 -0.3278851 TRUE

And you can also include more rules in the filter.

> subset(df, !is.na(z) & x %% 2 == 0)
    x          y    z
2   2 -0.7398008 TRUE
4   4 -1.6382330 TRUE
8   8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
    
18.09.2015 / 23:35
5

To remove rows without data in R, you must use the function complete.cases ().

For example in a dataset {x}:

y <- x[complete.cases(x),]
str(y)

The complete.cases (x) is a logical vector that will return TRUE for rows with data and FALSE for rows with no data.

    
18.09.2015 / 17:56
4

Consider the following database:

> dados <- data.frame(
+     var1 = c(NA, 1),
+     var2 = c(1, NA)
+   )
>   
>   dados
  var1 var2
1   NA    1
2    1   NA

You can exclude all rows that have at least one missing using na.omit :

> na.omit(dados)
[1] var1 var2
<0 linhas> (ou row.names de comprimento 0)

Or delete all missing (NA) lines in some variable:

> dados[!is.na(dados$var1),]
  var1 var2
2    1   NA
> dados[!is.na(dados$var2),]
  var1 var2
1   NA    1

To check if a vector element is NA in R, we use the function is.na :

> is.na(NA)
[1] TRUE
> is.na(1)
[1] FALSE

To actually remove missions from the data.frame, you need to overwrite:

dados <- na.omit(dados)
    
18.09.2015 / 22:16
4

You can also use the filter function of dplyr :

Creating sample data (based on Daniel's data):

dados <- data.frame(var1 = c(NA, 1, 3), var2 = c(1, NA, 3))

Loading dplyr :

library(dplyr)

Remove% with% s only from column NA

dados %>% filter(!is.na(var1))

Remove% with% s only from column var1

dados %>% filter(!is.na(var2))

To remove all% s from% s, use% s of% s. You can easily fit into the piping chain:

# remove todos NAs
dados %>% na.omit
    
19.09.2015 / 15:08