I have a database that has some missings (NA's) but only one variable (a column), and I need to remove the entire line that has the missing.
I have a database that has some missings (NA's) but only one variable (a column), and I need to remove the entire line that has the missing.
The subset
function resolves this directly and clearly, in my opinion.
This can be done in conjunction with the is.na
function being applied to the variable of interest.
> data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
x y z
1 1 1.02572367 TRUE
2 2 0.03988014 TRUE
3 3 -0.33269252 NA
4 4 0.05357787 TRUE
5 5 -0.05166907 TRUE
6 6 -0.68981171 NA
7 7 1.14728375 TRUE
8 8 -0.76820827 TRUE
9 9 -0.45425148 NA
10 10 -0.27369393 TRUE
11 11 -0.12687725 TRUE
12 12 -0.38773276 NA
> df <- data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
> subset(df, !is.na(z))
x y z
1 1 -0.2223889 TRUE
2 2 -0.7398008 TRUE
4 4 -1.6382330 TRUE
5 5 1.2596270 TRUE
7 7 1.0555701 TRUE
8 8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
11 11 -0.3278851 TRUE
And you can also include more rules in the filter.
> subset(df, !is.na(z) & x %% 2 == 0)
x y z
2 2 -0.7398008 TRUE
4 4 -1.6382330 TRUE
8 8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
To remove rows without data in R, you must use the function complete.cases ().
For example in a dataset {x}:
y <- x[complete.cases(x),]
str(y)
The complete.cases (x) is a logical vector that will return TRUE for rows with data and FALSE for rows with no data.
Consider the following database:
> dados <- data.frame(
+ var1 = c(NA, 1),
+ var2 = c(1, NA)
+ )
>
> dados
var1 var2
1 NA 1
2 1 NA
You can exclude all rows that have at least one missing using na.omit
:
> na.omit(dados)
[1] var1 var2
<0 linhas> (ou row.names de comprimento 0)
Or delete all missing (NA) lines in some variable:
> dados[!is.na(dados$var1),]
var1 var2
2 1 NA
> dados[!is.na(dados$var2),]
var1 var2
1 NA 1
To check if a vector element is NA
in R, we use the function is.na
:
> is.na(NA)
[1] TRUE
> is.na(1)
[1] FALSE
To actually remove missions from the data.frame, you need to overwrite:
dados <- na.omit(dados)
You can also use the filter
function of dplyr
:
Creating sample data (based on Daniel's data):
dados <- data.frame(var1 = c(NA, 1, 3), var2 = c(1, NA, 3))
Loading dplyr
:
library(dplyr)
Remove% with% s only from column NA
dados %>% filter(!is.na(var1))
Remove% with% s only from column var1
dados %>% filter(!is.na(var2))
To remove all% s from% s, use% s of% s. You can easily fit into the piping chain:
# remove todos NAs
dados %>% na.omit