How to know the amount of NA in each variable?

5

Suppose I'm working with the following database:

df=data.frame(v=c(1,2,NA,4,NA,6,7,8,9,10),v2=c(11,NA,NA,14,NA,16,NA,NA,19,NA),
          v3=c(21,22,23,24,25,26,27,28,29,30),
          v4=c("a","b","c", NA, NA,NA,"g","h", NA,NA))

I need to know how much NA each variable contains. In the example: v1 = 2 v2 = 6 v3 = 0

I could make the command below for each variable

sum(is.na(df$v1))

But when we have a large date frame this is nothing practical.

Another possible command is summary(df) but as it returns many other results it is difficult to see the quantities of NA in each variable.

Is there a way to return only the amount of NAs that each variable in the data frame has?

    
asked by anonymous 27.02.2014 / 11:14

3 answers

4

Use sapply to apply its function to each column of data.frame

df
    v v2 v3   v4
1   1 11 21    a
2   2 NA 22    b
3  NA NA 23    c
4   4 14 24 <NA>
5  NA NA 25 <NA>
6   6 16 26 <NA>
7   7 NA 27    g
8   8 NA 28    h
9   9 19 29 <NA>
10 10 NA 30 <NA>

sapply(df, function(x) sum(is.na(x)))
 v v2 v3 v4 
 2  6  0  5 
    
27.02.2014 / 15:08
3

You can use the colwise function of plyr to make its function applicable to columns in the data frame:

Defining the function:

library(plyr)
quantos.na <- colwise(function(x) sum(is.na(x)))

Applying the function:

quantos.na(df)
  v v2 v3 v4
1 2  6  0  5
    
01.04.2014 / 18:01
1

Try this here

table (data $ VAR1, useNA="always")

The result appears like this:

 1      2     3     4     5     6     7  <NA> 

10484   518  4389  3639   272   522   836 18291

The command useNA="always" is used so that R does not omit the missings data.

    
28.07.2017 / 15:15