How to calculate the percentage of NA in a data frame in R?

Question

How to calculate the percentage of NA in a data frame in R?

Navigation

#1 by (3 votes)
#2 by (0 votes)

0

Hello,

I'm working with a large data frame - 1000 variables and 60,000 rows - and I need to calculate the percentage of NA and whitespace for each of the variables separately.

What is the best way to do this in R?

r

asked by anonymous 08.09.2018 / 23:58

2 answers

Problem due to add at the end of a Simple Chained List How to insert link in button

score 3 · Answer 1

To count NA per columns you can use the colSums() function:

# total de linhas
n = nrow(df)

# porcentagem de NA por coluna
round(colSums(is.na(df))*100/n, 2)

Or you can also use the apply() function:

# função para contar NA's
sum_NA <- function(dados){
  sum(is.na(dados))
}

# total de linhas
n = nrow(df)

# aplicando a função em cada coluna
round(apply(df, 2, sum_NA)*100/n, 2)

score 0 · Answer 2

Well, come on, one of the ways to do that is to create a loop and take column by column of your data frame.

I created a data frame to exemplify

df <- data.frame(A=c(NA,2,'',1),B=c('',4,4,2),C=c(5,'','',''),D=c(7,7,5,4),E=c('','',NA,NA),F=c(9,9,0,6))

Notice that some of them have blank and NA values ...

for (i in 1:ncol(df)){
    print(sum(is.na(df[,c(i)]   )   | df[,c(i)] == ""  )/length(df[,c(i)]) * 100)
}

This is a loop that walks in each column and calculates the percentage you need based on my data frame this for will print the following results:

[1] 50
[1] 25
[1] 75
[1] 0
[1] 100
[1] 0

Do you want something simpler and maybe faster? try:

print(colMeans(is.na(df) | df == "")*100)

This gives the following output:

  A   B   C   D   E   F 
 50  25  75   0 100   0

Look at is.na is a function of R that finds all NA's made ou(|) to find all empty =="" , I think this last option is faster because it only uses functions compiled natively from R