Delete rows containing NA in a data frame

7

I have a date frame and in the fourth column there are several NA cells. I would like to know how I can delete all lines that have NA. I used this command but they keep showing up dataframe1

r <- with(dataframe1, which(dataframe1[4]==NA, arr.ind=TRUE))
newd <- dataframe1[-r, ]

The structure of my data is:

dput(head(dataframe1, 10))

structure(list(Sigla = c("AC", "AC", "AC", "AC", "AC", "AC", 
"AC", "AC", "AC", "AC"), Código = c(1200013L, 1200054L, 1200104L, 
1200138L, 1200179L, 1200203L, 1200252L, 1200302L, 1200328L, 1200336L
), MunicÃ.pio = c("Acrelândia", "Assis Brasil", "Brasiléia", 
"Bujari", "Capixaba", "Cruzeiro do Sul", "Epitaciolândia", "Feijó", 
"Jordão", "Mâncio Lima"), 'numero de homicidios' = c(4L, NA, 
1L, NA, 1L, 1L, NA, 1L, NA, 1L), 'media escolaridade' = c(3.268, 
3.72, 3.788, 2.816, 2.417, 4.108, 3.681, 1.948, 1.038, 3.537), 
    rendimento = c(1042.3834261349, 429.2221666106, 2243.2492197717, 
    786.6815828794, 603.835515482, 9363.3159742031, 1503.420009265, 
    1737.0793588989, 130.7838314018, 1040.2388777272), populacao = c(7935L, 
    3490L, 17013L, 5826L, 5206L, 67441L, 11028L, 26722L, 4454L, 
    11095L)), .Names = c("Sigla", "Código", "MunicÃ.pio", "numero de homicidios", 
"media escolaridade", "rendimento", "populacao"), row.names = c(NA, 
10L), class = "data.frame")
    
asked by anonymous 28.05.2014 / 00:51

2 answers

7

There are two solutions. If you want to omit all NA of the data.frame, you can use the na.omit function.

For example, suppose a data.frame with two columns, where there are NA's in both columns.

### Construindo um data.frame de exemplo ###
set.seed(1)
df <- data.frame(x=rnorm(100), y = rnorm(100))
df[sample(1:100,20),1] <- NA
df[sample(1:100,20),2] <- NA

The command na.omit will remove all lines that have at least one NA:

df2 <- na.omit(df)

But if you want to omit only lines that have NA in a specific column, you can use the is.na function to make the subset of the data.frame. The function is.na returns TRUE if the value is NA , in this way you will deny ! the result in the subset.

For example, the following command removes only lines that have NA in x :

df3 <- df[!is.na(df$x),]
    
28.05.2014 / 01:09
2

Another option is to use the complete.cases

df2 <- df[complete.cases(df),]

The function complete.cases returns a logical vector of TRUE and FALSE. As used in the example above, only cases with all observations that do not contain any NA variables are selected.

    
24.01.2015 / 02:06