How to filter a data frame?

3

I have a date frame with 5597 rows and 7 columns. I would like to filter the results of this date frame, so that only the lines in which the second column is written "AC" appear. I tried to use the dr=subset(df, df[2]=="AC") command, where df is my own data frame and 2 is the column where "AC" appears. Unfortunately, the command did not work. Is there anything I can do to improve the code?

    
asked by anonymous 28.05.2014 / 19:44

1 answer

4

Well, at first your code is correct, it should do the subset of the data, what may have happened is some other problem that could only be verified with the specific case.

Displaying a sample date frame:

set.seed(1)
df <- data.frame(valor= rnorm(100), categoria = rep(c("AB", "AC"), 50), stringsAsFactors=FALSE)
dr <- subset(df, df[2]=="AC")

See that dr has only rows whose second column is "AC":

unique(dr[2])
  categoria
2        AC

head(dr)
        valor categoria
2   0.1836433        AC
4   1.5952808        AC
6  -0.8204684        AC
8   0.7383247        AC
10 -0.3053884        AC
12  0.3898432        AC

There are several other ways to filter a data frame. One of them would be to use the [ operator of R. Example:

dr <- df[df[2]=="AC", ]

or

dr <- df[df$categoria=="AC", ]

There are also specific packages for data manipulation. An excellent package for this is dplyr , because it is quite fast and has a intuitive syntax (for example, the command to filter is called "filter").

No dplyr would look like this:

library(dplyr)
dr <- df%>%filter(categoria=="AC")

If you're going to work a lot with databases, it's worth a look.

    
29.05.2014 / 02:42