Well, at first your code is correct, it should do the subset of the data, what may have happened is some other problem that could only be verified with the specific case.
Displaying a sample date frame:
set.seed(1)
df <- data.frame(valor= rnorm(100), categoria = rep(c("AB", "AC"), 50), stringsAsFactors=FALSE)
dr <- subset(df, df[2]=="AC")
See that dr
has only rows whose second column is "AC":
unique(dr[2])
categoria
2 AC
head(dr)
valor categoria
2 0.1836433 AC
4 1.5952808 AC
6 -0.8204684 AC
8 0.7383247 AC
10 -0.3053884 AC
12 0.3898432 AC
There are several other ways to filter a data frame. One of them would be to use the [
operator of R. Example:
dr <- df[df[2]=="AC", ]
or
dr <- df[df$categoria=="AC", ]
There are also specific packages for data manipulation. An excellent package for this is dplyr
, because it is quite fast and has a intuitive syntax (for example, the command to filter is called "filter").
No dplyr
would look like this:
library(dplyr)
dr <- df%>%filter(categoria=="AC")
If you're going to work a lot with databases, it's worth a look.