I imported a table as a database to handle in R. However, I need to do some calculations with only a few columns in this table.
How do I select only these columns for the calculations?
I imported a table as a database to handle in R. However, I need to do some calculations with only a few columns in this table.
How do I select only these columns for the calculations?
There are several ways to select columns from a data.frame
in R, let's use data.frame
mtcars
as an example. To find out which columns exist, you can ask to see the names
or colnames
of data.frame
:
names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
To select any of these columns, for example, the mpg
column, you can use $mpg
, brackets [,"mpg"]
as an array, or double brackets as if it were a [["mpg"]]
list:
mtcars$mpg
mtcars[, "mpg"]
mtcars[["mpg"]]
These three ways mentioned return a vector as a result. You can also select a data.frame containing the mpg
column (note the difference, you get a date .frame and not a vector). For this you will use the simple bracket as if it were a list:
mtcars["mpg"]
Or also use the were array, with the drop = FALSE
argument.
mtcars[ ,"mpg", drop = FALSE]
If you want to select more than one column, you can use either the simple bracket as a list or the simple bracket as an array.
mtcars[ ,c("mpg", "cyl")] # seleciona duas colunas
mtcars[c("mpg", "cyl")] # seleciona duas colunas
Note that the array form now returns a data.frame
, since you are selecting more than one column. There are convenience functions to do this too, like the subset
function that rafael mentioned. It will return you a data.frame with the column mpg
and not a vector:
subset(mtcars, select = c("mpg","cyl"))
And each data manipulation package also has its way of selecting columns. For example, dplyr
has the select function, which is very similar to the subset mentioned:
mtcars %>% select(mpg, cyl)
It would be easier if you included your database (or some part of it) so we could work on it. Take a look at the dput
function for this purpose. It would also be nice if you included the code you developed / tried to develop.
As for your doubt, the subset
functions, from the R
base itself, or the select
function, from the dplyr
package, should help you.