There are several ways to run multiple regressions by category in R. I'll show you how to do with the base functions of R and with dplyr
. As an example, we will use the mtcars
database.
Suppose you want to run the mpg ~ disp + hp
regression for each level of the variable cyl
of mtcars
(there are 3 categories).
First of all, you can use the split()
function to build a list with three data.frames
different, one for each category:
data.frame_por_categoria <- split(mtcars, mtcars$cyl)
Now, just use lapply()
to apply the regression on every data.frame
:
modelos <- lapply(data.frame_por_categoria, function(x) lm(mpg ~ disp + hp, data = x))
The result, modelos
is a list of all three regressions. To access the first template:
modelos[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = x)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609
You can also do the same with the dplyr
package.
You have to group by category and then use do()
function to run the regression, putting a point .
where data.frame
would need to enter:
library(dplyr)
resultado <- mtcars %>% group_by(cyl) %>% do(modelo = lm(mpg ~ disp + hp, data = .))
The resultado
of the operation is a data.frame
with a column named model, and each element of that column is the regression. To access the first template:
resultado$modelo[[1]]
Call:
lm(formula = mpg ~ disp + hp, data = .)
Coefficients:
(Intercept) disp hp
43.04006 -0.11954 -0.04609