I have a table with a few columns of factors that vary over time. With multiple regression I can evaluate the influence of a group of factors on the variation of 1. How can I do this in R?
I have a table with a few columns of factors that vary over time. With multiple regression I can evaluate the influence of a group of factors on the variation of 1. How can I do this in R?
You can run a regression on R
using the lm
function. Using the mtcars
base that already comes in R as an example:
regressao <- lm(mpg ~ cyl, data = mtcars)
First we move on to the lm
function of the mpg ~ cyl
regression, and then the data = mtcars
database. The formula mpg ~ cyl
means that we are regressing the variable mpg
(miles per gallon) against the variable cyl
(displacements), it would be equivalent to the equation mph = b0 + b1 * cyl + e, and you are estimating the parameters b0 (constant) and b1 (angular coefficient). The regression result was saved in object regressao
.
When giving summary
you see the main results of the regression:
summary(regressao)
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
To perform a multiple regression, simply add more variables after ~
. More specifically, the element to the left of ~
is the dependent variable (its y) and all variables to the right of ~
are explanatory variables (the Xs). For example:
regressao_multipla <- lm(mpg ~ cyl + disp + wt + hp , data = mtcars)
Here we run a regression with 4 explanatory variables: cyl
, disp
, wt
and hp
, all data.frame
mtcars. To see the top results, use summary
again:
summary(regressao_multipla)
Call:
lm(formula = mpg ~ cyl + disp + wt + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.0562 -1.4636 -0.4281 1.2854 5.8269
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.82854 2.75747 14.807 1.76e-14 ***
cyl -1.29332 0.65588 -1.972 0.058947 .
disp 0.01160 0.01173 0.989 0.331386
wt -3.85390 1.01547 -3.795 0.000759 ***
hp -0.02054 0.01215 -1.691 0.102379
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.513 on 27 degrees of freedom
Multiple R-squared: 0.8486, Adjusted R-squared: 0.8262
F-statistic: 37.84 on 4 and 27 DF, p-value: 1.061e-10
There are several other functions to work with regressions in R. The object that the lm
function returns is of the lm
class, so you have an idea of the methods available for the class you can run methods(class = "lm")
.