How to select all data.frame variables at once for a regression?

5

Suppose the following data.frame :

set.seed(1)    
dados <- data.frame(y=rnorm(100), x1=rnorm(100), x2=rnorm(100), x3=rnorm(100), x4=rnorm(100))

If I want to run a regression of y against x1 ... xn , I can do it as follows:

modelo <- lm(y~x1+x2+x3+x4, data=dados)

In this case since it only has 4 variables, it is not exhaustive to describe them all. But supposing there were 100 variables, that is, from x1 to x100 . How to select all in an easy way for regression?

    
asked by anonymous 19.02.2014 / 14:37

3 answers

5

The . operator in this context (argument formula function lm ) means "all other columns that are not in the formula".

In this way the regression of y against all other columns of data.frame can be obtained as follows:

modelo <- lm(y~., data=dados)

Reference: ?formula

    
19.02.2014 / 16:50
2

The dot is particularly useful when you want to place interaction effects. For example, suppose you want to test a template with all variables and all interactions of up to 2 variables, how could it be done?

## Conjunto de dados de exemplo
exemplo = data.frame(x1 = 1:3, x2 = 1:3, x3 = 1:3, x4 = 1:3)

## Modelos com todas interações até 2
lm(data = x, formula = x1 ~ (.)^2)

## Modelos com todas as interações até 3
lm(data = x, formula = x1 ~ (.)^3)
    
18.09.2014 / 07:01
1

or, if dados is your frame and the first column has name y (as is your case),

    modelo <- lm(formula=dados)

also works.

    
23.03.2014 / 19:20