How to include in the regression a variable raised to n

4

Suppose I have the following data

x<-rnorm(100,1,10000)
y<-rnorm(100,1,10000)+2*x+x^2

If I use the lm function as follows:

model1<-lm(y~x+x^2)

The R does not understand that it is to put between the independent variables the term x squared. It simply ignores the term and regresses the model as the code below:

model2<-lm(y~x)
    
asked by anonymous 18.02.2014 / 12:53

3 answers

3

Use model1 <- lm(y ~ x + I(x^2)) .

The problem is that characters like + , - , * , and ^ have specific meanings inside a formula; the I function causes its expression ( x^2 ) to be interpreted literally as potentiation.

    
18.02.2014 / 13:11
4

Another way to do regression is to use the poly

x<-rnorm(100,1,10000)
y<-rnorm(100,1,10000)+2*x+x^2
model1<-lm(y~poly(x,degree=2,raw=T))
    
18.02.2014 / 22:45
3

Whenever you want to use a function of some variable, you can use the function I() .

x<-rnorm(100,1,100)
y<-rnorm(100,0,10)+2*x+x^2

mod <- lm(y~x+I(x^2))

The advantage of using I() in relation to creating a new variable with x^2 values is that you do not need to specify x^2 values to perform projections, just enter x .

predict(mod, data.frame(x=1:3))
        1         2         3 
 2.211883  7.209663 14.207509 
    
18.02.2014 / 13:10