Second degree polynomial regression in R: How to get X given Y?

Question

Second degree polynomial regression in R: How to get X given Y?

Navigation

#1 by (1 votes)
#2 by (1 votes)

5

R Gurus,

I have the following data frame (Df) that establishes the relationship between the X and Y variables:

     X     Y
 1  25  2457524
 2  25  2391693
 3  25  2450828
 4  25  2391252
 5  25  2444638
 6  25  2360293
 7  50  4693194
 8  50  4844527
 9  50  4835596
10  50  4878092
11  50  4809226
12  50  4722253
13  75  7142763
14  75  7182769
15  75  7135550
16  75  7173920
17  75  7216871
18  75  7076359
19  100 9496553
20  100 9537788
21  100 9405825
22  100 9439201
23  100 9609870
24  100 9707734
25  125 12031958
26  125 12027037
27  125 11935594
28  125 11930086
29  125 12154132
30  125 12096462
31  150 14298064
32  150 14396607
33  150 13964716
34  150 14221039
35  150 14283992
36  150 14042220

(Note that we have 7 levels for variable X with 6 points on each level)

If we fit a 2nd degree polynomial model for this data, we will obtain the following model:

Model<-lm(formula = Y ~ X + I(X^2))
print(Model)

Call:
lm(formula = Y ~ X + I(X^2))

Coefficients:
   (Intercept)     X       I(X^2)  
     -26588.12  97310.61   -14.02

The graphic representation of this model, which looks more like a straight line, is as follows:

Ifwewanttousethemodeltopredictthevaluesof"Y" from the values of the "X" variable, just execute this line of code:

>predicted.intervals <- predict(Model,data.frame(x=X),interval='confidence',
+ level=0.95)

>predicted.intervals
        fit      lwr      upr
   1   2397413  2315346  2479481
   2   2397413  2315346  2479481
   3   2397413  2315346  2479481
   4   2397413  2315346  2479481
   5   2397413  2315346  2479481
   6   2397413  2315346  2479481
   7   4803887  4753705  4854070
   8   4803887  4753705  4854070
   9   4803887  4753705  4854070
   10  4803887  4753705  4854070
   11  4803887  4753705  4854070
   12  4803887  4753705  4854070
   13  7192834  7137649  7248019
   14  7192834  7137649  7248019
   15  7192834  7137649  7248019
   16  7192834  7137649  7248019
   17  7192834  7137649  7248019
   18  7192834  7137649  7248019
   19  9564252  9509067  9619438
   20  9564252  9509067  9619438
   21  9564252  9509067  9619438
   22  9564252  9509067  9619438
   23  9564252  9509067  9619438
   24  9564252  9509067  9619438
   25 11918144 11867961 11968326
   26 11918144 11867961 11968326
   27 11918144 11867961 11968326
   28 11918144 11867961 11968326
   29 11918144 11867961 11968326
   30 11918144 11867961 11968326
   31 14254507 14172440 14336574
   32 14254507 14172440 14336574
   33 14254507 14172440 14336574
   34 14254507 14172440 14336574
   35 14254507 14172440 14336574
   36 14254507 14172440 14336574

The question you do not want to shut up:

What would be the line (s) of code to do the inverse prediction, ie in this model, predict "X" from the data of variable "Y"? Searching in google already tried several packages and specific functions but unfortunately I did not succeed (perhaps for lack of familiarity with the functions). Could any of you help me unravel this mystery? Big hug to all.

r

asked by anonymous 17.11.2016 / 20:36

2 answers

How to swap an apply within a for by a double apply? JSF is specification or framework?

score 1 · Answer 1

I do not know of any function that is ready to do this, however this problem can be treated as an optimization problem.

We want to find x in a given interval that will minimize a function. The function I want to minimize is:

objetivo <- function(x, k, model){
  df <- data.frame(x = x)
  (k - predict(model, df))^2
}

Given a y = k value, I want to find a x value that most closely matches the prediction of the template of this k value. In the background this means: find me x that when I apply the template I will get closer to the y that I am looking for .

Example:

Suppose the following data:

x <- runif(100)
y <- 10 + x + x^2 + rnorm(n = 100, mean = 0, sd = 0.1)
plot(x, y)

model<-lm(formula=y~x+I(x^2))Call:lm(formula=y~x+I(x^2))Residuals:Min1QMedian3QMax-0.208921-0.0645060.0015370.0611070.276347Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)9.988180.03736267.335<2e-16***x1.059120.156976.7471.10e-09***I(x^2)0.958220.142296.7341.17e-09***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:0.09685on97degreesoffreedomMultipleR-squared:0.9728,AdjustedR-squared:0.9723F-statistic:1735on2and97DF,p-value:<2.2e-16

Now,usingtheoptimfunctionwecangetwhatwewant.

v<-optim(0,objetivo,k=11,model=model,method="Brent", lower = min(x), upper = max(x))
v$par
[1] 0.6240072

So, we get the value of x which is most likely given y = k = 11 .

Note : The optim function by default uses the Nelder-Mead method, however it releases a warning when I used it in this example. So I switched to the "Brent" method that requires minimum and maximum values for the estimated value of x . This may be good, since by estimating polynomials, it is possible that there are other values that minimize this function.

score 1 · Answer 2

Adapting the previous answers with data more like the one of the question and understanding that the result of x must be in the 7 categories ("levels") of described values:

v <- c(25,50,75,100,125)
valores <- sort(rep.int(v,6))

set.seed(13)
x <- sample(valores, 36, replace = T)
y <- -26500 + 97000*x + -14*x^2 + rnorm(n = 36, mean = 0, sd = 100000)
plot(x, y, pch = 21)

The Model can look like this:

model <-lm(formula = y ~ x + I(x^2))

As well as the objective function to find the predicted value:

objetivo <- function(x, k, model){
  df <- data.frame(x = x)
  (k - predict(model, df))^2
}

This way by looping with 'optim':

    aux_1 <- numeric(length(y))

    for(i in seq_along(y)){ 
    aux_1[i] <- optim(0, objetivo, k = y[i], model = model, method = "Brent",
 lower = min(x), upper = max(x))$par
    }

    # Tomando os resultados mais próximos:

    result <- round(aux_1,0)
    result
[1] 100  51  50  27 125  25  76 100 124  25  99 124 125  76  75  50  50  74
[19] 125 100  25  74 100  77  26 102  25  74  49  98  75  76 125 125  75  25

If you want only the values present in (25, 50, 75, 100, 125) you can do the function below:

extract <- function(x,y){
y <- sort(y, decreasing = T)

for(i in seq_along(y)){
x[x/y[i] < 1.25 & x/y[i] >= 0.98] = y[i]}
return(x)
}

result <- extract(result,v)
result
[1] 100  50  50  25 125  25  75 100 125  25 100 125 125  75  75  50  50  75
[19] 125 100  25  75 100  75  25 100  25  75  50 100  75  75 125 125  75  25

For values of 'x' closer to each other it may be necessary to further refine the function that looks for the nearest values.

EDIT_1 (11/27/17, 23:35): I was generating an aux_2 vector that was unnecessary in the final published version.

EDIT_2 (06/15/18, 20:20): I have refined the vector of values to be more similar to the data.frame shown as an example and added the good suggestions of Rui Barradas's comment below. Also, since this response lacked approximation to be only with vector values (25, 50, 75, 100, 125), I created the 'extract' function. However, it is a specific function for the values of this example, you can adjust them for more general cases.