How to make a prediction interval for a restricted group?

5

Considering the model with only these two explanatory variables, indicate a 95% prediction interval for an individual in the first quartile (1st Qu) of X1 and the second category of X2 .

I know the generic code, but I can not restrict to the requested group the code I used: pr.p <- predict(model,interval="prediction",level=0.95)

Bank ex:

glucose	insulin	FIDADE
89	94	1
78	88	1
118	230	1
126	235	1
97	140	1
158	245	1
88	54	1
145	130	2
126	22	2
187	392	2
130	79	2
187	200	2
128	110	2
166	175	3
143	146	3
150	342	3
136	110	3
134	60	4
173	265	4
195	145	4
145	165	4

Thanks for any help !!

    
asked by anonymous 05.01.2019 / 14:11

1 answer

4

To predict using the model set with lm , you must have a dataframe with the regressor variables at the points you want. The code below creates a sub-df with rows in which insulin is in the 1st quartile and FIDADE is in the 2 category.

Assuming the adjusted model is this:

model <- lm(glucose ~ insulin + FIDADE, data = dados)

You can get a prediction interval with:

qq <- quantile(dados$insulin, probs = 0.25)
i1 <- with(dados, qq >= insulin)
i2 <- with(dados, FIDADE == 2)
new <- dados[i1 & i2, c("insulin", "FIDADE")]
predict(model, newdata = new, interval = "prediction", level = 0.95)
#        fit     lwr      upr
#9  108.6813 60.2474 157.1153
#11 118.9752 72.0415 165.9090

Editing.

Given the request in the comment to simulate the 20% increase in the amplitude of the insulin variable, the only problem seems to be the creation of a data set with 20% greater amplitude of insulin in each category. (At least that's what I think makes the most sense.)

rng <- with(dados, tapply(insulin, FIDADE, FUN = range))
rng <- lapply(rng, function(r){
  d <- diff(r)
  c(max(r) - 1.1*d, min(r + 1.1*d))
})
tmp <- unlist(lapply(names(rng), function(n) rep(as.integer(n), length(rng[[n]]))))
nova_ampl <- data.frame(insulin = unlist(rng), FIDADE = tmp)
rm(rng, tmp)

Now just pass this dataframe on the newdata argument.

predict(model, newdata = nova_ampl, interval = "prediction", level = 0.95)
#         fit       lwr      upr
#11  94.76547  45.69869 143.8323
#12 136.15787  87.45688 184.8589
#21 101.99931  52.22080 151.7778
#22 182.18353 128.06123 236.3058
#31 136.62942  89.30538 183.9535
#32 186.90710 135.84374 237.9705
#41 144.33280  93.69015 194.9755
#42 188.75920 138.68448 238.8339

Data in dput format.

dados <-
structure(list(glucose = c(89L, 78L, 118L, 126L, 97L, 
158L, 88L, 145L, 126L, 187L, 130L, 187L, 128L, 166L, 
143L, 150L, 136L, 134L, 173L, 195L, 145L), 
insulin = c(94L, 88L, 230L, 235L, 140L, 245L, 
54L, 130L, 22L, 392L, 79L, 200L, 110L, 175L, 146L, 
342L, 110L, 60L, 265L, 145L, 165L), 
FIDADE = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L)), 
class = "data.frame", row.names = c(NA, -21L))
    
05.01.2019 / 14:32