Sum of the regression square of the R models

5

The presented models are different in only one additional coefficient (f), which multiplies the independent variable (x), allowing the calculation of the increase of the sum of the square of the regression, including the coefficient f with a value other than zero. In this way it is possible to test the mean square associated with inclusion, with a degree of freedom.

F = (SQmodeloBC.4-SQmodeloLL.3) / QMR

The greater the value of this mean square, in relation to the average square of the other model, the more significant is the fit of the model, and the hypothesis f = 0 is rejected

I have the idea but I can not do it in an "easy and simple" way in R. Can anyone help me?

#pacote curva dose resposta
library("drc")
#dose resposta "hormesis" modelo BC.4 = f(x) = 0 + \frac{d-0+fx}{1+\exp(b(\log(x)-\log(e)))}
lett.BC4 <- drm(weight ~ conc, data = lettuce, fct = BC.4())
#dose resposta "comum" modelo LL.3 = f(x) = 0 + \frac{d-0}{1+\exp(b(\log(x)-\log(e)))}
lett.LL3 <- drm(weight ~ conc, data = lettuce, fct = LL.3())  

plot(lett.BC4, col = 2, lty = 2)
plot(lett.LL3, add=TRUE) 
    
asked by anonymous 26.02.2018 / 14:33

1 answer

1

Once, here in Stack Overflow, I commented on variable selection look at the link . The problem of selecting variables is similar to the problem of model selection: we are trying to choose the simplest model that explains our data (in statistics, we always want the simplest possible model to describe our data).

But to make a test like this that you want, with a sum of squares, it is necessary for the tested models to be nested. Trouble is, your templates are not nested. It makes no sense to make a hypothesis test of the type

  • H_0: templates lett.LL3 and lett.BC4 are equal

  • H_1: templates lett.LL3 and lett.BC4 are not equal

Because they are not more complex and simpler versions of the same model. The nonlinear functions defined by the fct = BC.4() and LL.3() arguments are different. Therefore, from the theoretical point of view in the theory of Nonlinear Models (see Bates and Watts, Nonlinear Regression Analysis (1988), pp. 103-104), the test that you are trying to apply makes no sense. It can be done numerically because it is possible to calculate the sums of squares for each of the models, but such a test has no theoretical support.

What can be done is to compare two nested models. For example,

lett.BC5 <- drm(weight ~ conc, data = lettuce, fct = BC.5())
lett.BC4 <- drm(weight ~ conc, data = lettuce, fct = BC.4())

The only difference between the non-linear functions specified by fct = BC.5() and fct = BC.4() is that BC.5() has one more parameter:

summary(lett.BC5)

Model fitted: Brain-Cousens (hormesis) (5 parms)

Parameter estimates:

              Estimate Std. Error t-value   p-value    
b:(Intercept) 1.502065   0.352231  4.2644  0.002097 ** 
c:(Intercept) 0.280173   0.248569  1.1271  0.288836    
d:(Intercept) 0.963030   0.078186 12.3171 6.164e-07 ***
e:(Intercept) 1.120457   0.612908  1.8281  0.100799    
f:(Intercept) 0.988182   0.776136  1.2732  0.234846    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 0.1149117 (9 degrees of freedom)

summary(lett.BC4)

Model fitted: Brain-Cousens (hormesis) with lower limit fixed at 0 (4 parms)

Parameter estimates:

              Estimate Std. Error t-value   p-value    
b:(Intercept) 1.282812   0.049346 25.9964 1.632e-10 ***
d:(Intercept) 0.967302   0.077123 12.5423 1.926e-07 ***
e:(Intercept) 0.847633   0.436093  1.9437   0.08059 .  
f:(Intercept) 1.620703   0.979711  1.6543   0.12908    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error:

 0.1117922 (10 degrees of freedom)

In this way, you can compare the lett.BC5 and lett.BC4 templates according to their sums of squares and the hypothesis test defined above:

anova(lett.BC5, lett.BC4)
1st model
 fct:      BC.4()
2nd model
 fct:      BC.5()

ANOVA table

          ModelDf     RSS Df F value p value
1st model      10 0.12498                   
2nd model       9 0.11884  1  0.4644  0.5127    

(see more information at? anova.drc ')

As the p-value was greater than 0.05, we can say that the models are not different from each other, opting in this way for the lett.BC4 , which is simpler.

See that I did not answer the main question. It may be in your interest to decide between comparing the LL and BC function families and deciding on the best family of functions to fit your data. Unfortunately, I do not know any statistical method like a hypothesis test to solve this problem. I give you the following two suggestions as to how to decide between LL and BC :

1) Choose the best possible model among the LL and BC families, using the methodology above. With the best models of each family chosen, analyze the residuals two two models found and, based on the residue analysis, see which model violates less hypotheses.

2) Make a conscious choice. Look in the literature of your area if the models with LL (log-logistic model) and BC (Brain-Cousens modified log-logistic) are the most used and why. Or, since you are doing a parametric adjustment of the data, say that you will use either of these two options because of their interpretability or because your data has behavior that resembles some of them. Or, test some other function, like Weibull, because maybe your results will be even better.

    
05.03.2018 / 17:45