Weighted linear regression using the inverse of the variance as a weighting factor

1

I have the following data set that establishes a relationship between two variables "X" and "Y":

df <- data.frame(X=c(25,25,25,25,25,25,50,50,50,50,50,50,
75,75,75,75,75,75,100,100,100,100,100,100,
125,125,125,125,125,125,150,150,150,150,150,150),    
Y=c(2457524,2391693,2450828,2391252,2444638,2360293,
4693194,4844527,4835596,4878092,4809226,4722253,
7142763,7182769,7135550,7173920,7216871,7076359,
9496553,9537788,9405825,9439201,9609870,9707734,
12031958,12027037,11935594,11930086,12154132,
12096462,14298064,14396607,13964716,14221039,
14283992,14042220))

Consider the following problem:

"Adjust a weighted linear model using the" lm "function and, as a weighting factor, the inverse of the" Y "variance for each" X "level. That is, the linear model should be weighted by the inverse of the variance of each level of "X". In this case, how can we specify the weighted functional relation? Is there any specific function to be entered as an argument in " weights "?

Technical detail: It is only worth to adjust by the "lm" function. It is not worth adjusting by any other method (gls, glm, etc.).

    
asked by anonymous 19.11.2016 / 01:02

1 answer

2

Simply create the desired weight vector to solve this problem. In your case, I called this vector of pesos :

variancias_condicionais <- aggregate(df$Y, list(df$X), var)$x
quantidade_X <- as.numeric(table(df$X))
pesos <- rep(1/variancias_condicionais, quantidade_X)

ajuste <- lm(Y ~ X, data=df, weights=pesos)
summary(ajuste)

Call:
lm(formula = Y ~ X, data = df, weights = pesos)

Weighted Residuals:
     Min       1Q   Median       3Q      Max 
-2.17331 -0.71861 -0.08895  0.84733  2.42540 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    28185      22538   1.251     0.22    
X              95300        330 288.777   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.097 on 34 degrees of freedom
Multiple R-squared:  0.9996,    Adjusted R-squared:  0.9996 
F-statistic: 8.339e+04 on 1 and 34 DF,  p-value: < 2.2e-16
    
19.11.2016 / 01:55