Poorly calculated coefficients in Linear Regression in R due to NA's


This is my dataframe:

structure(list(Year = c(1979L, 1979L, 1979L, 1980L, 1980L, 1980L, 
1981L, 1981L, 1981L, 1982L, 1982L, 1982L, 1983L, 1983L, 1983L, 
1984L, 1984L, 1984L, 1985L, 1985L, 1985L, 1986L, 1986L, 1986L, 
1987L, 1987L, 1987L, 1988L, 1988L, 1988L), Month = c(10L, 11L, 
12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 
10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 
11L, 12L), Y.1 = c(8.00983263923528, 2.41267858341867, -0.701122343112104, 
-3.93438481559836, 1.61989462202274, -0.0837521649979607, -1.18856075379809, 
-5.79109166398385, -6.02656788564288, 3.57285443621284, 5.28086890954826, 
4.61968948421691, 1.6450358083769, 2.09679639676383, 3.13330926488653, 
7.03433470051535, 8.82984898471047, 6.35665464823924, -2.06916023327692, 
-6.80818412035661, -2.55840141236052, 5.93892137387166, 3.73139295521127, 
-2.43756307587375, -7.88332536927916, -11.1612368255376, -14.9073451470428, 
-3.39210451580797, -9.45264055248482, -6.71777033430725), X.1 = c(0.308656857874223, 
1.04586629806642, 0.861945545932596, 0.375970358978561, -0.347308458564966, 
-0.29159098146565, 0.658969566870815, 0.777325096646653, 0.819638059706351, 
0.14348380776068, 0.320980128297688, 0.422457840273038, 0.0753279027397413, 
-0.00412826834750302, -0.0306969460488249, 0.202590024491522, 
0.144588970489035, 0.299274727728394, 0.924086583854944, 0.903017497665926, 
0.964001122879932, 1.26678884737668, 1.24568369535494, 1.17738738727233, 
0.855877205956479, 0.778924677659654, 0.601219806786069, 0.967781164852632, 
1.10343758488876, 1.02401236754546), Y.2 = c("NA", "NA", "NA", 
"5.33565675549722", "-0.477469962261498", "0.743881752912509", 
"0.946947439972276", "5.26357788348063", "6.20317011981397", 
"-3.44416166730468", "-4.98209173294852", "-4.17799392953961", 
"-1.60319913629998", "-2.07841411022162", "-3.07277915798255", 
"-6.81314462908097", "-8.99190729955144", "-6.41231440381122", 
"2.93695557772259", "7.71262044640592", "3.48797284502131", "-5.06072963216373", 
"-2.74288427337241", "3.50049327959275", "8.56226731314113", 
"12.0144762810381", "15.6527185635863", "4.17084966096979", "10.4311905060596", 
"7.6861205071862"), X.2 = c(0.288003451, 0.873662015, 0.874190316, 
0.36027826, -0.120926336, -0.276130722, 0.633675698, 0.849582846, 
0.778756432, 0.20203225, 0.221280623, 0.467109312, 0.07783831, 
-0.008749708, -0.023401276, 0.196393036, 0.18439037, 0.294919158, 
0.908446718, 0.922729322, 0.962361556, 0.74, 0.74, 0.77, 2.36, 
2.79, 1.76, 1.26, 1.48, 1.21)), class = "data.frame", row.names = c(NA, 

When I run the following equation, because there are some NA's, the R makes the adjustment to delete the first 3 lines of Y.1 and the first 3 lines of X.1. It should delete the last 3 lines of X.1 :

summary(volcker.ini %>% lm(Y.1~X.1,data = .))

How can I make this adjustment in the above code?

asked by anonymous 30.08.2018 / 21:33

1 answer


There must be something non-standard with your session R.
As can be read in help("lm") , in section Arguments (my emphasis):




a function which indicates what should happen when the data contain NAs.   The default is set by the na.action setting of options, and is na.fail if   that is unset. The 'factory-fresh' default is na.omit . Another possible   value is NULL, no action. Value na.exclude can be useful.

This means that the lm command will omit the NA values unless you modify the options()$na.action value. This value can be checked with

#[1] "na.omit"

If you do something else, just run the following command.

options(na.action = "na.omit")

In my system that's the value, I never change it. And when I ran your code, everything worked out.


summary(volcker.ini %>% lm(Y.1 ~ X.1,data = .))
#lm(formula = Y.1 ~ X.1, data = .)
#     Min       1Q   Median       3Q      Max 
#-14.1342  -4.0814   0.0258   4.5236  10.2769 
#            Estimate Std. Error t value Pr(>|t|)  
#(Intercept)    2.447      1.675   1.461   0.1552  
#X.1           -5.356      2.259  -2.371   0.0249 *
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 5.613 on 28 degrees of freedom
#Multiple R-squared:  0.1672,   Adjusted R-squared:  0.1375 
#F-statistic: 5.621 on 1 and 28 DF,  p-value: 0.02486

In the above quote it says that na.exclude can be useful. See its help("na.exclude") page, and if you find it to be really useful, the modified code will be

summary(volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude))

And now, why not divide this statement into two, one to assign the value of lm and another to summary ?

modelo <- volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude)

Later you may want coef(modelo) or other values as waste.

Finally, so that it does not happen again, see if you have a file named .RData (this is not an extension, it is the full name of the file) and if you have it remove it.

30.08.2018 / 23:10