Automatically identify points of influence in a regression

5

Whenever we do a linear regression, we need to check whether the assumed assumptions for the model are correct. One of the best ways to do this is through diagnostic charts. See the example below:

ajuste <- lm(Petal.Width ~ Petal.Length, data=iris)

library(ggfortify)
autoplot(ajuste)

There are four diagnostic graphs produced by the autoplot function. Some of the points in these graphs are identified as deviating from the assumptions made. For example, in the QQ plot above, points 115, 135 and 142 are identified as out of order for waste if they were distributed according to normal.

Is there any way to do this automatically in R? How can I get the output of autoplot (or the native function itself plot of R) and identify, for each graph plotted, which points violate the hypotheses of the model?     

asked by anonymous 27.10.2017 / 14:27

1 answer

3

In fact, the autoplot.lm function of the ggfortify package does not enforce any rule to mark these points.

As can be seen here , it just takes the number passed to the argument label.n (which by default is 3 ) and indicates on the graph those points that have the n largest absolute residues.

The autoplot function returns a class object ( S4 ) ggfortify . This object has a slot called plot which stores the 4 graphics that appear when the object is printed. In this slot the second element is the qqplot type chart.

Since every graph ggplot is a list with 9 elements, we can access the first one (date), which contains the data, and then make the necessary calculations.

The code below shows the 3 points with the largest absolute residuals:

ajuste <- lm(Petal.Width ~ Petal.Length, data=iris)
library(ggfortify)
objeto_ggplot <- autoplot(ajuste, label.n = 10)

objeto_ggplot@plots[[2]]$data %>% 
  top_n(3, abs(.wresid)) %>% 
  select(Petal.Width, Petal.Length, .index)

  Petal.Width Petal.Length .index
1         2.4          5.1    115
2         1.4          5.6    135
3         2.3          5.1    142
    
28.10.2017 / 20:30