Marginal effect for GLM (logit) with categorical variables.

2

I have the following regression:

Call:
glm(formula = IN_FIN_REEMB_FIES ~ CO_CATEGORIA_ADMINISTRATIVA + 
CO_COR_RACA_ALUNO + IN_RESERVA_VAGAS + IN_RESERVA_ENSINO_PUBLICO + 
CO_TURNO_ALUNO + IN_RESERVA_RENDA_FAMILIAR + IN_FIN_NAOREEMB_PROUNI_PARCIAL + 
TP_PROCEDE_EDUC_PUBLICA + IN_SEXO_ALUNO + NU_IDADE_ALUNO, 
family = binomial, data = nor1)

Deviance Residuals: 
Min       1Q   Median       3Q      Max  
-1.3535  -0.7779  -0.6651  -0.4668   2.8123  

Coefficients:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                    -0.3802707  0.0112696 -33.743  < 2e-16 ***
CO_CATEGORIA_ADMINISTRATIVA5   -0.5030216  0.0063270 -79.504  < 2e-16 ***
CO_COR_RACA_ALUNO1             -0.0452001  0.0081445  -5.550 2.86e-08 ***
CO_COR_RACA_ALUNO2              0.2784830  0.0144464  19.277  < 2e-16 ***
CO_COR_RACA_ALUNO3              0.1527192  0.0067474  22.634  < 2e-16 ***
CO_COR_RACA_ALUNO4              0.2388296  0.0210065  11.369  < 2e-16 ***
CO_COR_RACA_ALUNO5              0.0357580  0.0516736   0.692  0.48894    
IN_RESERVA_VAGAS                0.7116092  0.0474497  14.997  < 2e-16 ***
IN_RESERVA_ENSINO_PUBLICO      -3.1344320  0.1919678 -16.328  < 2e-16 ***
CO_TURNO_ALUNO                 -0.0157669  0.0055981  -2.816  0.00486 ** 
IN_RESERVA_RENDA_FAMILIAR      -9.8526964 14.0635939  -0.701  0.48356    
IN_FIN_NAOREEMB_PROUNI_PARCIAL -0.3676032  0.0206161 -17.831  < 2e-16 ***
TP_PROCEDE_EDUC_PUBLICA1        0.1709269  0.0054290  31.484  < 2e-16 ***
IN_SEXO_ALUNO                   0.1497493  0.0055368  27.046  < 2e-16 ***
NU_IDADE_ALUNO                 -0.0328465  0.0003694 -88.911  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 876350  on 808601  degrees of freedom
Residual deviance: 857364  on 808587  degrees of freedom
AIC: 857394

Number of Fisher Scoring iterations: 9

I need to create type scenarios: CO_COR_RACA_ALUNO = 1, TP_PROCEDE_EDUC_PUBLICA1 = 1, NU_ALL_ITY

asked by anonymous 14.01.2016 / 19:45

1 answer

0

Rafael, an alternative would be to do the following.

  • calculate the predicted probability for each of the individuals in your sample using your model and the predict function.
  • filter in your sample only the cases of your scenario of interest
  • calculate the average of the predicted probability in the filtered database

Since you did not make a database available, I'll use one that appears in this link .

# ajuste do modelo
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")

Here three indices are used to predict whether a student will be accepted in a degree (variable admit):

  • gre: Graduate Record Exam scores
  • gpa: grade point average
  • rank: 1 to 4, is the prestige of the institution from which the student comes

Now if, for example, you want to know the probability of a student coming from an institution with rank = 1 and gre > 500 not mattering gpa , you can do so in R:

# prever a probabilidade de resposta
mydata$prediction <- predict(mylogit, type = "response")

# filtrar o banco de dados apenas com os casos do seu cenário
library(dplyr)
aux <- mydata %>% filter(rank == 1, gre > 500)
mean(aux$prediction)
[1] 0.577473

If you wanted to predict who has rank = 1 and gre < 500 :

aux <- mydata %>% filter(rank == 1, gre < 500)
mean(aux$prediction)
[1] 0.3751226
    
15.01.2016 / 11:45