“mlm” es el nombre que le daremos a la base de datos “High School and Beyond”
Variables relevantes para ejercicios:
## [1] 7185 26
## [1] "minority" "female" "ses" "mathach" "size" "sector"
## [7] "pracad" "disclim" "himinty" "schoolid" "mean" "sd"
## [13] "sdalt" "junk" "sdalt2" "num" "se" "sealt"
## [19] "sealt2" "t2" "t2alt" "pickone" "mmses" "mnses"
## [25] "xb" "resid"
mlm=mlm %>% select(minority,female,ses,mathach,size,sector,pracad,disclim,himinty,mnses,schoolid) %>% as.data.frame()
dim(mlm)
## [1] 7185 11
## minority female ses mathach size sector pracad disclim himinty
## 1 0 1 -1.528 5.876 842 0 0.35 1.597 0
## 2 0 1 -0.588 19.708 842 0 0.35 1.597 0
## 3 0 0 -0.528 20.349 842 0 0.35 1.597 0
## 4 0 0 -0.668 8.781 842 0 0.35 1.597 0
## 5 0 0 -0.158 17.898 842 0 0.35 1.597 0
## 6 0 0 0.022 4.583 842 0 0.35 1.597 0
## mnses schoolid
## 1 -0.434383 1224
## 2 -0.434383 1224
## 3 -0.434383 1224
## 4 -0.434383 1224
## 5 -0.434383 1224
## 6 -0.434383 1224
## minority female ses mathach
## Min. :0.0000 Min. :0.0000 Min. :-3.758000 Min. :-2.832
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:-0.538000 1st Qu.: 7.275
## Median :0.0000 Median :1.0000 Median : 0.002000 Median :13.131
## Mean :0.2747 Mean :0.5282 Mean : 0.000143 Mean :12.748
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 0.602000 3rd Qu.:18.317
## Max. :1.0000 Max. :1.0000 Max. : 2.692000 Max. :24.993
## size sector pracad disclim
## Min. : 100 Min. :0.0000 Min. :0.0000 Min. :-2.4160
## 1st Qu.: 565 1st Qu.:0.0000 1st Qu.:0.3200 1st Qu.:-0.8170
## Median :1016 Median :0.0000 Median :0.5300 Median :-0.2310
## Mean :1057 Mean :0.4931 Mean :0.5345 Mean :-0.1319
## 3rd Qu.:1436 3rd Qu.:1.0000 3rd Qu.:0.7000 3rd Qu.: 0.4600
## Max. :2713 Max. :1.0000 Max. :1.0000 Max. : 2.7560
## himinty mnses schoolid
## Min. :0.00 Min. :-1.1939460 Min. :1224
## 1st Qu.:0.00 1st Qu.:-0.3230000 1st Qu.:3020
## Median :0.00 Median : 0.0320000 Median :5192
## Mean :0.28 Mean : 0.0001434 Mean :5278
## 3rd Qu.:1.00 3rd Qu.: 0.3269123 3rd Qu.:7342
## Max. :1.00 Max. : 0.8249825 Max. :9586
stargazer posee tres opciones básicas de output: text, html o latex (defecto). Si se quiere ver el contenido directamente en formato txt en la consola de R con fines exploratorios, usar text. Si se quiere reportar luego via knitr a html cambiar a html, y si se quiere exportar a pdf cambiar a Latex. Recomendación general: dejar inicialmente como text hasta el reporte final de resultados, facilita la visualización en la consola y no requiere tener que compilar para ver el resultado (en el caso de trabajar con Rmarkdown)
##
## Descriptivos generales
## ===================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## -------------------------------------------------------------------
## minority 7,185 0.275 0.446 0 0 1 1
## female 7,185 0.528 0.499 0 0 1 1
## ses 7,185 0.0001 0.779 -3.758 -0.538 0.602 2.692
## mathach 7,185 12.748 6.878 -2.832 7.275 18.317 24.993
## size 7,185 1,056.862 604.172 100 565 1,436 2,713
## sector 7,185 0.493 0.500 0 0 1 1
## pracad 7,185 0.534 0.251 0.000 0.320 0.700 1.000
## disclim 7,185 -0.132 0.944 -2.416 -0.817 0.460 2.756
## himinty 7,185 0.280 0.449 0 0 1 1
## mnses 7,185 0.0001 0.414 -1.194 -0.323 0.327 0.825
## schoolid 7,185 5,277.898 2,499.578 1,224 3,020 7,342 9,586
## -------------------------------------------------------------------
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
minority | 7,185 | 0.275 | 0.446 | 0 | 0 | 1 | 1 |
female | 7,185 | 0.528 | 0.499 | 0 | 0 | 1 | 1 |
ses | 7,185 | 0.0001 | 0.779 | -3.758 | -0.538 | 0.602 | 2.692 |
mathach | 7,185 | 12.748 | 6.878 | -2.832 | 7.275 | 18.317 | 24.993 |
size | 7,185 | 1,056.862 | 604.172 | 100 | 565 | 1,436 | 2,713 |
sector | 7,185 | 0.493 | 0.500 | 0 | 0 | 1 | 1 |
pracad | 7,185 | 0.534 | 0.251 | 0.000 | 0.320 | 0.700 | 1.000 |
disclim | 7,185 | -0.132 | 0.944 | -2.416 | -0.817 | 0.460 | 2.756 |
himinty | 7,185 | 0.280 | 0.449 | 0 | 0 | 1 | 1 |
mnses | 7,185 | 0.0001 | 0.414 | -1.194 | -0.323 | 0.327 | 0.825 |
schoolid | 7,185 | 5,277.898 | 2,499.578 | 1,224 | 3,020 | 7,342 | 9,586 |
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.832 7.275 13.131 12.748 18.317 24.993
## mathach ses sector size
## mathach 1.00 0.36 0.20 -0.05
## ses 0.36 1.00 0.19 -0.07
## sector 0.20 0.19 1.00 -0.42
## size -0.05 -0.07 -0.42 1.00
reg1<- lm(mathach~1, data=mlm)
reg2<- lm(mathach~ses, data=mlm)
reg3<- lm(mathach~ses+female, data=mlm)
reg4<- lm(mathach~ses+female+sector, data=mlm)
reg5<- lm(mathach~ses+female+sector+minority, data=mlm)
summary(reg5)
##
## Call:
## lm(formula = mathach ~ ses + female + sector + minority, data = mlm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.2286 -4.5076 0.2104 4.7472 17.8078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.24158 0.13386 98.924 <2e-16 ***
## ses 2.36392 0.09946 23.768 <2e-16 ***
## female -1.42166 0.14608 -9.732 <2e-16 ***
## sector 2.25492 0.14906 15.127 <2e-16 ***
## minority -3.11239 0.17029 -18.277 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.166 on 7180 degrees of freedom
## Multiple R-squared: 0.1969, Adjusted R-squared: 0.1965
## F-statistic: 440.1 on 4 and 7180 DF, p-value: < 2.2e-16
##
## Regresión datos individuales
## ===============================================
## Dependent variable:
## ---------------------------
## mathach
## -----------------------------------------------
## ses 2.364***
## (0.099)
##
## female -1.422***
## (0.146)
##
## sector 2.255***
## (0.149)
##
## minority -3.112***
## (0.170)
##
## Constant 13.242***
## (0.134)
##
## -----------------------------------------------
## Observations 7,185
## R2 0.197
## Adjusted R2 0.196
## Residual Std. Error 6.166 (df = 7180)
## F Statistic 440.111*** (df = 4; 7180)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## StudRes Hat CookD
## 869 2.6047540 0.0013831364 0.0018779316
## 1629 -2.2407411 0.0020743123 0.0020861529
## 3523 -3.2842968 0.0007451964 0.0016066368
## 5321 -1.4359364 0.0035373357 0.0014636979
## 6033 -0.1116449 0.0042083659 0.0000105369
## 7136 -3.2099858 0.0007110209 0.0014644182
Generando un factor: de número a factor Opción 1
table(mlm$sector) # Ver niveles
str(mlm$sector) #Inpeccionar tipo de variable (númerica o factor)
mlm$sectorf=as.factor(mlm$sector) # Cambiar propiedad de número a factor en una nueva variable
str(mlm$sectorf) #Ver propiedad de la nueva variable
levels(mlm$sectorf) #Ver etiquetas de la nueva variable
# O etiquetando:
mlm$sectorf2 = factor(mlm$sector, levels = c(0,1), labels = c("Público", "Privado"))
levels(mlm$sectorf2)
table(mlm$sectorf2, mlm$sector) #Chequeo de codificación
Promedio de grupo (Ej. Promedio de SES por Escuela o Group MEAN)
# Escuela 1224, más en detalle
id_1224 <- subset(mlm, schoolid==1224)
id_1224
dim(id_1224)
summary(id_1224$ses)
Crea un % (Ej. de Mujeres por escuela)
agg_mlm=mlm %>% group_by(schoolid) %>% summarise_all(funs(mean)) %>% as.data.frame()
stargazer(agg_mlm, type = "text")
# Descriptivos
dim(agg_mlm)
summary(agg_mlm)
agg_mlm=agg_mlm %>% select (-c(sectorf, sectorf2)) # sacar factores por NA
reg5_agg<- lm(mathach~ses+female+sector+minority, data=agg_mlm)
stargazer(reg5,reg5_agg, title = "Comparación de modelos",column.labels=c("Individual","Agregado"), type ='text')
##
## Comparación de modelos
## =====================================================================
## Dependent variable:
## -------------------------------------------------
## mathach
## Individual Agregado
## (1) (2)
## ---------------------------------------------------------------------
## ses 2.364*** 4.204***
## (0.099) (0.418)
##
## female -1.422*** -1.997***
## (0.146) (0.532)
##
## sector 2.255*** 1.635***
## (0.149) (0.302)
##
## minority -3.112*** -2.343***
## (0.170) (0.534)
##
## Constant 13.242*** 13.613***
## (0.134) (0.347)
##
## ---------------------------------------------------------------------
## Observations 7,185 160
## R2 0.197 0.711
## Adjusted R2 0.196 0.703
## Residual Std. Error 6.166 (df = 7180) 1.699 (df = 155)
## F Statistic 440.111*** (df = 4; 7180) 95.124*** (df = 4; 155)
## =====================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
¿Qué sucede con la regresión con datos agregados en comparación con la con datos individuales?