2 Lectura de datos HSB

  • En formato stata desde sitio web

“mlm” es el nombre que le daremos a la base de datos “High School and Beyond”

Variables relevantes para ejercicios:

  • Nivel 1:
    • minority, an indicator for student ethnicity (1 = minority, 0 = other)
    • female, an indicator for student gender (1 = female, 0 = male)
    • ses, a standardized scale constructed from variables measuring parental education, occupation, and income
    • mathach, a measure of mathematics achievement
  • Nivel 2
    • size (school enrollment)
    • sector (1 = Catholic, 0 = public)
    • pracad (proportion of students in the academic track)
    • disclim (a scale measuring disciplinary climate)
    • himnty (1 = more than 40% minority enrollment, 0 = less than 40%)
    • mnses (mean of the SES values for the students in this school who are included in the level-1 file)
  • Cluster variable: schoolid

3 Exploración y descripción

## [1] 7185   26
##  [1] "minority" "female"   "ses"      "mathach"  "size"     "sector"  
##  [7] "pracad"   "disclim"  "himinty"  "schoolid" "mean"     "sd"      
## [13] "sdalt"    "junk"     "sdalt2"   "num"      "se"       "sealt"   
## [19] "sealt2"   "t2"       "t2alt"    "pickone"  "mmses"    "mnses"   
## [25] "xb"       "resid"

3.1 Seleccionar variables de interes

## [1] 7185   11
##   minority female    ses mathach size sector pracad disclim himinty
## 1        0      1 -1.528   5.876  842      0   0.35   1.597       0
## 2        0      1 -0.588  19.708  842      0   0.35   1.597       0
## 3        0      0 -0.528  20.349  842      0   0.35   1.597       0
## 4        0      0 -0.668   8.781  842      0   0.35   1.597       0
## 5        0      0 -0.158  17.898  842      0   0.35   1.597       0
## 6        0      0  0.022   4.583  842      0   0.35   1.597       0
##       mnses schoolid
## 1 -0.434383     1224
## 2 -0.434383     1224
## 3 -0.434383     1224
## 4 -0.434383     1224
## 5 -0.434383     1224
## 6 -0.434383     1224
##     minority          female            ses               mathach      
##  Min.   :0.0000   Min.   :0.0000   Min.   :-3.758000   Min.   :-2.832  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:-0.538000   1st Qu.: 7.275  
##  Median :0.0000   Median :1.0000   Median : 0.002000   Median :13.131  
##  Mean   :0.2747   Mean   :0.5282   Mean   : 0.000143   Mean   :12.748  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 0.602000   3rd Qu.:18.317  
##  Max.   :1.0000   Max.   :1.0000   Max.   : 2.692000   Max.   :24.993  
##       size          sector           pracad          disclim       
##  Min.   : 100   Min.   :0.0000   Min.   :0.0000   Min.   :-2.4160  
##  1st Qu.: 565   1st Qu.:0.0000   1st Qu.:0.3200   1st Qu.:-0.8170  
##  Median :1016   Median :0.0000   Median :0.5300   Median :-0.2310  
##  Mean   :1057   Mean   :0.4931   Mean   :0.5345   Mean   :-0.1319  
##  3rd Qu.:1436   3rd Qu.:1.0000   3rd Qu.:0.7000   3rd Qu.: 0.4600  
##  Max.   :2713   Max.   :1.0000   Max.   :1.0000   Max.   : 2.7560  
##     himinty         mnses               schoolid   
##  Min.   :0.00   Min.   :-1.1939460   Min.   :1224  
##  1st Qu.:0.00   1st Qu.:-0.3230000   1st Qu.:3020  
##  Median :0.00   Median : 0.0320000   Median :5192  
##  Mean   :0.28   Mean   : 0.0001434   Mean   :5278  
##  3rd Qu.:1.00   3rd Qu.: 0.3269123   3rd Qu.:7342  
##  Max.   :1.00   Max.   : 0.8249825   Max.   :9586

3.2 Tabla descriptiva con stargazer:

stargazer posee tres opciones básicas de output: text, html o latex (defecto). Si se quiere ver el contenido directamente en formato txt en la consola de R con fines exploratorios, usar text. Si se quiere reportar luego via knitr a html cambiar a html, y si se quiere exportar a pdf cambiar a Latex. Recomendación general: dejar inicialmente como text hasta el reporte final de resultados, facilita la visualización en la consola y no requiere tener que compilar para ver el resultado (en el caso de trabajar con Rmarkdown)

## 
## Descriptivos generales
## ===================================================================
## Statistic   N     Mean    St. Dev.   Min   Pctl(25) Pctl(75)  Max  
## -------------------------------------------------------------------
## minority  7,185   0.275     0.446     0       0        1       1   
## female    7,185   0.528     0.499     0       0        1       1   
## ses       7,185  0.0001     0.779   -3.758  -0.538   0.602   2.692 
## mathach   7,185  12.748     6.878   -2.832  7.275    18.317  24.993
## size      7,185 1,056.862  604.172   100     565     1,436   2,713 
## sector    7,185   0.493     0.500     0       0        1       1   
## pracad    7,185   0.534     0.251   0.000   0.320    0.700   1.000 
## disclim   7,185  -0.132     0.944   -2.416  -0.817   0.460   2.756 
## himinty   7,185   0.280     0.449     0       0        1       1   
## mnses     7,185  0.0001     0.414   -1.194  -0.323   0.327   0.825 
## schoolid  7,185 5,277.898 2,499.578 1,224   3,020    7,342   9,586 
## -------------------------------------------------------------------
  • y con html…
Descriptivos generales
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
minority 7,185 0.275 0.446 0 0 1 1
female 7,185 0.528 0.499 0 0 1 1
ses 7,185 0.0001 0.779 -3.758 -0.538 0.602 2.692
mathach 7,185 12.748 6.878 -2.832 7.275 18.317 24.993
size 7,185 1,056.862 604.172 100 565 1,436 2,713
sector 7,185 0.493 0.500 0 0 1 1
pracad 7,185 0.534 0.251 0.000 0.320 0.700 1.000
disclim 7,185 -0.132 0.944 -2.416 -0.817 0.460 2.756
himinty 7,185 0.280 0.449 0 0 1 1
mnses 7,185 0.0001 0.414 -1.194 -0.323 0.327 0.825
schoolid 7,185 5,277.898 2,499.578 1,224 3,020 7,342 9,586

3.2.1 Datos perdidos: crear una nueva base sin missing values (Listwise Deletion)

3.3 Exploración visual de datos

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -2.832   7.275  13.131  12.748  18.317  24.993

3.3.1 Matriz de correlaciones de un subset de variables

##         mathach   ses sector  size
## mathach    1.00  0.36   0.20 -0.05
## ses        0.36  1.00   0.19 -0.07
## sector     0.20  0.19   1.00 -0.42
## size      -0.05 -0.07  -0.42  1.00

3.4 Estimación de regresiones

## 
## Call:
## lm(formula = mathach ~ ses + female + sector + minority, data = mlm)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.2286  -4.5076   0.2104   4.7472  17.8078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.24158    0.13386  98.924   <2e-16 ***
## ses          2.36392    0.09946  23.768   <2e-16 ***
## female      -1.42166    0.14608  -9.732   <2e-16 ***
## sector       2.25492    0.14906  15.127   <2e-16 ***
## minority    -3.11239    0.17029 -18.277   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.166 on 7180 degrees of freedom
## Multiple R-squared:  0.1969, Adjusted R-squared:  0.1965 
## F-statistic: 440.1 on 4 and 7180 DF,  p-value: < 2.2e-16
## 
## Regresión datos individuales
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                               mathach          
## -----------------------------------------------
## ses                          2.364***          
##                               (0.099)          
##                                                
## female                       -1.422***         
##                               (0.146)          
##                                                
## sector                       2.255***          
##                               (0.149)          
##                                                
## minority                     -3.112***         
##                               (0.170)          
##                                                
## Constant                     13.242***         
##                               (0.134)          
##                                                
## -----------------------------------------------
## Observations                   7,185           
## R2                             0.197           
## Adjusted R2                    0.196           
## Residual Std. Error      6.166 (df = 7180)     
## F Statistic          440.111*** (df = 4; 7180) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

3.5 Diagnóstico de residuos (library(car))

##         StudRes          Hat        CookD
## 869   2.6047540 0.0013831364 0.0018779316
## 1629 -2.2407411 0.0020743123 0.0020861529
## 3523 -3.2842968 0.0007451964 0.0016066368
## 5321 -1.4359364 0.0035373357 0.0014636979
## 6033 -0.1116449 0.0042083659 0.0000105369
## 7136 -3.2099858 0.0007110209 0.0014644182

4 Datos agregados

4.1 Comparando regresiones

## 
## Comparación de modelos
## =====================================================================
##                                    Dependent variable:               
##                     -------------------------------------------------
##                                          mathach                     
##                            Individual                Agregado        
##                                (1)                      (2)          
## ---------------------------------------------------------------------
## ses                         2.364***                 4.204***        
##                              (0.099)                  (0.418)        
##                                                                      
## female                      -1.422***                -1.997***       
##                              (0.146)                  (0.532)        
##                                                                      
## sector                      2.255***                 1.635***        
##                              (0.149)                  (0.302)        
##                                                                      
## minority                    -3.112***                -2.343***       
##                              (0.170)                  (0.534)        
##                                                                      
## Constant                    13.242***                13.613***       
##                              (0.134)                  (0.347)        
##                                                                      
## ---------------------------------------------------------------------
## Observations                  7,185                     160          
## R2                            0.197                    0.711         
## Adjusted R2                   0.196                    0.703         
## Residual Std. Error     6.166 (df = 7180)        1.699 (df = 155)    
## F Statistic         440.111*** (df = 4; 7180) 95.124*** (df = 4; 155)
## =====================================================================
## Note:                                     *p<0.1; **p<0.05; ***p<0.01

¿Qué sucede con la regresión con datos agregados en comparación con la con datos individuales?