Chapter 17 Replicating ADH

17.1 Aggregate Facts

TBD

17.2 Descriptive Stats

Install the ones you do not have yet.

library("readr")
library("tibble")
library("dplyr")
library("Hmisc")

## Loading required package: lattice

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

17.2.1 Load Data

Like always, we are going to load the data and save it as a tibble

df = read_csv("data/adh_data.csv") %>% as_tibble

## Rows: 1444 Columns: 208

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): city
## dbl (207): czone, statefip, yr, t2, timepwt48, reg_midatl, reg_encen, reg_wn...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

17.2.2 Compute Simple Grouped Mean

Find which years (yr) are reflected in the data.

unique(df$yr)

## [1] 1990 2000

Compute the average number of chinese imports per worker (l_tradeusch_pw & d_tradeusch_pw) for each “year”.

df_yr = group_by(df, yr)

df_yr %>% summarise(l_tradeusch_pw_avg = mean(l_tradeusch_pw),
          d_tradeusch_pw_avg           = mean(d_tradeusch_pw)
          )

## # A tibble: 2 × 3
##      yr l_tradeusch_pw_avg d_tradeusch_pw_avg
##   <dbl>              <dbl>              <dbl>
## 1  1990              0.364               1.18
## 2  2000              1.12                2.64

17.2.3 Computed Weighted Group Means and Standard Deviations

For the rest of the exercise, weight the mean by population count per region instead (l_popcount) and compare it with the numbers in the table.

Repeat step 2 with weights.

df_yr %>% summarise(l_tradeusch_pw = weighted.mean(l_tradeusch_pw),
                    d_tradeusch_pw = weighted.mean(d_tradeusch_pw))

## # A tibble: 2 × 3
##      yr l_tradeusch_pw d_tradeusch_pw
##   <dbl>          <dbl>          <dbl>
## 1  1990          0.364           1.18
## 2  2000          1.12            2.64

Now also compute the weighted standard deviations for both variables. Hint: Use the Hmisc package and find the relevant function.

df_yr %>% summarise(l_tradeusch_pw_avg = weighted.mean(l_tradeusch_pw),
                    d_tradeusch_pw_avg = weighted.mean(d_tradeusch_pw),
                    l_tradeusch_pw_sd  = sqrt(wtd.var(l_tradeusch_pw, l_popcount)),
                    d_tradeusch_pw_sd  = sqrt(wtd.var(d_tradeusch_pw, l_popcount))
                    )

## # A tibble: 2 × 5
##      yr l_tradeusch_pw_avg d_tradeusch_pw_avg l_tradeusch_pw_sd d_tradeusch_pw_…
##   <dbl>              <dbl>              <dbl>             <dbl>            <dbl>
## 1  1990              0.364               1.18             0.325            0.992
## 2  2000              1.12                2.64             0.897            2.01

Now compute the mean and standard deviation of the average household wage and salary (l_avg_hhincwage_pc_pw, d_avg_hhincwage_pc_pw)

df_yr %>% summarise(l_avg_hhincwage_pc_pw_avg = weighted.mean(l_avg_hhincwage_pc_pw),
                    d_avg_hhincwage_pc_pw_avg = weighted.mean(d_avg_hhincwage_pc_pw),
                    l_avg_hhincwage_pc_pw_sd  = sqrt(wtd.var(l_avg_hhincwage_pc_pw, l_popcount)),
                    d_avg_hhincwage_pc_pw_sd  = sqrt(wtd.var(d_avg_hhincwage_pc_pw, l_popcount))
)

## # A tibble: 2 × 5
##      yr l_avg_hhincwage_p… d_avg_hhincwage_… l_avg_hhincwage_… d_avg_hhincwage_…
##   <dbl>              <dbl>             <dbl>             <dbl>             <dbl>
## 1  1990             18093.             3618.             4697.             1568.
## 2  2000             21711.             1077.             5445.             2621.

And once more for share not in labor force (l_sh_nilf, d_sh_nilf)

df_yr %>% summarise(l_sh_nilf_avg = weighted.mean(l_sh_nilf, , l_popcount),
                    d_sh_nilf_avg = weighted.mean(d_sh_nilf, , l_popcount),
                    l_sh_nilf_sd  = sqrt(wtd.var(l_sh_nilf, l_popcount)),
                    d_sh_nilf_sd  = sqrt(wtd.var(d_sh_nilf, l_popcount))
)

## # A tibble: 2 × 5
##      yr l_sh_nilf_avg d_sh_nilf_avg l_sh_nilf_sd d_sh_nilf_sd
##   <dbl>         <dbl>         <dbl>        <dbl>        <dbl>
## 1  1990          26.7        -0.344         4.33         2.55
## 2  2000          26.4        -1.51          4.39         2.56

17.3 Regression

Let’s first load the necessary packages to read data and do fancy regressions:

library("readr")
library("tibble")
library("dplyr")
library("sandwich")
library("lmtest")
library("lfe")

And let’s load the data like we always do:

df = read_csv("data/adh_data.csv")

## Rows: 1444 Columns: 208

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (1): city
## dbl (207): czone, statefip, yr, t2, timepwt48, reg_midatl, reg_encen, reg_wn...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

17.3.1 OLS regression

The core of the paper is looking at what happened to laborer’s when theres an increase in us imports from china. Let’s try and replicate part of Table 9 - namely the estimate from panel A column 2.

Their y variable is relchg_avg_hhincwage_pc_pw. The important x variable is decadal trade between the us and china d_tradeusch_pw.

Run that simple regression

lm_1 = lm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw, data = df)
summary(lm_1)

## 
## Call:
## lm(formula = relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -28.789  -8.411  -0.663   7.715  49.684 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     16.0720     0.3889   41.33   <2e-16 ***
## d_tradeusch_pw  -1.6466     0.1212  -13.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.89 on 1442 degrees of freedom
## Multiple R-squared:  0.1135, Adjusted R-squared:  0.1129 
## F-statistic: 184.6 on 1 and 1442 DF,  p-value: < 2.2e-16

Now add heteroskedasticity robust standard (HC1). Hint: Use the sandwich and lmtest packages

coeftest(lm_1, vcov = vcovHC(lm_1, type="HC1"))

## 
## t test of coefficients:
## 
##                Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)    16.07198    0.57211 28.0923 < 2.2e-16 ***
## d_tradeusch_pw -1.64663    0.28496 -5.7785 9.219e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we will start to add extra x variables.

Start by adding t2 - a dummy variable for whether observation is in the second decade. Fit again with HC1 robust standard errors.

lm_2 = lm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2, data = df)
coeftest(lm_2, vcov = vcovHC(lm_2, type="HC1"))

## 
## t test of coefficients:
## 
##                 Estimate Std. Error  t value  Pr(>|t|)    
## (Intercept)     21.59096    0.39889  54.1271 < 2.2e-16 ***
## d_tradeusch_pw  -0.88316    0.16804  -5.2555 1.698e-07 ***
## t2             -13.94769    0.57869 -24.1023 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

17.3.2 Clustering

Let us now use clustertered standard errors instead. ADH cluster by statefip. Hint: use the felm command from the lfe package

Run the basic regression with clustering

felm_1 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 | 0 | 0 | statefip, data = df)
summary(felm_1)

## 
## Call:
##    felm(formula = relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw +      t2 | 0 | 0 | statefip, data = df) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.707  -6.259  -0.799   5.024  43.128 
## 
## Coefficients:
##                Estimate Cluster s.e. t value Pr(>|t|)    
## (Intercept)     21.5910       0.9841  21.941  < 2e-16 ***
## d_tradeusch_pw  -0.8832       0.2544  -3.472 0.000532 ***
## t2             -13.9477       1.8146  -7.686 2.79e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.831 on 1441 degrees of freedom
## Multiple R-squared(full model): 0.3944   Adjusted R-squared: 0.3936 
## Multiple R-squared(proj model): 0.3944   Adjusted R-squared: 0.3936 
## F-statistic(full model, *iid*):469.3 on 2 and 1441 DF, p-value: < 2.2e-16 
## F-statistic(proj model):  68.5 on 2 and 47 DF, p-value: 1.179e-14

Add the following controls to your last regression:
- l_shind_manuf_cbp
- l_sh_popedu_c
- l_sh_popfborn
- l_sh_empl_f
- l_sh_routine33
- l_task_outsource

felm_2 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource
                | 0 | 0 | statefip, data = df)
summary(felm_2)

## 
## Call:
##    felm(formula = relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw +      t2 + l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +      l_sh_empl_f + l_sh_routine33 + l_task_outsource | 0 | 0 |      statefip, data = df) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.329  -5.869  -0.534   5.229  40.819 
## 
## Coefficients:
##                    Estimate Cluster s.e. t value Pr(>|t|)    
## (Intercept)        43.38967      8.70748   4.983 7.02e-07 ***
## d_tradeusch_pw     -0.37033      0.13283  -2.788  0.00537 ** 
## t2                -13.79402      1.76456  -7.817 1.04e-14 ***
## l_shind_manuf_cbp  -0.18211      0.03835  -4.749 2.25e-06 ***
## l_sh_popedu_c      -0.13418      0.05494  -2.442  0.01471 *  
## l_sh_popfborn      -0.13180      0.08678  -1.519  0.12903    
## l_sh_empl_f         0.31270      0.07987   3.915 9.46e-05 ***
## l_sh_routine33     -1.04448      0.23945  -4.362 1.38e-05 ***
## l_task_outsource    4.21857      1.72489   2.446  0.01458 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.218 on 1435 degrees of freedom
## Multiple R-squared(full model): 0.4699   Adjusted R-squared: 0.4669 
## Multiple R-squared(proj model): 0.4699   Adjusted R-squared: 0.4669 
## F-statistic(full model, *iid*):  159 on 8 and 1435 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 28.75 on 8 and 47 DF, p-value: 1.276e-15

Add region fixed effects to your regression.
- First find all variables in the dataset that start with reg_
- Add these to your last regression

names(select(df, starts_with("reg_")))

## [1] "reg_midatl" "reg_encen"  "reg_wncen"  "reg_satl"   "reg_escen" 
## [6] "reg_wscen"  "reg_mount"  "reg_pacif"

felm_3 = felm(relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource +
                reg_midatl + reg_encen + reg_wncen + reg_satl +
                reg_escen + reg_wscen + reg_mount + reg_pacif
              | 0 | 0 | statefip, data = df)
summary(felm_3)

## 
## Call:
##    felm(formula = relchg_avg_hhincwage_pc_pw ~ d_tradeusch_pw +      t2 + l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +      l_sh_empl_f + l_sh_routine33 + l_task_outsource + reg_midatl +      reg_encen + reg_wncen + reg_satl + reg_escen + reg_wscen +      reg_mount + reg_pacif | 0 | 0 | statefip, data = df) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.256  -5.408  -0.643   4.993  40.596 
## 
## Coefficients:
##                    Estimate Cluster s.e. t value Pr(>|t|)    
## (Intercept)        37.81934     10.64339   3.553 0.000393 ***
## d_tradeusch_pw     -0.41327      0.12965  -3.188 0.001465 ** 
## t2                -13.67579      1.73663  -7.875 6.70e-15 ***
## l_shind_manuf_cbp  -0.15404      0.03266  -4.716 2.64e-06 ***
## l_sh_popedu_c      -0.09418      0.05679  -1.658 0.097449 .  
## l_sh_popfborn      -0.09372      0.07883  -1.189 0.234674    
## l_sh_empl_f         0.18554      0.09601   1.933 0.053492 .  
## l_sh_routine33     -0.79294      0.26449  -2.998 0.002764 ** 
## l_task_outsource    4.26729      1.87809   2.272 0.023226 *  
## reg_midatl          2.06339      1.97681   1.044 0.296757    
## reg_encen           2.10566      2.13354   0.987 0.323844    
## reg_wncen           6.91844      2.14751   3.222 0.001303 ** 
## reg_satl            0.87864      1.89366   0.464 0.642723    
## reg_escen           4.30400      2.16206   1.991 0.046705 *  
## reg_wscen           5.05509      2.07366   2.438 0.014900 *  
## reg_mount           4.48765      2.04204   2.198 0.028136 *  
## reg_pacif          -0.17986      1.91287  -0.094 0.925100    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.997 on 1427 degrees of freedom
## Multiple R-squared(full model): 0.4977   Adjusted R-squared: 0.4921 
## Multiple R-squared(proj model): 0.4977   Adjusted R-squared: 0.4921 
## F-statistic(full model, *iid*):88.38 on 16 and 1427 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 21.68 on 16 and 47 DF, p-value: < 2.2e-16

17.3.3 Instrument Variables

Instrument d_tradeusch_pw with d_tradeotch_pw_lag in your last regression

felm_4 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource +
                reg_midatl + reg_encen + reg_wncen + reg_satl +
                reg_escen + reg_wscen + reg_mount + reg_pacif
              | 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip,
              data = df)
summary(felm_4)

## 
## Call:
##    felm(formula = relchg_avg_hhincwage_pc_pw ~ 1 + t2 + l_shind_manuf_cbp +      l_sh_popedu_c + l_sh_popfborn + l_sh_empl_f + l_sh_routine33 +      l_task_outsource + reg_midatl + reg_encen + reg_wncen + reg_satl +      reg_escen + reg_wscen + reg_mount + reg_pacif | 0 | (d_tradeusch_pw ~      d_tradeotch_pw_lag) | statefip, data = df) 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.066  -5.524  -0.555   4.996  42.042 
## 
## Coefficients:
##                        Estimate Cluster s.e. t value Pr(>|t|)    
## (Intercept)            35.87405     10.64255   3.371 0.000769 ***
## t2                    -12.28244      1.97304  -6.225 6.31e-10 ***
## l_shind_manuf_cbp      -0.08113      0.04139  -1.960 0.050177 .  
## l_sh_popedu_c          -0.08755      0.05778  -1.515 0.129955    
## l_sh_popfborn          -0.08900      0.08150  -1.092 0.275021    
## l_sh_empl_f             0.18856      0.09896   1.905 0.056935 .  
## l_sh_routine33         -0.75086      0.26165  -2.870 0.004168 ** 
## l_task_outsource        4.20832      1.86424   2.257 0.024135 *  
## reg_midatl              1.97306      1.93354   1.020 0.307693    
## reg_encen               1.62608      2.22012   0.732 0.464027    
## reg_wncen               6.46027      2.19861   2.938 0.003353 ** 
## reg_satl                0.41119      1.94593   0.211 0.832677    
## reg_escen               5.02788      2.28463   2.201 0.027914 *  
## reg_wscen               4.62044      2.12783   2.171 0.030063 *  
## reg_mount               3.89093      2.09533   1.857 0.063523 .  
## reg_pacif              -0.94648      2.00055  -0.473 0.636206    
## `d_tradeusch_pw(fit)`  -1.27575      0.43200  -2.953 0.003197 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.186 on 1427 degrees of freedom
## Multiple R-squared(full model): 0.4765   Adjusted R-squared: 0.4706 
## Multiple R-squared(proj model): 0.4765   Adjusted R-squared: 0.4706 
## F-statistic(full model, *iid*):86.45 on 16 and 1427 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 18.74 on 16 and 47 DF, p-value: 3.486e-15 
## F-statistic(endog. vars):8.721 on 1 and 47 DF, p-value: 0.004898

Weight your regression by timepwt48

The felm function is a bit picky on the order of the weights. Let us first try to define weights at the end after the data argument like so:

felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource +
                reg_midatl + reg_encen + reg_wncen + reg_satl +
                reg_escen + reg_wscen + reg_mount + reg_pacif
              | 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip, 
              data = df,
              weights = timepwt48)

## Error in eval(mf[[wpos]], pf): object 'timepwt48' not found

summary(felm_5)

## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'felm_5' not found

Felm didn’t find timepwt48 because it only assumes that columns are in df before you define data = df. We can solve this in two ways.

A good rule is to have data = df as the last argument.

felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource +
                reg_midatl + reg_encen + reg_wncen + reg_satl +
                reg_escen + reg_wscen + reg_mount + reg_pacif
              | 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip, 
              weights = timepwt48,
              data = df)

## Error in eval(mf[[wpos]], pf): object 'timepwt48' not found

summary(felm_5)

## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'felm_5' not found

Alternatively, you can define weights after data = df, but then you have to define the weights as df$timepwft48 like so:

felm_5 = felm(relchg_avg_hhincwage_pc_pw ~ 1 + t2 +
                l_shind_manuf_cbp + l_sh_popedu_c + l_sh_popfborn +
                l_sh_empl_f + l_sh_routine33 + l_task_outsource +
                reg_midatl + reg_encen + reg_wncen + reg_satl +
                reg_escen + reg_wscen + reg_mount + reg_pacif
              | 0 | (d_tradeusch_pw ~ d_tradeotch_pw_lag) | statefip, 
              data = df,
              weights = df$timepwt48)
summary(felm_5)

## 
## Call:
##    felm(formula = relchg_avg_hhincwage_pc_pw ~ 1 + t2 + l_shind_manuf_cbp +      l_sh_popedu_c + l_sh_popfborn + l_sh_empl_f + l_sh_routine33 +      l_task_outsource + reg_midatl + reg_encen + reg_wncen + reg_satl +      reg_escen + reg_wscen + reg_mount + reg_pacif | 0 | (d_tradeusch_pw ~      d_tradeotch_pw_lag) | statefip, data = df, weights = df$timepwt48) 
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2404 -0.1084 -0.0033  0.1114  2.8633 
## 
## Coefficients:
##                       Estimate Cluster s.e. t value Pr(>|t|)    
## (Intercept)           60.67937      9.14264   6.637 4.54e-11 ***
## t2                    -9.05462      2.65665  -3.408 0.000672 ***
## l_shind_manuf_cbp      0.06791      0.08505   0.798 0.424746    
## l_sh_popedu_c          0.10292      0.11146   0.923 0.355954    
## l_sh_popfborn          0.07652      0.08050   0.950 0.342020    
## l_sh_empl_f           -0.21015      0.16952  -1.240 0.215297    
## l_sh_routine33        -1.01014      0.22888  -4.413 1.09e-05 ***
## l_task_outsource       5.56661      1.48578   3.747 0.000186 ***
## reg_midatl            -0.56489      1.55722  -0.363 0.716840    
## reg_encen             -2.61723      1.97713  -1.324 0.185797    
## reg_wncen              1.93904      1.63154   1.188 0.234846    
## reg_satl              -2.73867      1.49871  -1.827 0.067855 .  
## reg_escen              0.60288      1.53128   0.394 0.693856    
## reg_wscen             -1.68621      1.73888  -0.970 0.332354    
## reg_mount             -2.36081      1.40184  -1.684 0.092385 .  
## reg_pacif             -6.27918      2.02160  -3.106 0.001933 ** 
## `d_tradeusch_pw(fit)` -2.14156      0.59462  -3.602 0.000327 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2791 on 1427 degrees of freedom
## Multiple R-squared(full model): 0.4278   Adjusted R-squared: 0.4214 
## Multiple R-squared(proj model): 0.4278   Adjusted R-squared: 0.4214 
## F-statistic(full model, *iid*):77.51 on 16 and 1427 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 44.53 on 16 and 47 DF, p-value: < 2.2e-16 
## F-statistic(endog. vars):12.97 on 1 and 47 DF, p-value: 0.0007602

And now we have the numbers reported in Column 2 of Panel A of Table 9 of the paper.