Chapter 14 Replicating ADH
14.1 Motivation
TBD
14.2 US Aggregate Data
We want to look for some aggregate facts about the US economy and its trade patterns to get a sense of how some macroeconomic indicators had evolved around China’s WTO accession. In particular we will show that
- Expansion of Chinese Trade. Essentially all of US trade growth since the 1990s is from the expansion of Chinese imports.
- Fall in Real Interest Rates Around the time the Chinese trade expanded.
- Expansion of the Trade Deficit during this time period.
14.2.1 Load Necessary Packages
Install the ones you do not have yet.
library("fredr")
library("purrr")
library("dplyr")
library("readr")
library("tidyr")
library("ggplot2")
library("magrittr")
library("lubridate")
library("PerformanceAnalytics")
14.2.2 Get Data from Federal Reserve
We need the following variables from FRED:
= c("GDP", "IMP0015", "IMPCH", "EXP0015", "GS1", "CPILFESL") codes
We also need to tell FRED our API key to authenticate ourselves. For this course, you can use the following command:
= "YOUR KEY"
api_key fredr_set_key(api_key)
- Use the
fredr_series_observations
command on a single variable to get the data for that variable - Now, find the option to pull only data starting at
1990-01-01
. Make sure you format this number as a date. - What happens if you use the previous command on
codes
instead of a single variable name? - Solve it by applying the correct map function which returns a dataframe by row-binding. Save your dataset as
df_raw
14.2.3 Data Transformations
For the rest of this part, take df_raw
and save the transformed output as df
:
- Now split your data into columns
- Rename your newly created columns as such:
gdp
<-GDP
,imp_ch
<-IMPCH
,imp_all
<-IMP0015
,exp_all
<-EXP0015
,t_bill
<-GS1
,cpi
<-CPILFESL
cpi
is coded in billions of USD while exports and imports are in millions, multiplycpi
by1000
to have all values in millions- create additional variables for our date using the
lubridate
module. We want columnsyear
,quarter
,month
,day
that only contain this part from thedate
column. Find the correct functions inlubridate
to achieve this - Sort your data by
date
We see that gdp is coded quaterly while the imports and exports are per month. We need the data grouped annually and quarterly respectively for the next two parts:
14.2.4 Data Grouping
- Group
df
quarterly into a dataframe calleddf_quarter
. In this dataset you want
start_date
as the minimum of date,gdp
,imp_all
,imp_ch
,exp_all
all aggregated as sums (how do you deal with NA’s?)cpi
,t_bill
aggregated as averages
- Group
df
annually into a dataframe called `df_year. In this dataset you want
gdp
,imp_all
,imp_ch
aggregated as sums (how do you deal with NA’s?)
14.2.5 Fact 1: Increase of Imports from China and the Rest of the World
For this exercise, we are going to use df_year
- drop data from 2020
- We want to create the following three variables:
global_share = 100 * imp_all / gdp
china_share = 100 * imp_ch / gdp
nonchina_share = global_share - china_share
- Create the following graph
- years between
1991
and2008
- x-axis: years
- y-axis:
china_share
andnonchina_share
(add to lines to your plot with different colors) - a vertical line for
x == 2008
14.2.6 Fact 2: Increasing trade-deficit of the US
For this we will work with df_year
again.
- drop data from 2020
- We want to create a variable
trade_deficit
(imports - exports) - And a variable
trade_deficit_share
for the share of the trade deficit compared to the gdp - Plot
trade_deficit_share
over time - Pick a nice color
- Add a vertical line to the plot for
year == 2008
14.2.7 Fact 3: 400 basis point fall in real Interest rates leading into China Expansion
For this we will work with df_quarter
again.
- We need to calculate the inflation rate from the consumer price index (
cpi
) column- find out how to calculate this
- look at the first lines of your dataset. What’s the problem?
- ungroup the dataset first before you repeat the previous step.
- look at the first lines of your dataset again.
- you now have the quaterly inflation rate
- Try to apply the
Return.annualized
function from thePerformanceAnalytics
package to the newly createdinfl
column insidemutate
- what’s the problem?
- We will solve this problem in two ways
- Use the function inside
map_dbl
instead (not in amutate
)- assing the output to a new column of
df_quarter
calledannum_return
- assing the output to a new column of
- Use
Vectorize(Return.annualized)
insidemutate
instead an save the output toannum_return2
- What does the function do?
- Use the function inside
- Create a new variable called
real_r
as the difference betweent_bill
and one of yourannum_return
columns - Plot
real_r
against date - Add a vetical line at the date
2002-01-01
- Hint: You have to parse the date first as a
date
and then convert it into anumerical
value
- Hint: You have to parse the date first as a
14.3 Descriptive Stats in ADH using Weighted Statistics
We are now going to work the the ‘micro’ data directly from ADH. Luckily, some of our coding friends at NYU Stern have done a tonne of the heavy lifting for us and merged all of ADH’s essential data together into one file.
The trickiest thing to understand is the timing of the data and the variable names.
Here is some info:
14.3.1 Details about timing is as follows.
The start of the period is 1991 and then end is 2007. This is then divided into two periods. The first periods is 1991-2000, thus this is a 9 year time period. They convert stuff into a “comparable decadal scale” see Footnote 22. Thus, for values for this period, they multiply them by 10/9. The same issue arises for the second period which is 2000-2007. The values for this are again converted to “decadal scales” so they are multiplied by 10/7.
The Appendix Table 2, reports the income variable and the decadal adjustments. In the summary statistics for the stuff that we care about, the ADH data is adjusted in this way described above.
That is, variables starting with ‘l’ are in levels whereas variables starting with ‘d’ are the decadal equivalents.
As necessary, we will tell you which variable to use, so that \(*_somevariable\) means to choose the appropriate level or decadal equivalent. We leave you to figure out which of the \(l\) or \(d\) variables to use. Do ask us if you are confused.
14.3.2 Understanding the Essence of the Paper and What Comes Next.
Read Section 1 of ADH, so that you build an understanding of there main measure ‘IPW’ and what the paper is about. This will help you understand the context behind the remaining exercises in this notebook and those to follow.
Your first task will be to compute some descriptive statistics from the data. To be more precise, you will replicate some of the key numbers in Appendix Table 2 of ADH. (On a side note, at least one of us thinks this table should be in the main text!)
14.3.3 Load Necessary Packages
Install the ones you do not have yet.
library("readr")
library("tibble")
library("dplyr")
library("Hmisc")
14.3.4 Load Data
Like always, we are going to load the data and save it as a tibble
= read_csv("data/adh_data.csv") %>% as_tibble() df
14.3.5 Compute Simple Grouped Mean
- Find which years (
yr
) are reflected in the data. - Compute the average number of chinese imports per worker (
'l_tradeusch_pw
&d_tradeusch_pw
) for each “year”.
14.3.6 Computed Weighted Group Means and Standard Deviations
For the rest of the exercise, weight the mean by population count per region instead (l_popcount
) and compare it with the numbers in the table.
- Repeat step 2 with weights.
- Now also compute the weighted standard deviations for both variables. Hint: Use the
Hmisc
package and find the relevant function. - Now compute the mean and standard deviation of the average household wage and salary (
l_avg_hhincwage_pc_pw
,d_avg_hhincwage_pc_pw
) - And once more for share not in labor force (
l_sh_nilf
,d_sh_nilf
)
How well do your numbers line up with those reported in the paper?
14.4 Regression Analysis
Let’s first load the necessary packages to read data and do fancy regressions:
library("readr")
library("tibble")
library("sandwich")
library("lmtest")
And let’s load the data like we always do:
= read_csv("data/adh_data.csv") df
14.4.1 OLS regression
The core of the paper is looking at what happened to laborer’s when theres an increase in us imports from china. Let’s try and replicate part of Table 9 - namely the estimate from panel A column 2.
Their y variable is relchg_avg_hhincwage_pc_pw
.
The important x variable is decadal trade between the us and china d_tradeusch_pw
.
- Run that simple regression
- Now add heteroskedasticity robust standard (HC1). Hint: Use the
sandwich
andlmtest
packages
Now we will start to add extra x variables.
- Start by adding
t2
- a dummy variable for whether observation is in the second decade. Fit again with HC1 robust standard errors.
14.4.2 Clustering
Let us now use clustertered standard errors instead. ADH cluster by statefip
.
Hint: use the felm
package.
- Run the basic regression with clustering
- Add the following controls to your last regression:
l_shind_manuf_cbp
l_sh_popedu_c
l_sh_popfborn
l_sh_empl_f
l_sh_routine33
l_task_outsource
- Add region fixed effects to your regression.
- First find all variables in the dataset that start with
reg_
- Add these to your last regression
- First find all variables in the dataset that start with
14.4.3 Instrument Variables
- Instrument
d_tradeusch_pw
withd_tradeotch_pw_lag
in your last regression - Weight your regression by
timepwt48