This repository contains the code which replicates "How Many Jobs Can be Done at Home?" by Simon Mongey, Laura Pilossoph, and Alex Weinberg. This replication package downloads the data available online and reproduces the results available in our paper.
The following data on occupation work-from-home ability and physical-proximity may be useful for your research:
We use publicly available data from:
-
O*NET, OES, and ATUS files
- Downloaded in the process of running the replication.
-
CPS files:
- 2019 ASEC is downloaded from IPUMS and can be found here
clean_CPS2019/input/asec19_raw.dta
- The basic monthly CPS is downloaded from IPUMS
- Sample is February and March from 2010 - 2020
- We keep a subset of variables (see end of this note)
- The replication code then loads in the files
unemployment_figs/input/basic_monthly_CPS_raw_YYYY.dta
- 2019 ASEC is downloaded from IPUMS and can be found here
-
PSID files:
- We downloaded FAM2017ER from PSID
- We keep a subset of variables (see end of this note)
- The replication code then loads in the file
clean_PSID/input/htm_psid2017.dta
-
The Safegraph data we use in the paper is not publicly available.
- Download this repository by clicking the green
Clone or download
button above. Unzip into the directory of your choice. - From the terminal, navigate to the folder containing this README.
- Type
make
and hitenter
to excute the code from top to bottom. TheMakefile
will execute all thestata
andshell
scripts.
Note:
- After running
make
for the first time, any subsequent changes will not run the whole code again from top to bottom, but only the parts of the codes that have changes and their dependencies. - For example, if after running
make
for the first time you wish to change sample selection in the CPS and PSID from individuals aged 25-65 to individuals aged 20-65, you could change that line of code in/unemployment_figures/unemp_figs.do
, runmake
again and only the code from this change onwards would be ran.
The best way to replicate the paper is to run make
.
It is also possible to replicate the paper by running the scripts found in the folders in the following order:
download_data/src
clean_ONET/src
clean_OES/src
clean_PSID/src
make_LWFH_PP_measures/src
clean_CPS2019/src
main_figs/src
unemployment_figs/src
atus_figs/src
The workflow is organized as a series of tasks.
Each task folder contains three folders: input
, src
, output
.
A task's output is used as an input by one or more downstream tasks.
- Jonathan Dingel's note on the automated task-based organization of this code.
- The replication package for Dingel and Neiman 2020 has helpful notes on running
make
andstata
.
Sample is February & March from 2010 - 2020.
Variables
- age
- occ
- educ
- empstat
- compwt
- month
- year
- marst
- earnweek
- uhrswork1
- bpl
- citizen
- race
- sex
- labforce
From the PSID’s Main Study Family File 2017.
- ER66002
- ER66009
- ER66017
- ER71331
- ER71333
- ER71335
- ER71347
- ER71349
- ER71351
- ER71353
- ER71305
- ER71377
- ER71363
- ER71361
- ER71365
- ER71277
- ER71427
- ER71429
- ER71433
- ER71435
- ER71437
- ER71439
- ER71443
- ER71445
- ER71447
- ER71449
- ER71451
- ER71453
- ER71455
- ER71457
- ER71459
- ER71461
- ER71463
- ER71465
- ER71467
- ER71469
- ER71471
- ER71473
- ER71475
- ER71481
- ER71483
- ER71485
- ER71570
- ER71512
- ER66195
- ER66196
- ER66197