This project visualizes the interaction between unemployment and average travel distance by incorporating spatial patterns. The data visualization is done using a Dash web application, which displays a map, project details, and data sources.
Before running the project, make sure you have Python 3.9+ installed and set up a virtual environment:
git clone https://github.com/ouslan/mov
cd mov
a .env
in the root directory with the following content:
CENSUS_API_KEY=YOUR_API_KEY
You can create a new Conda environment using the provided environment.yml
file:
conda env create -f environment.yml
Alternatively, you can install the required libraries with pip:
pip install -r requirements.txt
Important
This project uses polars
as one of its dependencies, which requires Rust to be installed on your system. You can install Rust from the official Rust website.
To run the Dash application, use the following command:
docker-compose up
python main.py
Important
It is important to note that this replication will take a while to run. Given my a high end computer with 68GB of RAM and 12 threads, it will take around 22 hours to download and run the project.
Warning
The data used for this project is very large, be sure to have enough space on your computer to download it. It requires at least 120GB of free space.
Caution
Given that the there is a recalculation of the road shape files from counties to PUMAs, and it is done in paraller to save time it maya very resorce intensive to run the project. You can use the following package to generate all the data of the project:
You can also run the website locally using Docker. To build and start the Docker containers, run:
docker-compose up --build
This will host the Dash application at http://localhost:7050 and the documentation at http://localhost:8005.
app.py
: Main application file that sets up the Dash app, defines the layout, callbacks, and runs the server.src/data/data_pull.py
: Contains theDataPull
class that handles data retrieval.src/data/data_process.py
: Contains theDataClean
class that handles data loading, cleaning, and processing.src/graphs/data_graph.py
: Contains theDataGraph
class for processing and visualizing data.
The data for this project comes from several sources:
- TIGER2019: Shapes for the census PUMAS and for state, as well as historical roads.
- Public Use Microdata Areas (PUMAs): Contains most control variables
The project uses two regression models:
- OLS Regression: Estimates coefficients of the MOVS dataset to move data from state level to county level.
- Panel Spatial Regression with Fixed Effects: Incorporates spatial interaction between neighboring counties. The model used is: