In this repo, we are merging project-level ARPA data that the Treasury Department released, and extracting projects that mentions something about the criminal justice system in the project description.
The Treasury Department requires large cities to report its ARPA spending every quarter, and smaller jurisdictions can report every year. You can find the excel spreadsheet from the Treasury's website, under the "public reporting" sections.
There are two notebooks that are currently relevant.
merge_data.ipynb
notebook merges project-level data from different data releases.project-level-analysis.ipynb
reads in the merged data and runs a text analysis on the project descriptions, extracting projects that contains keywords about the criminal justice system. You can checkclassfications
for the keywords we're extracting, or editing directly in the notebook.
In addition, the legacy_notebooks
directory has a number of notebooks that we used for previous analysis, including scrapers and text analysis on the interim report.
Python (3.6+) and Pandas
The data files are stored with git-lfs. You might need to install it before pulling the repo.