-
Notifications
You must be signed in to change notification settings - Fork 4
Phase One Overview & Terms
This goal of this project is to scrape the previous day’s New Criminal Filings at 6AM daily and send a .csv of the information and a roundup summary to people at PBF. Every Sunday at 6:15AM, a second .csv and roundup is emailed, which is a weekly summary. An example format can be seen at https://www.phillybailfund.org/weekly-bail-reports
The goal of this project is to be able to programmatically download and extract data from dockets, then enter it into the database. This project needs to solve the technological problem of not being able to easily programmatically download the dockets. Some people have done work around using a headless/embedded browser. The data we need from the dockets is noted in the Data Dictionary on the project’s Github repository.
Once we can download dockets, there will be a one-time need to download and parse everything going back to January 1, 2020. There is also an ongoing need to scrape the previous day’s dockets, which will get correlated with New Criminal Filing data and entered into the database.
The database/data lake is the destination for the New Criminal Filing, Docket, and Court Summary data. Each row is a docket number. The Data Dictionary notes which data needs to end up in the database. The database is the backend to the dashboard
The end goal of the data collection/storage is to drive an interactive dashboard embedded on the Philly Bail Fund Squarespace site. In general it should be something like https://data.philadao.com/Bail_Report.html, but the data team has a lot of leeway in terms of telling a story with the data. PBF is especially interested in examining whether some magistrates set higher bail while controlling for other constants. This requires the data from the Dockets.
New Criminal Filings - Every day, all new criminal cases filed in the city are posted to this website: https://www.courts.phila.gov/NewCriminalFilings/date/. These contain general information about the case, like the person’s name, the main charge, whether bail was set, and the docket number
Docket: Dockets are PDFs containing detailed information about a given case. They are publicly available at this website: https://ujsportal.pacourts.us/DocketSheets/MC.aspx. Dockets contain information that the New Criminal Filings website does not, like the name of the magistrate who set the bail, the defendant’s race. A docket is updated as the case progresses to reflect the next court hearing, as well as the case outcome when it concludes. The URL to access a given docket has a unique hash, so it is difficult to scrape them.
Court Summary: Court Summaries are also available from the docket website. They contain a defendant’s entire PA court history, so they’ll have information about a defendant’s current case as well as any other ongoing or concluded ones. They also have unique hashed URLs.