Scrapes, stores and and displays Pittsburgh Bureau of Police incident data.
Every morning (usually), the Pittsburgh police department publishes a PDF of the previous day's incidents and arrests here: http://communitysafety.pittsburghpa.gov/Blotter.aspx. Trouble is, they're posted in tough-to-read PDFs.
Openblotter scrapes these PDFs, inserts relevant information into a PostgreSQL database and serves it all up on a spiffy map.
- Apache or other web server
- Python enabled on that web server
- A PostgreSQL database
These instructions assume you're using an httpd/Apache web server program.
- Download the repository and store in
/var/html/www
. - Run
sql/initialize.sql
in PostgreSQL to set upincident
andincidentdescription
tables using the schema shown below. - Add your PostgreSQL login credentials to
py/contants.py
. - Install required libraries
sudo pip install psycopg2
(Don't forget psycopg's dependencies,python-dev
andlibpq-dev
. Check notes here.)sudo pip install pdfminer
- Set up a cronjob to run
py/parser.py
at regular intervals. (Example:00 09,11,13,18 * * * /usr/bin/python /var/www/html/blotter/py/parser.py
) - Profit!
Openblotter maintains an error log of misread (and therefore unincluded) entries at txt/errors.txt
.
Each pre-scraped PDF is stored as pdf/YYYYMMDD.pdf
.
Each just-converted text file is stored as txt/YYYYMMDD.txt
.
Openblotter's schema includes two tables: incident
, which stores metadata (time, location, neighborhood) about a given event, and incidentdescription
, which lists the various crimes associated with each event.
Field | Type | Purpose |
---|---|---|
incidentid | serial integer | Unique ID associated with each incident |
incidenttype | character | Type of incident: `Arrest` or `Offense 2.0` |
incidentnumber | integer | ID assigned to incident by police |
incidentdate | date | Date of incident |
incidenttime | time without timezone | Time of incident |
address | character | Address of incident (as reported by police, not geocoded) |
neighborhood | character | Neighborhood of incident (as reported by police, not geocoded) |
lat | numeric | Latitude of incident (geocoded from address and neighborhood) |
lng | numeric | Longitude of incident (geocoded from address and neighborhood) |
Zone | character | Police zone responding to incident (not always the same as the zone where the incident took place |
age | smallint | Age of suspect (if `incidenttype` is `Arrest`) |
Gender | character | Gender of suspect (if `incidenttype` is `Arrest`) |
geom | geometry(Point, 4326) | Geometry of incident, derived from `lat`/`lng` |
</tbody>
Field | Type | Purpose |
---|---|---|
incidentdescriptionid | serial integer | Unique ID associated with each incident charge |
incidentid | integer | Unique ID associated with each incident; links to `incident` table |
section | character | Section of the this charge's criminal statute |
description | character | Text description of charge |