Scrapes, stores and and displays Pittsburgh Bureau of Police incident data.
Scraping Pittsburgh Bureau of Police logs every morning and display criminal incidents. Striving to provide the best analysis of the city's data, but limited to the accuracy of the contents and the number of incidents provided by the police. It is important that any decisions based on this data be confirmed using additional resources. As the city says, "The City of Pittsburgh has provided this information as a service. The City assumes no responsibility for the use of information posted on this site." Blame Tim Condello, Mark Howe, Andrew McGill, Andy Somerville and Open Pittsburgh for creating this application.
Every morning (usually), the Pittsburgh police department publishes a PDF of the previous day's incidents and arrests here: http://communitysafety.pittsburghpa.gov/Blotter.aspx. Trouble is, they're posted in tough-to-read PDFs.
Openblotter scrapes these PDFs, inserts relevant information into a PostgreSQL database and serves it all up on a spiffy map.
- Apache or other web server
- Python enabled on that web server
- A PostgreSQL database
These instructions assume you're using an httpd/Apache web server program.
- Download the repository and store in
/var/html/www
. - Run
sql/initialize.sql
in PostgreSQL to set upincident
andincidentdescription
tables using the schema shown below. - Add your PostgreSQL login credentials to
py/contants.py
. - Install required libraries
sudo pip install psycopg2
(Don't forget psycopg's dependencies,python-dev
andlibpq-dev
. Check notes here.)sudo pip install pdfminer
- Set up a cronjob to run
py/parser.py
at regular intervals. (Example:00 09,11,13,18 * * * /usr/bin/python /var/www/html/blotter/py/parser.py
) - Profit!
Openblotter maintains an error log of misread (and therefore unincluded) entries at txt/errors.txt
.
Each pre-scraped PDF is stored as pdf/YYYYMMDD.pdf
.
Each just-converted text file is stored as txt/YYYYMMDD.txt
.
Openblotter's schema includes two tables: incident
, which stores metadata (time, location, neighborhood) about a given event, and incidentdescription
, which lists the various crimes associated with each event.
Field | Type | Purpose |
---|---|---|
incidentid | serial integer | Unique ID associated with each incident |
incidenttype | character | Type of incident: `Arrest` or `Offense 2.0` |
incidentnumber | integer | ID assigned to incident by police |
incidentdate | date | Date of incident |
incidenttime | time without timezone | Time of incident |
address | character | Address of incident (as reported by police, not geocoded) |
neighborhood | character | Neighborhood of incident (as reported by police, not geocoded) |
lat | numeric | Latitude of incident (geocoded from address and neighborhood) |
lng | numeric | Longitude of incident (geocoded from address and neighborhood) |
Zone | character | Police zone responding to incident (not always the same as the zone where the incident took place |
age | smallint | Age of suspect (if `incidenttype` is `Arrest`) |
Gender | character | Gender of suspect (if `incidenttype` is `Arrest`) |
geom | geometry(Point, 4326) | Geometry of incident, derived from `lat`/`lng` |
</tbody>
Field | Type | Purpose |
---|---|---|
incidentdescriptionid | serial integer | Unique ID associated with each incident charge |
incidentid | integer | Unique ID associated with each incident; links to `incident` table |
section | character | Section of the this charge's criminal statute |
description | character | Text description of charge |