Skip to content

mhowe0422/pgh-crime-blotter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openblotter

Scrapes, stores and and displays Pittsburgh Bureau of Police incident data.

Background

Every morning (usually), the Pittsburgh police department publishes a PDF of the previous day's incidents and arrests here: http://communitysafety.pittsburghpa.gov/Blotter.aspx. Trouble is, they're posted in tough-to-read PDFs.

Openblotter scrapes these PDFs, inserts relevant information into a PostgreSQL database and serves it all up on a spiffy map.

Setup

Requirements

  • Apache or other web server
  • Python enabled on that web server
  • A PostgreSQL database

Python dependencies

Installation

These instructions assume you're using an httpd/Apache web server program.

  1. Download the repository and store in /var/html/www.
  2. Run sql/initialize.sql in PostgreSQL to set up incident and incidentdescription tables using the schema shown below.
  3. Add your PostgreSQL login credentials to py/contants.py.
  4. Install required libraries
  • sudo pip install psycopg2 (Don't forget psycopg's dependencies, python-dev and libpq-dev. Check notes here.)
  • sudo pip install pdfminer
  1. Set up a cronjob to run py/parser.py at regular intervals. (Example: 00 09,11,13,18 * * * /usr/bin/python /var/www/html/blotter/py/parser.py)
  2. Profit!

Errors and logs

Openblotter maintains an error log of misread (and therefore unincluded) entries at txt/errors.txt.

Each pre-scraped PDF is stored as pdf/YYYYMMDD.pdf.

Each just-converted text file is stored as txt/YYYYMMDD.txt.

Database schema

Openblotter's schema includes two tables: incident, which stores metadata (time, location, neighborhood) about a given event, and incidentdescription, which lists the various crimes associated with each event.

incident

Field Type Purpose
incidentid serial integer Unique ID associated with each incident
incidenttype character Type of incident: `Arrest` or `Offense 2.0`
incidentnumber integer ID assigned to incident by police
incidentdate date Date of incident
incidenttime time without timezone Time of incident
address character Address of incident (as reported by police, not geocoded)
neighborhood character Neighborhood of incident (as reported by police, not geocoded)
lat numeric Latitude of incident (geocoded from address and neighborhood)
lng numeric Longitude of incident (geocoded from address and neighborhood)
Zone character Police zone responding to incident (not always the same as the zone where the incident took place
age smallint Age of suspect (if `incidenttype` is `Arrest`)
Gender character Gender of suspect (if `incidenttype` is `Arrest`)
geom geometry(Point, 4326) Geometry of incident, derived from `lat`/`lng`

incidentdescription

</tbody>
Field Type Purpose
incidentdescriptionid serial integer Unique ID associated with each incident charge
incidentid integer Unique ID associated with each incident; links to `incident` table
section character Section of the this charge's criminal statute
description character Text description of charge

About

Scrapes and displays Pittsburgh Bureau of Police incident data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published