Skip to content

ETL scripts for loading US Congressional data from govtrack.us into Neo4j

License

Notifications You must be signed in to change notification settings

gregoryfoster/legis-graph

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Legislative Graph

A set of scripts to easily download and import US legislative data into a Neo4j database. This is a work-in-progress, please submit an issue for any errors or feature requests.

Data Model

The data model incorporates a small amount of the data available from GovTrack. Please submit an issue to request any changes / updates. We're really interested in how the commmunity might want to use this data so please let us know!

Also, this file has more detailed information about the data model.

Quickstart

This Cypher script will load data from the 114th Congress. You can use the LazyWebCypher tool with this link.

Load Data

We're currently working to streamline the data loading process, but for now you can follow these steps to load data.

Install requirements

pip3 install -r requirements.txt

Download / update data

Sync a particular congress by its number (so, for instance, for the 112th congress, replace <num> with 112.

./sync.sh <num>

Parse data

Use the parse scripts to parse the raw data into CSV files that can be easily loaded into Neo4j.

$ python3 parse_legislators.py
...
$ python3 parse_bills.py
...
$ python3 parse_votes.py
...
$ python3 parse_committees.py
...
$ python3 parse_committee_members.py
...

The scripts require Python 3.

Insert into Neo4j

See the steps documented here for configuring neo4j-shell and pointing Neo4j to the CSV files generated in the previous step.

$ path/to/neo4j/bin/neo4j-shell < import.cypher

Queries

Once the data is loaded in Neo4j we can use queries written in Cypher to discover interesting things about Congress.

General Queries

Find all Legislators:

MATCH (n:Legislator) RETURN n LIMIT 100

Find Steve Daines:

MATCH (n:Legislator {firstName: "Steve", lastName: "Daines"}) RETURN n

What Bills did Steve Daines sponsor?

MATCH (n:Legislator {firstName: "Steve", lastName: "Daines"})<-[:SPONSORED_BY]-(b:Bill) RETURN b

For how many Bills did Steve Daines vote Yea?

MATCH (n:Legislator {firstName: "Steve", lastName: "Daines"})-[v:VOTED_ON]->(b:Bill)
WHERE v.vote = "Yea"
RETURN b

More advanced queries

Find the number of bills proposed during each congress in the database.

MATCH (c:Congress)<-[:PROPOSED_DURING]-(b:Bill)
RETURN c.number AS congress, count(b) as numProposed

Find the number of bills enacted in each congress in the database and the average number of sponsors bills had during that congress.

MATCH (c:Congress)<-[:PROPOSED_DURING]-(b:Bill)-[:SPONSORED_BY]->(l:Legislator)
WHERE b.enacted = 'True'
WITH c, b, count(l) AS numSponsors
RETURN c.number AS congress, count(b) AS numPassed, avg(numSponsors) AS avgSponsors

Articles

Authors

Terms

The software in this repository is provided "AS-IS" without warranties or guarantees of any kind. Data used by this software is provided by Govtrack.us and should be used under the terms specified by Govtrack.us here.

About

ETL scripts for loading US Congressional data from govtrack.us into Neo4j

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.1%
  • Shell 6.9%