Skip to content

Federal Election Commision campaign finance data: Pentaho ETL Job and Transformations

License

Notifications You must be signed in to change notification settings

MaxterBattle/fec-etl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

The United States Federal Election Commission (FEC) releases data on contributions to federal election campaigns and committees (including political action committees, party committees and campaign committees). I noticed the bulk downloads were not user-friendly for those who want to quickly pop it into a database...

  • confusing filenames
  • pipe-delimited data
  • comma-delimited headers (in separate files)
  • no quoting or escapes
  • inconsistent date strings
  • orphan child records
  • fields with opaque codes & no lookup tables

I used free software to extract, transform and load (ETL) the bulk FEC data into a database. This repository contains transformations, SQL to create tables and keys, CSV files for lookup tables and sample SQL queries.

I could more easily have written some command-line scripts, but I thought this would be a good learning experience and would allow non-programmers to import the bulk data on their own.

This repo is a respectful gift to anyone who might want to quickly pop FEC data into a database and do their own queries. I'd like to use SSIS to pull it into SQL Server soon, but that may be a separate repo/project.

Contributions/improvements welcome.

Basic requirements

You will need...

  • Pentaho Data Integration (Community Edition is fine)
  • MySQL (or another relational database)
  • Java Virtual Machine (to run Pentaho Data Integration)
  • Files from FEC

Files in this repo

  • Pentaho Data Integration (Kettle) Jobs and transformations
    • .KTR & .KJB
  • Lookup table CSV files (not provided by FEC)
    • /txt-import/lookup/incumbent.csv
    • /txt-import/lookup/party.csv

Setup

Directory structure

The .KTR files (Pentaho transformations) use a relative file path to find the text imports, so leave them in the root directory.

sql/ - SQL to create and modify tables.

txt-import/ - place unzipped files from the FEC here.

txt-import/lookup/ - contains CSV files to enhance FEC data

Pentaho Data Integration

Pentaho Data Integration (PDI) is an ETL tool from Hitachi Vantara. It's available in a free Community Edition and a commercial Enterprise Edition and runs on multiple operating systems, including Windows, Mac, Linux. You can download PDI from...

Java

You'll need the Java Virtual Machine to run Pentaho. There is a version included in the Pentaho installer, or you can use your own.

MySQL (or other database)

I created the Pentaho transformations using MySQL 8.0 on MacOS. Any of these transformations can be configured to use different databases. A couple use MySQL bulk loader, but even that would be easy enough to switch out.

MySQL installation instructions are beyond the scope of this document. Once you have MySQL (or your database of choice) up and running, do the following...

  • Create a database schema named: campaign_finance
  • Create the user: fec_data
  • Set fec_data's password to 'watchThem$L1ke@Hawk' (or edit the MySQL shared connection in pentaho-kettle-ETL-job.kjb to use any password you want).
  • Give user fec_data appropriate permissions in the schema.
    • SELECT
    • INSERT
    • UPDATE
    • DELETE
    • LOCK TABLES
    • SHOW VIEW
    • DROP
  • Some of the SQL scripts need further DDL privileges. Give these to fec_data or use root.
    • CREATE
    • ALTER
    • REFERENCES
    • INDEX
  • Use the file /sql/ddl-create.sql to create your tables files to create tables
  • If you're brave, you can create indexes and foreign keys before importing data. If you prefer less hassle, you can try to create them afterwards. Use these files when you're ready:
    • sql/foreign-keys.sql
    • sql/indexes.sql

FEC files

All files are available for downloaded from the FEC: https://www.fec.gov/data/browse-data/?tab=bulk-data. There are many years available as zip files grouped by 2-year federal election cycle.

Download and extract the files for the time period of interest. Place the extracted .TXT files in txt-import/

When I have time, I'd like to add unzipping to the Pentaho job.

Files and Entities - Details

From the FEC

After unzipping, there are 10 FEC files, each of which maps to a database table.

To run the transformations, place unzipped files in the txt-import/ directory.

File [Table]

  • ccl.txt [candidate_committee]
  • cm.txt [committee_master]
  • weblYY.txt [fed_current_campaign_summary]
  • cn.txt [candidate_master]
  • itcont.txt [contrib_indiv_2_cmte]
  • itoth.txt [trans_among_cmte]
  • itpas2.txt [contrib_cmte_2_cand]
  • oppexp.txt [operating_expenditure]
  • weballYY.txt [candidate_all ]
  • webkYY.txt [pac_party_summary]

Some of these files will overwrite older data... primarily the summaries. Others will not. Details are in sql/indexes.sql. I may modify the tables and imports to keep all data, but for my purposes, it wasn't necessary.

REPLACE (overwrite) records with duplicate keys...

  • candidate_master
  • candidate_all
  • committee_master
  • pac_party_summary

Project Files

These CSV files contain the lookup tables.

  • txt-import/lookup/party.csv [party]
  • txt-import/lookup/incumbent.csv [ici_status]

To Do

  • Load MySQL user credentials from config file.
  • Add unzipping to the PDI job.

Descriptions of zip file downloads from Federal Elections Commission

Descriptions here are directly from the FEC. The US federal government can't copyright anything, so I reproduced them here.

web-allYY.zip

cnYY.zip

cclYY.zip

weblYY.zip

cmYY.zip

  • table: committee_master
  • https://www.fec.gov/campaign-finance-data/committee-master-file-description/
  • The committee master file contains one record for each committee registered with the Federal Election Commission. This includes federal political action committees and party committees, campaign committees for presidential, house and senate candidates, as well as groups or organizations who are spending money for or against candidates for federal office.

webkYY.zip

indivYY.zip

pas2YY.zip

othYY.zip

  • table: trans_among_cmte
  • https://www.fec.gov/campaign-finance-data/any-transaction-one-committee-another-file-description/
  • The itemized records (miscellaneous transactions) file contains all transactions (contributions, transfers, etc. among federal committees). It contains all data in the itemized committee contributions file plus PAC contributions to party committees, party transfers from state committee to state committee, and party transfers from national committee to state committee. This file only includes federal transfers not soft money transactions.

oppexpYY.zip

About

Federal Election Commision campaign finance data: Pentaho ETL Job and Transformations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TSQL 100.0%