A workspace to run some data manipulations.
back up entire database of PolicyFlow v2.0.0
.
Dump: ./backup/diffusion2017vis_v2.sql
update policy clusters model using Google word2vec, the pre-trained model can be download here.
Script:
- update
policy
:./scripts/migrate_0912_update_policy_cluster.sql
- update
policy_similarity
:./scripts/migrate_0912_policy_similarity.sql
- update
policy_text
:./scripts/migrate_0912_update_policy_text.sql
Dump: ./backup/diffusion2017vis_20170912.sql
backup.
Dump: ./backup/diffusion2017vis_20170829.sql
create policy_similarity
table.
Script:
- ddl:
./scripts/migrate_0825_add_policy_similarity_table.sql
update improved policy LDA clusters: add policy_lda_3
;
Script:
- update
policy
:./scripts/migrate_0817_update_policy_ldas.sql
update policy LDA clusters. 6 clusters for the first layer, 10 clusters for the second layer. Terms for each cluster are presented at ./raw/lda_term/0/lda_term_*.txt
.
Script:
-
update
policy
:./scripts/migrate_0812_update_policy_ldas.sql
-
computation:
./scripts/textLDA.py
update major topics and add lda labels to policy
All 773 policies have been altered according to the newest data. According to output log: 464 policies are updated with new major topic, the others are labeled as 98: Unknown. Within these "Unknown"s, 290 policies that are found in the data set haven't been labeled as any major topic, 19 policies are from old data set that do not have new major topic assigned.
Policies from {Agriculture(2), Defense(1), Foreign Trade(1), Immigration(2), Public Lands(2), Technology(1)} are removed due to insufficiency in amount to calculate network.
Script:
-
update schema:
./scripts/migrate_0807_add_lda_n_update_subject.sql
-
update subject id:
./scripts/migrate.py -o u
Dump: ./backup/diffusion2017vis_20170807.sql
topics in new data are as follow.
>>> df.majortopic.unique().describe()
counts freqs
categories
Macroeconomics 1 0.05
Civil Rights 1 0.05
Health 1 0.05
Agriculture 1 0.05
Labor 1 0.05
Education 1 0.05
Environment 1 0.05
Energy 1 0.05
Immigration 1 0.05
Transportation 1 0.05
Law and Crime 1 0.05
Social Welfare 1 0.05
Housing 1 0.05
Domestic Commerce 1 0.05
Defense 1 0.05
Technology 1 0.05
Foreign Trade 1 0.05
Government Operations 1 0.05
Public Lands 1 0.05
NaN 1 0.05
code book:
{
"Macroeconomics": { "id": 1, "valid": 1 },
"Civil Rights": { "id": 2, "valid": 1 },
"Health": { "id": 3, "valid": 1 },
"Agriculture": { "id": 4, "valid": 1 },
"Labor": { "id": 5, "valid": 1 },
"Education": { "id": 6, "valid": 1 },
"Environment": { "id": 7, "valid": 1 },
"Energy": { "id": 8, "valid": 1 },
"Immigration": { "id": 9, "valid": 1 },
"Transportation": { "id": 10, "valid": 1 },
"Law and Crime": { "id": 12, "valid": 1 },
"Social Welfare": { "id": 13, "valid": 1 },
"Housing": { "id": 14, "valid": 1 },
"Domestic Commerce": { "id": 15, "valid": 1 },
"Defense": { "id": 16, "valid": 1 },
"Technology": { "id": 17, "valid": 1 },
"Foreign Trade": { "id": 18, "valid": 1 },
"International Affairs": { "id": 19, "valid": 1 },
"Government Operations": { "id": 20, "valid": 1 },
"Public Lands": { "id": 21, "valid": 1 },
"Arts and Entertainment": { "id": 23, "valid": 0 },
"Government Administration": { "id": 24, "valid": 0 },
"Weather": { "id": 26, "valid": 0 },
"Fires": { "id": 27, "valid": 0 },
"Sports": { "id": 29, "valid": 0 },
"Death Notices": { "id": 30, "valid": 0 },
"Religion": { "id": 31, "valid": 0 },
"Other": { "id": 99, "valid": 0 },
"Unknown": { "id": 98, "valid": 1 }
}
updated: 464, unknown: 290, raw: 19
Create table policy_text
for policy description text.
Script: ./script/migrate_0801_add_policy_text_table.sql
Dump: same to the previous
For potential requirement on displaying policy description, add policy_description
column to TABLE pilocy
, with identical values to policy_name
for now.
Script: ./scripts/migrate_0708_add_policy_description.sql
Dump: run the script on previous dump file
Totally, there are 755 policies in this dataset. All adoptions by either state from {'DC' 'GU', 'PR', 'VI'} are removed before insert to database. 151 policies are affected by this rule. Specially, policy healthcareconsentact1982
is removed since it contains only 'VI'.
584 of new policy added, 170 of overlapping old policy found, and 12601 of cascaded inserted.
By appending the new dataset, the total number of policy becomes 773, with 18696 adoptions from all 50 states.
some stats on len(description)
:
mode: 33, median: 46, mean: 53, max: 222
Script:
-
update database schema:
./scripts/migrate_0707_add_policies.sql
-
append new data to database:
./scripts/migrate.py
Dump: ./backup/diffusion2017vis_20170707.sql
The initiating version, please refer to v1.0
specification documentation.
189 policies, 6196 adoptions.
Script & Dump: ./backup/diffusion2017vis_20170706.sql
Convert shapefiles to topojson. To begin with, make sure you're under /data
.
- create directorire and download the shape file cb_2016_us_state_5m.zip from census:
# make directory for raw and output data
mkdir external && mkdir out && cd external
# download shapefiles
wget http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_state_5m.zip
# extract the package to `cb_2016_us_state_5m`
unzip cb_2016_us_state_5m.zip -d cb_2016_us_state_5m
- install tools globally:
npm install shapefile -g
npm install topojson -g
npm install ndjson-cli -g
- convert shape files to topojson files
# convert shapefiles to topojson
# .shx and .dbf files should be in the same directory with .shp file
shp2json -n cb_2016_us_state_5m/cb_2016_us_state_5m.shp | ndjson-reduce 'p.features.push({type: "Feature", properties: {id: d.properties.STUSPS, gid: d.properties.GEOID}, geometry: d.geometry}), p' '{type: "FeatureCollection", features: []}' | geo2topo states=- > states.topo.json
# simplify raw file and write it to `/src/data/`
toposimplify -P 0.1 states.topo.json -o ../../src/data/states.p1.topo.json
shapefiles can also be found at Census.gov › Geography › Maps & Data › Cartographic Boundary Shapefiles, and so much pregenerated map data can be found at Mapzen, jgoodall, mbostock for v3, and us-atlas for v4
Census Bureau-designated regions and divisions from wikipedia, state-wise abbrivation list, list contains abbrivation and pdf from census
GeoJSON spec, and RFC 7946.
This section served as notes for generating a static network from all policies and retrieving attributes of states. In current version, attributes are valid in database, whild the static network is no longer used. You may refer to notes here for the process of dynamically generating network according to users' requests.
Inferring Networks of Diffusion and Influence, and it's implementation with R interface.
Persistent Policy Pathways: Inferring Diffusion Networks in the American States, and their implementation as well as data.
The details of all policies and their categories are introduced by Boehmke and Skinner in appendix of State Policy Innovativeness Revisited