Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve data download feature #818

Merged
merged 9 commits into from
Nov 3, 2022
Merged

improve data download feature #818

merged 9 commits into from
Nov 3, 2022

Conversation

smcalilly
Copy link

@smcalilly smcalilly commented Oct 26, 2022

Overview

For #816 and #807

Includes:

  • additional make recipes for filtering data, based on the list in #816 comment
  • updated the download template with text in #816 comment
  • removed the old download buttons
  • added nav link to download page
  • blank readme for demo (per tom's instruction in an email)
  • new import docket with country name
  • updated the download data import to use country name

Notes

The data archive is a flat directory. The recipes write to <country_name>_<entity_type>.csv instead of creating a new directory with the country name. I couldn't figure out how to cleanly create a new directory based on the dynamic country name values. Let me know if you see a way to do this?

I had to create a custom processing script to blank out the values for two specific columns based on a discussion in our slack #computer-programming channel. For context: csvgrep can only remove the columns. csvsql can change the column values, but csvsql would completely remove the header row from a file that had no data rows, thus creating a blank file which isn't something we want.

Testing Instructions

  • get aws s3 token from wwic-data-archive-staging token in last pass
  • cp .env.s3.example .env and add your AWS access tokens
  • for testing purposes, you might need to remove the myanmar row from the import docket, since that spreadsheet naming convention will break the import (this issue has been mentioned in an email, to be fixed)
  • to create an archive locally, run docker-compose --env-file .env.s3 run --rm app make data_archive
  • visit localhost:8000/en/download/ to download the data

@smcalilly smcalilly changed the title [wip] improve data download feature improve data download feature Oct 31, 2022
@smcalilly smcalilly marked this pull request as ready for review October 31, 2022 19:59
@smcalilly smcalilly requested a review from hancush October 31, 2022 20:00
Copy link

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @smcalilly, great refactor! Once you respond to my super minor comments, let's get this up on staging for review.

data/processors/blank_columns.py Outdated Show resolved Hide resolved
data/processors/blank_columns.py Outdated Show resolved Hide resolved
data/processors/blank_columns.py Outdated Show resolved Hide resolved
docket.mk Show resolved Hide resolved
docket.mk Show resolved Hide resolved
@smcalilly
Copy link
Author

smcalilly commented Nov 3, 2022

@hancush I made that python change, thanks for that.

When I added the individual sources file to the pipeline, it was sometimes missing a column and erroring the blank_columns.py script. So I did some explicit checks for that:

for row in reader:
comment_key = f'{args.entity}:comments:admin'
comments = row.get(comment_key)
if comments:
row.update({
comment_key: ''
})
owner_key = f'{args.entity}:owner:admin'
owner = row.get(owner_key)
if owner:
row.update({
owner_key: ''
})

Not the cleanest but it works.

@smcalilly smcalilly requested a review from hancush November 3, 2022 19:13
@smcalilly smcalilly merged commit b05e83a into master Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants