Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fixture with sample analysis results #962

Merged
merged 3 commits into from
Dec 20, 2024

Conversation

KlaasH
Copy link
Contributor

@KlaasH KlaasH commented Dec 10, 2024

Overview

This edits the setup script (technically the update script, which does most of the setup) to download and import a fixture containing a few neighborhoods and analysis results. Specifically neighborhoods and results for:

  • Grand Rapids, MI
  • Mackinac Island, MI
  • Pella, IA
  • Presque Isle, ME
  • Thunder Bay, ON, Canada
  • Groningen, Netherlands

These particular ones were chosen mostly-arbitrarily by me, with a goal of keeping them smallish but not all tiny, with some in the US and some not. (And Mackinac Island made the cut because it's the #1 low-stress connectivity champion, owing to the fact that cars are banned from the island.)

I also added one neighborhood, Montpelier, VT, for which I didn't import the analysis results. Because it seems like it might be useful to have a neighborhood with no current analysis ready to go so we can run an analysis and not have to check if what we're seeing afterwards is the result of that or an earlier one.

To load the fixture, run ./scripts/update --load-data. I made it optional because it really only needs to be done on initial setup, and it will waste time to import it again on subsequent updates (I also put the import_crash_data step in the same conditional block, for the same reason).

Resolves #959

Demo

image

Notes

I assembled the data by copying it from production, i.e.:

  • Downloading the boundary files from s3://production-pfb-storage-us-east-1/neighborhood_boundaries/root/
  • Finding the job UUIDs of the latest job for each neighborhood via the web interface, then downloading the analysis output files for that job from s3://production-pfb-storage-us-east-1/results/
  • Bundling up the files for each job into a .zip file
  • Adding the neighborhoods through the Neighborhoods admin page
  • Importing the analysis results through the Jobs admin page

Then I exported the fixture and uploaded it to S3:

./scripts/django-manage dumpdata --indent 2 -o sample_analysis_results.json.gz \
    pfb_analysis.AnalysisJob pfb_analysis.Neighborhood pfb_analysis.NeighborhoodWaysResults \
    pfb_analysis.CensusBlocksResults
aws --profile pfb s3 cp sample_analysis_results.json.gz "s3://test-pfb-inputs/"

Testing Instructions

  • Delete your existing dev instance with docker compose down -v (this is necessary because you need your organization_id to match what the fixture expects)
  • Run ./scripts/update --load-data to rebuild your instance and import the data
  • Run ./scripts/server and go to http://localhost:9301/#/. You should see the imported results in the "Leading Cities/Towns" panel on the landing page, and on the list if you click "All places"

Checklist

  • Add entry to CHANGELOG.md

Copy link
Member

@aaronxsu aaronxsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the notes on how you assembled the data.

Finding the job UUIDs of the latest job for each neighborhood via the web interface, then downloading the analysis output files for that job from s3://production-pfb-storage-us-east-1/results/

I clicked on one of the UUIDs in that path, and it has a bunch of CSV files and JSON files. Are they all analysis output files? Did you zip them all?

Adding the neighborhoods through the Neighborhoods admin page

Is there a shared admin account for local dev? Or should we create one by running the management command createsuperuser?

After running through the testing steps, the following places' detail page gave me "No results available"- are they intended to show no results?

  • Grand Rapids (the screenshot is from this)
  • Mackinac Island
  • Pella
  • Presque Isle

Screenshot 2024-12-16 at 1 35 29 PM

@aaronxsu
Copy link
Member

aaronxsu commented Dec 16, 2024

Another thing I found: downloading the files in the red box shown below doesn't seem to work, i.e. the files don't exist in the S3 location. Some S3 requests also failed as shown in the network tab. I noticed that they all look for files in the None-pfb-storage-us-east-1 bucket. I assume the first part of the bucket name should be the env like prod or dev or something. Should I pass an env var in local dev? Or are these locations being constructed on the API level? Or are they from a column in the DB?

Screenshot 2024-12-16 at 2 36 46 PM

Screenshot 2024-12-16 at 2 43 24 PM

Screenshot 2024-12-16 at 2 41 01 PM

@aaronxsu aaronxsu mentioned this pull request Dec 16, 2024
10 tasks
@KlaasH KlaasH force-pushed the feature/kjh/sample-analysis-results-fixture branch 2 times, most recently from 3f5e376 to d23c897 Compare December 17, 2024 20:10
@KlaasH
Copy link
Contributor Author

KlaasH commented Dec 17, 2024

I've added some README changes to this branch, some of which will hopefully help address your questions for future users/developers.

Re your specific questions:

I clicked on one of the UUIDs in that path, and it has a bunch of CSV files and JSON files. Are they all analysis output files? Did you zip them all?

Yes, all the files under a given UUID in the results/ directory on the storage bucket are the output files for that analysis job. So to make the import files, I downloaded them all and zipped them up.

Is there a shared admin account for local dev?

Yes, there's an admin account that gets created by the initial migrations: [email protected] / root

the following places' detail page gave me "No results available"

It seems like the non-US places worked partly and the US places hardly worked at all for you. I think part of the answer is in the S3 bucket names of those failing links, which start with "None", and the new list of environment variables to set that I put in the README changes should help with that—the bucket names are trying to include the DEV_USER variable, which you don't have set.

But also, this is making me realize that this might have seemed to work better for me than it actually will work in general, because I'm using the same buckets each time. Those files get written to S3 when you do the import, so the khoekema ones exist, but anyone else loading the fixture would not have them, so all those files will be missing.

I need to give that part some thought. I don't think it makes sense to try to use a shared dev bucket, since that could cause some crazy bugs from cross-user dependencies/data changes. But I might need to add a command to the data-loading section of the update script to sync the files that go with the fixture from a shared source into the developer's S3 bucket.

@KlaasH
Copy link
Contributor Author

KlaasH commented Dec 19, 2024

Ok, I gathered up the results files that the app expects to find on S3 and put them in the test-pfb-inputs as well, and added a line to the update command to copy them into your personal development S3 bucket after it imports the fixture.

So that should take care of the downloadable files being missing and the US cities not working (they were failing because the service that loads place and job info from the API was trying to use some of the S3 files, but only for US cities).

I also realized that the dev buckets need a CORS policy, so I added that step to the README (along with the existing command to set the bucket access policy).

Can you run through the setup process again, following the modified README on this branch, and see if it works better this time? I think it should.

@KlaasH KlaasH requested a review from aaronxsu December 19, 2024 21:28
Copy link
Member

@aaronxsu aaronxsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran through the updated steps in README, it worked great. I also tested the above mentioned failing cases and they are now fixed.

I'd add the .env file to .gitignore before merging.

Adds some lines to the 'update' script to download a data fixture containing
a few neighborhoods and analysis results and import it into the database.

Because the import is a bit slow and only needs to be done on initial setup,
this also modifies the 'update' script to take a '--load-data' argument, so
that it will only load the fixture (and load crash data, which also takes
some time and doesn't need to be repeated) on demand, not by default.

The fixture itself is on S3, in the 'test-pfb-inputs' bucket, and the script
automatically downloads it.

Also in S3: the supporting files, like downloadable geometries and GeoJSON
files of destination locations, that get saved to S3 and used by the front end
in various ways. Those have also been added to the 'test-pfb-inputs' bucket
and 'update' has an `aws s3 sync` command to copy them to the user's
development S3 bucket.
We already hard-coded a user UUID in this migration, possibly to facilitate
moving fixtures between development environments. This sets the organization
UUID to a fixed value as well, to facilitate transferring neighborhoods in
a fixture.
Adds a few details and some additional organization to make the README
more accurate and usable, especially for people using docker-on-host rather
than Vagrant.
@KlaasH KlaasH force-pushed the feature/kjh/sample-analysis-results-fixture branch from 96a708f to 0ed0693 Compare December 20, 2024 19:14
@KlaasH KlaasH merged commit 921c076 into develop Dec 20, 2024
1 check passed
@KlaasH KlaasH deleted the feature/kjh/sample-analysis-results-fixture branch December 20, 2024 19:19
@KlaasH
Copy link
Contributor Author

KlaasH commented Dec 20, 2024

I realized shortly after merging this that I should have copied the neighborhood boundary files over as well. They're not used by the front end (as far as I know) once the neighborhood is created, but they're used by the analysis. So I pushed a commit to develop to copy those files over to the dev S3 bucket as well (f564783).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create fixture to populate local dev database
2 participants