Add fixture with sample analysis results #962

KlaasH · 2024-12-10T20:38:29Z

Overview

This edits the setup script (technically the update script, which does most of the setup) to download and import a fixture containing a few neighborhoods and analysis results. Specifically neighborhoods and results for:

Grand Rapids, MI
Mackinac Island, MI
Pella, IA
Presque Isle, ME
Thunder Bay, ON, Canada
Groningen, Netherlands

These particular ones were chosen mostly-arbitrarily by me, with a goal of keeping them smallish but not all tiny, with some in the US and some not. (And Mackinac Island made the cut because it's the #1 low-stress connectivity champion, owing to the fact that cars are banned from the island.)

I also added one neighborhood, Montpelier, VT, for which I didn't import the analysis results. Because it seems like it might be useful to have a neighborhood with no current analysis ready to go so we can run an analysis and not have to check if what we're seeing afterwards is the result of that or an earlier one.

To load the fixture, run ./scripts/update --load-data. I made it optional because it really only needs to be done on initial setup, and it will waste time to import it again on subsequent updates (I also put the import_crash_data step in the same conditional block, for the same reason).

Resolves #959

Demo

Notes

I assembled the data by copying it from production, i.e.:

Downloading the boundary files from s3://production-pfb-storage-us-east-1/neighborhood_boundaries/root/
Finding the job UUIDs of the latest job for each neighborhood via the web interface, then downloading the analysis output files for that job from s3://production-pfb-storage-us-east-1/results/
Bundling up the files for each job into a .zip file
Adding the neighborhoods through the Neighborhoods admin page
Importing the analysis results through the Jobs admin page

Then I exported the fixture and uploaded it to S3:

./scripts/django-manage dumpdata --indent 2 -o sample_analysis_results.json.gz \
    pfb_analysis.AnalysisJob pfb_analysis.Neighborhood pfb_analysis.NeighborhoodWaysResults \
    pfb_analysis.CensusBlocksResults
aws --profile pfb s3 cp sample_analysis_results.json.gz "s3://test-pfb-inputs/"

Testing Instructions

Delete your existing dev instance with docker compose down -v (this is necessary because you need your organization_id to match what the fixture expects)
Run ./scripts/update --load-data to rebuild your instance and import the data
Run ./scripts/server and go to http://localhost:9301/#/. You should see the imported results in the "Leading Cities/Towns" panel on the landing page, and on the list if you click "All places"

Checklist

Add entry to CHANGELOG.md

aaronxsu

Thanks for the notes on how you assembled the data.

Finding the job UUIDs of the latest job for each neighborhood via the web interface, then downloading the analysis output files for that job from s3://production-pfb-storage-us-east-1/results/

I clicked on one of the UUIDs in that path, and it has a bunch of CSV files and JSON files. Are they all analysis output files? Did you zip them all?

Adding the neighborhoods through the Neighborhoods admin page

Is there a shared admin account for local dev? Or should we create one by running the management command createsuperuser?

After running through the testing steps, the following places' detail page gave me "No results available"- are they intended to show no results?

Grand Rapids (the screenshot is from this)
Mackinac Island
Pella
Presque Isle

aaronxsu · 2024-12-16T19:42:35Z

Another thing I found: downloading the files in the red box shown below doesn't seem to work, i.e. the files don't exist in the S3 location. Some S3 requests also failed as shown in the network tab. I noticed that they all look for files in the None-pfb-storage-us-east-1 bucket. I assume the first part of the bucket name should be the env like prod or dev or something. Should I pass an env var in local dev? Or are these locations being constructed on the API level? Or are they from a column in the DB?

KlaasH · 2024-12-17T20:10:26Z

I've added some README changes to this branch, some of which will hopefully help address your questions for future users/developers.

Re your specific questions:

I clicked on one of the UUIDs in that path, and it has a bunch of CSV files and JSON files. Are they all analysis output files? Did you zip them all?

Yes, all the files under a given UUID in the results/ directory on the storage bucket are the output files for that analysis job. So to make the import files, I downloaded them all and zipped them up.

Is there a shared admin account for local dev?

Yes, there's an admin account that gets created by the initial migrations: [email protected] / root

the following places' detail page gave me "No results available"

It seems like the non-US places worked partly and the US places hardly worked at all for you. I think part of the answer is in the S3 bucket names of those failing links, which start with "None", and the new list of environment variables to set that I put in the README changes should help with that—the bucket names are trying to include the DEV_USER variable, which you don't have set.

But also, this is making me realize that this might have seemed to work better for me than it actually will work in general, because I'm using the same buckets each time. Those files get written to S3 when you do the import, so the khoekema ones exist, but anyone else loading the fixture would not have them, so all those files will be missing.

I need to give that part some thought. I don't think it makes sense to try to use a shared dev bucket, since that could cause some crazy bugs from cross-user dependencies/data changes. But I might need to add a command to the data-loading section of the update script to sync the files that go with the fixture from a shared source into the developer's S3 bucket.

KlaasH · 2024-12-19T21:28:38Z

Ok, I gathered up the results files that the app expects to find on S3 and put them in the test-pfb-inputs as well, and added a line to the update command to copy them into your personal development S3 bucket after it imports the fixture.

So that should take care of the downloadable files being missing and the US cities not working (they were failing because the service that loads place and job info from the API was trying to use some of the S3 files, but only for US cities).

I also realized that the dev buckets need a CORS policy, so I added that step to the README (along with the existing command to set the bucket access policy).

Can you run through the setup process again, following the modified README on this branch, and see if it works better this time? I think it should.

aaronxsu

I ran through the updated steps in README, it worked great. I also tested the above mentioned failing cases and they are now fixed.

I'd add the .env file to .gitignore before merging.

Adds some lines to the 'update' script to download a data fixture containing a few neighborhoods and analysis results and import it into the database. Because the import is a bit slow and only needs to be done on initial setup, this also modifies the 'update' script to take a '--load-data' argument, so that it will only load the fixture (and load crash data, which also takes some time and doesn't need to be repeated) on demand, not by default. The fixture itself is on S3, in the 'test-pfb-inputs' bucket, and the script automatically downloads it. Also in S3: the supporting files, like downloadable geometries and GeoJSON files of destination locations, that get saved to S3 and used by the front end in various ways. Those have also been added to the 'test-pfb-inputs' bucket and 'update' has an `aws s3 sync` command to copy them to the user's development S3 bucket.

We already hard-coded a user UUID in this migration, possibly to facilitate moving fixtures between development environments. This sets the organization UUID to a fixed value as well, to facilitate transferring neighborhoods in a fixture.

Adds a few details and some additional organization to make the README more accurate and usable, especially for people using docker-on-host rather than Vagrant.

KlaasH · 2024-12-20T20:05:21Z

I realized shortly after merging this that I should have copied the neighborhood boundary files over as well. They're not used by the front end (as far as I know) once the neighborhood is created, but they're used by the analysis. So I pushed a commit to develop to copy those files over to the dev S3 bucket as well (f564783).

KlaasH requested a review from aaronxsu December 10, 2024 20:38

KlaasH assigned aaronxsu Dec 10, 2024

aaronxsu reviewed Dec 16, 2024

View reviewed changes

aaronxsu mentioned this pull request Dec 16, 2024

Upgrade Django #961

Open

10 tasks

KlaasH force-pushed the feature/kjh/sample-analysis-results-fixture branch 2 times, most recently from 3f5e376 to d23c897 Compare December 17, 2024 20:10

KlaasH requested a review from aaronxsu December 19, 2024 21:28

aaronxsu approved these changes Dec 20, 2024

View reviewed changes

KlaasH added 3 commits December 20, 2024 14:10

Set organization UUID in initial migration

8430387

We already hard-coded a user UUID in this migration, possibly to facilitate moving fixtures between development environments. This sets the organization UUID to a fixed value as well, to facilitate transferring neighborhoods in a fixture.

Update and clarify README, esp for non-Vagrant users

0ed0693

Adds a few details and some additional organization to make the README more accurate and usable, especially for people using docker-on-host rather than Vagrant.

KlaasH force-pushed the feature/kjh/sample-analysis-results-fixture branch from 96a708f to 0ed0693 Compare December 20, 2024 19:14

KlaasH merged commit 921c076 into develop Dec 20, 2024
1 check passed

KlaasH deleted the feature/kjh/sample-analysis-results-fixture branch December 20, 2024 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fixture with sample analysis results #962

Add fixture with sample analysis results #962

KlaasH commented Dec 10, 2024

aaronxsu left a comment

aaronxsu commented Dec 16, 2024 •

edited

Loading

KlaasH commented Dec 17, 2024

KlaasH commented Dec 19, 2024

aaronxsu left a comment

KlaasH commented Dec 20, 2024

Add fixture with sample analysis results #962

Add fixture with sample analysis results #962

Conversation

KlaasH commented Dec 10, 2024

Overview

Demo

Notes

Testing Instructions

Checklist

aaronxsu left a comment

Choose a reason for hiding this comment

aaronxsu commented Dec 16, 2024 • edited Loading

KlaasH commented Dec 17, 2024

KlaasH commented Dec 19, 2024

aaronxsu left a comment

Choose a reason for hiding this comment

KlaasH commented Dec 20, 2024

aaronxsu commented Dec 16, 2024 •

edited

Loading