-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check population density around stops #870
Conversation
* before they are transformed to WGS84: the origin of a common French coordinate system is in the Sahara. | ||
*/ | ||
private void validateStopPopulationDensity () { | ||
BooleanAsciiGrid popGrid = BooleanAsciiGrid.forEarthPopulation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this population grid loaded and used anywhere else? I'm wondering if we should be concerned about resource management or additional time delays here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was only used by a GTFS validator that was not included in r5 when we moved the gtfs-lib code in, so is not currently used elsewhere. This is a 4.3 kb text file containing ones and zeros which gets loaded into a BitSet, so should amount to a few hundred bytes in memory. There would be some cost of reading that file and creating that object, but this is in the context of loading and validating GTFS which involves loading large volumes of numbers from text files into a database. So I'd say it's a one-off cost per feed that's a similar action to what GTFS ingestion already requires, but several orders of magnitude smaller. In principle a single instance of the grid could be cached, but at a few hundred bytes it's so tiny that I don't think it will have any impact.
As we discussed in a recent call, I have two minor concerns about possible support questions that may arise from this newly enabled validation. I tried to address both in my commit here.
|
bb50f3f
to
bd0df44
Compare
bd0df44
to
07ebb7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abyrd I added some changes we discussed and disabled auto-merge so you can take a look before merging.
Thanks for making the changes we discussed. A couple of notes for future reference: This PR does not introduce any tests, but the boolean ASCII grid was already present before the PR and has an automated test. It just wasn't wired up as a validator for uploaded GTFS feeds. That test uses some hard-coded coordinates rather than reading and validating a full GTFS feed. I had already tested that the SuspectStopLocationError introduced by this PR was produced by feeds with strange stop placement and that the error was visible in the UI. The final commit on this PR just changes the message, not anything else about the error generation process, so it should still work. We probably still want to test this out manually when preparing the next release. |
While working on duplicate ID checks I noticed we were running tests on this validator but not applying it to the stops.
Results seem good after testing out on several feeds. Null island ferry results in warnings, most other feeds produce no warnings. Some places like Montreal get warnings on particular routes that have stops in mountainous parkland (Parc des Sept Chutes) but this makes sense.
I'm creating this as a draft. This is not a critical validation rule, and the population density thresholds might benefit from some tweaking.
I also noticed that we don't return the affected entity ID in
com.conveyal.analysis.models.Bundle.GtfsErrorSummary
, so we end up generating a lot of custom messages that contain the entity ID, rather than providing structured data that would allow the UI to display the ID together with the filename, line number, and field name.