Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of shapes trip_distance_exceeds_shape_distance supports #1589

Closed
3 tasks
davidgamez opened this issue Sep 27, 2023 · 5 comments
Closed
3 tasks
Assignees
Milestone

Comments

@davidgamez
Copy link
Member

davidgamez commented Sep 27, 2023

Description

As part of the investigation of #1587, we came across a feed to ~40M of shapes to validate. The current validator's implementation needs to be optimized to handle large feeds; see #1358. The recently merged #1553 introduced validation on shapes that expands resource usage on the validator. As a temporary solution until #1358 provides a better way to manage large files, we propose limiting the number of the shapes that trip_distance_exceeds_shape_distance's validator supports.

Tasks:

  • Investigate an approximate higher number of shapes that trip_distance_exceeds_shape_distance can support.
  • Implement logic to skip the validator when the shape count is above the limit and log why the validator is being skipped.
  • Add the validator to the list of validators that didn't run due to limit
@emmambd
Copy link
Contributor

emmambd commented Mar 13, 2024

@cka-y Is this needed after the work done in #1676?

@cka-y
Copy link
Contributor

cka-y commented Mar 13, 2024

@davidgamez could you share the feed containing the 40M shapes, so I can provide a well-informed answer to @emmambd's question, please?

@emmambd
Copy link
Contributor

emmambd commented Mar 13, 2024

@emmambd emmambd added this to the Next milestone Mar 13, 2024
@cka-y
Copy link
Contributor

cka-y commented Mar 13, 2024

From the master branch, here’s what I found regarding TripAndShapeDistanceValidator performance for the example feed:

  • With the validator, validation takes about 199 seconds.
  • Without it, it's around 183 seconds.

The difference is roughly 16 seconds, which seems reasonable for the volume of shapes (~40M) we're handling. I believe we don’t currently have data on much larger sets of shapes, but so far, it doesn’t seem necessary to make further adjustments.

Thoughts?

@emmambd
Copy link
Contributor

emmambd commented Mar 21, 2024

Closing given there were significant improvements in the last pre release analytics: #1703

@emmambd emmambd closed this as completed Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants