Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autosplit large queries #254

Open
Mashin6 opened this issue Nov 30, 2021 · 3 comments
Open

Autosplit large queries #254

Mashin6 opened this issue Nov 30, 2021 · 3 comments

Comments

@Mashin6
Copy link
Contributor

Mashin6 commented Nov 30, 2021

Currently if query times out or runs out of memory the request is terminated. Given that the amount of data in OSM is growing it might be worth thinking about how to help user in these situations.
A possibility is that if a query fails, the bbox is split in the middle into four equal rectangles and each are submitted one after another. If any of them fails then it is recursively split until the correct size is reached when the query successfully retrieves data.

Similarly as here:
https://github.com/ZeLonewolf/osm-overpass-scripts/blob/7bf5ff10948f50ee5f1e33b08a9d0852274094fb/get_tag_density_map.sh#L139

Notes:

  • Might be good to ask user before proceeding with bbox splitting
  • Needs a way how to handle possible duplicates retrieved at the bbox borders.
@mpadge
Copy link
Member

mpadge commented Nov 30, 2021

Thanks @Mashin6, I've actually got code elsewhere that does exactly that. The handling of duplicates is already in-built via the c operator; you just have to expand each bbox part so that there is some non-zero overlap, then combining all with c will automatically de-duplicate everything. The real issue here (in my opinion at least) is that enabling that would extend an open invitation for people to really start abusing the overpass server. And so: my interim suggestion would be, at least initially, to add an extra vignette on the general procedure. Would you be interested in dong that?

@Mashin6
Copy link
Contributor Author

Mashin6 commented Dec 4, 2021

I've never wrote a vignette, but I could give it a try.

By general procedure you mean how to split bbox, run several separate queries and then manually merge the results into one?

A solution to the extensive server load could be restricting this type of querying to kumi OP only. They don't really have any restrictions about usage.
As a warning, kumi has broken attic data so it is not good for using queries that contain datetime = or datetime2 =

mpadge added a commit that referenced this issue Jan 3, 2022
@mpadge
Copy link
Member

mpadge commented Jan 3, 2022

@Mashin6 yes, a vignette on splitting a bbox, running separate queries, and merging results is exactly what i mean. I've started a generic vignette template in #262 - feel free to extend however you like. If you do contribute, please make sure you add your name both to the author field of that vignette and to the DESCRIPTION file as ctb. Thanks!

mpadge added a commit that referenced this issue Jan 23, 2022
add query-split vignette for #254 - Thanks @Mashin6 for this really useful contribution. I just tweaked the text a bit, mostly to more explicitly state what each bit of code actually does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants