Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up trim_osmdata #178

Open
mpadge opened this issue Jul 12, 2019 · 5 comments
Open

Speed up trim_osmdata #178

mpadge opened this issue Jul 12, 2019 · 5 comments

Comments

@mpadge
Copy link
Member

mpadge commented Jul 12, 2019

Because I finally had call to trim a huge data set (New York City within the official boundary polygon => around 700,000 vertices submitted to sp::point.in.polygon or sf::st_within). The latter especially does not scale well at all, and took something like half an hour. I should just bundle clipper like I already have in moveability and use that instead. That should make it entirely scalable.

@FlxPo
Copy link

FlxPo commented Nov 2, 2021

I'm using osmium extract to operate directly on .osm files, based on a boundary stored in a geojson file, could it work for you ? The performance is quite good.

I'm using it through system calls to the Windows Subsystem for Linux, from R, so it might be tricky to integrate it with in a stand alone R package.

@mpadge
Copy link
Member Author

mpadge commented Nov 2, 2021

Yeah, for that kind of operation, osmium is by far the best. On my TODO list is wrapping the src code of that as an R package. I'll get to it one day ... until then, the command line suffices.

@Mashin6
Copy link
Contributor

Mashin6 commented Nov 30, 2021

Another option is to do the trimming on the server side (also means less downloaded data).

Possibility 1.:

  • Run Nominatim query and retrieve osm_id.
    getbb("new York City", format_out = "data.frame") returns osm_id: 175905.
  • Retrieve OP pre-calculated area. You need to adjust the id: for relations id+3600000001; for ways id+2400000001. See: OP area filters
    In this case: area(id:3600175905)->.a;
  • Filter results by area
    node[natural=tree](area.a);

Full query:

[out:json][timeout:250];
area(id:3600175905)->.a;
node[natural=tree](area.a);
out body;




Possibility 2.:

  • Run Nominatim as before
  • Retrieve respective way or relation based on the osm_id
    In this case rel(id:175905);
  • Convert to area
    map_to_area->.a;
  • Filter by area
    node[natural=tree](area.a);

Full query:

rel(id:175905);
map_to_area->.a;
node[natural=tree](area.a);
out body;

@mpadge
Copy link
Member Author

mpadge commented Nov 30, 2021

Yes indeed that would be useful @Mashin6, and better in all ways. One way to achieve it might be to introduce yet another trim function that gets piped before the main call, so we'd have a workflow like:

opq(...) |>
    add_osm_feature(...) |>
    overpass_trim(...) |>
    osmdata_<whatever>()

There'd still be a use case for both forms, because area polygons don't always exist, and the current trim_osmdata() function is intended (among other things) to enable data to be trimmed to entirely arbitrary polygons.

If you'd be interested in contributing more directly, please feel free to start a pull request to develop this further. Note also that #252 will require some kind of initial function to determine or validate an OSM area for a given nominatim query - just to check that the string corresponds to a single OSM relation ID. That would then also be used here.

@Mashin6
Copy link
Contributor

Mashin6 commented Dec 4, 2021

I agree. Having an option to trim locally by a custom polygon is a useful feature. I will start a new issue for the server side trimming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants