Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/xyz routes #88

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

iacopoff
Copy link

So, the service is quite simple, however there are few design choices that may not be ideal:

  • Data projection: The service's tiling system uses morecantile, therefore one of the available CRS projections can be passed to the tiling function. The input data projection should match that CRS, however, in case there is a mismatch, the service does not try to re-project the data as it expects the users to take care of that. If that sounds OK, then I think there should be a check that raises an error if the CRS don't match.
  • Data validation: For simplicity the service requires that data spatial dimensions are named 'x' and 'y'. The function that checks that is defined in the router function. Is there a better way (or place) to implement a data input validation functionality?

Copy link
Contributor

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait @iacopoff. I've played with the example notebooks and it looks very cool!

I left some minor comments.

The notebooks are very helpful, but I'm wondering if this repository is the right place for it. We should try to keep simple the management of the docs and the dependencies. Xpublish is front-end agnostic and FastAPI already provides good tools for documenting the API endpoints.

xpublish/rest.py Outdated Show resolved Hide resolved
xpublish/routers/xyz.py Outdated Show resolved Hide resolved
xpublish/routers/xyz.py Outdated Show resolved Hide resolved
@iacopoff
Copy link
Author

iacopoff commented Aug 5, 2021

Hi @benbovy, I think I have opened this PR too early as I am kind of rethinking the design as I am understanding better xpublish and thinking also how to accommodate the future wmts router. However, it is still very helpful to get your comments at this stage.

Regarding the notebooks, I can definitely delete them. The docstrings will be enough.

I agree with you that a sort of factory class, such as Titiler Router factory, will probably solve the problem of passing options to the individual routers or add data validation logic.

There are 2 sets of parameters that are required by tiling and image production:

  1. Tiling requires a CRS, unless we default to the PseudoMercator.
  2. Image creation requires colours mapping parameters. For the simpler xyz router datashader is enough. For the wmts router I would like to be able to create a colour bar as well to add to the getCapabilities.xml, if possible. That means I need to use, for example, matplotlib.

Regarding point 2. I am not sure whether you think that is outside the scope of this project.

Thanks!

@benbovy benbovy mentioned this pull request Aug 6, 2021
xyz_router = XYZRouter()


@xyz_router.get("/tiles/{layer}/{time}/{z}/{x}/{y}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make less assumptions here about the structure of the dataset. I'd rather see something like /tiles/{var}/{z}/{x}/{y} and allow setting time or any other extra dimension(s) as a query parameter.

Allowing flexible image formats /tiles/{var}/{z}/{x}/{y}.{format} would be great.

(If later xarray supports multiscale datasets pydata/xarray#4118 pydata/xarray#5376 it will be nice to have /tiles/{var}/{z}/{x}/{y}@{scale}x too)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbovy

Perhaps we could make less assumptions here about the structure of the dataset. I'd rather see something like /tiles/{var}/{z}/{x}/{y} and allow setting time or any other extra dimension(s) as a query parameter.

yes it makes sense to be minimalist in the assumption about data structure.
In general the get_tile operation should be used by every tile services (XYZ, WMTS...). Different tile services can then implement a certain logic on how other dimensions (time included) can be managed.

These are some references about time dimension in WMTS:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbovy which other image formats do you think it should support?

@benbovy
Copy link
Contributor

benbovy commented Aug 6, 2021

I started adding router factory classes in #89.

Regarding data validation, I think that it would be better to expose the x and y dimensions as query parameters. Using a router factory class, it will be possible to override the default values (e.g., lon and lat).

Regarding the color mapping parameters, I'm not sure to know why matplotlib would be required unless we want to support by default all colormaps available in matplotlib (I'd be OK with that). There is also colorcet on which datashader depends.

xpublish/routers/xyz.py Outdated Show resolved Hide resolved
@davidbrochart
Copy link

It looks very promising @iacopoff! In fact, I'd like to base xarray-leaflet on xpublish when this PR gets in, as there is a lot of overlap.
In xarray-leaflet we use the Jupyter Server to serve the tiles, but I'm thinking this is not a great idea. One of the drawbacks is that the server runs in another process than the kernel, which means we have to poll to check if the tile files are generated (by the kernel) before the server can send them. I changed that in xtrude, where I use aiohttp to run a server inside the kernel, in an async task.
@benbovy I'm not sure how xpublish works, is it like a server that runs in a blocking way, or does it allow to run async code concurrently? In the later case, that would work with xarray-leaflet.
I recently added support for a colorbar in xarray-leaflet, based on matplotlib, but I think it is a bit overkill for just using colormaps, and colorcet looks great.
It looks like you're also using rioxarray to do the reprojection, but on the whole data array. In xarray-leaflet we reproject each tile individually. I was wondering if you expect the data array to be chunked, otherwise it could take up a lot of memory/CPU. Or maybe it is done lazily on the tile slices?
Also, in xarray-leaflet we have hooks at different levels of the data pipeline to allow for data transformation. The default transformation coarsens the data in order to get approximately the same resolution as the tile, and thus have an efficient reprojection. This is illustrated in this static map example. But you can customize the transformations as shown in this example of a dynamic map (you might want to run this one locally as sometimes you get a very slow machine in Binder, and dynamic maps are more CPU intensive). Would it make sense to have support for that in this PR?
Anyways, great work and looking forward to it!

@benbovy
Copy link
Contributor

benbovy commented Aug 8, 2021

Thanks for chiming in @davidbrochart.

I agree that there is quite some overlap with xarray-leaflet, at least for everything between an xarray dataset (it may be chunked data) and the created image tiles (projection / transform, mapping options, etc.). It would be great if somehow we could join efforts on this.

Since xpublish is front-end agnostic it might make sense to implement this functionality here. In fact, I think we could probably pick up many things already implemented in xarray-leaflet since it is in a more advanced stage of development regarding the generation of the tiles.

However, I'm not sure how easy / hard would be for xarray-leaflet to rely on xpublish for serving the tiles:

  • A fastapi application like xpublish usually runs via uvicorn in a blocking way, although it seems possible to run it in a separate thread: Uvicorn cannot be shutdown programmatically encode/uvicorn#742 (comment). I haven't found anything yet about running it as an async task. Fastapi's path operation functions may be run asynchronously but I don't have experience enough with doing both multi-threading and async programming in Python to know if it's working well together.
  • Xpublish a-priori serves a static collection of xarray Datasets, but we could probably imagine passing a mutable dictionary to xpublish.Rest. If we are running the server in a separate thread I guess it should be OK since the dictionary should never be updated in the server thread.
  • In Xpublish tiles are both generated and served on the server side. I'm not sure what would be the best approach for hooks like in xarray-leaflet. Serializing custom transform functions and passing them through the REST API? Or execute the transformations in the main kernel thread and give the transformed dataset to xpublish?

@benbovy
Copy link
Contributor

benbovy commented Aug 8, 2021

Also, in xarray-leaflet we have hooks at different levels of the data pipeline to allow for data transformation. The default transformation coarsens the data in order to get approximately the same resolution as the tile, and thus have an efficient reprojection. [...] Would it make sense to have support for that in this PR?

Yes I think it makes a lot of sense (at least for the default transformation), in this PR or in a follow-up PR.

@benbovy
Copy link
Contributor

benbovy commented Aug 8, 2021

Xpublish a-priori serves a static collection of xarray Datasets, but we could probably imagine passing a mutable dictionary to xpublish.Rest.

Alternatively, we could create one server per dataset. Not sure it makes sense to have multiple (many!) threads with each their own event loop, though.

@iacopoff
Copy link
Author

iacopoff commented Aug 12, 2021

Hi @davidbrochart, thanks for your inputs!

It looks like you're also using rioxarray to do the reprojection, but on the whole data array. In xarray-leaflet we reproject each tile individually. I was wondering if you expect the data array to be chunked, otherwise it could take up a lot of memory/CPU. Or maybe it is done lazily on the tile slices?

The xyz service so far assumes the dataset projection == map projection, this is required for the tiling to work. In few words the user should take care of the reprojection outside xpublish. I thought this was good to reduce dependencies and to keep the code simple. Regarding chunking, the rioxarray reproject method persists the data in memory, so either you save the reprojected data to disk and then read it again in chunks or indeed it may not fit in memory if the dataset is large.
At the moment if your dataset is chunked then xpublish will persist only once the image is created by datashader shade so on the individual tiles. Do you think the reprojection functionality should be included in xpublish at the tiling level?

Also, in xarray-leaflet we have hooks at different levels of the data pipeline to allow for data transformation. The default transformation coarsens the data in order to get approximately the same resolution as the tile, and thus have an efficient reprojection.

This is a nice functionality, I was thinking that Datashader also supports some of the transformation you probably are referring to by sampling the raster in The Datashader's raster. I would probably keep this development for the next PR, but it is good to think about it now.

I recently added support for a colorbar in xarray-leaflet, based on matplotlib, but I think it is a bit overkill for just using colormaps, and colorcet looks great.

Also to reply to @benbovy, I think that for a simple xyz service we don't need a colorbar. However, if we are going to develop a WMTS, then it would be good to add a colorbar and legend and it seems that datashader alone can't do that How can I get legends and colorbars for my Datashader plot?¶. So this may still be something for a later PR?

@benbovy thanks for developing the router factory! I will have a look at it.
Regarding the server I think it makes sense to make it non-blocking. From my side I think this may need a bit more research on how to do it properly, but I guess it fits well within the scope of this PR?

@benbovy
Copy link
Contributor

benbovy commented Aug 12, 2021

Regarding the server I think it makes sense to make it non-blocking. From my side I think this may need a bit more research on how to do it properly, but I guess it fits well within the scope of this PR?

This could be done in another PR I think. I'm not sure this would be really useful in most of uses cases, but if that makes sense for xarray-leaflet and/or other use cases we may add a run_in_thread option (defaults to False) to xpublish.Rest.serve() and implement the solution that I mention in my previous comment.

@iacopoff
Copy link
Author

iacopoff commented Sep 9, 2021

Hi @benbovy, I have drafted the xyz-router-factory following #89. I think it works well particularly for setting optional parameters as you suggested.

Now, I guess this PR will eventually go through after #89, right?

@davidbrochart
Copy link

A fastapi application like xpublish usually runs via uvicorn in a blocking way, although it seems possible to run it in a separate thread: Uvicorn cannot be shutdown programmatically encode/uvicorn#742 (comment). I haven't found anything yet about running it as an async task.

It is possible to run a FastAPI application as an async task using uvicorn, since uvicorn.Server.serve() is async (see encode/uvicorn#541 (comment)).

In Xpublish tiles are both generated and served on the server side. I'm not sure what would be the best approach for hooks like in xarray-leaflet. Serializing custom transform functions and passing them through the REST API? Or execute the transformations in the main kernel thread and give the transformed dataset to xpublish?

Since the server will run in the Jupyter kernel as an async task, passing the transformation functions to the FastAPI app shouldn't be a problem, right?

@iacopoff
Copy link
Author

@davidbrochart, I was experimenting a bit trying finding ways for running uvicorn as non-blocking server and I came up with two options:

I am not familiar enough with async programming in Python, maybe you could tell if the second option is what you would need to make it work in xarray-leaflet.

I think it is possible to pass the transformation functions, they could be defined as class parameters, like in the xyz router factory class.

@davidbrochart
Copy link

maybe you could tell if the second option is what you would need to make it work in xarray-leaflet.

Yes, the second option would be best.

@iacopoff
Copy link
Author

iacopoff commented Dec 1, 2021

Hi @benbovy and @davidbrochart, in the last few days I have progressed a bit on the xyz router development, as some time in the future I will need this feature for another project.
However, I have been working in another branch that benefits from the refactoring in #89 .

I think this current PR should be closed in favour of xyz-router-factory.
Let me know what are your thoughts on this matter.

There are two main improvements in this xyz service:

  • Transformers which are callbacks that can be passed as parameters in the instantiation of the XYZFactory class.
  • Renders that take care of the tiles' colour mapping. There are two, one based on datashader (default) and the other based on matplotlib.

I have created a repo with some notebooks that show the usage of transformers (where I basically follow @davidbrochart's dynamic.ipynb ) and renders.

Any feedback is much appreciated!

thanks

@davidbrochart
Copy link

@iacopoff
Copy link
Author

iacopoff commented Dec 2, 2021

@davidbrochart ops, it was set to private, I have changed to public :)

@davidbrochart
Copy link

I looked at the xyz_server-tiff.ipynb and xyz_client-tiff.ipynb notebooks, this looks good! You've basically reimplemented xarray-leaflet.
I'm also interested in using xpublish as a jupyverse plugin. Since both use FastAPI, this is a good match. That would allow us to create a JupyterLab extension for Zarr visualization, in 2D using Leaflet or in 3D using deck.gl (the equivalent of xarray-leaflet and xtrude).

@benbovy
Copy link
Contributor

benbovy commented Dec 6, 2021

@iacopoff this looks great indeed! I should find some time to finish #89, so that we can then merge your work on a XYZ router into the main branch!

@davidbrochart jupyverse looks very interesting.

@iacopoff
Copy link
Author

Hey @benbovy, any chance that you will have a look at #89?

@benbovy
Copy link
Contributor

benbovy commented Feb 21, 2022

Hey @iacopoff, sorry for my long absence here! Yeah I should definitely find some time to finish #89. I need to check but I think it is mostly done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants