You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
geedim is a Python package that supports downloading EE images with automatic tiling to bypass file size limits. I've been wanting to improve the download system in wxee for a while (see #19), and using geedim might be a good way to do that with the added bonus of removing most of the low-level thread and tempfile management that causes a lot of headaches. Ideally, I would replace the entire image downloading system with geedim, both for to_tif and for to_xarray.
It will be quite a bit of work just to figure out how feasible this is, so I'm going to start keeping track of and checking off potential incompatibilities below as I figure them out.
Possible Issues
Parallelizing - geedim uses threads to download tiles of large images whereas wxee uses threads to download images within collections. I'll need to figure out the feasibility of parallelizing on both dimensions or else download speed would tank on large collections of small images, which is the primary focus of wxee.
Download progress - geedim tracks progress of image tiles whereas I need to track progress of images in collections (or both would be fine). I give separate progress bars for retrieving data (requesting the download URLs) and the download itself because the URL request can take a lot of time, and I don't think this will be possible with geedim.
Tempfiles - I don't believe geedim supports tempfile outputs, but that's typically what you want when converting to xarray. I don't want to have to manage files manually, so I'll need to think more about how this will work. Maybe just create temp directories and download into them?
File-per-band - geedim automatically sets filePerBand=False for all downloads. I'll need to do some rewriting to load xarray objects from multi-band images, but that may improve performance on the IO side by reading/writing fewer files.
Masking - wxee takes a nodata argument and replaces masked values with that. After downloading, it sets that value in the image metadata or xarray.Dataset. geedim takes a different approach of adding a "FILL_MASK" band to the image before downloading. The advantage of the geedim approach is that you don't need to choose between exporting everything as a float or risking assigning nodata to real values, but it does require downloading more data from EE, and once you actually get the image into xarray and mask it there's no advantage since xarray will promote everything to float64 anyways to accommodate NaN values. I'll probably live with the geedim approach by applying and removing the mask band after downloading, but I should do some experiments to see how that affects performance (and to make sure I'm fully understanding the geedim approach).
Solved Issues
Setting filenames - The geedim.MaskedImage class exposes and caches EE properties, so building filenames from metadata is straightforward. The only consideration is that we need to persist that MaskedImage instance throughout the download process to avoid having to retrieve properties multiple times.
The text was updated successfully, but these errors were encountered:
Worth noting that ee.Image.getDownloadURL now supports a format parameter that can be used to download images directly instead of to an intermediate ZIP. That would further simplify and speed up IO if downloading was done with filePerBand=False.
I've decided not to pursue this further. There are a few too many complications in directly integrating wxee and geedim to be worth tackling. I also have reservations around bypassing the Earth Engine file size limits, since those are obviously in place for a reason.
geedim is a Python package that supports downloading EE images with automatic tiling to bypass file size limits. I've been wanting to improve the download system in wxee for a while (see #19), and using
geedim
might be a good way to do that with the added bonus of removing most of the low-level thread and tempfile management that causes a lot of headaches. Ideally, I would replace the entire image downloading system withgeedim
, both forto_tif
and forto_xarray
.It will be quite a bit of work just to figure out how feasible this is, so I'm going to start keeping track of and checking off potential incompatibilities below as I figure them out.
Possible Issues
geedim
uses threads to download tiles of large images whereaswxee
uses threads to download images within collections. I'll need to figure out the feasibility of parallelizing on both dimensions or else download speed would tank on large collections of small images, which is the primary focus ofwxee
.geedim
tracks progress of image tiles whereas I need to track progress of images in collections (or both would be fine). I give separate progress bars for retrieving data (requesting the download URLs) and the download itself because the URL request can take a lot of time, and I don't think this will be possible withgeedim
.geedim
supports tempfile outputs, but that's typically what you want when converting to xarray. I don't want to have to manage files manually, so I'll need to think more about how this will work. Maybe just create temp directories and download into them?geedim
automatically setsfilePerBand=False
for all downloads. I'll need to do some rewriting to load xarray objects from multi-band images, but that may improve performance on the IO side by reading/writing fewer files.wxee
takes anodata
argument and replaces masked values with that. After downloading, it sets that value in the image metadata orxarray.Dataset
.geedim
takes a different approach of adding a "FILL_MASK" band to the image before downloading. The advantage of thegeedim
approach is that you don't need to choose between exporting everything as a float or risking assigning nodata to real values, but it does require downloading more data from EE, and once you actually get the image into xarray and mask it there's no advantage since xarray will promote everything tofloat64
anyways to accommodate NaN values. I'll probably live with thegeedim
approach by applying and removing the mask band after downloading, but I should do some experiments to see how that affects performance (and to make sure I'm fully understanding thegeedim
approach).Solved Issues
geedim.MaskedImage
class exposes and caches EE properties, so building filenames from metadata is straightforward. The only consideration is that we need to persist thatMaskedImage
instance throughout the download process to avoid having to retrieve properties multiple times.The text was updated successfully, but these errors were encountered: