Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why did you decide to download the images? #322

Open
3enedix opened this issue Sep 7, 2022 · 4 comments
Open

Why did you decide to download the images? #322

3enedix opened this issue Sep 7, 2022 · 4 comments
Labels
question Further information is requested

Comments

@3enedix
Copy link

3enedix commented Sep 7, 2022

Hi all,

first of all, thanks for your incredible work, this toolbox is exactly what I need.

Okay, almost exactly. I would like to extract (sandy, muddy and mangrovy) shorelines worldwide, over a period of approximately 30 years. In order to avoid having to buy loads of harddrives and running the code hundreds of times, I was hoping that a shoreline could also be extracted from images stored only temporarily in a variable, and then loop to the next timestep/place.

Is there a special reason why you decided to download the images to a local storage? Do you think it would be possible to extract shorelines without downloading the images?

I am quite new to GEE and would appreciate every hint!

(Hope this is the right place for this question and it is not already answered in the other 213 issues...)
Best wishes, Bene

@kvos
Copy link
Owner

kvos commented Sep 8, 2022

hi @CharliesWelt , it's a good question and the right channel to discuss this topic.

The CoastSat package uses GEE to filter the image collections, select the bands of interest and crop the images to the region of interest, then download the .tif files. The analysis is then done locally, with python libraries like scikit-image, scikit-learn, shapely, GDAL etc...
The advantage of this workflow is that we have full control on the image (pixel by pixel) and can extract the shoreline at sub-pixel resolution, optimise the thresholding algorithm, discard bad images, quality-control the shorelines and many more functionalities.

Others have developed a different approach where everything is done on the GEE server, you can look at the work by Lujendijk et al. 2019 at a global scale using yearly composites (sounds very similar to what you are proposing to do). You can process images directly on the cloud with the GEE API but with more limited functionalities and control on the individual pixels of the image. Also, keep in mind that the GEE code is not open-source, so you can't see the source code to know exactly what each function is doing.

I personally use loads of hard-drives as you mentioned to generate the shoreline time-series over large spatial scales, see for example the CoastSat website. I like to keep a copy of the images in case I need to reprocess the datasets, but you could very well delete the images after extracting the shoreline to minimise memory allocation. From my experience, timewise, the bottleneck is on the image downloads, as the extraction of the shorelines is very fast (as long as you break down the coast on small polygons, ~25-30 sqkm seems to be the optimum).

Good luck with your project,
Kilian

@dbuscombe-usgs
Copy link

This is a nice discussion, and since I have thought about some of these issues, I would also like to chime-in to add the following reasons why a local workflow generally makes sense:

  1. cloud computing is costly and may be a barrier to uptake (it is for me, for example; I have no institutionally provided access to cloud computing), but you could run coastsat in a cloud provider, either from a terminal with an X-server for graphics, or jupyterhub, so you are not downloading images to your personal machine
  2. cloud computing would make more sense if the processing routines in coastsat are computationally demanding, but they are not particularly so. Download times would not necessarily speed up on a cloud provider unless you were working directly in GEE
  3. image classifiers are not perfect; if you wish to develop your own classifier, that is easiest as a local workflow because it may require iteration. Also, if new better classifiers are developed in the future, you may simply point them at the imagery that you already have downloaded
  4. it would be nice to simply download the shorelines and other results from the cloud computer, but in situations where there is error, it is often instructively to visualize those images. Typically, cloud computers are VMs with a finite life, so you'd have to eventually download everything anyway if you wanted to archive your entire project

@3enedix
Copy link
Author

3enedix commented Sep 13, 2022

Hi Kilian and Dan,

thanks for explaining your thoughts! I see a lot of good points (especially the reproducibility argument), have to think about others and learn more GEE. So far I was naively assuming that as one can 'store' the image in a variable (with the python API), it should be possible to use functionalities from other toolboxes to manipulate this variable. But apparently that's wrong... will keep learning.
And I agree that using another cloud server would not help, as it would still require downloading the images, only then to the server.

Thank you!

@kvos kvos added the good first issue Good for newcomers label Sep 30, 2022
@kvos kvos added the question Further information is requested label Oct 23, 2022
@kvos kvos removed the good first issue Good for newcomers label Nov 21, 2022
@kvos kvos added the good first issue Good for newcomers label Feb 16, 2023
@kvos kvos removed the good first issue Good for newcomers label Apr 9, 2024
@neon-ninja
Copy link

neon-ninja commented Jul 24, 2024

timewise, the bottleneck is on the image downloads

Is it possible to download multiple images in parallel? Perhaps with https://tqdm.github.io/docs/contrib.concurrent/?

Edit to answer my own question - yes - as long as you parallelise by site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants