Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Automatic Good/Bad Image Filter #171

Open
6 tasks
2320sharon opened this issue Aug 15, 2023 · 3 comments
Open
6 tasks

Feature Request: Automatic Good/Bad Image Filter #171

2320sharon opened this issue Aug 15, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request Optional An optional feature that's not necessary to be built. V2 for version 2 of coastseg

Comments

@2320sharon
Copy link
Collaborator

2320sharon commented Aug 15, 2023

Good Bad Image Filter

Description:
Users need a way to automatically get rid of images that are not usable.

Proposed Solution:
Using rioxarray, we can create a dataset of all the downloaded images. Then, by utilizing the time-averaged images, the RMSE (Root Mean Squared Error) and PSNR (Peak Signal to Noise Ratio) for each image can be determined. Good images are characterized by a low RMSE (indicating that the pixel values don't differ much from the time-averaged image) and a high PSNR (measuring how much the image differs from the time-averaged image, with a higher value indicating a better quality image).

Benefits:

  • Automatically sorts out good & bad imagery, enhancing the user experience and ensuring quality.
  • By automatically sorting the bad imagery out better quality shorelines can be extracted.
  • By automatically sorting the bad imagery out less images need to be segmented, thus saving time

Drawbacks:

  • Adds xarray as a dependency.
  • Adds rioxarray as a dependency.
  • Could slow down process of extracting shorelines

Additional Context:

Checklist:

  • Implement the RMSE and PSNR calculations using rioxarray.
  • Add new dependencies to the pyproject.toml
  • Decide where to implement automatic good/bad filtering
  • Build a prototype on a separate branch
  • Test the filter on a sample set of images.
  • Update documentation to explain the new feature and its dependencies.

Peak Signal to Noise Ration Explanation

PSNR stands for Peak Signal-to-Noise Ratio. It's a metric used primarily in the fields of image and video processing to measure the quality of a reconstructed or compressed image (or video) as compared to the original one. Essentially, it quantifies how much the reconstructed image differs from the original image. The higher the PSNR, the closer the reconstructed image is to the original, and hence the better the quality.

@2320sharon 2320sharon added the enhancement New feature or request label Aug 15, 2023
@2320sharon 2320sharon self-assigned this Aug 15, 2023
@dbuscombe-usgs
Copy link
Member

dbuscombe-usgs commented Aug 17, 2023

I had some comments that relate to this issue here #168 (comment)

In summary, I think I prefer a less aggressive approach to filtering images than what you (we) propose here. Even though I did suggest it as a potential solution, on reflection, and having looked at the psnr and rmse scores of a lot of imagery from different locations this week, I think it would be very hard to determine a threshold that worked well. I think we would end up throwing out a lot of good images and keeping in a lot of bad images, unless we tweaked the threshold of psnr (or rmse) quite a bit. Additionally, it would not be useful for short time-series, because it relies on a stable average image that would ideally be drawn from at least several tens of relatively good quality images. That can be sometimes hard to come by, for example when limited to Landsat only, or short time-periods.

Instead, I think I would prefer the following:

  1. a more simple criterion for filtering out the worst images, such as the % black pixel filter set to a high threshold. Or we can continue to research how best to do this... ultimately I would like to use ML for this problem and have some ideas we can discuss
  2. the method I outlined in filter_good_labels.py script in this zipped folder https://github.com/Doodleverse/CoastSeg/files/12297423/new_shoreline_detect_workflow.zip , which works by comparing each model output to the average of model outputs (I elaborate more here Exploring ideas for new shoreline extraction routines for application on label outputs of 4-class segmentation models #168 (comment)). So, it does need several good outputs, but those outputs are much lower dimensional, so stable averages require overall many fewer samples.

The downside is that we have to compute the label for a lot of bad images, but we can devote some time to speeding up the model calls, which is Doodleverse/doodleverse_utils#31

Also, it still adds xarray and rioxarray as a dependency.

We can discuss.

@dbuscombe-usgs dbuscombe-usgs added V2 for version 2 of coastseg and removed V2 for version 2 of coastseg labels Aug 17, 2023
@2320sharon
Copy link
Collaborator Author

2320sharon commented Aug 17, 2023

Hi Dan I forgot to update this issue after I tested the filter_good_labels.py script, I was planning on using the logic outlined in this script to perform the good/bad model output filtering. I'll make sure to update this issue tomorrow

@dbuscombe-usgs
Copy link
Member

However, I do think there are some good ideas in the original workflow #154 (comment) for filtering out glitchy images. I use the term 'glitch' to refer to sensor errors. They typically involve a completely different colorspace....

some examples from a site I was looking at today (I am pulling lots of examples of different types of noise together to form the basis of a ML training data set - yes a new attempt!)

2003-04-06-21-40-27_RGB_L7
2011-02-14-21-51-18_RGB_L7
2020-02-23-21-36-02_RGB_L7

I think I will work on researching a new type of filter that uses ideas in the original workflow #154 (comment) for filtering out glitchy images. then after some testing we can see whether it should be included in coastseg, so I am proposing we still implement this idea, but for a low-key filter that detects the really rare glitches. I would do this by seeing what the dominant colors were and throw them out if they are in a certain range.

Another idea I had to adapt this workflow was to throw out images smaller than the requested ROI. In this scope, xarray would be useful with dask to speed up reading the shape of each image. That's a common thing - yesterday I had 178 partial images out of a total of 920, or about one in five!

2320sharon added a commit that referenced this issue Aug 17, 2023
@2320sharon 2320sharon added the V2 for version 2 of coastseg label Aug 24, 2023
@2320sharon 2320sharon moved this from Todo to In Progress in CoastSeg Project Oct 3, 2023
@2320sharon 2320sharon moved this from In Progress to Stuck in CoastSeg Project Dec 6, 2023
@2320sharon 2320sharon added the Optional An optional feature that's not necessary to be built. label Dec 22, 2023
2320sharon added a commit that referenced this issue May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Optional An optional feature that's not necessary to be built. V2 for version 2 of coastseg
Projects
None yet
Development

No branches or pull requests

2 participants