Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clip/Mask data below threshold value #638

Open
siddharth0248 opened this issue Oct 29, 2024 · 7 comments
Open

Clip/Mask data below threshold value #638

siddharth0248 opened this issue Oct 29, 2024 · 7 comments
Assignees

Comments

@siddharth0248
Copy link

siddharth0248 commented Oct 29, 2024

Some data providers request clipping data below a specific threshold value so that the colormap starts at this threshold and extends to an upper limit. For example, in TM5 Var, we clipped the values below 0.48 and then set the min-max range in the metadata to 0.48 and 12. If we hadn’t clipped the data, we would have seen many more pixels, and we also want to avoid designating them as "no-data."
Currently, my approach involves masking out these values during the data transformation using the provided script. Could this functionality be incorporated into rio-tiler to enhance sustainability?

import rasterio
import numpy as np

url_original = "s3://ghgc-data-store/tm54dvar-ch4flux-monthgrid-v1/methane_emis_total_201601.tif"
url_masked = "s3://ghgc-data-store/tm54dvar-ch4flux-mask-monthgrid-v1/methane_emis_total_201601.tif"

with rasterio.open(url_original) as src:
    metadata = src.meta.copy()
    no_data_value = -9999  
    metadata.update({
        'dtype': 'float32',
        'nodata': no_data_value
    })

    # Create an array to hold the clipped data for all bands
    clipped_data = np.empty((metadata['count'], metadata['height'], metadata['width']), dtype='float32')

    # Iterate through each band
    for i in range(metadata['count']):
        band_data = src.read(i + 1)

        # Create a mask for values below the threshold
        mask = band_data >= threshold

        # Clip pixels and assign no-data value
        clipped_data[i] = np.where(mask, band_data, no_data_value)

    with rasterio.open(output_tiff, 'w', **metadata) as dst:
        dst.write(clipped_data)
@siddharth0248
Copy link
Author

@j08lue

@siddharth0248 siddharth0248 self-assigned this Oct 29, 2024
@j08lue
Copy link
Collaborator

j08lue commented Oct 29, 2024

So, this is a simple threshold operation - all values below threshold are set to nodata.

The nodata value is also replaced/set (to a hard-coded value of -9999) in the process, though, but it looks like it had that value, at least in this file.

As we discussed, there is still no dedicated API functionality for conditional masking like this, but I expect nonetheless that it can be done with the current API, without modifying the data.

We should explore the use of an expression passed to TiTiler. These are just numexpr strings. Not sure they support where, though. Here is a URL to experiment with: https://earth.gov/ghgcenter/api/raster/cog/preview?url=s3://ghgc-data-store/tm54dvar-ch4flux-monthgrid-v1/methane_emis_total_201601.tif&colormap_name=purd&rescale=0.48%2C24

Alternatively, we could register a custom algorithm in TiTiler that does the masking - i.e. if we created a new algorithm called mask_below, or so, the API call could look like ?algorithm=mask_below&algorithm_params=10

@j08lue
Copy link
Collaborator

j08lue commented Oct 29, 2024

@siddharth0248, can you please confirm

  1. What is the value of threshold for the TM5 data?
  2. Can you please share an unmasked file? Or are files in the tm54dvar-ch4flux-monthgrid-v1 collection unmasked?

@j08lue
Copy link
Collaborator

j08lue commented Oct 29, 2024

@j08lue
Copy link
Collaborator

j08lue commented Oct 29, 2024

We should make sure we instruct users who e.g. load the data into Python or QGIS from the same source also apply this masking. Otherwise, they will get different values e.g. when calculating zonal statistics.

That is the disadvantage of applying the masking on the fly and not fixing the files - slightly worse interoperability.

@j08lue
Copy link
Collaborator

j08lue commented Nov 14, 2024

To configure this, put this expression into the sourceParams for the dataset, like here:

sourceParams:
assets: total
colormap_name: purd
rescale:
- 0.48
- 24

This block would become

 sourceParams: 
   assets: total 
   colormap_name: purd 
   rescale: 
     - 0.48 
     - 24 
   expression: "where(b1<10, -9999, b1)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants