Discussion: how to handle nodata #31

alexgleith · 2023-11-26T23:58:35Z

We currently have a mix of ways of handling nodata annotation in xarrays.

Some code is using rio-xarray and some is using the convention of storing the nodata value on xr.atts["nodata".

We should ensure we consume and produce the nodata metadata the same, so we don't need multiple ways of checking, asserting, setting, and consuming nodata values.

The text was updated successfully, but these errors were encountered:

jessjaco · 2023-11-28T00:34:12Z

Yeah it looks like odc looks for a nodata attribute and rioxarray looks for a .rio.nodata value so I'm inclined to support each. I will check if rioxarray respects the .nodata attribute.

I do want to clean up the loaders, they're kind of a mess right now. But part of that is pending this convo

jessjaco · 2024-01-08T18:31:19Z

I'm trying to work through the current handling of nodata values in the OdcLoaderMixin, and understand how odc.stac.load handles things at the same time.

It looks like odc.stac.load sets the nodata attribute at the variable level, not the dataset level. This makes sense, as different variables can have different nodata values (see qa_pixel vs the other landsat bands). Which makes this assignment in the mixin redundant, I think:

dep-tools/dep_tools/loaders.py

Lines 182 to 183 in e2ecdce

    
           if self.nodata is not None: 
        
               xr.attrs["nodata"] = self.nodata

But I think this may be used in places, so we should check before removing.

So if the metadata exists or the nodata argument is passed, then each variable will have a .nodata attribute. I don't know what happens if neither occurs.

I would like to maintain compatibility with rioxarray, which describes its approach to nodata here. They too keep track of nodata at the DataArray / variable level. Additionally, the .rio.nodata accessor will read .attrs['nodata'] so I think we will be ok maintaining that as the goto. My only concern (and something worth testing) is if .attrs['nodata'] is set and .rio.write_nodata is called with another value, does .attrs['nodata'] get overwritten, or are they at odds?

jessjaco · 2024-01-08T18:35:32Z

Another thing to consider is xarray. Officially, xarray only supports nan as the nodata value.

An additional concern is propagation of .attrs['nodata'] (or other attributes). Even innocuous operations like da * 3 will erase the attribute (see https://docs.xarray.dev/en/stable/user-guide/computation.html#missing-values). At they link, they do suggest changing one of the global xarray options (keep_attrs) to override this behavior, though we have to decide if we want to override a default option as a matter of course! Also see rioxarray's discussion of this.

alexgleith · 2024-01-08T22:27:45Z

There's a mix of compatibility in our current implementation!

I know that odc-stac handles nodata per-band, and you can pass in information to help (this kind of config stuff). We should just hardcode that for S-2 and Landsat probably, and then once it's done, we can just assume it's there.

Xarray is hard to influence... they do a few things that break the work we're doing. Some operations lose the CRS, for example. Being consistent with keep_attrs is a good idea...

Maybe it's worth coming up with a consolidated picture between us, and then checking in with others like Robbi and Kirill and influencing in the right place to encourage broader consistency! That's a longer term goal, but will benefit many.

the .rio.nodata accessor will read .attrs['nodata'] so I think we will be ok maintaining that as the goto

This sounds good to me.

jessjaco · 2024-01-08T22:51:24Z

Right, I intend to take out a lot of our nodata handling, because odc.stac.load (and stackstac, if I ever update that interface) already handle it in a fairly robust manner. In fact, I am planning to take out of a lot of our wrappers around odc.stac.load, because they don't really help.

The only operation I'd like to keep is if data are loaded as a float type (potentially non-nan) float values are converted to nan. I may make that optional. Otherwise it's up to us to support propagation of non-nan nodata attribtues. I'll do my best, but I don't really work with non-float xarray objects, until I'm at or very close to writing.

alexgleith · 2024-01-08T22:58:46Z

In fact, I am planning to take out of a lot of our wrappers around odc.stac.load

Good idea :-)

I don't really work with non-float xarray objects

I think we have to in some situations, unfortunately. I really don't want to write intermediate files, and as such, want to keep everything in memory and saving half of a big number of memory makes a significant difference! The GeoMAD for Sentinel-2 for example keeps everything as uint16 and still uses something like 150 GB of memory. The Coastlines process I'm running in Vietnam is similar, at the most extreme case (hilariously with 1 year of data for S-2 and ~35 years with Landsat!)

So, I'm pretty firm on us needing both an int and a float path.

jessjaco · 2024-01-08T23:37:47Z

Well it's a matter of approach too. I almost never bring big float arrays into memory, just chunks. (That was the biggest reason I ended up modifying a lot of the existing coastlines code)

alexgleith changed the title ~~Discussion: how to handle Nodata~~ Discussion: how to handle nodata Nov 26, 2023

jessjaco mentioned this issue Feb 12, 2024

0.3.0 candidate #50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: how to handle nodata #31

Discussion: how to handle nodata #31

alexgleith commented Nov 26, 2023

jessjaco commented Nov 28, 2023

jessjaco commented Jan 8, 2024

jessjaco commented Jan 8, 2024 •

edited

Loading

alexgleith commented Jan 8, 2024

jessjaco commented Jan 8, 2024

alexgleith commented Jan 8, 2024

jessjaco commented Jan 8, 2024

Discussion: how to handle nodata #31

Discussion: how to handle nodata #31

Comments

alexgleith commented Nov 26, 2023

jessjaco commented Nov 28, 2023

jessjaco commented Jan 8, 2024

jessjaco commented Jan 8, 2024 • edited Loading

alexgleith commented Jan 8, 2024

jessjaco commented Jan 8, 2024

alexgleith commented Jan 8, 2024

jessjaco commented Jan 8, 2024

jessjaco commented Jan 8, 2024 •

edited

Loading