-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing fill values within the parameter function framework #177
Comments
I would actually argue that function authors are responsible for managing fill values. That's why we have an external interface to retrieve a parameter's fill value. Coverage model should be responsible for storing and regurgitating data, not making assumptions on what the user wants to do with it. The example listed above actually demonstrates why I believe that to be crucial. Observations 1, 5, 9, 10, 14, 15, and 20 each have a fill for either temperature or pressure but not both. |
I think it's unreasonable to ask them that.
I would agree it's probably the user's responsibility which is me. And I will absolutely deal with it if it's necessary but, I'm just going to end up using the same handler and code in every place I use the coverage model, I would prefer to just put that code in the coverage model to reduce code duplication. However, in the interest of not supporting a monolithic data model, I will agree that in the purest interest of the coverage model as something that stores and reads data, it doesn't go there. Unfortunately, it has been overloaded with the responsibility to also manage data processing. |
So how do I manage that? Remove every index that has fill values? It the case of your array, all values (time, temperature, and pressure) would have to be masked at every index where any of the parameters have fill values. Valid data is masked because a function parameter might not process it correctly. Are you suggesting yet another argument flag for the get_parameter_values method? On Jun 2, 2014, at 10:49 AM, Luke Campbell [email protected] wrote:
|
Something like this maybe def get_wrapper(parameter, time_bounds, fill_value_mask=True):
ptype = self.get_parameter_context(parameter).param_type
if not isinstance(ptype, ParameterFunctionType):
return
args, mask = build_arg_map(parameter, time_bounds, fill_value_mask)
# Get the base data type and shape
value_encoding = ptype.value_encoding # I think it's here
shape = self.get_values('time', time_bounds).shape
retval = np.empty(shape, dtype=value_encoding)
if fill_value_mask:
retval[mask] = ptype.function._callable(*args)
else:
retval[:] = ptype.function._callable(*args)
def build_arg_map(parameter, time_bounds, fill_value_mask=True):
ptype = self.get_parameter_context(parameter).param_type
arg_list = ptype.function.arg_list
arg_map = ptype.function.param_map
array_map = {}
mask = None
for k,v in arg_map.iteritems():
context = self.get_parameter_context(k)
array_value = self.get_values(k, time_bounds)
if mask is None:
mask = ~np.isclose(array_value, context.fill_value, equal_nan=True)
else:
mask = mask & ~np.isclose(array_value, context.fill_value, equal_nan=True)
array_map[k] = array_value
if fill_value_mask:
for k,v in array_map.iteritems():
array_map[k] = v[mask]
return array_map, mask |
Unless I'm missing something, the above appears to be just how to do the masking. My question/statement is more along these lines: |
With what I've pasted, it doesn't throw away indices. It creates an array in-memory initially of all fill values. Using the mask it fills in the array. The key here is that we never call the function with an array that contains missing values. It doesn't treat those missing values in any special way in a simple scenario we have a parameter function sum: def sum(x,y):
return x + y In our coverage, if x is [-9999, 1, 2, 3] and y is [10, 11, 12, 13] so the full return dictionary would be
Behind the scenes, sum only ever got called like this:
|
Okay. I see, now, what you are requesting. Really, this is just - if any of the args have a fill-value at an index, insert the function parameter's fill-value at that index for the function parameter array. I'm still not a parameter function expert. Can you create some tests that exercise the faulty functionality for both mathematical expressions and python function types so I can figure out the best approach? It doesn't need to be right away. I won't get to this for a few days. |
Yeah, I'll make some use-case tests. |
I will add a new method, get_valid_indexes(), to NumpyDictParameterData - the object returned from Coverage.get_parameter_data(). NumpyDictParameterData.get_data() will continue to return a dictionary of data with fill values. get_fill_index_dict() will return a dictionary of numpy boolean arrays used to indicate if the value at an index of the data arrays is valid or if it fill. It should be up to the parameter function to use this information to perform calculations at valid indexes only. The array returned from the parameter function should be the the same size as the array returned from the referenced parameter array and determine where fill values should be placed. Example usage: ndpd = cov.get_parameter_values(['cond'], fill_indexes=True, ...)
data = ndpd.get_data()['cond']
valid_indexes = ndpd.get_valid_indexes()['cond']
calculated_cond_array = np.empty(data.size, dtype=data.dtype)
calculated_cond_array[:] = fill_value
calculated_cond_array[valid_indexes] = data[valid_indexes]*2+1 |
This looks good, could you push something like this so I can test it out? |
In order to be precise in the fill value mapping, I need to know which indexes returned from a function parameter evaluation are fills. This means that the interface needs to be changed all the way through so the actual function being called can create and return the mapping. |
I was under the impression that we would filter out the indexes that are fills before we execute the function(s). Parameter functions shouldn't produce fill values on their own, I think. |
I’m talking about the abstract case where a parameter function receives non-fill data but determines that an index should be a fill. |
That's correct. To my knowledge. A fill value in it's purest sense represents a lack of information from a sensor. Whereas in the parameter function's case, it always has information about a domain OR in the case where it doesn't we use NaN, but not as a fill value but in the absence of real value, such as a division by zero or where the mapping of the domain does not result in a real result (finite domain problems like tangent). |
Before evaluating a parameter function, you'd like all fill values removed. I interpret this as: Is that your expectation? |
I believe that's accurate, yes. |
Hiding data is a restriction on a parameter function's ability to make decisions. I believe you're okay with this, but I want to make sure we are all on the same page. |
Just to clarify, this behavior should be optional I think. |
I can make it optional to hide data. |
I don't think either are an option. Most parameter functions, that I'm aware of, can't or don't accept callbacks to the coverage model and they don't explicitly accept masked arrays or another parameter to identify the fill value. Some might but most don't, I think. If the option to mask the fill values is not enabled, I would say we pass the arrays as-is into the parameter functions. I can't think of another option, but I'm open to suggestions. |
I think maybe the use case got lost somewhere in this thread. If we have a sensor that is observing u and v vectors and we have a function that determines wind direction given the u and v vectors, the function is not meant to understand the data model or even the concept of flagged values. It's only intended purpose, architecturally, is to compute wind direction given u and v.
Now if we have maybe the u-component of the velocity sensor on the instrument fail or go offline for whatever, the u vector will start publishing fill_values (or nothing at all which we then represent as fill_values). Essentially my goal is this: u = np.array([ 1, 0, 2, -9999, -9999], dtype=np.float32)
v = np.array([0.5, 1, 2, 1, 0], dtype=np.float32)
wind_direction_var = np.ones(u.shape[0]) * -9999
not_fill = (u != -9999) & (v != -9999)
wind_direction_var[not_fill] = wind_direction(u[not_fill], v[not_fill]) It's not my goal to hide data, I'm just not trying to run a function with missing information which would result in a resultant array with improper information. Instead of the result being the intended
without dealing with the fill values the result would be
Which are real values, and it would not be desired, I think. |
The intent hasn't been lost. I understand your use case. I am writing a library that should be able to support multiple use cases. I am just trying to define the library's restrictions when it comes to parameter functions. |
For now, parameter functions are pretty simple, they can't interface with the coverage model and they don't deal with fill values directly (the only exceptions I'm aware of are QC, but that's because I'm the author and they're not used through the coverage model). |
The functions and or the function authors aren't responsible for managing fill values, so if fill values are used as inputs incorrect outputs will be produced. The coverage clients aren't responsible either.
It would be good if we can add in correct fill value support within the coverage model.
http://nbviewer.ipython.org/github/lukecampbell/notebooks/blob/master/Masked%20Arrays.ipynb
The text was updated successfully, but these errors were encountered: