You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the numerical feature types like REAL, we could have some descriptive statistics like min/max/avg/std to increase the expressiveness of the schema. This way, we can
Use it for data validation on inference time. For example, a tranformer can perform the task of feature data validation on received data points. When a feature is not within the range defined by min/max values, it can log the error accordingly, for example increase an outlier counter/metric.
Use the trained data distribution information to compare it against calculated distributions of inference requests batches. For example using some KL based distance method to increase a skew/drift detection counter/metric.
Similarly to the numerical, store the distribution of the category_map.
Data presence
In all feature types, define an attribute to specify whether a feature is supposed to be mandatory for inference or not. For example if there are no missing values on a particular feature during training time, most probably we'd like to require this feature in the inference request. A transformer performing the data validation task can handle this error and increase an anomaly detection counter/metric.
The text was updated successfully, but these errors were encountered:
Data ranges
In the numerical feature types like
REAL
, we could have some descriptive statistics like min/max/avg/std to increase the expressiveness of the schema. This way, we canSimilarly to the numerical, store the distribution of the
category_map
.Data presence
In all feature types, define an attribute to specify whether a feature is supposed to be mandatory for inference or not. For example if there are no missing values on a particular feature during training time, most probably we'd like to require this feature in the inference request. A transformer performing the data validation task can handle this error and increase an anomaly detection counter/metric.
The text was updated successfully, but these errors were encountered: