You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current forecasting notebook, we assumed that the maximum number of days of data that we are guaranteed to have at runtime is 6. However after talking to ceph subject matter experts, it seems that there might be some flexibility there.
It may be possible to have aggregated values describing time series behavior over a longer period of time, instead of having raw data. For example, consider SMART 5 values for device A. Instead of storing a vector
[100, 100, 100, 99, 95, 96]
representing the raw values from the last 6 days, we could instead store a vector
where the first tuple is (mean,std) of SMART 5 in last 6 days, next tuple is (mean,std) of SMART 5 in the last days 6-12, and so on and so forth. This way we can describe last 36 days of behavior using 12 discrete values.
As a data scientist, I want to explore if it is possible to have a forecasting model predicting future values using such aggregated features as input, instead of raw values
Acceptance criteria:
EDA notebook exploring possible models with the above setup
Compare performance of models created with the above setup vs current setup
Compare performance of models created using different types of aggregated features - e.g. mean, std, min, max, entropy.
The text was updated successfully, but these errors were encountered:
Feedback no. 5
In the current forecasting notebook, we assumed that the maximum number of days of data that we are guaranteed to have at runtime is 6. However after talking to ceph subject matter experts, it seems that there might be some flexibility there.
It may be possible to have aggregated values describing time series behavior over a longer period of time, instead of having raw data. For example, consider SMART 5 values for device A. Instead of storing a vector
[100, 100, 100, 99, 95, 96]
representing the raw values from the last 6 days, we could instead store a vector
[(99.5, 0.24), (100, 0), (100, 0), (99.5, 0.2), (99.25, 0.1), (98.33, 0.56)]
where the first tuple is (mean,std) of SMART 5 in last 6 days, next tuple is (mean,std) of SMART 5 in the last days 6-12, and so on and so forth. This way we can describe last 36 days of behavior using 12 discrete values.
As a data scientist, I want to explore if it is possible to have a forecasting model predicting future values using such aggregated features as input, instead of raw values
Acceptance criteria:
The text was updated successfully, but these errors were encountered: