Replies: 4 comments 3 replies
-
Some types of components that this might motivate:
|
Beta Was this translation helpful? Give feedback.
-
@fkiraly I case you are referring Recursive reduction forecasts to a multi-step forecast problem that is transformed into multiple single-step forecasts, then the following is the "standard" way in time series forecasting DeepAR: Probabilistic forecasting with autoregressive recurrent networks |
Beta Was this translation helpful? Give feedback.
-
@fkiraly Hmm, ok then I do not properly understand what you mean with "Recursive reduction forecasts using probabilistic tabular regressors", I am afraid. Can you please be more specific on this, ideally with some references. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@fkiraly From what I read in RecursiveTabularRegressionForecaster
isn't this just an auto-regressive model? The reason I linked the DeepAR paper is that it also unrolls the RNN/LSTM using one-step-ahead forecasts, where the probabilistic forecasts are created by sampling from the predicted distribution. |
Beta Was this translation helpful? Give feedback.
-
Thread to discuss:
skpro
scope, e.g., distribution or regressor compositorsMethodologically, the question arises when trying to generalize recursive reduction to probabilistic regressors - it is not as straightforward as direct reduction, since, unlike in direct reduction, the second and further steps of the recursive forecasts take distributional predictions as inputs in a very naive generalization.
More formally, let us look at vanilla reduced reduction, with window length$L$ . Let $y_1, \dots, y_n$ be a sequence. Let $L$ and
r
be a tabular regressor (non-proba). Parameters of the algorithm arer
.fit
fitsr.fit
to feature-label pairspredict
makes predictions as follows: user.predict
to predictNow, if$\widehat{y}_k, k>n$ will be probabilistic predictions, in naive generalization. Simply continuing to use the predictive mean will be (logically) inconsistent in producing probabilistic estimates for $\widehat{y}_k$ , since the uncertainty of previous predictions would be "lost".
r
is probabilistic, theMore mathematically, let$f$ be $f(y_{N-L+1}, \dots, y_{N}) = d_{N+1}$ , with $d_{N+1}$ a predictive distribution, which is $y_{N+1}| y_N, \dots, y_1$ . Next, we want to make a guess for the law $y_{N+2}| y_N, \dots, y_1$ , and if we naively use $f(y_{N-L+2}, \dots, y_N, \widehat{y}_{N+1}) = d_{N+2}$ , then by construction this is a guess for the law $y_{N+2}| \widehat{y}_{N+1}, y_N, \dots, y_1$ , i.e., it will be "narrower" since $\widehat{y}_{N+1}$ is not marginalized over. Same problem occurs the more steps we make, and it only gets worse as we condition over more and more point predictions.
r.predict
once the model has been fitted. Then,r
's guess at the law of a conditional random variableI can think of a number of ways to get around that, although it is not clear what is the best to do here:
r
r
can take distributions as inputr.predict
Beta Was this translation helpful? Give feedback.
All reactions