Recursive reduction forecasts using probabilistic tabular regressors - methodology and component design #185

fkiraly · 2024-01-27T21:22:15Z

fkiraly
Jan 27, 2024
Maintainer

Thread to discuss:

how recursive reduction with proba tabular regressors should look like, methodology-wise
what that implies in terms of components in skpro scope, e.g., distribution or regressor compositors

Methodologically, the question arises when trying to generalize recursive reduction to probabilistic regressors - it is not as straightforward as direct reduction, since, unlike in direct reduction, the second and further steps of the recursive forecasts take distributional predictions as inputs in a very naive generalization.

More formally, let us look at vanilla reduced reduction, with window length $L$. Let $y_1, \dots, y_n$ be a sequence. Let r be a tabular regressor (non-proba). Parameters of the algorithm are $L$ and r.

fit fits r.fit to feature-label pairs $\tilde{x}_i := ( y_i, \dots, y_{i+L-1}), \tilde{y}_i:= y_{i+L}$, ranging over over all $i = 1 \dots n - L$.
predict makes predictions as follows: use r.predict to predict $\widehat{y}_k$, for $k>n$ , from $z_{k-L}, \dots, z_{k-1}$, where $z_j = y_j$ if $j\le n$, otherwise $z_j = \widehat{y}_j$. This is done with increasing $k$.

Now, if r is probabilistic, the $\widehat{y}_k, k>n$ will be probabilistic predictions, in naive generalization. Simply continuing to use the predictive mean will be (logically) inconsistent in producing probabilistic estimates for $\widehat{y}_k$, since the uncertainty of previous predictions would be "lost".

More mathematically, let $f$ be r.predict once the model has been fitted. Then, $f(y_{N-L+1}, \dots, y_{N}) = d_{N+1}$, with $d_{N+1}$ a predictive distribution, which is r's guess at the law of a conditional random variable $y_{N+1}| y_N, \dots, y_1$. Next, we want to make a guess for the law $y_{N+2}| y_N, \dots, y_1$, and if we naively use $f(y_{N-L+2}, \dots, y_N, \widehat{y}_{N+1}) = d_{N+2}$, then by construction this is a guess for the law $y_{N+2}| \widehat{y}_{N+1}, y_N, \dots, y_1$, i.e., it will be "narrower" since $\widehat{y}_{N+1}$ is not marginalized over. Same problem occurs the more steps we make, and it only gets worse as we condition over more and more point predictions.

I can think of a number of ways to get around that, although it is not clear what is the best to do here:

Monte-Carlo sampling of $\widehat{y}_k$, e.g., drawing 100 chains from the predictive distributions of a probabilistic r
substituting a grid of quantiles and pushing that forward (on needs to avoid exponential growth here)
ensuring that r can take distributions as input
substituting the distribution symbolically into r.predict

fkiraly · 2024-01-27T21:27:13Z

fkiraly
Jan 27, 2024
Maintainer Author

Some types of components that this might motivate:

distribution regression, i.e., distributions as inputs
- in that same vein, potential reductions that turn a normal regressor into a distribution regressor (e.g., monte-carlo substitutions)
transformed distributions such as discussed in [ENH] transformed distribution #30 and [ENH] probabilistic TransformedTargetRegressor #31 might become useful, as their part-substitution into r is a distribution of this type - this could be done symbolically

0 replies

StatMixedML · 2024-01-30T12:55:55Z

StatMixedML
Jan 30, 2024

@fkiraly I case you are referring Recursive reduction forecasts to a multi-step forecast problem that is transformed into multiple single-step forecasts, then the following is the "standard" way in time series forecasting DeepAR: Probabilistic forecasting with autoregressive recurrent networks

1 reply

fkiraly Jan 30, 2024
Maintainer Author

@StatMixedML, I am aware of DeepAR - that's only a specific algorithm and it relies on a specific neural networks tructure.

When I talk about reduction, I mean it in the general sense used in literature, i.e., a meta-algorithm that converts any tabular regressor to a forecaster.

The question is about the best generalization of the meta-algorithm "recursive reduction" - a reduction of forecasting to tabular regression (non-probab) - to a reduction of forecasting to tabular probabilistic regression.

Please let me know if I should define any concepts, or if you think I am using terms differently from their usual meaning (there is lots of "overloading" of terms in stats/ML/AI).

StatMixedML · 2024-01-30T15:26:11Z

StatMixedML
Jan 30, 2024

I am aware of DeepAR - that's only a specific algorithm and it relies on a specific neural networks tructure.

@fkiraly Hmm, ok then I do not properly understand what you mean with "Recursive reduction forecasts using probabilistic tabular regressors", I am afraid.

Can you please be more specific on this, ideally with some references. Thanks!

1 reply

fkiraly Jan 30, 2024
Maintainer Author

For the specific concept of "Recursive reduction forecasts using probabilistic tabular regressors", it is sth for which to the best of my knowledge no references exist.
That is why I am raising the question, math details are in my first post:

#185 (comment)

This is meant to be an adaptation of "Recursive reduction forecasts using non-probabilistic tabular regressors", which is a classical idea that is frequently applied or varied in literature at least since the 90s.

It is unfortunatly not easy to find early papers without paywalls, but Hyndman has some work which summarizes the idea well, e.g., "Recursive and direct multi-step
forecasting: the best of both worlds".

The meta-algorithm is also imo well-described in the docstring of sktime RecursiveReductionForecaster:

    In `fit`, given endogeneous time series `y` and possibly exogeneous `X`:
        fits `estimator` to feature-label pairs as defined as follows.

        features = `y(t)`, `y(t-1)`, ..., `y(t-window_size)`, if provided: `X(t+1)`
        labels = `y(t+1)`
        ranging over all `t` where the above have been observed (are in the index)

    In `predict`, given possibly exogeneous `X`, at cutoff time `c`,
        applies fitted estimators' predict to
        feature = `y(c)`, `y(c-1)`, ..., `y(c-window_size)`, if provided: `X(c+1)`
        to obtain a prediction for `y(c+1)`.
        If a given `y(t)` has not been observed, it is replaced by a prediction
        obtained in the same way - done repeatedly until all predictions are obtained.
        Out-of-sample, this results in the "recursive" behaviour, where predictions
        at time points c+1, c+2, etc, are obtained iteratively.
        In-sample, predictions are obtained in a single step, with potential
        missing values obtained via the `impute` strategy chosen.

    Parameters
    ----------
    estimator : sklearn regressor, must be compatible with sklearn interface
        tabular regression algorithm used in reduction algorithm
    window_length : int, optional, default=10
        window length used in the reduction algorithm

StatMixedML · 2024-01-30T20:53:41Z

StatMixedML
Jan 30, 2024

@fkiraly From what I read in RecursiveTabularRegressionForecaster

For the recursive strategy, a single estimator is fit for a one-step-ahead forecasting horizon and then called iteratively to predict multiple steps ahead.

isn't this just an auto-regressive model? The reason I linked the DeepAR paper is that it also unrolls the RNN/LSTM using one-step-ahead forecasts, where the probabilistic forecasts are created by sampling from the predicted distribution.

1 reply

fkiraly Jan 31, 2024
Maintainer Author

isn't this just an auto-regressive model?

Yes, but not "just", in the sense that it is a meta-algorithm turning a tabular regressor into a forecaster. It has a "type", which is "input = tabular regressor, output = forecaster". DeepAR is not of such a type, it is simply a forecaster.

Of course DeepAR is interesting in its own right, it just lives in a different (type) category.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive reduction forecasts using probabilistic tabular regressors - methodology and component design #185

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Recursive reduction forecasts using probabilistic tabular regressors - methodology and component design #185

fkiraly Jan 27, 2024 Maintainer

Replies: 4 comments · 3 replies

fkiraly Jan 27, 2024 Maintainer Author

StatMixedML Jan 30, 2024

fkiraly Jan 30, 2024 Maintainer Author

StatMixedML Jan 30, 2024

fkiraly Jan 30, 2024 Maintainer Author

StatMixedML Jan 30, 2024

fkiraly Jan 31, 2024 Maintainer Author

fkiraly
Jan 27, 2024
Maintainer

Replies: 4 comments 3 replies

fkiraly
Jan 27, 2024
Maintainer Author

StatMixedML
Jan 30, 2024

fkiraly Jan 30, 2024
Maintainer Author

StatMixedML
Jan 30, 2024

fkiraly Jan 30, 2024
Maintainer Author

StatMixedML
Jan 30, 2024

fkiraly Jan 31, 2024
Maintainer Author