Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the univariate statsforecast function in EvaDB #1081

Closed
6 tasks done
xzdandy opened this issue Sep 8, 2023 · 5 comments · Fixed by #1094
Closed
6 tasks done

Improve the univariate statsforecast function in EvaDB #1081

xzdandy opened this issue Sep 8, 2023 · 5 comments · Fixed by #1094
Assignees
Labels
Milestone

Comments

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 8, 2023

Search before asking

  • I have searched the EvaDB issues and found no similar feature requests.

Description

  • The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.
  • The timeseries column is not properly handled. statsforecast has a required format for the timeseries column. https://nixtla.github.io/statsforecast/docs/getting-started/getting_started_short.html
  • The univariate statsforecast expects a fixed schema for the input dataframe. Renaming the column is not handled properly now.
  • Update documentation with all available parameters.

Use case

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@americast
Copy link
Member

The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.

I believe we can simply do SELECT Forecast(12);. The FROM part is a little redundant, but I am not sure if that is in line with SQL syntax.

@xzdandy I'll take care of 2, you can assign it to me. Thanks!

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 8, 2023

The univariate statsforecast function train and predicts on the exact same input relation, so there is no need for a separate training procedure. Currently SELECT Forecast(12) FROM AirData; does not make sense.

I believe we can simply do SELECT Forecast(12);. The FROM part is a little redundant, but I am not sure if that is in line with SQL syntax.

@xzdandy I'll take care of 2, you can assign it to me. Thanks!

Thanks @americast! SELECT Forecast(12) the syntax is not supported. We can add that. I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

For time data type, it can be tricky. For example, I am using House Property Sales Time Series data set, where the saledate column is 30/09/2007, which is not the default panda date type format. We need to support some kind of date type and conversion here. Any idea you have.

@americast
Copy link
Member

I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

@xzdandy I had added some support for customized column names in #969 . It's handled by the id and time variables. Are they not working for you?

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 8, 2023

I will handle 3 first, which does not allow me to doing forecast on tables with customized column names.

@xzdandy I had added some support for customized column names in #969 . It's handled by the id and time variables. Are they not working for you?

It is not working. 1) the change is to the aggregated_batch instead of data. This can be easily fixed. 2) The output object of the UDF is not correctly binded. So in projection, we are looking for a non-existent column.

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 9, 2023

From the warning message, /home/zxu330/eva/evadb-venv-test/lib/python3.10/site-packages/statsforecast/core.py:691: UserWarning: Parsing dates in %d/%m/%Y format when dayfir st=False (the default) was specified. Pass dayfirst=True or specify a format to silence this warning. It seems we can specify a time format to parsing. We can explore this option.

xzdandy added a commit that referenced this issue Sep 10, 2023
Addressing item3 in #1081

* [x] In `evadb/executor/create_function_executor.py`, we rename the
input relationship to a [fixed
schema](https://nixtla.github.io/statsforecast/docs/getting-started/getting_started_short.html)
requested by statsforecast
* [x] Rename the output column so it is synced with binder. A temporal
fix. We will reconsider the design in #1017
* [x] Update testcases to test the column rename feature.
jiashenC pushed a commit that referenced this issue Sep 10, 2023
- Addressing ` Update documentation with all available parameters.` in
#1081.
- Adding documentation for 
   * MODEL
   * ID
   * TIME
   * PREDICT
   * FREQUENCY
americast added a commit that referenced this issue Sep 12, 2023
xzdandy added a commit that referenced this issue Sep 12, 2023
Address the `SELECT Forecast(12) FROM AirData;` to `SELECT
Forecast(12);` in #1081

- [x] update parser, binder, optimizer, and executor to allow project
without children.
- [x] update forecasting test cases and documentation.
- [x] add unit test and short integration test for `SELECT expr;`.
- [x] add documentation that we support `SELECT expr;`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants