Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP FUNCTION needs to clean up the dumped model file for functions created through training framework. #1218

Open
1 of 2 tasks
xzdandy opened this issue Sep 26, 2023 · 3 comments
Labels
Bug 🐞 EVA is not working as expected

Comments

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 26, 2023

Search before asking

  • I have searched the EvaDB issues and found no similar bug report.

Bug

When we created a function via training framework (e.g., ludwig, statsforecast, sklearn), the model file will be dumped to the strorage layer. When we drop the function, those dumped file also needs to be cleaned up.

Environment

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@xzdandy xzdandy added the Bug 🐞 EVA is not working as expected label Sep 26, 2023
@aayushacharya
Copy link

aayushacharya commented Jan 17, 2024

Hi, may I work on this fix?

@aayushacharya
Copy link

Hi, I am working on this issue. Dropping the dumped model file created through Sklearn, XGBoost and Ludwig seems straightforward. For Forecasting however, it seems that the model directory is named after the hex digest of SHA256 hash of the dataframe, to avoid retraining the model for same data. So if I train two forecasting functions: Forecast and AnotherForecast on the same data using the following queries(probably not a practical use case), they will use the same model path.

CREATE FUNCTION IF NOT EXISTS Forecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';
CREATE FUNCTION IF NOT EXISTS AnotherForecast FROM
(SELECT ds,y FROM AirData)
TYPE Forecasting
HORIZON 12
PREDICT 'y';

This would mean dropping Forecast would drop the model for AnotherForecast as well. Isn’t it better to isolate the model for each function like the other training frameworks? Or should the correct drop implementation for forecasting be such that for dropping the function Forecast, the entire function catalog of forecasting functions should be searched to see if they share the same model path and cleanup accordingly?

@aayushacharya
Copy link

aayushacharya commented Jan 25, 2024

I have created PR #1442 with the latter approach in the comment above. Please provide feedback if any changes need to be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐞 EVA is not working as expected
Projects
Development

No branches or pull requests

2 participants