forked from georgia-tech-db/evadb
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Store & Data Source updates #2
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bump Version to v0.3.8+dev --------- Co-authored-by: Jiashen Cao <[email protected]>
Adding support for `neuralforecast`. Fixes #1112. ```sql DROP TABLE IF EXISTS AirData; CREATE TABLE AirData ( unique_id TEXT(30), ds TEXT(30), y INTEGER); LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData; DROP FUNCTION IF EXISTS Forecast; CREATE FUNCTION Forecast FROM (SELECT unique_id, ds, y FROM AirData) TYPE Forecasting PREDICT 'y' HORIZON 12 LIBRARY 'neuralforecast'; SELECT Forecast(12); ``` One quick issue here is that `neuralforecast` needs `horizon` as a parameter while training, unlike `statsforecast`. Thus, a better way to call the UDF would be simply `SELECT Forecast();`, which is currently unsupported. @xzdandy Please let me know your thoughts. List of stuff yet to be done: - [x] Incorporate `neuralforecast` - [x] Fix `HORIZON` redundancy (UPDATE: Being fixed in #1121) - [x] Reuse model with lower horizon no - [x] Add support for ~multivariate forecasting~ exogenous variables - [x] Add tests - [x] Add docs --------- Co-authored-by: xzdandy <[email protected]>
- [x] GitHub Data Source Integration - [x] Batching support for native storage engine. We can not do batching in storage engine, which does not work with limit. Revert the change. - [x] Full NamedUser table support - [x] Enable circle ci local PR cache for testmondata - [x] Native storage engine `read` refactory - [x] Testcases - [x] Github data source documentation
The first step to do automatic index updates on insertions. Replace the old version of creating an index, which directly reads data from the storage engine. It now reads data from the children's plans: SeqScan and Storage.
Added documentation for vector stores including usage examples, dependencies and other requirements.
Break the feature into multiple PRs. We can merge this PR after #1244.
- [x] Remove empty evadb.db file - [x] Move `test_github_datasource.py` to long integration tests. Fix #1251 - [x] Fix the failing `test/integration_tests/long/test_create_table_executor.py::CreateTableTest::test_should_create_table_from_select`. - [x] Update documentation with links
Removing table names from the `dataframe` during `df()` call. The users can then easily load CSV files generated using `EvaDB` with the `to_csv()` call at a later time (for long-running or expensive queries). Example: ``` select_query = cursor.query( f"SELECT * FROM {repo_name}_StargazerList;" ).df() select_query.to_csv("stargazers_list.csv", index=False) # Later cursor.query( f""" CREATE TABLE IF NOT EXISTS {repo_name}_StargazerList( github_username TEXT(1000)); """ ).df() cursor.query("LOAD CSV 'stargazers_list.csv' INTO {repo_name}_StargazerList;""").df() ``` Do we need the table names for any use cases? For example, for duplicate column names from two different functions - `object_detector_1.labels` and `object_detector_2.labels`? --------- Co-authored-by: Andy Xu <[email protected]> Co-authored-by: Andy Xu <[email protected]>
Users can now create a table with just `FLOAT` without providing the dimensions. Earlier: ```sql CREATE TABLE ETTM1 ( date TEXT(30), hufl FLOAT(5,7), hull FLOAT(5,7), mufl FLOAT(5,7), mull FLOAT(5,7), lufl FLOAT(5,7), lull FLOAT(5,7), ot FLOAT(5,7)); ``` Now: ```sql CREATE TABLE ETTM1 ( date TEXT, hufl FLOAT, hull FLOAT, mufl FLOAT, mull FLOAT, lufl FLOAT, lull FLOAT, ot FLOAT); ``` Fixes #1260. --------- Co-authored-by: Andy Xu <[email protected]>
…the query. (#1267) - [x] Add basic functionality Below is the example error message: ``` evadb.binder.binder_utils.BinderError: Cannnot find column name2. Did you mean name? The available columns are ['avatar_url', 'bio', 'blog', 'collaborators', 'company', 'contributions', 'disk_usage', 'email', 'events_url', 'followers', 'followers_url', 'following', 'following_url', 'gists_url', 'gravatar_id', 'hireable', 'html_url', 'id', 'invitation_teams_url', 'location', 'login', 'name', 'node_id', 'organizations_url', 'owned_private_repos', 'private_gists', 'public_gists', 'public_repos', 'received_events_url', 'repos_url', 'role', 'site_admin', 'starred_url', 'subscriptions_url', 'team_count', 'total_private_repos', 'twitter_username', 'type', 'url']. ``` **Limitation**: To keep the output clean, we only do fuzzy match on the columns and skip the alias. - [x] Add testcases.
Fixed #1268 Moved instrumentation after the entire execution. <img width="1840" alt="image" src="https://github.com/georgia-tech-db/evadb/assets/12206234/3d770689-deff-4408-bb64-9df320a95fa1"> <img width="1832" alt="image" src="https://github.com/georgia-tech-db/evadb/assets/12206234/969ed19b-0985-4feb-ac87-045542a7b485">
updated the steps to create a new AI function with EvaDB. --------- Co-authored-by: Andy Xu <[email protected]>
Co-authored-by: Andy Xu <[email protected]>
Reopen the #1111. --------- Co-authored-by: sudoboi <[email protected]> Co-authored-by: Abhijith S Raj <[email protected]>
text_summarization uses drop udf instead of drop function.
SHOW DATABASES #1252
Test default values of `chunk_size` and `chunk_overlap`
Co-authored-by: Jineet Desai <[email protected]> Co-authored-by: Andy Xu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.