New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Vector Store & Data Source updates #2

Merged

a0x8o merged 69 commits into alexxx-db:master from georgia-tech-db:master

Oct 30, 2023

a0x8o commented Oct 24, 2023

No description provided.

github-actions bot and others added 30 commits

September 30, 2023 02:36


          Bump Version to v0.3.8+dev (#1241)

567ab49

Bump Version to v0.3.8+dev

---------

Co-authored-by: Jiashen Cao <[email protected]>


          Update README.md

2022f2c


          Update README.md

b250207


          Update README.md

497b10d


          Update README.md

08f1433


          Update README.md

f5a7c92


          Add support for Neuralforecast (#1115)

e8a181c

Adding support for `neuralforecast`. Fixes #1112.

```sql
DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP FUNCTION IF EXISTS Forecast;

CREATE FUNCTION Forecast FROM
(SELECT unique_id, ds, y FROM AirData)
TYPE Forecasting
PREDICT 'y'
HORIZON 12
LIBRARY 'neuralforecast';

SELECT Forecast(12);
```
One quick issue here is that `neuralforecast` needs `horizon` as a
parameter while training, unlike `statsforecast`. Thus, a better way to
call the UDF would be simply `SELECT Forecast();`, which is currently
unsupported. @xzdandy Please let me know your thoughts.

List of stuff yet to be done:

- [x] Incorporate `neuralforecast`
- [x] Fix `HORIZON` redundancy (UPDATE: Being fixed in #1121)
- [x] Reuse model with lower horizon no
- [x] Add support for ~multivariate forecasting~ exogenous variables
- [x] Add tests
- [x] Add docs

---------

Co-authored-by: xzdandy <[email protected]>


          GitHub Data Source Integration (#1233)

495ce7d

- [x] GitHub Data Source Integration
- [x] Batching support for native storage engine. We can not do batching
in storage engine, which does not work with limit. Revert the change.
- [x] Full NamedUser table support
- [x] Enable circle ci local PR cache for testmondata
- [x] Native storage engine `read` refactory
- [x] Testcases
- [x] Github data source documentation


          feat: create index from projection (#1244)

277161e

The first step to do automatic index updates on insertions. 

Replace the old version of creating an index, which directly reads data
from the storage engine.

It now reads data from the children's plans: SeqScan and Storage.


          Documentation on vector stores + vector benchmark (#1245)

e59092d

Added documentation for vector stores including usage examples,
dependencies and other requirements.


          feat: insertion update index (#1246)

a3f66ab

Break the feature into multiple PRs. 

We can merge this PR after
#1244.


          Collection of fixes for the staging branch (#1253)

379f018

- [x] Remove empty evadb.db file
- [x] Move `test_github_datasource.py` to long integration tests. Fix
#1251
- [x] Fix the failing
`test/integration_tests/long/test_create_table_executor.py::CreateTableTest::test_should_create_table_from_select`.
- [x] Update documentation with links


          updates

17103d3


          updates

f36980a


          updates

751d97c


          updates

b9a3c7d


          ran spellchecker


          docs: fix classification link

f0116f1


          Remove table names from column names for df() call (#1256)

d6cb3a5

Removing table names from the `dataframe` during `df()` call. The users
can then easily load CSV files generated using `EvaDB` with the
`to_csv()` call at a later time (for long-running or expensive queries).

Example:

```
select_query = cursor.query(
    f"SELECT * FROM {repo_name}_StargazerList;"
).df()

select_query.to_csv("stargazers_list.csv", index=False)

# Later
cursor.query(
        f"""
   CREATE TABLE IF NOT EXISTS {repo_name}_StargazerList(
   github_username TEXT(1000));
"""
    ).df()

cursor.query("LOAD CSV 'stargazers_list.csv' INTO {repo_name}_StargazerList;""").df()

```

Do we need the table names for any use cases? For example, for duplicate
column names from two different functions - `object_detector_1.labels`
and `object_detector_2.labels`?

---------

Co-authored-by: Andy Xu <[email protected]>
Co-authored-by: Andy Xu <[email protected]>


          Remove dimensions from TEXT and FLOAT (#1261)

aeb9a3b

Users can now create a table with just `FLOAT` without providing the
dimensions.

Earlier:
```sql
CREATE TABLE ETTM1 (
        date TEXT(30),
        hufl FLOAT(5,7),
        hull FLOAT(5,7),
        mufl FLOAT(5,7),
        mull FLOAT(5,7),
        lufl FLOAT(5,7),
        lull FLOAT(5,7),
        ot FLOAT(5,7));
```

Now:
```sql
CREATE TABLE ETTM1 (
        date TEXT,
        hufl FLOAT,
        hull FLOAT,
        mufl FLOAT,
        mull FLOAT,
        lufl FLOAT,
        lull FLOAT,
        ot FLOAT);
```

Fixes #1260.

---------

Co-authored-by: Andy Xu <[email protected]>


          docs: updated outdated reference to SHOW UDF

7171a23


          Merge branch 'staging' of github.com:georgia-tech-db/eva into staging

9252b94


          docs: update getting started AI query

40802b4


          docs: update references to UDFs

14830dc


          docs: update data sources

ee14b29


          docs: updates

8379a40


          docs: updates

98c6897


          docs: updates

8b21cbe


          Improve the error message when there is a typo in the column name in …

c95e6aa

…the query. (#1267)

- [x] Add basic functionality

Below is the example error message:

```
evadb.binder.binder_utils.BinderError: Cannnot find column name2. Did you mean name? The available columns are ['avatar_url', 'bio', 'blog', 'collaborators', 'company', 'contributions', 'disk_usage', 'email', 'events_url', 'followers', 'followers_url', 'following', 'following_url', 'gists_url', 'gravatar_id', 'hireable', 'html_url', 'id', 'invitation_teams_url', 'location', 'login', 'name', 'node_id', 'organizations_url', 'owned_private_repos', 'private_gists', 'public_gists', 'public_repos', 'received_events_url', 'repos_url', 'role', 'site_admin', 'starred_url', 'subscriptions_url', 'team_count', 'total_private_repos', 'twitter_username', 'type', 'url'].
```

**Limitation**: To keep the output clean, we only do fuzzy match on the
columns and skip the alias.

- [x] Add testcases.


          docs: updates

7007b74

jarulraj and others added 29 commits

October 8, 2023 19:25


          docs: updated feature list

bbdeab1


          docs: updated images

08db5eb


          docs: updates

06f8899


          docs: updates

913548e


          docs: updates

dc79a1c


          docs: updates

ad7bb30


          docs: updates

a53a7d4


          docs: updates

5f27824


          docs: updates

c952858


          docs: updates

d3b1be8


          docs: updates

163fc2a


          docs: updates

3ea2f8a


          fix: Catalog init introduces significant overhead (#1270)

18bc547

Fixed #1268

Moved instrumentation after the entire execution. 

<img width="1840" alt="image"
src="https://github.com/georgia-tech-db/evadb/assets/12206234/3d770689-deff-4408-bb64-9df320a95fa1">
<img width="1832" alt="image"
src="https://github.com/georgia-tech-db/evadb/assets/12206234/969ed19b-0985-4feb-ac87-045542a7b485">


          SHOW command for retrieveing configurations (#1264)

a64d24b


          Fix Notebook and Ray testcases at staging (#1274)

a3b6c0c

Fix #1271, Fix #1265, Fix #1266

~~Not able to fix 11-similarity-search-for-motif-mining.ipynb due to
#1275~~


          Update custom-ai-function.rst (#1273)

65c6cb9

updated the steps to create a new AI function with EvaDB.

---------

Co-authored-by: Andy Xu <[email protected]>


          Clickhouse integration (#1281)

6a38d27


          Added basic functionalities of REST apis (#1234)

0bcd5df


          Update custom-ai-function.rst (#1285)

6a0cd76

Co-authored-by: Andy Xu <[email protected]>


          Add stable diffusion integration (#1240)

bf02232

Reopen the #1111.

---------

Co-authored-by: sudoboi <[email protected]>
Co-authored-by: Abhijith S Raj <[email protected]>


          fix: text_summarization uses drop udf (#1290)

61af3ed

text_summarization uses drop udf instead of drop function.


          feat: function_metadata supports boolean and float (#1296)

036e203

Fixes #1288


          feat: add support for show databases (#1295)

e21092c

SHOW DATABASES #1252


          fix: make the table/function catalog insert operation atomic (#1293)

7d51925

Fixes: #1282


          fix: improve testcase (#1294)

b114304

Test default values of `chunk_size` and `chunk_overlap`


          Starting the change for XGBoost integration into EVADb. (#1232)

201f901

Co-authored-by: Jineet Desai <[email protected]>
Co-authored-by: Andy Xu <[email protected]>


          Add Documentation for UDF Unit Testing and Mocking (and minor Stable …

b8dd206

…Diffusion Fix) (#1301)


          Reenable batch for release (#1302)

f192a10


          v0.3.8 - new release (#1303)

c3b45b6

a0x8o merged commit c3b45b6 into alexxx-db:master

1 check passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet