Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SQLite: Error: Unknown Error #2836

Open
luanmuniz opened this issue Nov 21, 2024 · 9 comments · May be fixed by #2849
Open

[Bug]: SQLite: Error: Unknown Error #2836

luanmuniz opened this issue Nov 21, 2024 · 9 comments · May be fixed by #2849
Labels
bug Something isn't working

Comments

@luanmuniz
Copy link

Describe the bug

I'm trying to load an SQLite database that's around 100MB.

Seems like I'm hitting this line when trying to access a table in the db that's bigger than 32MB:
https://github.com/evidence-dev/evidence/blob/main/packages/lib/sdk/src/plugins/datasources/wrapSimpleConnector.js#L52

By reading the code, I can't see a workaround for this.
Obs: All .sql files consume the same database file

Steps to Reproduce

  • Try to load a SQLite base that's bigger than 32MB
  • Try to access a table that's bigger than 32MB
  • run npm run sources

Logs

> evidence sources

✔ Loading plugins & sources
-----
  [Processing] articles_connection
  article_categories ✔ Finished, wrote 108594 rows.
  article_classification ✔ Finished, wrote 64421 rows.
  articles-database ⚠ No results returned.
  articles ✖ Error: Unknown Error
  categories ✔ Finished, wrote 155 rows.
-----
  Evaluated sources, saving manifest
  ✅ Done!


With debug, I have the following line:

$ NODE_OPTIONS="--max-old-space-size=4096" npm run sources -- --debug;

> [email protected] sources
> evidence sources --debug

Evidence running with debug logging
✔ Loading plugins & sources
-----
  [Processing] articles_connection
  article_categories ◢ Processing...[DEBUG]:  Building parquet file article_categories.parquet
[DEBUG]:  Reading rows from a generator object
  article_categories ◥ Processing...[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 2868.07013,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushing batch 0 with 108594 rows
[DEBUG]:  Flushing batch 0 with 108594 rows
  article_categories ◢ Processing...[DEBUG]:  Measure: "flush" {
  duration: 418.50450400000045,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushed batch 0 with 108594 rows
  article_categories ◣ Processing...[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 3679.8906589999997,
  meta: { 'output filename': 'article_categories.parquet' },
  parents: []
}
  article_categories ✔ Finished, wrote 108594 rows.
  article_classification ◢ Processing...[DEBUG]:  Building parquet file article_classification.parquet
[DEBUG]:  Reading rows from a generator object
  article_classification ◢ Processing...[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 2006.8502520000002,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushing batch 0 with 64421 rows
[DEBUG]:  Flushing batch 0 with 64421 rows
  article_classification ◣ Processing...[DEBUG]:  Measure: "flush" {
  duration: 806.4016789999996,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushed batch 0 with 64421 rows
  article_classification ◤ Processing...[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 3263.2427499999994,
  meta: { 'output filename': 'article_classification.parquet' },
  parents: []
}
  article_classification ✔ Finished, wrote 64421 rows.
Will not eagerly load files larger than 32 Megabytes.
  articles-database ⚠ No results returned.
  articles ✖ Error: Unknown Error
  categories ◢ Processing...[DEBUG]:  Building parquet file categories.parquet
[DEBUG]:  Reading rows from a generator object
[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 4.869833999999173,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushing batch 0 with 155 rows
[DEBUG]:  Flushing batch 0 with 155 rows
[DEBUG]:  Measure: "flush" {
  duration: 3.3371879999995144,
  meta: { 'batch number': 0 },
  parents: [ 'buildMultipartParquet' ]
}
[DEBUG]:  Flushed batch 0 with 155 rows
[DEBUG]:  Measure: "buildMultipartParquet" {
  duration: 14.027275999998892,
  meta: { 'output filename': 'categories.parquet' },
  parents: []
}
  categories ✔ Finished, wrote 155 rows.
-----
  Evaluated sources, saving manifest
  Updating schema 'articles_connection'
  | Schema exists already
  | 4 queries found
  |   article_categories
  |   article_classification
  |   articles-database
  |   categories
  | 0 queries are new
  | 3 queries already exists
  |   static/data/articles_connection/article_categories/article_categories.parquet
  |   static/data/articles_connection/article_classification/article_classification.parquet
  |   static/data/articles_connection/categories/categories.parquet
  | 3 queries to be rendered
  |   static/data/articles_connection/article_categories/article_categories.parquet
  |   static/data/articles_connection/article_classification/article_classification.parquet
  |   static/data/articles_connection/categories/categories.parquet
  ✅ Done!

System Info

System:
    OS: macOS 15.1.1
    CPU: (12) x64 Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
    Memory: 217.51 MB / 16.00 GB
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 22.11.0 - ~/.nvm/versions/node/v22.11.0/bin/node
    npm: 10.5.0 - ~/.nvm/versions/node/v22.11.0/bin/npm
    pnpm: 8.7.5 - ~/Library/pnpm/pnpm
    Watchman: 2023.07.03.00 - /usr/local/bin/watchman
  npmPackages:
    @evidence-dev/bigquery: ^2.0.8 => 2.0.8
    @evidence-dev/core-components: ^4.8.13 => 4.8.13
    @evidence-dev/csv: ^1.0.13 => 1.0.13
    @evidence-dev/databricks: ^1.0.7 => 1.0.7
    @evidence-dev/duckdb: ^1.0.12 => 1.0.12
    @evidence-dev/evidence: ^39.1.17 => 39.1.17
    @evidence-dev/motherduck: ^1.0.3 => 1.0.3
    @evidence-dev/mssql: ^1.1.1 => 1.1.1
    @evidence-dev/mysql: ^1.1.3 => 1.1.3
    @evidence-dev/postgres: ^1.0.6 => 1.0.6
    @evidence-dev/snowflake: ^1.2.1 => 1.2.1
    @evidence-dev/sqlite: ^2.0.6 => 2.0.6
    @evidence-dev/trino: ^1.0.8 => 1.0.8

Severity

blocking all usage of Evidence

Additional Information, or Workarounds

This is my connection.yaml file:

name: articles_connection
type: sqlite
options:
  filename: articles-database.db
  readonly: true

This is the articles.sql

select * from articles limit 1;
@luanmuniz luanmuniz added bug Something isn't working to-review Evidence team to review labels Nov 21, 2024
@archiewood
Copy link
Member

archiewood commented Nov 21, 2024

Can you also confirm what columns and column types are in your sqlite file in the articles and articles-database table?

@archiewood archiewood removed the to-review Evidence team to review label Nov 21, 2024
@luanmuniz
Copy link
Author

@archiewood This is the query that. creates the articles table:

CREATE TABLE IF NOT EXISTS articles (
	id TEXT PRIMARY KEY,
	title TEXT NOT NULL,
	published TEXT NOT NULL,
	abstract TEXT NOT NULL,
	conclusion TEXT,
	link TEXT UNIQUE NOT NULL,
	input_token INTEGER,
	output_token INTEGER,
	created_at TEXT DEFAULT CURRENT_TIMESTAMP,
	updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

There is no articles-database table. But this is the structure of the folder:

$ cd sources/articles_connection/
$ ls -l
total 122832
-rw-r--r--  1 user  staff        33 Nov 21 19:06 article_categories.sql
-rw-r--r--  1 user  staff        37 Nov 21 19:06 article_classification.sql
-rw-r--r--@ 4 user  staff  61861888 Nov 21 13:40 articles-database.db
-rw-r--r--  1 user  staff        23 Nov 21 19:53 articles.sql
-rw-r--r--  1 user  staff        25 Nov 21 19:05 categories.sql
-rw-r--r--  1 user  staff        98 Nov 21 19:50 connection.yaml

@archiewood
Copy link
Member

archiewood commented Nov 22, 2024

I think this is the expected behaviour regarding the loading of the files. We dont need to read the contents of the SQLite file's text, we just need to query it.

However the error is happening when running the query articles.sql, and the error message that is given back is not helpful!

@luanmuniz
Copy link
Author

luanmuniz commented Nov 23, 2024

@archiewood I see. Indeed, the 32MB log probably comes from trying to load the .db file, not the query! It makes total sense!

But indeed my biggest problem is the articles.sql file. Please let me know what I can do to help provide more relevant data! I'm very interested in solving this problem

@archiewood
Copy link
Member

I assume this same query runs successfully against sqlite in some other client?

@luanmuniz
Copy link
Author

@archiewood Yes it does

SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> select * from articles limit 1;
1653afcd-92ea-4468-8891-99c9fcc7275e|Randomized Autoregressive Visual Generation|2024-11-01|This paper presents Randomized AutoRegressive modeling (RAR) for visual generation, which sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks. The proposed RAR is simple: during a standard autoregressive training process with a next-token prediction objective, the input sequence-typically ordered in raster form-is randomly permuted into different factorization orders with a probability r, where r starts at 1 and linearly decays to 0 over the course of training. This annealing training strategy enables the model to learn to maximize the expected likelihood over all factorization orders and thus effectively improve the model's capability of modeling bidirectional contexts. Importantly, RAR preserves the integrity of the autoregressive modeling framework, ensuring full compatibility with language modeling while significantly improving performance in image generation. On the ImageNet-256 benchmark, RAR achieves an FID score of 1.48, not only surpassing prior state-of-the-art autoregressive image generators but also outperforming leading diffusion-based and masked transformer-based methods. Code and models will be made available at https://github.com/bytedance/1d-tokenizer||https://arxiv.org/abs/2411.00776|2024-11-04 17:28:49|2024-11-07 09:26:35||

@luanmuniz luanmuniz linked a pull request Nov 24, 2024 that will close this issue
4 tasks
@luanmuniz
Copy link
Author

@archiewood Seems like i managed to debug the solution:
#2849

Please modify as you see fit!
If possible, can you let me know when this will be released? This is blocking a page that I'm trying to build, and I would love to release my code as soon as possible.

Thank you!

@archiewood
Copy link
Member

Hi @luanmuniz - thanks for the PR.

We'll look at this next week. Next release is scheduled for Thursday 28th.

If you want to get unblocked faster, you could release your version as a community plugin.

https://docs.evidence.dev/plugins/create-source-plugin/

Since you have written all the code already i imagine it will just be a bit of copy pasting.

you can then install your plugin in evidence and drive on!

@archiewood
Copy link
Member

Should be decent instructions here in the template:

https://github.com/evidence-dev/datasource-template

if you have any questions or issues with the template let me know!

@archiewood archiewood changed the title [Bug]: SQLite: Will not eagerly load files larger than 32 Megabytes. [Bug]: SQLite: Error: Unknown Error Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants