Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computational output: moving from save-to-disk to DuckDB #225

Open
SamuelBrand1 opened this issue May 16, 2024 · 3 comments
Open

Computational output: moving from save-to-disk to DuckDB #225

SamuelBrand1 opened this issue May 16, 2024 · 3 comments
Labels

Comments

@SamuelBrand1
Copy link
Collaborator

At the moment, the analysis workflow relies on saving to disk for both serialising results and checkpointing (via DrWatson.produce_or_load). Similarly to discussion #221 this is fine for low/moderate computational workloads but isn't obviously scalable and relies on local file structure.

A (IMO) better alternative is open a connection to a DuckDB instance using the Julia front-end to stream results at; this also makes it easier to run the post-processing as results arrive.

@seabbs
Copy link
Collaborator

seabbs commented May 16, 2024

Does DuckDB not have an interface to DataFrames.jl or similar to avoid needing to use the SQL syntax?

@SamuelBrand1
Copy link
Collaborator Author

Does DuckDB not have an interface to DataFrames.jl or similar to avoid needing to use the SQL syntax?

It looks like the DataFrames interface is via an API to Appender and you can add row by row; not sure about load back from the DB.

@seabbs
Copy link
Collaborator

seabbs commented Jul 22, 2024

Status of this idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants