Explore DuckDB as an alternative approach to Phenologs pipeline #3

bryanlaraway · 2024-05-20T21:48:17Z

The current and previous (MS thesis work) implementation of the phenologs pipeline is computationally intensive, requires too much time when run locally, even with the parallel processing implementation. Can the pipeline be sped up with DuckDB, accomplishing much of the logic via table joins instead of nested Python for-loops?

bryanlaraway · 2024-05-20T23:21:33Z

Noting my first memory/processing issue when running the 'star join' to generate the cross-product table of every cross-species phenotype combination. Might be a laptop limitation, so need to test on a more powerful desktop machine for comparison.

Update: Running on a local machine with 128GB of RAM, the table creation script ran just fine within a minute or two. Will have to see how well downstream operations run as well though.

bryanlaraway self-assigned this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore DuckDB as an alternative approach to Phenologs pipeline #3

Explore DuckDB as an alternative approach to Phenologs pipeline #3

bryanlaraway commented May 20, 2024 •

edited

Loading

bryanlaraway commented May 20, 2024 •

edited

Loading

Explore DuckDB as an alternative approach to Phenologs pipeline #3

Explore DuckDB as an alternative approach to Phenologs pipeline #3

Comments

bryanlaraway commented May 20, 2024 • edited Loading

bryanlaraway commented May 20, 2024 • edited Loading

bryanlaraway commented May 20, 2024 •

edited

Loading

bryanlaraway commented May 20, 2024 •

edited

Loading