Processing of Data Slow #893

jhnath21 · 2024-11-27T14:55:39Z

As data amounts have been increasing in size from new chemistry and instrumentation and the reference databases have increased in size, the processing of data has become very slow. We have tried using various number of threads to process the data. It has not been helpful with processing the data (we have tried 96, 128, 192). Also, the larger the datasets have become the more memory the processing computer needs.

Is there a way to speed up the analysis that we have not seen and a way to not require large amounts of RAM with these larger datasets? For example a file containing >20M reads takes 4+ days to process where in the past it would only take ~6 hrs (~5M reads/hr with just 16 threads). Currently we can't use a server with less then 512 GB RAM.

salzberg · 2024-11-27T15:14:05Z

You can use KrakenUniq with the new low-memory option, and then you can run on a server with any amount of memory, even just 16 GB. There's a time penalty but it's not bad. Read our short paper about it, https://pubmed.ncbi.nlm.nih.gov/37602140/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing of Data Slow #893

Processing of Data Slow #893

jhnath21 commented Nov 27, 2024

salzberg commented Nov 27, 2024

Processing of Data Slow #893

Processing of Data Slow #893

Comments

jhnath21 commented Nov 27, 2024

salzberg commented Nov 27, 2024