Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond Clustering - Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage. #842

Open
nhngoc02 opened this issue Dec 4, 2024 · 2 comments

Comments

@nhngoc02
Copy link

nhngoc02 commented Dec 4, 2024

Hi, I'm trying to run diamond clustering on a file with 9 million sequences and got this error.

I have 128 GB of RAM and 1.8T for disk storage. The database.dmnd database I used is 1.2 GB.

The command I used:

diamond cluster  -d database.dmnd -o output.tsv  --approx-id 80  --tmpdir /data  -M 64G

My log file contains:

diamond v2.1.10.164 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Opening the input file...  [0.995s]
Input database: database.dmnd (8585030 sequences, 993623733 letters)
Temporary directory: /data
#Target sequences to report alignments for: unlimited
Database: database.dmnd (type: Diamond database, sequences: 8585030, letters: 993623733)
Block size = 12800000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Seeking in database...  [0s]
Loading query sequences...  [0.638s]
Length sorting queries...  [0.601s]
Algorithm: Double-indexed
Building query histograms...  [3.384s]
Seeking in database...  [0s]
Seeking in database...  [0.004s]
Initializing temporary storage...  [0.003s]
Building reference histograms...  [1.399s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array...  [1.48s]
Building query seed array...  [1.426s]
Computing hash join...  [0.503s]
Masking low complexity seeds...  [0.04s]
Building kmer ranking...  [0.008s]
Searching alignments...  [0.952s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.004s]
Clearing query masking...  [0.069s]
Computing alignments...  [35.532s]
Deallocating reference...  [0s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0.003s]
Closing the output file...  [0s]
Closing the database...  [0s]
Cleaning up...  [0s]
Total time = 46.054s
Reported 52104517 pairwise alignments, 52104517 HSPs.
3412693 queries aligned.
Finished search. #Edges: 97532173
Allocating buffers...  [0s]
Loading edges...  [0.391s]
Sorting edges...  [0.441s]
Computing edge counts...  [0.105s]
Computing vertex cover...  [1.334s]
Computing reassignment...  [0.104s]
Clustering round 1 complete. #Input sequences: 8585030 #Clusters: 1010961 #Letters: 115151078 Time: 48s
Temporary directory: /data
#Target sequences to report alignments for: unlimited
Database: database.dmnd (type: Diamond database, sequences: 8585030, letters: 993623733)
Block size = 3200000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Seeking in database...  [0s]
Loading query sequences...  [0.268s]
Algorithm: Double-indexed
Building query histograms...  [0.287s]
Seeking in database...  [0s]
Initializing temporary storage...  [0s]
Building reference histograms...  [0.036s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array...  [0.073s]
Building query seed array...  [0.074s]
Computing hash join...  [0.266s]
Masking low complexity seeds...  [0.07s]
Searching alignments...  [1729.5s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.002s]
Clearing query masking...  [0.014s]
Computing alignments... Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage.

Could anyone help me out to resolve this issue?

Thanks in advance!

@bbuchfink
Copy link
Owner

There are some issues causing increased memory use that will be fixed in the next release. For now one thing you could try is using --bin 256 (or possibly higher).

@bbuchfink
Copy link
Owner

Another option would be --cluster-steps faster_lin fast_lin, that should be sufficient for 80% id cutoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants