You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to assemble a genome of ~2.5 GB with ~15X of Nanopore data and ~65X of 100bps short-read data.
I did one run of Abyss and I got a pretty bad result and then I learned that I used the wrong k-mer size, too far away from the optimum (95). Then, I found out that there is a software kmergenie to find the optimal k-mer size, and I did it for my short-read data and it is 63, MaSurca said it was 67. For Haslr, I did not tune up the first time, and use the default parameters, and with k=43 I got an assembly of 672 MB, but the BUSCO was very bad, 13.3% complete, [S:13.3%,D:0.0%], 3.4% fragmented, and 83.3% missing.
The configuration I used was:
haslr.py -t 48 --minia-kmer 43 -x nanopore plus the Nanopore dataset in fasta and the short-reads _1 and _2 in fq.gz
Gave 600 Gb and 48 cores to run.
Since Haslr, uses Minia, I looked in the manual for parameter optimization and I found this:
kmer-size The k-mer length is the length of the nodes in the de Bruijn graph. It strongly depends on the input dataset. For proper assembly, we recommend that you use the Minia-pipeline that runs Minia multiple times, with an iterative multi-k algorithm. That way, you won't need to choose k. If you insist on running with a single k value, the KmerGenie software
can automatically find the best k for your dataset.
The configuration was:
compared to the first one, I only changed the k-mer size from 43 to 63.
And so I did, and I used the kmer found in kmergenie, and the final file asm.final.fa was zero bites!
Could you help me explain why I ended up with a zero fasta file:
The log file of the first run, showed no problem
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/haslr_assemble: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia_nooverlap: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/fastutils: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minimap2: ok
number of threads: 48
output directory: /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k43
subsampling 25x long reads to /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k43/lr25x.fasta... done
assembling short reads using Minia... done
removing overlaps in short read assembly... done
removing short sequences in short read assembly... done
aligning long reads to short read assembly using minimap2... done
assembling long reads using HASLR... done
The second pretty much the same:
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/haslr_assemble: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia_nooverlap: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/fastutils: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minimap2: ok
number of threads: 48
output directory: /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k63
subsampling 25x long reads to /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k63/lr25x.fasta... done
assembling short reads using Minia... done
removing overlaps in short read assembly... done
removing short sequences in short read assembly... done
aligning long reads to short read assembly using minimap2... done
assembling long reads using HASLR... done
Thanks;
The text was updated successfully, but these errors were encountered:
Hi,
I am trying to assemble a genome of ~2.5 GB with ~15X of Nanopore data and ~65X of 100bps short-read data.
I did one run of Abyss and I got a pretty bad result and then I learned that I used the wrong k-mer size, too far away from the optimum (95). Then, I found out that there is a software kmergenie to find the optimal k-mer size, and I did it for my short-read data and it is 63, MaSurca said it was 67. For Haslr, I did not tune up the first time, and use the default parameters, and with k=43 I got an assembly of 672 MB, but the BUSCO was very bad, 13.3% complete, [S:13.3%,D:0.0%], 3.4% fragmented, and 83.3% missing.
The configuration I used was:
haslr.py -t 48 --minia-kmer 43 -x nanopore plus the Nanopore dataset in fasta and the short-reads _1 and _2 in fq.gz
Since Haslr, uses Minia, I looked in the manual for parameter optimization and I found this:
kmer-size The k-mer length is the length of the nodes in the de Bruijn graph. It strongly depends on the input dataset. For proper assembly, we recommend that you use the Minia-pipeline that runs Minia multiple times, with an iterative multi-k algorithm. That way, you won't need to choose k. If you insist on running with a single k value, the KmerGenie software
can automatically find the best k for your dataset.
The configuration was:
compared to the first one, I only changed the k-mer size from 43 to 63.
And so I did, and I used the kmer found in kmergenie, and the final file asm.final.fa was zero bites!
Could you help me explain why I ended up with a zero fasta file:
The log file of the first run, showed no problem
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/haslr_assemble: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia_nooverlap: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/fastutils: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minimap2: ok
number of threads: 48
output directory: /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k43
subsampling 25x long reads to /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k43/lr25x.fasta... done
assembling short reads using Minia... done
removing overlaps in short read assembly... done
removing short sequences in short read assembly... done
aligning long reads to short read assembly using minimap2... done
assembling long reads using HASLR... done
The second pretty much the same:
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/haslr_assemble: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia_nooverlap: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/fastutils: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minia: ok
checking /users/PHS0338/jpac1984/.conda/envs/assembly-Y/bin/minimap2: ok
number of threads: 48
output directory: /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k63
subsampling 25x long reads to /fs/scratch/PHS0338/appz/haslr/ONTq_NoSplit_RAPL-k63/lr25x.fasta... done
assembling short reads using Minia... done
removing overlaps in short read assembly... done
removing short sequences in short read assembly... done
aligning long reads to short read assembly using minimap2... done
assembling long reads using HASLR... done
Thanks;
The text was updated successfully, but these errors were encountered: