appropriate basecaller_cfg not found #115

rwilton1 · 2024-11-07T22:35:20Z

Ask away!

Hello,
I'm trying to run some older data through the wf-artic pipeline. I used Guppy to basecall this data with model
dna_r9.4.1_450bps_sup.cfg.

Epi2me version:

When I run my nextflow command, I get an error that it cannot automatically determine the basecalling model used. (why is that, and where should this information be found?)

I found a suggestion in the Github issues to use --override_basecaller_cfg. However, dna_r9.4.1_540bps_sup is not an option.
Can you please let me know which model I should use?

Here is the nextflow command that I am trying to use:

I tried to run this with the hac model instead. It seemed to be working, but ultimately, it failed and no output files were written. Is this because I used the wrong model?

Here is the start of the error messages:

By the way, I was able to somewhat successfully run the test data (without the override command). Even though this failed to output any artic, nextclade or pangolin analyses, I really only want the Read QC and the primertrimmed and sorted bam files, which I could see in the test output.

Thanks!

mattdmem · 2024-11-08T16:37:07Z

Hello - I'd be tempted to use "[email protected]"

Thanks

Matt

rwilton1 · 2024-11-08T17:29:52Z

Hi Matt,
Thanks! That (almost) worked;-) I do see the trimmed and sorted bam files now, but there are still some issues:

The run ended without producing a report. With the demo data, I could produce the report showing coverage, etc (although it failed at variant steps). I was using --update-data false, since I had issues when trying to run the default with pangolin update. This doesn't seem to really solve the problem.
The trimmed and sorted bam files are quite sparse (compared to results I got when trimming with iVar). Can you share the parameters that are used to filter and trim the reads?
Although the pipeline ran further than before, I noticed in one of the sample log files that it still is complaining about the basecaller model:
ValueError: Model [email protected]:consensus is not a known model or existant file.

Hi Matt,
Sorry that I keep editing this response and adding more stuff. I just saw your nice post about the Artic pipeline: https://labs.epi2me.io/sarscov2-midnight-analysis/

From this I realized that we are only keeping 200 reads, which is probably what accounts for the small size of the bam files!
I'm actually working on wastewater samples, so would like to keep all the data. Is there any way to do this?
Just for some background, we typically use illumina sequencing for this, and freyja analysis, but we were interested in capturing some longer amplicons for phasing. I started out trimming with iVar, which feeds into freyja, but I was concerned about the primer trimming because of the tagmentation approach in the Midnight kit. It wasn't clear to me if iVar could properly interpret reads that may have had primers cut off by tagmentation. Anyway, I thought it would be safer to use an ONT tool for this. But it would be helpful if I could maintain all the reads. (By the way, I usually don't do much of the data processing, so sort of a novice at this. Thanks for your patience)

Thanks for your help!

rwilton1 added the question Further information is requested label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appropriate basecaller_cfg not found #115

appropriate basecaller_cfg not found #115

rwilton1 commented Nov 7, 2024 •

edited

Loading

mattdmem commented Nov 8, 2024

rwilton1 commented Nov 8, 2024 •

edited

Loading

appropriate basecaller_cfg not found #115

appropriate basecaller_cfg not found #115

Comments

rwilton1 commented Nov 7, 2024 • edited Loading

Ask away!

mattdmem commented Nov 8, 2024

rwilton1 commented Nov 8, 2024 • edited Loading

rwilton1 commented Nov 7, 2024 •

edited

Loading

rwilton1 commented Nov 8, 2024 •

edited

Loading