Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble mode issue #68

Open
JunhakAhn opened this issue Dec 16, 2020 · 1 comment
Open

Ensemble mode issue #68

JunhakAhn opened this issue Dec 16, 2020 · 1 comment

Comments

@JunhakAhn
Copy link

JunhakAhn commented Dec 16, 2020

Hi. I tried to run ensemble mode neusomatic.
Since there was no 'SomaticSeq.Wrapper.sh' on 'https://github.com/bioinform/somaticseq/blob/master/SomaticSeq.Wrapper.sh',
I ran 'somaticseq_parallel.py' using recommended command on 'https://github.com/bioinform/somaticseq/' and got 'Ensemble.sSNV.tsv'

However, when I tried to run 'preprocess.py' of NeuSomatic with Ensemble mode using 'Ensemble.sSNV.tsv' I got,
I faced this exception.


extract_ensemble The following features are missing from ensemble file: ['nBAM_Z_Ranksums_EndPos', 'tBAM_Z_Ranksums_MQ', 'nBAM_Z_Ranksums_MQ', 'nBAM_Z_Ranksums_BQ', 'tBAM_Z_Ranksums_EndPos', 'tBAM_Z_Ranksums_BQ']

File "preprocess.py", line 435, in
args.scan_alignments_binary)
File "preprocess.py", line 241, in preprocess
ensemble_bed = extract_ensemble(work, ensemble_tsv)
File "neusomatic/python/generate_dataset.py", line 1296, in extract_ensemble
raise Exception

Seems the issue is related to the 'Ensemble.sSNV.tsv' and the problem is that it does not have upper features including 'nBAM_Z_Ranksums_EndPos', ...

Is there any process did I do wrong? or How can I get an exact ensemble SNV file with whole features to run Neusomatic ensemble mode without any issue.

Thanks,
Ahn

@msahraeian
Copy link
Contributor

@JunhakAhn Thanks for your interest in NeuSomatic.

  1. I guess you are running somaticseq_parallel.py in single sample mode (tumor-only). But, you need to run it in tumor-normal paired mode. You can find the example command here.

  2. You need both of Ensemble.sSNV.tsv and Ensemble.sINDEL.tsv. If you have them ready, you can run the following command to combine them:

cat <(cat Ensemble.sSNV.tsv |grep CHROM|head -1) \
    <(cat Ensemble.sSNV.tsv Ensemble.sINDEL.tsv |grep -v CHROM) | sed "s/nan/0/g" > ensemble_ann.tsv

and provide ensemble_ann.tsv as --ensemble_tsv argument in preprocess.py.

3- Take note that if you are using the pre-trained models, you should use the same set of callers used for training. For instance if you are using the SEQC-II models (which are the recommended ones), you should use MuTect2, MuSE, Strelka2, SomaticSniper, VarDict.

4- We also have a Dockerized solution for running all of the individual somatic callers and a wrapper that combines their output here. It has not been updated lately, so the tools may be old. But, it is an alternative for you. Just please drop VarScan if you are using SEQC-II model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants