failed at the last step #53

cbiitbian · 2019-11-24T04:43:04Z

example sample ran well, but when run with a pair of real data set, neusomatic failed at the last step with error:

NeuSomatic stand-alone FAILED: Files ../example/work_standalone/NeuSomatic_standalone.vcf and ../NeuSomatic_standalone.vcf Are Different!

NeuSomatic ensemble FAILED: Files ../example/work_ensemble/NeuSomatic_ensemble.vcf and ../NeuSomatic_ensemble.vcf Are Different!

Please advise what could be wrong

cbiitbian · 2019-11-24T05:35:47Z

I see what is wrong here. The last step of comparison is for the testing sample to make sure testing process run successfully.
Can I substitute the input files with my sample sets including normal bam, tumor bam, bed and reference genome and consider the result vcf at /example/work_standalone/NeuSomatic_standalone.vcf and /example/work_ensemble/NeuSomatic_ensemble.vcf are the variant results for my data?

msahraeian · 2019-11-24T08:17:00Z

@cbiitbian happy to see your interest in NeuSomatic. As you pointed out the test script will only succeed if it is used on test data. For your real application, you should follow the instructions in README. The inputs/models/parameters to use may need modification depending on your data.
Particularly,
1- I would recommend to use newer trained models (NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth for standalone and NeuSomatic_v0.1.4_ensemble_SEQC-WGS-Spike.pth for ensemble), more information here. Depending how different your data is from our training set you may be interested to train the network too.
2- For ensemble mode, you need to have somatic calls from five other tools (MuTect2, VarDict, Strelka2, SomaticSniper, and MuSE) for your samples, you can find instruction for that here
3- The reference/regions/minimum AFs can also be adjusted as explained https://github.com/bioinform/neusomatic#example-usage. As discussed there, for inference, you need to follow three steps (preprocess/call/postprocess).

Please let me know if you need more information

cbiitbian · 2019-11-26T04:32:58Z

Hi, Msahraeian Thanks for the quick response. I followed your instruction and successfully rerun a standalone test of chr1 of my sample data, but I have a couple of questions I’d like to get your help. 1. I basically use the same code as the testing script, but substitute the ref genome, bed, tumor bam and normal bam, I use the NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth as pretrained model and ran the three steps preprocess/call/postprocess, is this the right way? 2. Is “work_standalone/NeuSomatic_standalone.vcf” the final variant call result? The reason I am asking is because the number of variant (270) is almost triple that in the grand truth (95), which represent a lot of false positive even before a comparison. Maybe the parameters need to adjust or the pretrained model is not a good match? 3. I saw in the paper, you use cll sample and got very good results. I al so have the sample and grand truth, I’d like to follow your protocol to do it, I also have the DREAM sets mentioned in your paper. Would you let me know the parameters and trained model you use so that I can re-run? 4. If it is necessary, I’d like to learn bamsuegeon and produce my own trained model, so any information about how you did spike in would also be greatly appreciated. Thanks very much! Xiaopeng Bian CBIIT/NCI From: msahraeian <[email protected]> Sent: Sunday, November 24, 2019 3:17 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> happy to see your interest in NeuSomatic. As you pointed out the test script will only succeed if it is used on test data. For your real application, you should follow the instructions in README. The inputs/models/parameters to use may need modification depending on your data. Particularly, 1- I would recommend to use newer trained models (NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth for standalone and NeuSomatic_v0.1.4_ensemble_SEQC-WGS-Spike.pth for ensemble), more information here<https://github.com/bioinform/neusomatic#trained-network-models>. Depending how different your data is from our training set you may be interested to train the network too. 2- For ensemble mode, you need to have somatic calls from five other tools (MuTect2, VarDict, Strelka2, SomaticSniper, and MuSE) for your samples, you can find instruction for that here<https://github.com/bioinform/neusomatic#ensemble-mode> 3- The reference/regions/minimum AFs can also be adjusted as explained https://github.com/bioinform/neusomatic#example-usage<url>. As discussed there, for inference, you need to follow three steps (preprocess/call/postprocess). Please let me know if you need more information — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTN4QGZ4TQKYUSKNS6DQVIZ7ZA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAGCNA#issuecomment-557867316>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTOTSR7HFE3VTUF4NP3QVIZ7ZANCNFSM4JQ5LG4A>.

msahraeian · 2019-11-26T05:51:26Z

@cbiitbian

Yes, it should be the right steps to go. You may also set the number of threads based on your resources. Another important note is that on the test script I used min AF of 5%. If it is too high for your sample, you can modify the parameters. For our new SEQC-II robustness analysis paper, we use the following setting: --scan_maf 0.01 --min_mapq 10 --snp_min_af 0.03 --snp_min_bq 15 --snp_min_ao 3 --ins_min_af 0.02 --del_min_af 0.02.
Each variant call is assigned a quality score in the VCF. Based on this score we categorized the calls to three classes PASS (score>=0.7), LowQUAL (0.4<=score<0.7), and REJECT (score<0.4). This is the default limits we use. You can modify these limits in the postprocessing step using input parameters. You should then use the PASS calls for analysis.
For real samples in NatureComm paper, we used DREAM challenge model NeuSomatic_v0.1.3_standalone_Dream3.pth to infer. There, we used 0.97 pass threshold for CLL1 and COLO-829 samples. For the DREAM challenge stage 3 we also used DREAM challenge model to infer with 0.7 pass threshold. For the SEQC-II robustness analysis paper, we used the default 0.7 pass threshold for whole analysis. With the new SEQC trained models, 0.7 pass threshold should give you similar accuracy on CLL1 and COLO samples.
You can find the information on how to use BAMSurgeon to spike in mutations here.

Also note that the best performance is usually achieved in the ensemble mode.

cbiitbian · 2019-11-28T06:01:57Z

Hi, Another question, I am running WES (Whole Exome Sequencing) sample, do you have trained model for that or I can use WGS model? I saw you have a older model for WEX, “NeuSomatic_v0.1.0_standalone_WEX_100purity.pth”, is that a WES model I can use? Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Tuesday, November 26, 2019 12:51 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> 1. Yes, it should be the right steps to go. You may also set the number of threads based on your resources. Another important note is that on the test script I used min AF of 5%. If it is too high for your sample, you can modify the parameters. For our new SEQC-II robustness analysis paper<https://doi.org/10.1101/667261>, we use the following setting: --scan_maf 0.01 --min_mapq 10 --snp_min_af 0.03 --snp_min_bq 15 --snp_min_ao 3 --ins_min_af 0.02 --del_min_af 0.02. 2. Each variant call is assigned a quality score in the VCF. Based on this score we categorized the calls to three classes PASS (score>=0.7), LowQUAL (0.4<=score<0.7), and REJECT (score<0.4). This is the default limits we use. You can modify these limits in the postprocessing step using input parameters. You should then use the PASS calls for analysis. 3. For real samples in NatureComm paper<https://doi.org/10.1038/s41467-019-09027-x>, we used DREAM challenge model NeuSomatic_v0.1.3_standalone_Dream3.pth to infer. There, we used 0.97 pass threshold for CLL1 and COLO-829 samples. For the DREAM challenge stage 3 we also used DREAM challenge model to infer with 0.7 pass threshold. For the SEQC-II robustness analysis paper<https://doi.org/10.1101/667261>, we used the default 0.7 pass threshold for whole analysis. With the new SEQC trained models<https://github.com/bioinform/neusomatic#trained-network-models>, 0.7 pass threshold should give you similar accuracy on CLL1 and COLO samples. 4. You can find the information on how to use BAMSurgeon to spike in mutations here<https://github.com/bioinform/neusomatic#creating-training-data>. Also note that the best performance is usually achieved in the ensemble mode. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLIZH4U3IUGHOKONE3QVS2N7A5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFEZVWI#issuecomment-558471897>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTKKEVTCRAFPKO45QODQVS2N7ANCNFSM4JQ5LG4A>.

msahraeian · 2019-11-28T23:52:56Z

@cbiitbian The NeuSomatic_v0.1.0_standalone_WEX_100purity.pth is the model we used in original paper for WES. We have also applied our WGS trained model NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth on WES data in the SEQC paper and it seems to perform good.

cbiitbian · 2019-12-01T17:52:08Z

Thanks, I am trying both model based on your information. The process terminated due to error. I could not figure out how to fix it. I am attaching the error log for the run and a log file suggested by the process log: Please check error log at work_standalone/work_tumor/work.212/scan.err Could you please have a look and give me some advice on how to fix it? Thanks very much. Xiaopeng Bian From: msahraeian <[email protected]> Sent: Thursday, November 28, 2019 6:53 PM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> The NeuSomatic_v0.1.0_standalone_WEX_100purity.pth is the model we used in original paper for WES. We have also applied our WGS trained model NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth on WES data in the SEQC paper and it seems to perform good. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTMRYANQS732LXTTSX3QWBKVTA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFNSFPI#issuecomment-559620797>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLQEZ6567BDYY3NX43QWBKVTANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-01T21:27:53Z

@cbiitbian I cannot see the attachment. Would you please attach the scan.err file again?

cbiitbian · 2019-12-02T03:23:17Z

Here is all that in the scan.err file: terminate called after throwing an instance of 'std::ios_base::failure[abi:cxx11]' what(): basic_ios::clear: iostream error From: msahraeian <[email protected]> Sent: Sunday, December 1, 2019 4:28 PM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> I cannot see the attachment. Would you please attach the scan.err file again? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTIHYO52AN3JA4WCBGDQWQT5TA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFRWFBY#issuecomment-560161415>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTMA7FSSDCEB7IEMP5DQWQT5TANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-02T08:01:14Z

@cbiitbian it is not clear to me. Can you make sure you alignment .bam files have index files .bam.bai with them. Also please make sure the reference file used to align reads matches the one you use here. If it didn't help, can you restrict your bam to a smaller region and send it to me (if the code fails on the restricted bam).

cbiitbian · 2019-12-03T20:39:52Z

Hi, I am testing what you suggested and will let you know. In the mean time I got another error for another sample, would you please help me out. ERROR 2019-12-03 05:39:01,890 run_scan_alignments (ForkPoolWorker-14) Please check error log at work_standalone/work_tumor/work.148/scan.err Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Monday, December 2, 2019 3:01 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> it is not clear to me. Can you make sure you alignment .bam files have index files .bam.bai with them. Also please make sure the reference file used to align reads matches the one you use here. If it didn't help, can you restrict your bam to a smaller region and send it to me (if the code fails on the restricted bam). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLRQHTNWMMDM2IDY7DQWS6EXA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSS2HQ#issuecomment-560278814>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLSXIVW5YJIOGOXCJDQWS6EXANCNFSM4JQ5LG4A>.

cbiitbian · 2019-12-03T23:05:13Z

Hi, I did the test you suggested, I used the problematic region bed to restricted the bam to a small region without change anything else, it worked well without any error. I’ll use the full bed file and try again, hope it is just a random error, but I do see this error in a lot of samples, though some samples ran successfully, they are all very similar samples. Thanks. Xiaopeng Bian From: msahraeian <[email protected]> Sent: Monday, December 2, 2019 3:01 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> it is not clear to me. Can you make sure you alignment .bam files have index files .bam.bai with them. Also please make sure the reference file used to align reads matches the one you use here. If it didn't help, can you restrict your bam to a smaller region and send it to me (if the code fails on the restricted bam). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLRQHTNWMMDM2IDY7DQWS6EXA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSS2HQ#issuecomment-560278814>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLSXIVW5YJIOGOXCJDQWS6EXANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-03T23:26:12Z

@cbiitbian if it is a public sample, it would be good if you could share with me the whole bam file and the command line parameters you use and I will take a look.

cbiitbian · 2019-12-06T04:38:53Z

I am sorry this is not a public data set. We are comparing performance of 5 variant caller, including Neusomatic Most of the samples ran successfully, however, the results have no overlap with the results of other 4 callers which have pretty good overlap among them. I am worried that I may not have run the process correctly? I am attaching the scripts I used, which was basically modified from the example script included in the package. Would you have a look and give some suggestions? I am attaching both WGS and WEX models scripts. Thanks very much! Xiaopeng Bian From: msahraeian <[email protected]> Sent: Tuesday, December 3, 2019 6:26 PM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> if it is a public sample, it would be good if you could share with me the whole bam file and the command line parameters you use and I will take a look. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTPKU6XCEMWGJ3ILNMLQW3TJJA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF3FN6I#issuecomment-561403641>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTJCDSSYDSHLF7CFLUTQW3TJJANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-07T08:19:08Z

@cbiitbian I think there should be sth wrong with how you run NeuSomatic. I cannot see your attachments. Would you please send them again. You can attach them to the Github message.

cbiitbian · 2019-12-08T04:06:41Z

Hi, When I compared with results from two other centers, there are good overlaps, so I think there maybe something wrong with the data from the other center. The only thing is that Neusomatic called more variants than other callers. I am attaching my run scripts and a VCFcomparator somparison result for your reference. My scripts were named with .sh, which maybe blocked by your system last time, so I renamed it to .txt, hope will get through. Let me know if you still cannot receive them. Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Saturday, December 7, 2019 3:19 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> I think there should be sth wrong with how you run NeuSomatic. I cannot see your attachments. Would you please send them again. You can attach them to the Github message. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLJTYJWJZCJ6NHBI4DQXNL7ZA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGGBDCQ#issuecomment-562827658>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTONP6CHBAEM6ZLZUPTQXNL7ZANCNFSM4JQ5LG4A>. #!/bin/bash #module load neusomatic set -e tumor=$1 normal=$2 center=$3 refGenome=/data/nextgen/CIMAC/ref/hg38.fa bed=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/region_hg38.bed tumor=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/${center}/${tumor} normal=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/${center}/${normal} # test_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )" # neusomatic_dir="$( dirname ${test_dir} )" test_dir=.. neusomatic_dir=../.. # cd ${test_dir} tum=$(echo "${tumor}" | cut -f 1 -d '.') mkdir -p ${tum} cd ${tum} rm -rf work_standalone #Stand-alone NeuSomatic test # python ${neusomatic_dir}/neusomatic/python/preprocess.py \ preprocess.py \ --mode call \ --reference ${refGenome} \ --region_bed ${bed} \ --tumor_bam ${tumor} \ --normal_bam ${normal} \ --work work_standalone \ --scan_maf 0.01 \ --min_mapq 10 \ --snp_min_af 0.03 \ --snp_min_bq 15 \ --snp_min_ao 3 \ --ins_min_af 0.02 \ --del_min_af 0.02 \ --num_threads 20 \ --scan_alignments_binary ${NEUSOMATIC_SCAN_ALIGNMENTS} # --scan_alignments_binary ${neusomatic_dir}/neusomatic/bin/scan_alignments # CUDA_VISIBLE_DEVICES= python ${neusomatic_dir}/neusomatic/python/call.py \ call.py \ --candidates_tsv work_standalone/dataset/*/candidates*.tsv \ --reference ${refGenome} \ --out work_standalone \ --checkpoint /data/nextgen/Xiaopeng/match/neusomatic/neusomatic/models/NeuSomatic_v0.1.4_standalone_SEQC-WGS-Spike.pth \ --num_threads 1 \ --batch_size 100 # python ${neusomatic_dir}/neusomatic/python/postprocess.py \ postprocess.py \ --reference ${refGenome} \ --tumor_bam ${tumor} \ --pred_vcf work_standalone/pred.vcf \ --candidates_vcf work_standalone/work_tumor/filtered_candidates.vcf \ --pass_threshold 0.7 \ --lowqual_threshold 0.4 \ --output_vcf work_standalone/NeuSomatic_standalone.vcf \ --work work_standalone :' #comment start rm -rf work_ensemble #Ensemble NeuSomatic test # python ${neusomatic_dir}/neusomatic/python/preprocess.py \ preprocess.py \ --mode call \ --reference ${refGenome} \ --region_bed ${bed} \ --tumor_bam ${tumor} \ --normal_bam ${normal} \ --work work_ensemble \ --scan_maf 0.05 \ --min_mapq 10 \ --snp_min_af 0.05 \ --snp_min_bq 20 \ --snp_min_ao 10 \ --ins_min_af 0.05 \ --del_min_af 0.05 \ --num_threads 1 \ --ensemble_tsv ${test_dir}/ensemble.tsv \ --scan_alignments_binary ${NEUSOMATIC_SCAN_ALIGNMENTS} # CUDA_VISIBLE_DEVICES= python ${neusomatic_dir}/neusomatic/python/call.py \ call.py \ --candidates_tsv work_ensemble/dataset/*/candidates*.tsv \ --reference ${refGenome} \ --out work_ensemble \ --checkpoint ${neusomatic_dir}/neusomatic/models/NeuSomatic_v0.1.0_ensemble_Dream3_70purity.pth \ --num_threads 1 \ --ensemble \ --batch_size 100 # python ${neusomatic_dir}/neusomatic/python/postprocess.py \ postprocess.py \ --reference ${refGenome} \ --tumor_bam ${tumor} \ --pred_vcf work_ensemble/pred.vcf \ --candidates_vcf work_ensemble/work_tumor/filtered_candidates.vcf \ --ensemble_tsv ${test_dir}/ensemble.tsv \ --output_vcf work_ensemble/NeuSomatic_ensemble.vcf \ --work work_ensemble # cd .. file1=${test_dir}/example/work_standalone/NeuSomatic_standalone.vcf file2=${test_dir}/NeuSomatic_standalone.vcf cmp --silent $file1 $file2 && echo "### NeuSomatic stand-alone: SUCCESS! ###" \ || echo "### NeuSomatic stand-alone FAILED: Files ${file1} and ${file2} Are Different! ###" file1=${test_dir}/example/work_ensemble/NeuSomatic_ensemble.vcf file2=${test_dir}/NeuSomatic_ensemble.vcf cmp --silent $file1 $file2 && echo "### NeuSomatic ensemble: SUCCESS! ###" \ || echo "### NeuSomatic ensemble FAILED: Files ${file1} and ${file2} Are Different! ###" #comment end ' #!/bin/bash #module load neusomatic set -e tumor=$1 normal=$2 center=$3 refGenome=/data/nextgen/CIMAC/ref/hg38.fa bed=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/region_hg38.bed tumor=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/${center}/${tumor} normal=/data/nextgen/Xiaopeng/match/neusomatic/MATCH/${center}/${normal} # test_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )" # neusomatic_dir="$( dirname ${test_dir} )" test_dir=.. neusomatic_dir=../.. # cd ${test_dir} tum=$(echo "${tumor}" | cut -f 1 -d '.') mkdir -p ${tum}_wex cd ${tum}_wex rm -rf work_standalone #Stand-alone NeuSomatic test # python ${neusomatic_dir}/neusomatic/python/preprocess.py \ preprocess.py \ --mode call \ --reference ${refGenome} \ --region_bed ${bed} \ --tumor_bam ${tumor} \ --normal_bam ${normal} \ --work work_standalone \ --scan_maf 0.01 \ --min_mapq 10 \ --snp_min_af 0.03 \ --snp_min_bq 15 \ --snp_min_ao 3 \ --ins_min_af 0.02 \ --del_min_af 0.02 \ --num_threads 20 \ --scan_alignments_binary ${NEUSOMATIC_SCAN_ALIGNMENTS} # --scan_alignments_binary ${neusomatic_dir}/neusomatic/bin/scan_alignments # CUDA_VISIBLE_DEVICES= python ${neusomatic_dir}/neusomatic/python/call.py \ call.py \ --candidates_tsv work_standalone/dataset/*/candidates*.tsv \ --reference ${refGenome} \ --out work_standalone \ --checkpoint /data/nextgen/Xiaopeng/match/neusomatic/neusomatic/models/NeuSomatic_v0.1.0_standalone_WEX_100purity.pth \ --num_threads 1 \ --batch_size 100 # python ${neusomatic_dir}/neusomatic/python/postprocess.py \ postprocess.py \ --reference ${refGenome} \ --tumor_bam ${tumor} \ --pred_vcf work_standalone/pred.vcf \ --candidates_vcf work_standalone/work_tumor/filtered_candidates.vcf \ --pass_threshold 0.7 \ --lowqual_threshold 0.4 \ --output_vcf work_standalone/NeuSomatic_standalone.vcf \ --work work_standalone :' #comment start rm -rf work_ensemble #Ensemble NeuSomatic test # python ${neusomatic_dir}/neusomatic/python/preprocess.py \ preprocess.py \ --mode call \ --reference ${refGenome} \ --region_bed ${bed} \ --tumor_bam ${tumor} \ --normal_bam ${normal} \ --work work_ensemble \ --scan_maf 0.05 \ --min_mapq 10 \ --snp_min_af 0.05 \ --snp_min_bq 20 \ --snp_min_ao 10 \ --ins_min_af 0.05 \ --del_min_af 0.05 \ --num_threads 1 \ --ensemble_tsv ${test_dir}/ensemble.tsv \ --scan_alignments_binary ${NEUSOMATIC_SCAN_ALIGNMENTS} # CUDA_VISIBLE_DEVICES= python ${neusomatic_dir}/neusomatic/python/call.py \ call.py \ --candidates_tsv work_ensemble/dataset/*/candidates*.tsv \ --reference ${refGenome} \ --out work_ensemble \ --checkpoint ${neusomatic_dir}/neusomatic/models/NeuSomatic_v0.1.0_ensemble_Dream3_70purity.pth \ --num_threads 1 \ --ensemble \ --batch_size 100 # python ${neusomatic_dir}/neusomatic/python/postprocess.py \ postprocess.py \ --reference ${refGenome} \ --tumor_bam ${tumor} \ --pred_vcf work_ensemble/pred.vcf \ --candidates_vcf work_ensemble/work_tumor/filtered_candidates.vcf \ --ensemble_tsv ${test_dir}/ensemble.tsv \ --output_vcf work_ensemble/NeuSomatic_ensemble.vcf \ --work work_ensemble # cd .. file1=${test_dir}/example/work_standalone/NeuSomatic_standalone.vcf file2=${test_dir}/NeuSomatic_standalone.vcf cmp --silent $file1 $file2 && echo "### NeuSomatic stand-alone: SUCCESS! ###" \ || echo "### NeuSomatic stand-alone FAILED: Files ${file1} and ${file2} Are Different! ###" file1=${test_dir}/example/work_ensemble/NeuSomatic_ensemble.vcf file2=${test_dir}/NeuSomatic_ensemble.vcf cmp --silent $file1 $file2 && echo "### NeuSomatic ensemble: SUCCESS! ###" \ || echo "### NeuSomatic ensemble FAILED: Files ${file1} and ${file2} Are Different! ###" #comment end ' VCF Comparator Settings: R1-2-F.somatic.vcf Key vcf file region_hg38.bed Key interrogated regions file mda_R1-2-F_standalone_pass.vcf Test vcf file region_hg38.bed Test interrogated regions file mda_R1-2-F Save directory for parsed datasets TRUE Require matching alternate bases FALSE Require matching genotypes FALSE Use record VQSLOD score as ranking statistic FALSE Exclude non PASS or . records TRUE Compare SNPs, not non-SNP variants 3088286401 Interrogated bps in key 3088286401 Interrogated bps in test 3088286401 Interrogated bps in common 426 Key variants 426 Key variants in shared regions 0.537906137 Shared key variants Ti/Tv 2459 Test variants 2459 Test variants in shared regions 0.721988796 Shared test variants Ti/Tv QUALThreshold NumMatchTest NumNonMatchTest FDR=nonMatchTest/(matchTest+nonMatchTest) decreasingFDR TPR=matchTest/totalKey FPR=nonMatchTest/totalKey PPV=matchTest/(matchTest+nonMatchTest) none 360 2099 0.853599 0.853599 0.8450704 4.92723 0.14640097 5.2394 360 2097 0.85347986 0.85347986 0.8450704 4.9225354 0.14652015 5.2485 360 2096 0.8534202 0.8534202 0.8450704 4.920188 0.1465798 5.2496 360 2095 0.8533605 0.8533605 0.8450704 4.9178405 0.14663951 5.2538 360 2094 0.85330075 0.85330075 0.8450704 4.915493 0.14669926 5.2569 360 2093 0.8532409 0.8532409 0.8450704 4.9131455 0.14675908 5.2675 360 2092 0.85318106 0.85318106 0.8450704 4.910798 0.14681892 5.2782 360 2091 0.85312116 0.85312116 0.8450704 4.9084506 0.14687882 5.2797 360 2090 0.8530612 0.8530612 0.8450704 4.906103 0.14693877 5.2822 360 2089 0.85300124 0.85300124 0.8450704 4.9037557 0.14699878 5.2983 360 2088 0.85294116 0.85294116 0.8450704 4.9014087 0.14705883 5.3032 360 2087 0.8528811 0.8528811 0.8450704 4.899061 0.14711893 5.3096 360 2086 0.85282093 0.85282093 0.8450704 4.8967137 0.14717907 5.3118 360 2085 0.85276073 0.85276073 0.8450704 4.8943663 0.14723927

msahraeian · 2019-12-09T08:47:18Z

@cbiitbian are you using only PASS calls during evaluation? And if your sample is WEX you should intersect the final vcf with the exons bed.

cbiitbian · 2019-12-20T19:03:41Z

Hi, The Neusomatic output default to the same folder as the bam file folder. Can I redirect to another directory? Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Monday, December 2, 2019 3:01 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> it is not clear to me. Can you make sure you alignment .bam files have index files .bam.bai with them. Also please make sure the reference file used to align reads matches the one you use here. If it didn't help, can you restrict your bam to a smaller region and send it to me (if the code fails on the restricted bam). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLRQHTNWMMDM2IDY7DQWS6EXA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFSS2HQ#issuecomment-560278814>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLSXIVW5YJIOGOXCJDQWS6EXANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-20T20:36:39Z

@cbiitbian you don't need to output to the same directory. For each step there is an input parameter that specifies the output location. for preprocess.py you can specify the work folder with --work, for call.py you can specify the output folder that with --out, and for postprocess.py you can specify the output vcf using --output_vcf and the work folder with --work.

cbiitbian · 2019-12-24T02:04:08Z

Hi, I ran Neusomatic on some Hapmap samples but the results still showed very low overlap with the “ground truth” they provided. These samples were aligned to hg38 genome, does that have to be the same with the trained model? I am very anxious to get some idea why the overlap is so low? Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Friday, December 20, 2019 3:37 PM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> you don't need to output to the same directory. For each step there is an input parameter that specifies the output location. for preprocess.py you can specify the work folder with --work, for call.py you can specify the output folder that with --out, and for postprocess.py you can specify the output vcf using --output_vcf and the work folder with --work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLTCB55RKZ3EM6VHPLQZUUFRA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHODGTI#issuecomment-568079181>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLKOAQLOE4Q4Y4BAH3QZUUFRANCNFSM4JQ5LG4A>.

cbiitbian · 2019-12-24T02:08:08Z

For hapmap samples, which trained model do you suggest I should use, any particular attention, like the parameters, I should pay to? Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Friday, December 20, 2019 3:37 PM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> you don't need to output to the same directory. For each step there is an input parameter that specifies the output location. for preprocess.py you can specify the work folder with --work, for call.py you can specify the output folder that with --out, and for postprocess.py you can specify the output vcf using --output_vcf and the work folder with --work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTLTCB55RKZ3EM6VHPLQZUUFRA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHODGTI#issuecomment-568079181>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTLKOAQLOE4Q4Y4BAH3QZUUFRANCNFSM4JQ5LG4A>.

msahraeian · 2019-12-25T07:48:20Z

@cbiitbian I think there is something wrong here. Are you using HapMap samples as tumor-normal pairs? As HapMap data is public, it would be great if you could share with me the ID of samples you are using as well as your neusomatic run scripts and I can make sure you are using the right settings.

cbiitbian · 2019-12-27T06:29:08Z

Hi, Sorry, I don’t have the hapmap id of the files because the data were re-processed and renamed. But I found that the “ground truth” provided contains a lot of duplicate after annotation, so I removed the duplicate and re-compared them, the overlap got much better but still low and the false positive is quite high. I attached my run script and one of the vcfcomparator results FYI. Maybe can manipulate the parameters to get better results? Your suggestion would be highly appreciated. Thanks. Xiaopeng From: msahraeian <[email protected]> Sent: Wednesday, December 25, 2019 2:48 AM To: bioinform/neusomatic <[email protected]> Cc: Bian, Xiaopeng (NIH/NCI) [E] <[email protected]>; Mention <[email protected]> Subject: Re: [bioinform/neusomatic] failed at the last step (#53) @cbiitbian<https://github.com/cbiitbian> I think there is something wrong here. Are you using HapMap samples as tumor-normal pairs? As HapMap data is public, it would be great if you could share with me the ID of samples you are using as well as your neusomatic run scripts and I can make sure you are using the right settings. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#53?email_source=notifications&email_token=ABRXDTJOM5L2Z6E4FN7THHDQ2MF4JA5CNFSM4JQ5LG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHUBVKA#issuecomment-568859304>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRXDTNSSSS73RL2MAQ7WJDQ2MF4JANCNFSM4JQ5LG4A>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed at the last step #53

failed at the last step #53

cbiitbian commented Nov 24, 2019

cbiitbian commented Nov 24, 2019

msahraeian commented Nov 24, 2019

cbiitbian commented Nov 26, 2019 via email

msahraeian commented Nov 26, 2019

cbiitbian commented Nov 28, 2019 via email

msahraeian commented Nov 28, 2019

cbiitbian commented Dec 1, 2019 via email

msahraeian commented Dec 1, 2019

cbiitbian commented Dec 2, 2019 via email

msahraeian commented Dec 2, 2019

cbiitbian commented Dec 3, 2019 via email

cbiitbian commented Dec 3, 2019 via email

msahraeian commented Dec 3, 2019

cbiitbian commented Dec 6, 2019 via email

msahraeian commented Dec 7, 2019

cbiitbian commented Dec 8, 2019 via email

msahraeian commented Dec 9, 2019

cbiitbian commented Dec 20, 2019 via email

msahraeian commented Dec 20, 2019

cbiitbian commented Dec 24, 2019 via email

cbiitbian commented Dec 24, 2019 via email

msahraeian commented Dec 25, 2019

cbiitbian commented Dec 27, 2019 via email

failed at the last step #53

failed at the last step #53

Comments

cbiitbian commented Nov 24, 2019

NeuSomatic stand-alone FAILED: Files ../example/work_standalone/NeuSomatic_standalone.vcf and ../NeuSomatic_standalone.vcf Are Different!

NeuSomatic ensemble FAILED: Files ../example/work_ensemble/NeuSomatic_ensemble.vcf and ../NeuSomatic_ensemble.vcf Are Different!

cbiitbian commented Nov 24, 2019

msahraeian commented Nov 24, 2019

cbiitbian commented Nov 26, 2019 via email

msahraeian commented Nov 26, 2019

cbiitbian commented Nov 28, 2019 via email

msahraeian commented Nov 28, 2019

cbiitbian commented Dec 1, 2019 via email

msahraeian commented Dec 1, 2019

cbiitbian commented Dec 2, 2019 via email

msahraeian commented Dec 2, 2019

cbiitbian commented Dec 3, 2019 via email

cbiitbian commented Dec 3, 2019 via email

msahraeian commented Dec 3, 2019

cbiitbian commented Dec 6, 2019 via email

msahraeian commented Dec 7, 2019

cbiitbian commented Dec 8, 2019 via email

msahraeian commented Dec 9, 2019

cbiitbian commented Dec 20, 2019 via email

msahraeian commented Dec 20, 2019

cbiitbian commented Dec 24, 2019 via email

cbiitbian commented Dec 24, 2019 via email

msahraeian commented Dec 25, 2019

cbiitbian commented Dec 27, 2019 via email