Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output columns explain #10

Open
junyanzho opened this issue Jun 4, 2020 · 9 comments
Open

output columns explain #10

junyanzho opened this issue Jun 4, 2020 · 9 comments

Comments

@junyanzho
Copy link

Dear VIcaller developer,

I want to know meaning of some output columns.
I see output from detect function that No._chimeric_reads and No._split_reads both are zero, but No._reads_supporting_VI have value bigger than 0. supporting reads = chimeric reads + split reads, is it right? And how to explain these columns?

Best,
JY

@xunchen85
Copy link
Owner

yeah, there is a bug in the script, but it will be corrected after you run the validation step.

Xun

@junyanzho
Copy link
Author

Hi, xunchen,

Maybe I didn’t quite understand what you said. Which columns are right? which wrong and corrected by validation?
I try to run validate function with one integration using below command:
VIcaller.pl validate
-c VIcaller.config -i Sample
-S Sample_18_72481819_72483840_hepatitis_b_virus_21326584
-G 21326584 -V hepatitis_b_virus -t 10

After run it without error, the No._chimeric_reads and No._split_reads still 0, but No._reads_supporting_VI is 61. Seems still not consistently.

For virus integration detection, should validation and allele fraction be required? Can I only do detection analysis?

Thanks and hope relpy
JY

@xunchen85
Copy link
Owner

xunchen85 commented Jun 6, 2020

I see, after you run the validation step, it will add a few columns, including columns of validation_chimeric and Validation_split. The sum of these columns should be "61". It should give you the same information. My original idea is that because some of the reads can not be successfully validated, thus the detected chimeric and split reads may not be correct. Indeed, the sum of validated chimeric and split reads should be mainly considered unless you want keep as more candidates as you want.

Regarding your another question, yes, you can run "detect" function to detect viral integration and run "calculate" function to obtain the allele fraction.

Best,
Xun

@xunchen85
Copy link
Owner

I will soon correct the two columns in the main VIcaller script which will use the original number.

Xun

@junyanzho
Copy link
Author

Dear Xun,
I didn't found column name "validation_chimeric" and "Validation_split". For other similar names, sum is not equal to 61.

No._reads_supporting_VI Average_alignment_score Is_cell_line_contamination Is_vector Validation_chimeric_confident Validation_chimeric_weak Validation_chimeric_false Validation_split_confident Validation_split_weak Validation_split_false
61 89.18033 - - 100 0 0 32 0 0

Best!

JY

@xunchen85
Copy link
Owner

I'm wondering if you can share more info. I only see the inconsistency for the supporting reads, but it is hard for me to follow and address the potential issues without additional information. I am also not quite sure if you successfully run the validation step.

Validated reads were extracted from the visualization figure. Thus you can first check the corresponding visualization record. You can count how many unique reads, how many chimeric and split reads there, 61 or 132? If you are not sure, you can share me the screenshot, visualization file, fuq file, output file, that i can help debug it.

If you run script step by step, it may also bring some potential issues, especially when you modified the script on your own.

Best,
Xun

@junyanzho
Copy link
Author

Hi xunchen,

  1. I checked the sample.visualization file, number of lines starts with 'O2' is 132 and sequence from 'seq0' to 'seq132'.
  2. After revised script<Result_visual3-3.pl>, I run detect function as your manual recommend, not step by step, and the run log seems no error.

Regards,
JY

@junyanzho
Copy link
Author

Hi xunchen,
I found some information:

  1. No._reads_supporting_VI from output is field 39 of file sample.virus_f2.
  2. Validation reads counted from file sample.visualization that is from file sample_f2
  3. sample_f2 -> sample_f22 -> sample.virus_f -> sample.virus_f2 through multiple process.
  4. Total validation reads is 132, while No._reads_supporting_VI is 61.

Regards
JY

@xunchen85
Copy link
Owner

From what you described, you may check or validate the wrong visualizationg record.

You can double check the GI number, because if there are multiple candidate GIs detected, we will keep all of them. Meanwhile, you also can check if there is another record show similar or the same number of reads as "61".

if you use your own customized library, that may also be the issue with the viral ref name which may cause the inconsistency.

Xun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants