Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eLRR: Clarifcation on LRR adjustments #46

Open
ryan-ed-bailey opened this issue Feb 26, 2024 · 6 comments
Open

eLRR: Clarifcation on LRR adjustments #46

ryan-ed-bailey opened this issue Feb 26, 2024 · 6 comments

Comments

@ryan-ed-bailey
Copy link

Hi there,

I was curious if you can provide details on the type of LRR adjustments that are made to produce the eLRR plots?

Cheers,

Ryan

@freeseek
Copy link
Owner

If you look in the code for mocha_plot.R you will find this:

for (gt in c('AA', 'AB', 'BB')) {
  idx <- df$gts == gt
  df$BAF[idx] <- df$BAF[idx] - df[idx, paste0(gt, '_BAF1')] * df$LRR[idx] - df[idx, paste0(gt, '_BAF0')]
  df$LRR[idx] <- df$LRR[idx] - df[idx, paste0(gt, '_LRR0')]
}

The MoChA output VCF will include the following nine variables:

AA_LRR0
AA_BAF0
AA_BAF1
AB_LRR0
AB_BAF0
AB_BAF1
BB_LRR0
BB_BAF0
BB_BAF1

These explain how to adjust LRR and BAF for each genotype using the following formulas:

BAF = BAF - BAF1 * LRR - BAF0
LRR = LRR - LRR0

After that you have an adjustment based on GC content by extracting information from the .stats.tsv file through the following code:

df_stats <- read.table(args$stats, sep = '\t', header = TRUE)
lrr_gc_order <- sum(grepl('^lrr_gc_[0-9]', names(df_stats))) - 1
df <- merge(df, df_stats[, c('sample_id', paste0('lrr_gc_', 0:lrr_gc_order))])
for (i in 0:lrr_gc_order) {
  df$LRR <- df$LRR - as.numeric(df$gc)^i * df[, paste0('lrr_gc_', i)]
}

This means that LRR will be further adjusted as follows:

LRR = LRR - LRR_GC_0 - GC * LRR_GC_1 - GC*GC * LRR_GC_2 - ...

Where the list is as long as the degree of the polynomial used for the GC correction

@ryan-ed-bailey
Copy link
Author

Thanks for the clarifications! Would there be a case in which the eLRR picks up a signal yet the unadjusted LRR looks fine?

@freeseek
Copy link
Owner

Yeah, of course in theory that is possible and it could be desirable if it comes from a real signal

@ryan-ed-bailey
Copy link
Author

I am attempting to troubleshoot this given call (14q11):

image

This exact call is present in many unrelated samples and thus we're suspicious of a false-positive, especially given the fact it is not apparent in the unadjusted LRR (and BAF). It is isolated to one single batch of samples. Dropping the missingness threshold does not affect the call.

We are hoping to understand these types of calls more deeply. Any guidance on how to further troubleshoot would be appreciated. Thanks.

@ryan-ed-bailey
Copy link
Author

Just following up on this. Any guidance on how to further troubleshoot?

Cheers,

Ryan

@freeseek
Copy link
Owner

freeseek commented Apr 3, 2024

Visually something does seem to be going on across multiple consecutive markers so it is hard to argue that MoChA is doing anything wrong. What explains that might not be a CNV but I am not an expert on LRR. There does not seem to be any BAF signal in this call. You could try to filter out germline duplications based solely on LRR but without further testing and investigating I don't have further advice as you would have to try to understand what exactly is going on with these regions. Do you think the GC correction is at fault here? Do the affected markers have outlier GC values? Are you using the newest version of MoChA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants