eLRR: Clarifcation on LRR adjustments #46

ryan-ed-bailey · 2024-02-26T20:48:20Z

Hi there,

I was curious if you can provide details on the type of LRR adjustments that are made to produce the eLRR plots?

Cheers,

Ryan

freeseek · 2024-02-26T20:59:54Z

If you look in the code for mocha_plot.R you will find this:

for (gt in c('AA', 'AB', 'BB')) {
  idx <- df$gts == gt
  df$BAF[idx] <- df$BAF[idx] - df[idx, paste0(gt, '_BAF1')] * df$LRR[idx] - df[idx, paste0(gt, '_BAF0')]
  df$LRR[idx] <- df$LRR[idx] - df[idx, paste0(gt, '_LRR0')]
}

The MoChA output VCF will include the following nine variables:

AA_LRR0
AA_BAF0
AA_BAF1
AB_LRR0
AB_BAF0
AB_BAF1
BB_LRR0
BB_BAF0
BB_BAF1

These explain how to adjust LRR and BAF for each genotype using the following formulas:

BAF = BAF - BAF1 * LRR - BAF0
LRR = LRR - LRR0

After that you have an adjustment based on GC content by extracting information from the .stats.tsv file through the following code:

df_stats <- read.table(args$stats, sep = '\t', header = TRUE)
lrr_gc_order <- sum(grepl('^lrr_gc_[0-9]', names(df_stats))) - 1
df <- merge(df, df_stats[, c('sample_id', paste0('lrr_gc_', 0:lrr_gc_order))])
for (i in 0:lrr_gc_order) {
  df$LRR <- df$LRR - as.numeric(df$gc)^i * df[, paste0('lrr_gc_', i)]
}

This means that LRR will be further adjusted as follows:

LRR = LRR - LRR_GC_0 - GC * LRR_GC_1 - GC*GC * LRR_GC_2 - ...

Where the list is as long as the degree of the polynomial used for the GC correction

ryan-ed-bailey · 2024-02-26T21:58:05Z

Thanks for the clarifications! Would there be a case in which the eLRR picks up a signal yet the unadjusted LRR looks fine?

freeseek · 2024-02-26T22:35:11Z

Yeah, of course in theory that is possible and it could be desirable if it comes from a real signal

ryan-ed-bailey · 2024-02-26T22:42:36Z

I am attempting to troubleshoot this given call (14q11):

This exact call is present in many unrelated samples and thus we're suspicious of a false-positive, especially given the fact it is not apparent in the unadjusted LRR (and BAF). It is isolated to one single batch of samples. Dropping the missingness threshold does not affect the call.

We are hoping to understand these types of calls more deeply. Any guidance on how to further troubleshoot would be appreciated. Thanks.

ryan-ed-bailey · 2024-04-02T23:21:59Z

Just following up on this. Any guidance on how to further troubleshoot?

Cheers,

Ryan

freeseek · 2024-04-03T06:57:06Z

Visually something does seem to be going on across multiple consecutive markers so it is hard to argue that MoChA is doing anything wrong. What explains that might not be a CNV but I am not an expert on LRR. There does not seem to be any BAF signal in this call. You could try to filter out germline duplications based solely on LRR but without further testing and investigating I don't have further advice as you would have to try to understand what exactly is going on with these regions. Do you think the GC correction is at fault here? Do the affected markers have outlier GC values? Are you using the newest version of MoChA?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eLRR: Clarifcation on LRR adjustments #46

eLRR: Clarifcation on LRR adjustments #46

ryan-ed-bailey commented Feb 26, 2024

freeseek commented Feb 26, 2024

ryan-ed-bailey commented Feb 26, 2024

freeseek commented Feb 26, 2024

ryan-ed-bailey commented Feb 26, 2024

ryan-ed-bailey commented Apr 2, 2024

freeseek commented Apr 3, 2024

eLRR: Clarifcation on LRR adjustments #46

eLRR: Clarifcation on LRR adjustments #46

Comments

ryan-ed-bailey commented Feb 26, 2024

freeseek commented Feb 26, 2024

ryan-ed-bailey commented Feb 26, 2024

freeseek commented Feb 26, 2024

ryan-ed-bailey commented Feb 26, 2024

ryan-ed-bailey commented Apr 2, 2024

freeseek commented Apr 3, 2024