Discussion: possible improvements for copy number processing / analyses #486

jaclyn-taroni · 2020-01-29T13:45:15Z

We at the CCDL have been thinking quite a bit about the copy number data this sprint (#479 #485 #480 #482 #463 #467 #476). Here I want to document some potential improvements that have come up during various discussions. I don't know if anything here will rise to the level of "must-have" and it may also be more appropriate to split individual notes/points into their own, more fleshed out tickets.

From a discussion with @jashapiro (please tell me if I've got this wrong 🙂 ) - anything that goes through copy_number_consensus_call with a copy number of 2 in CNVkit is essentially considered a neutral change because CNVkit doesn't take into account ploidy (A Note on Ploidy). So it's possible in some cases we are underreporting losses (or gains if there are haploid samples, but we don't think this is the case). Here's the distribution of tumor_ploidy from v13 of pbta-histologies.tsv for reference:

> histologies_df %>% 
+   filter(experimental_strategy == "WGS", 
+          sample_type == "Tumor") %>% 
+   group_by(tumor_ploidy) %>% 
+   tally()
# A tibble: 3 x 2
  tumor_ploidy     n
         <dbl> <int>
1            2   690
2            3   134
3            4   116

We broadly call copy number changes, i.e., we'll only label an event as a loss without considering whether the loss is homozygous or hemizygous or how many copies are lost (see A Note on Ploidy again) in the focal-cn-file-preparation steps for SEG files (CNVkit, consensus SEG). In the consensus case at least, we can potentially use the seg.mean values to guide us with any kind of thresholding - see this section of the notebook for processing the consensus SEG file: https://alexslemonade.github.io/OpenPBTA-analysis/analyses/focal-cn-file-preparation/02-add-ploidy-consensus.nb.html#does_segmean_agree_with_status
Related to the two points above - the neutral segments (e.g., segments that are not in cnv_consensus.tsv) don't have seg.mean values (Include neutral changes in cnv consensus .seg file #476). It "shouldn't be terrible" ™️ to essentially 'backfill' these regions with the seg.mean values from pbta-cnv-cnvkit.seg.gz.
As far as looking at genes that are affected with copy number alterations (this is where we do that work), we only look at exons. Based on what was written up on Update focal CN file prep to use exons again and cover the consensus SEG case #479, it seems reasonable to attempt to include the promoter regions.

I'll note that #387 is also related to all of this!

The text was updated successfully, but these errors were encountered:

jaclyn-taroni added updated analysis improvement discussion labels Jan 29, 2020

This was referenced Jan 30, 2020

Subtype chordoma #475

Merged

Updated analysis: CNVkit consensus calls when tumor_ploidy > 2 #501

Open

Updated analysis: "more nuanced" copy number calls #502

Open

jashapiro mentioned this issue Feb 3, 2020

Updated analysis: Backfill seg.mean into neutral segments #504

Closed

cbethell mentioned this issue Mar 23, 2020

Define most focal units of recurrent CNVs #644

Merged

5 tasks

cbethell mentioned this issue May 12, 2020

Find recurrent focal CN calls #686

Merged

5 tasks

jaclyn-taroni mentioned this issue Apr 19, 2021

Updated analysis: Neutral region called as losses when compared to ploidy #1010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: possible improvements for copy number processing / analyses #486

Discussion: possible improvements for copy number processing / analyses #486

jaclyn-taroni commented Jan 29, 2020

Discussion: possible improvements for copy number processing / analyses #486

Discussion: possible improvements for copy number processing / analyses #486

Comments

jaclyn-taroni commented Jan 29, 2020