You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
We at the CCDL have been thinking quite a bit about the copy number data this sprint (#479#485#480#482#463#467#476). Here I want to document some potential improvements that have come up during various discussions. I don't know if anything here will rise to the level of "must-have" and it may also be more appropriate to split individual notes/points into their own, more fleshed out tickets.
From a discussion with @jashapiro (please tell me if I've got this wrong 🙂 ) - anything that goes through copy_number_consensus_call with a copy number of 2 in CNVkit is essentially considered a neutral change because CNVkit doesn't take into account ploidy (A Note on Ploidy). So it's possible in some cases we are underreporting losses (or gains if there are haploid samples, but we don't think this is the case). Here's the distribution of tumor_ploidy from v13 of pbta-histologies.tsv for reference:
>histologies_df %>%
+ filter(experimental_strategy=="WGS",
+sample_type=="Tumor") %>%
+ group_by(tumor_ploidy) %>%
+ tally()
# A tibble: 3 x 2tumor_ploidyn<dbl><int>126902313434116
Related to the two points above - the neutral segments (e.g., segments that are not in cnv_consensus.tsv) don't have seg.mean values (Include neutral changes in cnv consensus .seg file #476). It "shouldn't be terrible" ™️ to essentially 'backfill' these regions with the seg.mean values from pbta-cnv-cnvkit.seg.gz.
We at the CCDL have been thinking quite a bit about the copy number data this sprint (#479 #485 #480 #482 #463 #467 #476). Here I want to document some potential improvements that have come up during various discussions. I don't know if anything here will rise to the level of "must-have" and it may also be more appropriate to split individual notes/points into their own, more fleshed out tickets.
copy_number_consensus_call
with a copy number of 2 in CNVkit is essentially considered a neutral change because CNVkit doesn't take into account ploidy (A Note on Ploidy). So it's possible in some cases we are underreporting losses (or gains if there are haploid samples, but we don't think this is the case). Here's the distribution oftumor_ploidy
from v13 ofpbta-histologies.tsv
for reference:We broadly call copy number changes, i.e., we'll only label an event as a loss without considering whether the loss is homozygous or hemizygous or how many copies are lost (see A Note on Ploidy again) in the
focal-cn-file-preparation
steps for SEG files (CNVkit, consensus SEG). In the consensus case at least, we can potentially use theseg.mean
values to guide us with any kind of thresholding - see this section of the notebook for processing the consensus SEG file: https://alexslemonade.github.io/OpenPBTA-analysis/analyses/focal-cn-file-preparation/02-add-ploidy-consensus.nb.html#does_segmean_agree_with_statusRelated to the two points above - the neutral segments (e.g., segments that are not in
cnv_consensus.tsv
) don't haveseg.mean
values (Include neutral changes in cnv consensus .seg file #476). It "shouldn't be terrible" ™️ to essentially 'backfill' these regions with theseg.mean
values frompbta-cnv-cnvkit.seg.gz
.As far as looking at genes that are affected with copy number alterations (this is where we do that work), we only look at exons. Based on what was written up on Update focal CN file prep to use exons again and cover the consensus SEG case #479, it seems reasonable to attempt to include the promoter regions.
I'll note that #387 is also related to all of this!
The text was updated successfully, but these errors were encountered: