Add exploratory analyses of mutation data #22

dhimmel · 2016-09-06T16:01:16Z

This pull request is based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup. Tagging @Mike1906 @stephenshank, @drolejoel, @linzho, who were part of this group (we'd love your feedback).

Specifically, I'd like feedback on interested cancer genes where we expect to see mutation status segregate with disease. For example, the present notebook shows the enrichment of VHL for kidney clear cell carcinoma.

Based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup (https://www.meetup.com/DataPhilly/events/233403001/).

gwaybio · 2016-09-07T13:05:15Z

BRAF should segregate to melanoma and subsets of lung cancer

BRAFV600E should be a good test for the machine learning group once we get the columns mentioned in #16

Can also visualize BRCA1 and BRCA2 - will largely segregate into breast and subsets of ovarian, cervical, and uterine cancers as well.

gwaybio · 2016-09-07T13:20:38Z

Can you also add ALK - should segregate into subsets of lung cancer. ALK is interesting because it is inactivated usually by chromosomal rearrangements and I suspect a gene expression signature for ALK inactivation could be interesting

gwaybio · 2016-09-07T13:23:15Z

could possibly incorporate COSMIC here too

linzho · 2016-09-07T13:46:14Z

You can also look at MEN1 and RET, genes which is associated with a lot of neuroendocrine things (pancreas, pituitary, parathyroid, medullary thyroid, pheochromocytoma)

Are you interested in genes associated with cancers in general, or genes where we might expect that the majority of cancers segregate with a single gene?

dhimmel · 2016-09-07T14:10:14Z

Are you interested in genes associated with cancers in general, or genes where we might expect that the majority of cancers segregate with a single gene?

@linzho both. Since this is an exploratory analysis, I'm just looking to look!

dhimmel · 2016-09-07T14:15:54Z

@linzho & @gwaygenomics thanks for your suggestions. I added them to the heatmap in 29c926a, which now looks like this:

I also scaled the mutation rates for each gene by the max mutation rate. Note that there is still the outstanding issue that some diseases harbor more mutations (see row-wise bands above & cognoma/machine-learning#8).

gwaybio · 2016-09-07T15:04:35Z

would it be useful to add functionality to the script? if the final output is the mutation by tissue heatmap could you add an argparse argument? So the above graph would be generated like:

python scripts/3.explore-mutations.py --gene-list "BRCA2,ALK,CD274,MEN1,VHL,RET,TP53,BRCA1"

just a thought

dhimmel · 2016-09-07T15:20:14Z

@gwaygenomics I have a slightly different philosophy here.

scripts/3.explore-mutations.py is an auto-exported script version of the notebook for diff viewing. So all code changes should be done to the notebook. Passing args to the notebook doesn't make sense because you should be able to use notebooks interactively.

So one option is to create a python module, e.g. heatmap.py which has a function that 3.explore-mutations.ipynb would call and has a __main__ that could enable script execution. However, I don't really see a major benefit that justifies the added complexity. If you want to add more genes, you can just open the notebook and add genes to the dictionary.

IMO, notebooks are better than scripts with arguments for agile data science.

gwaybio · 2016-09-07T15:25:38Z

got it - i agree for this script.

Although I do think that moving towards this philosophy in terms of thinking about functionality for how a user will visualize input genes and input tissues (i.e. the frontend/cancer data discussion yesterday - see cognoma/frontend#12) will be important.

LGTM 👍

Evaluate covariate-only classifiers for the interesting mutations compiled in cognoma/cancer-data#22 (comment). Switches to an expand grid system for evaluating all possible covariate combinations. Plot performance of all covariates on each mutation. Switches to `covariates.tsv` created in cognoma/cancer-data#24 for encoded covariates.

* Evaluate performance of covariates on TP53 Creates an explore directory and README for this type of exploratory notebook. See how well covariates (non-expression features) predict TP53 mutation. Related to #8: General mutation-load does provide some ability to predict mutation status of TP53. Partially addresses #21: Covariates are extracted from samples.tsv. * Evaluate more covariate/mutation combinations Evaluate covariate-only classifiers for the interesting mutations compiled in cognoma/cancer-data#22 (comment). Switches to an expand grid system for evaluating all possible covariate combinations. Plot performance of all covariates on each mutation. Switches to `covariates.tsv` created in cognoma/cancer-data#24 for encoded covariates. * Export clean notebook to script * Address review comments

Add exploratory analyses of mutation data

a3c1cf9

Based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup (https://www.meetup.com/DataPhilly/events/233403001/).

Add genes to mutation frequency by disease heatmap

29c926a

dhimmel merged commit 67f8032 into cognoma:master Sep 7, 2016

dhimmel mentioned this pull request Sep 19, 2016

Evaluate performance of covariates at predicting various mutations cognoma/machine-learning#47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add exploratory analyses of mutation data #22

Add exploratory analyses of mutation data #22

dhimmel commented Sep 6, 2016

gwaybio commented Sep 7, 2016

gwaybio commented Sep 7, 2016

gwaybio commented Sep 7, 2016

linzho commented Sep 7, 2016

dhimmel commented Sep 7, 2016

dhimmel commented Sep 7, 2016 •

edited

Loading

gwaybio commented Sep 7, 2016

dhimmel commented Sep 7, 2016

gwaybio commented Sep 7, 2016

Add exploratory analyses of mutation data #22

Add exploratory analyses of mutation data #22

Conversation

dhimmel commented Sep 6, 2016

gwaybio commented Sep 7, 2016

gwaybio commented Sep 7, 2016

gwaybio commented Sep 7, 2016

linzho commented Sep 7, 2016

dhimmel commented Sep 7, 2016

dhimmel commented Sep 7, 2016 • edited Loading

gwaybio commented Sep 7, 2016

dhimmel commented Sep 7, 2016

gwaybio commented Sep 7, 2016

dhimmel commented Sep 7, 2016 •

edited

Loading