-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exploratory analyses of mutation data #22
Conversation
Based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup (https://www.meetup.com/DataPhilly/events/233403001/).
Can you also add |
could possibly incorporate COSMIC here too |
You can also look at MEN1 and RET, genes which is associated with a lot of neuroendocrine things (pancreas, pituitary, parathyroid, medullary thyroid, pheochromocytoma) Are you interested in genes associated with cancers in general, or genes where we might expect that the majority of cancers segregate with a single gene? |
@linzho both. Since this is an exploratory analysis, I'm just looking to look! |
@linzho & @gwaygenomics thanks for your suggestions. I added them to the heatmap in 29c926a, which now looks like this: I also scaled the mutation rates for each gene by the max mutation rate. Note that there is still the outstanding issue that some diseases harbor more mutations (see row-wise bands above & cognoma/machine-learning#8). |
would it be useful to add functionality to the script? if the final output is the mutation by tissue heatmap could you add an python scripts/3.explore-mutations.py --gene-list "BRCA2,ALK,CD274,MEN1,VHL,RET,TP53,BRCA1" just a thought |
@gwaygenomics I have a slightly different philosophy here.
So one option is to create a python module, e.g. IMO, notebooks are better than scripts with arguments for agile data science. |
got it - i agree for this script. Although I do think that moving towards this philosophy in terms of thinking about functionality for how a user will visualize input genes and input tissues (i.e. the frontend/cancer data discussion yesterday - see cognoma/frontend#12) will be important. LGTM 👍 |
Evaluate covariate-only classifiers for the interesting mutations compiled in cognoma/cancer-data#22 (comment). Switches to an expand grid system for evaluating all possible covariate combinations. Plot performance of all covariates on each mutation. Switches to `covariates.tsv` created in cognoma/cancer-data#24 for encoded covariates.
* Evaluate performance of covariates on TP53 Creates an explore directory and README for this type of exploratory notebook. See how well covariates (non-expression features) predict TP53 mutation. Related to #8: General mutation-load does provide some ability to predict mutation status of TP53. Partially addresses #21: Covariates are extracted from samples.tsv. * Evaluate more covariate/mutation combinations Evaluate covariate-only classifiers for the interesting mutations compiled in cognoma/cancer-data#22 (comment). Switches to an expand grid system for evaluating all possible covariate combinations. Plot performance of all covariates on each mutation. Switches to `covariates.tsv` created in cognoma/cancer-data#24 for encoded covariates. * Export clean notebook to script * Address review comments
This pull request is based on a preliminary notebook we created at the 2016-08-23 Cognoma Meetup. Tagging @Mike1906 @stephenshank, @drolejoel, @linzho, who were part of this group (we'd love your feedback).
Specifically, I'd like feedback on interested cancer genes where we expect to see mutation status segregate with disease. For example, the present notebook shows the enrichment of VHL for kidney clear cell carcinoma.