Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate performance of covariates at predicting various mutations #47

Merged
merged 4 commits into from
Sep 22, 2016

Conversation

dhimmel
Copy link
Member

@dhimmel dhimmel commented Sep 15, 2016

Creates an explore directory and README for this type of exploratory notebook.

See how well covariates (non-expression features) predict TP53 mutation.

Related to #8: General mutation-load does provide some ability to predict mutation status of TP53.

Partially addresses #21: Covariates are extracted from samples.tsv.

Creates an explore directory and README for this type of exploratory notebook.

See how well covariates (non-expression features) predict TP53 mutation.

Related to cognoma#8:
General mutation-load does provide some ability to predict mutation status of
TP53.

Partially addresses cognoma#21:
Covariates are extracted from samples.tsv.
@cgreene
Copy link
Member

cgreene commented Sep 15, 2016

Related to #8: General mutation-load does provide some ability to predict mutation status of TP53.

This isn't too surprising given TP53's role in controlling cell cycle checkpoints. Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?

Evaluate covariate-only classifiers for the interesting mutations compiled in
cognoma/cancer-data#22 (comment).

Switches to an expand grid system for evaluating all possible covariate
combinations.

Plot performance of all covariates on each mutation.

Switches to `covariates.tsv` created in
cognoma/cancer-data#24 for encoded covariates.
@dhimmel
Copy link
Member Author

dhimmel commented Sep 19, 2016

Is this only true for genes in cell cycle checkpoint or DNA repair pathways, or is it also true for genes in proliferation pathways?

@cgreene, I updated the notebook to evaluate performance of covariate-only classifiers for the 8 interesting mutations we've previously considered. Here is performance of the models with all covariates included:

covariate-performance

So actually, TP53 is among the hardest to predict using only covariates. VHL which is highly disease-specific achieves a near-perfect AUROC. Therefore, without the disease or organ covariate, expression classifiers of VHL are likely just classifying kidney clear cell carcinoma / kidney tissue.

See this dataframe to get a general idea of covariate importance. Mutation load and disease type both seem important.

@cgreene
Copy link
Member

cgreene commented Sep 19, 2016

@dhimmel : Interesting! I can imagine that gene expression would clearly capture disease/organ, so if one usually had a mutation in a single gene that was relatively specific (e.g. VHL) I could imagine that creating a strong signal.

@dhimmel dhimmel assigned dhimmel and gwaybio and unassigned dhimmel Sep 19, 2016
@dhimmel dhimmel changed the title Evaluate performance of covariates on TP53 Evaluate performance of covariates at predicting various mutations Sep 19, 2016
@@ -0,0 +1,280 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you need to nbconvert this again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look again once it's easier to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. should be fixed now.

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like nbconvert confounding.ipynb has to be rerun

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting and informative analysis. My comments are generally cosmetic, but it would be nice to get some additional documentation on some of the functions!

@@ -0,0 +1,9 @@
# A directory for exploratory machine learning analyses

This directory is home is exploratory analyses that help answer questions about how we should do machine learning. For algorithm implementations see the [`algorithms`](../algorithms) directory. For other types of analyses, place them here.
Copy link
Member

@gwaybio gwaybio Sep 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This directory is home to exploratory analyses"?


# coding: utf-8

# # Create a logistic regression model to several mutations from covariates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Create a logistic regression model to predict several mutations from covariates"?

return pd.DataFrame.from_records(rows, columns=data_dict.keys())

mutations = {
'7157': 'TP53', # tumor protein p53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 says inline comments should be separated by two spaces

# In[8]:

def get_aurocs(X, y, series):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does series look like?

@dhimmel dhimmel merged commit cbc0604 into cognoma:master Sep 22, 2016
@dhimmel dhimmel deleted the confounding branch September 22, 2016 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants