-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cv csvs paper #191
Open
DarioMarzella
wants to merge
18
commits into
paper
Choose a base branch
from
add_cv_csvs_paper
base: paper
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add cv csvs paper #191
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
b968ef3
Add scripts to make crossvalidation data splits and misc scripts
DarioMarzella 5c49ecb
Add crossval code for CNN and EGNN
DarioMarzella 28ec7cd
Edit MLP code to take train, valid and test csvs for crossvalidations
DarioMarzella f921747
change make_crossval_csvs.py to use original allele clustering csv file
DarioMarzella a2c2e71
Update CNN scripts to include early stopping for cross validations
DarioMarzella 9c7e01f
Updating MLP scripts for consistent crossvalidations
DarioMarzella 3add327
Add scripts for cv csvs
DarioMarzella 33ced2a
Update code for crossvalidation
DarioMarzella 8cfc3ec
Add revised MLP code
DarioMarzella 1842628
Add EGNN conde for HBV testing and update crossvalidation code
DarioMarzella 00999bb
Add EGNN HBV code. Update .gitignore to ignore data folders.
DarioMarzella 955264d
remove old plotting scripts
DarioMarzella b637a1d
Update gitignore
DarioMarzella b17a8c0
remove old code to run mhcflurry
DarioMarzella e83a3a6
Add Fig1A plotting to dendrograms.ipynb
DarioMarzella cba7e9c
Add scatterplot to auc per allele plot. Make barplots colorblind-safe
DarioMarzella 332965a
Update figures to editorial requests
DarioMarzella 81c2db5
Fix zenodo published data. Fix MLP output column (previously accident…
DarioMarzella File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -134,3 +134,8 @@ dmypy.json | |
|
||
# slurm | ||
.out | ||
.err | ||
|
||
# data and reports | ||
data/ | ||
reports/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,251 @@ | ||
import warnings | ||
from sklearn import metrics | ||
import numpy as np | ||
# info | ||
# https://en.wikipedia.org/wiki/Precision_and_recall | ||
|
||
|
||
def sensitivity(yp, yt): | ||
"""sensitivity, recall or true positive rate (TPR) | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: sensitivity value | ||
""" | ||
tp = true_positive(yp, yt) | ||
p = positive(yt) | ||
if p == 0: | ||
tpr = float('inf') | ||
warnings.warn( | ||
f'Number of positive cases is 0, ' | ||
f'TPR or sensitivity is assigned as inf') | ||
else: | ||
tpr = tp / p | ||
return tpr | ||
|
||
|
||
def specificity(yp, yt): | ||
"""specificity, selectivity or true negative rate (TNR) | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: specificity value | ||
""" | ||
tn = true_negative(yp, yt) | ||
n = negative(yt) | ||
if n == 0: | ||
warnings.warn( | ||
f'Number of negative cases is 0, ' | ||
f'TNR or specificity is assigned as inf') | ||
tnr = float('inf') | ||
else: | ||
tnr = tn / n | ||
return tnr | ||
|
||
|
||
def precision(yp, yt): | ||
"""precision or positive predictive value (PPV) | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: precision value | ||
""" | ||
tp = true_positive(yp, yt) | ||
fp = false_positive(yp, yt) | ||
tp, fp = map(np.float64, [tp, fp]) | ||
if tp + fp == 0: | ||
warnings.warn( | ||
f'Total number of true positive and false positive cases is 0, ' | ||
f'PPV or precision is assigned as inf') | ||
ppv = float('inf') | ||
else: | ||
ppv = tp / (tp + fp) | ||
return ppv | ||
|
||
|
||
def accuracy(yp, yt): | ||
"""Accuracy. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: accuracy value | ||
""" | ||
tp = true_positive(yp, yt) | ||
tn = true_negative(yp, yt) | ||
p = positive(yt) | ||
n = negative(yt) | ||
tp, tn, p, n = map(np.float64, [tp, tn, p, n]) | ||
acc = (tp + tn) / (p + n) | ||
return acc | ||
|
||
|
||
def F1(yp, yt): | ||
"""F1 score. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: F1 score | ||
""" | ||
tp = true_positive(yp, yt) | ||
fp = false_positive(yp, yt) | ||
fn = false_negative(yp, yt) | ||
tp, fp, fn = map(np.float64, [tp, fp, fn]) | ||
f1 = 2 * tp / (2 * tp + fp + fn) | ||
return f1 | ||
|
||
def mcc(yp, yt): | ||
"""Matthews correlation coefficient (MCC) | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: MCC value | ||
""" | ||
tp = true_positive(yp, yt) | ||
tn = true_negative(yp, yt) | ||
fp = false_positive(yp, yt) | ||
fn = false_negative(yp, yt) | ||
tp, tn, fp, fn = map(np.float64, [tp, tn, fp, fn]) | ||
|
||
with np.errstate(invalid='raise'): | ||
try: | ||
mcc = (tp * tn - fp * fn) / np.sqrt((tp + fp) * (tp + fn) * (tn + fp) * (tn + fn)) | ||
except FloatingPointError as e: | ||
# if denominator is zero and causes an error, set it to 1 (source: https://en.wikipedia.org/wiki/Phi_coefficient) | ||
mcc = (tp * tn - fp * fn) / 1 | ||
|
||
return mcc | ||
|
||
def roc_auc(yp, yt): | ||
"""compute roc auc with sklearn | ||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
Returns: | ||
float: roc auc | ||
""" | ||
return metrics.roc_auc_score(np.expand_dims(yt,1), yp) | ||
|
||
def tpr_fpr_thresholds(yp, yt): | ||
"""compute arrays of true positive rate and false positive rate | ||
with sklearn can be used for plotting roc curves and computing roc auc | ||
|
||
Args: | ||
yp (ndarray): probabilities for all indices | ||
yt (ndarray): true labels for all indices | ||
|
||
Returns: | ||
np.array: true positive rate for each threshold in [0, 0.001.., 1] | ||
np.array: false positive rate for each threshold in [0, 0.001.., 1] | ||
""" | ||
fprs, tprs, _ = metrics.roc_curve(np.expand_dims(yt,1), yp) | ||
|
||
return tprs, fprs | ||
|
||
def rmse(yp, yt): | ||
"""_summary_ | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
|
||
Returns: | ||
float: Root Mean Squared Error (RMSE) score | ||
""" | ||
return np.sqrt(np.sum(((yp - yt)**2)/yp.size)) | ||
|
||
def true_positive(yp, yt): | ||
"""number of true positive cases. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
""" | ||
yp, yt = _to_bool(yp), _to_bool(yt) | ||
tp = np.logical_and(yp, yt) | ||
return(np.sum(tp)) | ||
|
||
|
||
def true_negative(yp, yt): | ||
"""number of true negative cases. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
""" | ||
yp, yt = _to_bool(yp), _to_bool(yt) | ||
tn = np.logical_and(yp == False, yt == False) | ||
return(np.sum(tn)) | ||
|
||
|
||
def false_positive(yp, yt): | ||
"""number of false positive cases. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
""" | ||
yp, yt = _to_bool(yp), _to_bool(yt) | ||
fp = np.logical_and(yp, yt == False) | ||
return(np.sum(fp)) | ||
|
||
|
||
def false_negative(yp, yt): | ||
"""number of false false cases. | ||
|
||
Args: | ||
yp (array): predictions | ||
yt (array): targets | ||
""" | ||
yp, yt = _to_bool(yp), _to_bool(yt) | ||
fn = np.logical_and(yp == False, yt == True) | ||
return(np.sum(fn)) | ||
|
||
|
||
def positive(yt): | ||
"""The number of real positive cases. | ||
|
||
Args: | ||
yt (array): targets | ||
""" | ||
yt = _to_bool(yt) | ||
return np.sum(yt) | ||
|
||
|
||
def negative(yt): | ||
"""The nunber of real negative cases. | ||
|
||
Args: | ||
yt (array): targets | ||
""" | ||
yt = _to_bool(yt) | ||
return(np.sum(yt == False)) | ||
|
||
|
||
def _to_bool(x): | ||
"""convert array values to boolean values. | ||
|
||
Args: | ||
x (array): values should be 0 or 1 | ||
|
||
Returns: | ||
array: boolean array | ||
""" | ||
return x.astype(bool) |
13 changes: 13 additions & 0 deletions
13
src/5_train_models/DeepRank2/GNN/run_pre-trained_testing.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
#SBATCH --job-name split_h5 | ||
#SBATCH --partition thin | ||
#SBATCH -o /projects/0/einf2380/data/test_logs/test_erasmusmcData-%J.out | ||
#SBATCH -e /projects/0/einf2380/data/test_logs/test_erasmusmcData-%J.err | ||
#SBATCH --nodes 1 | ||
#SBATCH --ntasks-per-node=1 | ||
#SBATCH --cpus-per-task=96 | ||
#SBATCH --time=01:00:00 | ||
|
||
|
||
source activate dr2 | ||
python -u pre-trained_testing.py |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file? I think it should't be here. Also, the structure in the paper branch is the following:
, please move the files accordingly