Add cv csvs paper #191

DarioMarzella · 2024-06-28T15:09:32Z

No description provided.

gcroci2

Thanks for the PR :) I left few comments, and these are additional remarks:

Remove the two init files added with this PR
Clarify (ideally with files names) the difference between MLP.py, SeqBased_model.py, mlp_baseline.py
In 6_test_cases, it's really hard to get which script does what, and deeprank2 files are still there (should be removed). I'd try to make the stucture of the folder clearer

gcroci2 · 2024-07-01T08:53:10Z

src/5_train_models/DeepRank2/GNN/run_pre-trained_testing.sh

@@ -0,0 +1,13 @@
+#!/bin/bash


What is this file? I think it should't be here. Also, the structure in the paper branch is the following:
, please move the files accordingly

gcroci2 · 2024-07-01T08:54:48Z

src/5_train_models/seq/mlp_baseline.py

@@ -46,12 +46,20 @@
 arg_parser.add_argument("--csv-file", "-f",
    help="Name of the csv file in data/external/processed containing the cluster column. \n \
        Works as a train and validation set if provided with --test-csv.",
-    default="/home/daqop/mountpoint_snellius/3D-Vac/data/external/processed/all_hla_j_4.csv"
+    default=False, #"/home/daqop/mountpoint_snellius/3D-Vac/data/external/processed/all_hla_j_4.csv"


What is the difference between this file and MLP.py?

gcroci2 · 2024-07-01T08:59:01Z

src/5_train_models/str/PyTorch/process_hbv_testcases.py

+import torch
+
+
+def parse_pdb_dataset(folder, return_tuple=False, one_hot=False, residue_level=True, radius_pocket=10, elements=['C', 'N', 'O', 'S', 'P'], exclude_elements=['H']):


We have a folder called 6_test_cases, I think this file should be moved there

gcroci2 · 2024-07-01T09:01:22Z

src/5_train_models/str/PyTorch/test_hbv.py

+
+from egnn import EGNNModel
+from data_proccess_fn import data_process_fn
+


As above, I think this file should be move to the folder 6_test_cases

gcroci2 · 2024-07-01T09:03:08Z

src/6_test_cases/train.py

@@ -0,0 +1,355 @@
+import numpy as np


I think that the train file should not be in this folder, that should have test-related scripts only

gcroci2 · 2024-07-01T09:10:12Z

src/exploration/manuscript/code/get_paper_csvs.ipynb

@@ -0,0 +1,675 @@
+{


We don't need these two subfolders (manuscript and code). We can keep only only subfolder and call it in a more descriptive way, like paper_plots. Make sure that all the notebooks used for the actual paper's plots are there. In general, the two notebooks in code could be made clearer (even just by adding meaningful subsections title)

…ally switched with labels). Fix CNN output (previously including both test and validation)

DarioMarzella added 13 commits June 4, 2024 11:30

Add scripts to make crossvalidation data splits and misc scripts

b968ef3

Add crossval code for CNN and EGNN

5c49ecb

Edit MLP code to take train, valid and test csvs for crossvalidations

28ec7cd

change make_crossval_csvs.py to use original allele clustering csv file

f921747

Update CNN scripts to include early stopping for cross validations

a2c2e71

Updating MLP scripts for consistent crossvalidations

9c7e01f

Add scripts for cv csvs

3add327

Update code for crossvalidation

33ced2a

Add revised MLP code

8cfc3ec

Add EGNN conde for HBV testing and update crossvalidation code

1842628

Add EGNN HBV code. Update .gitignore to ignore data folders.

00999bb

remove old plotting scripts

955264d

Update gitignore

b637a1d

DarioMarzella requested a review from gcroci2 July 1, 2024 08:38

remove old code to run mhcflurry

b17a8c0

gcroci2 requested changes Jul 1, 2024

View reviewed changes

DarioMarzella added 4 commits July 1, 2024 16:13

Add Fig1A plotting to dendrograms.ipynb

e83a3a6

Add scatterplot to auc per allele plot. Make barplots colorblind-safe

cba7e9c

Update figures to editorial requests

332965a

Fix zenodo published data. Fix MLP output column (previously accident…

81c2db5

…ally switched with labels). Fix CNN output (previously including both test and validation)

gcroci2 approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cv csvs paper #191

Add cv csvs paper #191

DarioMarzella commented Jun 28, 2024

gcroci2 left a comment

gcroci2 Jul 1, 2024

gcroci2 Jul 1, 2024

gcroci2 Jul 1, 2024

gcroci2 Jul 1, 2024

gcroci2 Jul 1, 2024

gcroci2 Jul 1, 2024

		import torch


		def parse_pdb_dataset(folder, return_tuple=False, one_hot=False, residue_level=True, radius_pocket=10, elements=['C', 'N', 'O', 'S', 'P'], exclude_elements=['H']):


		from egnn import EGNNModel
		from data_proccess_fn import data_process_fn

Add cv csvs paper #191

Are you sure you want to change the base?

Add cv csvs paper #191

Conversation

DarioMarzella commented Jun 28, 2024

gcroci2 left a comment

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment

gcroci2 Jul 1, 2024

Choose a reason for hiding this comment