1.
Predicting enzymatic pathway type/class from partially/fully annotated enzymes**.2.
Identify similar, existing pathways from KEGG and MetaCyc, given a set of partially/fully annotated enzymes**.
**from a prokaryotic/eukaryotic organism
-
ecdata
contains all relevant unlabeled data from BioCyc and MetaCyc (scrapped from webpages && downloaded from API) -
labeldata
contains all relevant labeled data from KEGG and MetaCyc -
src
contains all the modeling/scrapping scripts -
Flaskapp
contains flask templates and scripts for webserver -
tests
contains unit tests
-
ExtractUnlabeledData
class to extract unlabeled data from BioCyc and MetaCyc -
SampleUnlabeledData
class to expand unlabeled data using resampling, child ofExtractUnlabeledData
-
PathwayScrapper
class to extract labeled data from MetaCyc -
ExtractLabeledData
class to process and clean labeled data from MetaCyc webpages -
BalanceLabelData
class to balance labeled data
-
CreateEmbedding
class to create fasttext embeddings -
ParameterizeEmbedding
class to explore hyperparameters for creating embeddigns, child ofCreateEmbedding
-
ClusterEmbedding
class to cluster enzyme embedding vectors, child ofCreateEmbedding
-
ClusterPWYEmbedding
class to cluster pathway embedding vectors, child ofCreateEmbedding
-
ModelVectors
class to get dependent and independent variables for building models -
BuildClassicalModel
class to set up and run classical ML models using scikit-learn, child ofModelVectors
-
BuildControlModels
class to set up and run classical ML models on control datasets, child ofBuildClassicalModel
-
BuildAnnot3Models
class to set up and run classical ML models on datasets build using 3 digit EC number annotations, child ofBuildClassicalModel
-
BuildAnnot2Models
class to set up and run classical ML models on datasets build using 2 digit EC number annotations, child ofBuildClassicalModel
-
BuildAnnot1Models
class to set up and run classical ML models on datasets build using 1 digit EC number annotations, child ofBuildClassicalModel
DNNModel
class to create, parameterize and test LSTM based neural network model, child ofModelVectors
-
Metrics
class to define all required metrics -
EvaluateMetrics
class evaluate output from ML model, child ofMetrics
andModelVectors
-
EvaluateControl
class evaluate output from ML model for control datasets, child ofEvaluateMetrics
-
EvaluateAnnot3
class to evaluate output from ML model for datasets build using 3 digit EC number annotations, child ofEvaluateMetrics
-
EvaluateAnnot2
class to evaluate output from ML model for datasets build using 2 digit EC number annotations, child ofEvaluateMetrics
-
EvaluateAnnot1
class to evaluate output from ML model for datasets build using 1 digit EC number annotations, child ofEvaluateMetrics
-
combinedEvaluations
class to compare and contrast output from all datasets, child ofMetrics
andModelVectors
-
MetricsDNN
class to define all required metrics for NN output -
EvaluateMetricsDNN
class evaluate output from DNN model, child ofMetricsDNN
andModelVectors
-
EvaluateMetricsAnnot3DNN
class evaluate output from DNN model on datasets build using 1 digit EC number annotations, child ofEvaluateMetricsDNN
-
ExtractKEGGData
class to extract and evaluate KEGG data -
ExtractKEGGDataControl
class to evalate KEGG data on control models, child ofExtractKEGGData
Paper
: Predicting ontology of enzymatic pathways using transfer learning
Author(s)
: Sai J. Ganesan, Andrej Sali
Citation
: TBD