ESM-2 for Binding Site Prediction

Using ESM-2 protein language model to predict binding sites of proteins from their sequence alone. I use the 6 layer, 8M parameter esm2_t6_8M_UR50D, trained on this dataset. See here and here for more details on the pre-trained model.

The goal is to use single sequence only (no MSA) protein language models for binary token classification tasks like predicting binding and active sites of protein sequences based on sequence alone. The model was finetuned with and without LoRA for the binary token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.

On data (protein sequence) pre-processing: check notebook

Acknowledgements

A great deal of heavy work with ESM2 models has been done by Amelie Schreiber on hugging face, whose blog posts (and code) have inspired this repository. My contribution is integrating these models with the DeepChem open source framework. This work was sponsored by the Google Summer of Code program.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Untitled.ipynb		Untitled.ipynb
data_preprocessing_notebook_v1.ipynb		data_preprocessing_notebook_v1.ipynb
esm2_vanilla.ipynb		esm2_vanilla.ipynb
esm2xdeepchem.ipynb		esm2xdeepchem.ipynb
metrics.py		metrics.py
requirements.txt		requirements.txt
testing_and_inference.ipynb		testing_and_inference.ipynb
testing_esmb.ipynb		testing_esmb.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESM-2 for Binding Site Prediction

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

elisagdelope/ESM2-bindingsites

Folders and files

Latest commit

History

Repository files navigation

ESM-2 for Binding Site Prediction

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages