Experimental deep learning architecture for scoring protein-protein interactions.
See PointNet paper for original architecture description. This implementation contains two architectures, neither of which contain the transformer networks, so can be considered variants of the vanilla version of PointNet. The first differs merely in its dropout rate (50%), whereas the second is a novel architecture called Siamese PointNet, visible in the image below.
Other adaptations include cosine annealing learning rate decay, which has been implemented to improve accuracy and generalizability of the trained network (see Stochastic Gradient Descent with Warm Restarts), and a custom loss function introducing a bias in learning towards higher scoring decoys.
- Python 3.x
- H5Py for fast data retrieval
- PyTorch <0.4 and its dependencies
- Data conversion uses DeepRank and its dependencies
- Seaborn for plotting
python train.py
--batch_size BATCH_SIZE Input batch size (default = 256)
--num_points NUM_POINTS Points per point cloud used (default = 1024)
--num_epoch NUM_EPOCH Number of epochs to train for (default = 15)
--CUDA Train on GPU
--out_folder OUT_FOLDER Model output folder
--model MODEL Model input path
--data_path DATA_PATH Path to HDF5 file
--lr LR Learning rate (default = 0.0001)
--optimizer OPTIMIZER What optimizer to use. Options: Adam, SGD, SGD_cos
--avg_pool Use average pooling for feature pooling (instead of default max pooling)
--dual Use Siamese PointNet architecture
--metric METRIC Metric to be used. Options: irmsd, lrmsd, fnat, dockQ (default)
--dropout DROPOUT Dropout rate in last layer. When 0 replaced by batchnorm (default = 0.5)
--root Apply square root on metric (for DockQ score balancing)
--patience PATIENCE Number of epochs to observe overfitting before early stopping
--classification Classification instead of regression
The network takes the atoms taking part in an interaction as point cloud data. Data conversion can be performed using the extract_pc.py script.
Data is saved in HDF5 format containing 3 groups: train, test and "holdout" data. Datasets within these groups contain atom features with float32 precision and attributes containing the iRMSD, lRMSD, FNAT, and DockQ scores.
- Architecture & training scripts have been fully implemented