Skip to content

Commit

Permalink
updated to most recent AZ version
Browse files Browse the repository at this point in the history
  • Loading branch information
Löffler, Hannes committed Feb 20, 2024
1 parent 5a462e9 commit 5bc8b35
Show file tree
Hide file tree
Showing 32 changed files with 1,072 additions and 269 deletions.
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,78 @@ This follows the guideline on [keep a changelog](https://keepachangelog.com/)
- CAZP scoring component


## [4.1.8] 2024-02-20

### Fixed

- Tensorboard histogram bug fixed again


## [4.1.7] 2024-02-20

### Fixed

- TL is now running for the expected `num_epochs`


## [4.1.6] 2024-02-19

### Fixed

- Get model\_type from save\_dict prior model correctly


## [4.1.5] 2024-02-13

### Fixed

- Staged learning does not allocate GPU memory if device is set to CPU


## [4.1.4] 2024-02-06

### Added

- Prior model files have been tagged with meta data
- Model files read in are checked for integrity


## [4.1.3] 2024-02-06

### Fixed

- Tab reader unit tests now uses mocks for open
- Wite correctly CSV scoring file when from one columns SMILES file


## [4.1.2] 2024-02-04

### Fixed

- Scoring filter components work as filters again


## [4.1.1] 2024-02-02

### Added

- CSV and SMILES file reader for the scoring run mode, will retain all columns form the input and write to output CSV


## [4.1.0] 2024-01-26

### Added

- Tobias Ploetz' (Merck) REINFORCE implementations of the DAP, MAULI and MASCOF RL reward strategies


## [4.0.36] 2024-01-22

### Added

- Check if RDKit descriptor names are valid


## [4.0.35] 2024-01-19

### Fixed
Expand Down
91 changes: 56 additions & 35 deletions configs/toml/SCORING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This is a list of currently supported scoring components together with their
parameters.

* Qed: QED drug-likeness score (RDKit)
Basic molecular physical properties:
* SlogP: Crippen SLogP (RDKit)
* MolecularWeight: molecular weight (RDKit)
* TPSA: topological polar surface area (RDKit)
Expand All @@ -21,28 +21,74 @@ parameters.
* NumRings: number of total rings (RDKit)
* NumAromaticRings: number of aromatic rings (RDKit)
* NumAliphaticRings: number of aliphatic rings (RDKit)
* PMI: principal moment of inertia to assess dimensionality (RDKit)
* _property_: "npr1" or "npr2" to choose index
* MolVolume: Molecular volume (RDKIT)

Similiarity and cheminformatics components:
* CustomAlerts: list of undesired SMARTS patterns
* _smarts_: SMARTS patterns
* GroupCount: count how many times the SMARTS pattern is found
* _smarts_: SMARTS pattern
* MatchingSubstructure: penalty applied to final score when SMARTS pattern is found
* _smarts_: list of SMARTS patterns
* _use_chirality_: check for chirality
* PMI: principal moment of inertia to assess dimensionality (RDKit)
* _property_: "npr1" or "npr2" to choose index
* DockStream: generic docking interface for AutoDock Vina, rDock,
OpenEye's Hybrid, Schrodinger's Glide and CCDC's GOLD
* _configuration_path_: path for the Dockstream config json
* _docker_script_path_: location of Dockstream "AZdock/docker.py" file
* _docker_python_path_: python interpreter with Dockstream install, e.g. conda/envs/envname/bin/python
* TanimotoDistance: Tanimoto distance using the Morgan fingerprint (RDKit)
* _smiles_: list of SMILES to match against
* _radius_: Morgan fingerprint radius
* _use_counts_: Morgan fingerprint, whether to use counts
* _use_features_: Morgan fingerprint, whether to use featurs
* _use_features_: Morgan fingerprint, whether to use features
* MMP: Matched Molecular Pair Similarity. Use with _value\_mapping_ score transform, returns "MMP" or "No MMP"
* _reference\_smiles_: list of reference SMILES to be similar to
* _num\_of\_cuts_: number of bonds to cut in fragmentation (default 1)
* _max\_variable\_heavies_: max heavy atom change in MMPs (default 40)
* _max\_variable\_ratio_: max ratio of heavy atoms in MMPs (default 0.33)

Physics/structure/ligand based components:
* ROCSSimilarity: OpenEye ROCS
* _color\_weight_: float between 0-1, default 0.5, weighting between shape and color scores
* _shape\_weight_: float between 0-1, default 0.5, weighting between shape and color scores
* _custom\_cff_: path to custom ROCs forecfield, optional
* _max\_stereocenters_: max number of stereo centers to enumerate
* _ewindow_: energy window for conformers (kJ/mol)
* _maxconfs_: max number of confs per compound
* _rocs\_input_: input file, sdf or sq
* _similarity\_measure_: how to compare shapes. Must be Tanimoto, RefTversky or FitTversky
* DockStream: generic docking interface for AutoDock Vina, rDock,
OpenEye's Hybrid, Schrodinger's Glide and CCDC's GOLD (https://github.com/MolecularAI/DockStream). Superseded by MAIZE.
* _configuration_path_: path for the Dockstream config json
* _docker_script_path_: location of Dockstream "AZdock/docker.py" file
* _docker_python_path_: python interpreter with Dockstream install, e.g. conda/envs/envname/bin/python
* Icolos: generic interface to Icolos (https://github.com/MolecularAI/Icolos), Superseded by MAIZE
* _name_: label of the score to extract
* _executable_: Icolos executable
* _config_file_: JSON config file for Icolos
* MAIZE: generic interface to MAIZE (https://github.com/MolecularAI/maize)
* _executable_: MAIZE executable
* _workflow_: workflow file for MAIZE
* _config_: custrom MAIZE config file (optional)
* _debug_: bool, execute MAIZE with debug (default false)
* _keep_: bool, retrain intermediate MAIZE files (default false)
* _log_: path for MAIZE log file (optional)
* _parameters_: dictionary of workflow parameters to override (optional)

QSAR/QSPR model-related components:
* ChemProp: ChemProp D-MPNN models
* _checkpoint_dir_: checkpoint directory with the models
* _rdkit_2d_normalized_: whether to use RDKit 2D normalization
* CustomAlerts: SMARTS substructure filter applied to the total score
* _smarts_: list of SMARTS
* Qptuna: QSAR models with Qptuna
* _mode\_file_: model file name

Scoring components about drug-likeness, synthesizability & reactions:
* Qed: QED drug-likeness score (RDKit)
* SAScore: Ertl's synthesizability score (higher is more difficult). based on https://doi.org/10.1186/1758-2946-1-8.
* ReactionFilter: reaction filter for Libinvent, applied to total score
* _tyoe_: filter type
* _reaction\_smarts_: RDKit reaction SMARTS

Generic scoring components:
* ExternalProcess: generic component to run an external process for scoring
* _executable_: name of the executable to run
* _args_: command line arguments for the executable
Expand All @@ -53,32 +99,8 @@ parameters.
* _predictor_id_: request paramter
* _predictor_version_: request paramter
* _header_: request header
* Icolos: generic interface to Icolos
* _name_: label of the score to extract
* _executable_: Icolos executable
* _config_file_: JSON config file for Icolos
* MMP: matched molecular pairs
* _reference\_smiles_:
* _num\_of\_cuts_:
* _max\_variable\_heavies_:
* _max\_variable\_ratio_:
* Qptuna: QSAR models with Qptuna
* _mode\_file_: model file name
* ReactionFilter: reaction filter for Libinvent, applied to total score
* _tyoe_: filter type
* _reaction\_smarts_: RDKit reaction SMARTS
* ROCSSimilarity: OpenEye ROCS
* _color\_weight_: float between 0-1, default 0.5, weighting between shape and color scores
* _shape\_weight_: float between 0-1, default 0.5, weighting between shape and color scores
* _custom\_cff_: path to custom ROCs forecfield, optional
* _max\_stereocenters_: max number of stereo centers to enumerate
* _ewindow_: energy window for conformers (kJ/mol)
* _maxconfs_: max number of confs per compound
* _rocs\_input_: input file, sdf or sq
* _similarity\_measure_: how to compare shapes. Must be Tanimoto, RefTversky or FitTversky
* SAScore: Ertl's synthesizability score (higher is more difficult). based on https://doi.org/10.1186/1758-2946-1-8.

Linkinvent specific physchem properties:
LinkInvent linker-specific physchem properties:
* FragmentMolecularWeight
* FragmentNumAliphaticRings
* FragmentGraphLength
Expand All @@ -100,7 +122,6 @@ Linkinvent specific physchem properties:
* _left\_step_, _right\_step_, _step_: one- and two-sided step functions
* _value\_mapping_: map labels/categories to numbers, number must be be in the range \[0.0, 1.0\]


## Aggregation functions

* _arithmetic\_mean_: weighted arithemtic mean
Expand Down
Loading

0 comments on commit 5bc8b35

Please sign in to comment.