Skip to content

Commit

Permalink
minor_changes updates
Browse files Browse the repository at this point in the history
  • Loading branch information
IanAWatson committed Oct 30, 2024
1 parent c7089fd commit f8d0873
Show file tree
Hide file tree
Showing 9 changed files with 531 additions and 48 deletions.
3 changes: 2 additions & 1 deletion contrib/bin/minor_changes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ if [[ ! -v LILLYMOL_HOME ]] ; then
fi

config=${LILLYMOL_HOME}/data/minor_changes/minor_changes.textproto
fragments=${LILLYMOL_HOME}/data/minor_changes/fragments.textproto

$LILLYMOL_HOME/bin/$(uname)/minor_changes -C ${config} "$@"
$LILLYMOL_HOME/bin/$(uname)/minor_changes -P UST:AY -F ${fragments} -C ${config} "$@"
5 changes: 4 additions & 1 deletion data/MolecularVariants/triphenyl_to_phenyl.rxn
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: "triphenyl_to_phenyl"
scaffold {
id: 0
smarts: "[CD4](c1ccccc1)(c1ccccc1)c1ccccc1"
smarts: "[CD4R0](c1ccccc1)(c1ccccc1)c1ccccc1"
break_bond {
a1: 0
a2: 1
Expand All @@ -12,4 +12,7 @@ scaffold {
}
remove_fragment: 1
remove_fragment: 7
match_conditions {
one_embedding_per_start_atom: true
}
}
60 changes: 60 additions & 0 deletions data/minor_changes/fragments.textproto
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
O[3038CH3] iso: ATYPE smi: "O[3038CH3]" par: "CHEMBL6974" nat: 2 n: 1680
N[6007CH]=N iso: ATYPE smi: "N[6007CH]=N" par: "CHEMBL509348" nat: 3 n: 1749
C[9001CH2]C iso: ATYPE smi: "C[9001CH2]C" par: "CHEMBL1201933" nat: 3 n: 1792
[3001NH2]CC iso: ATYPE smi: "[3001NH2]CC" par: "CHEMBL547407" nat: 3 n: 1835
[3038NH2]C iso: ATYPE smi: "[3038NH2]C" par: "CHEMBL454492" nat: 2 n: 1907
[3001SH]C iso: ATYPE smi: "[3001SH]C" par: "CHEMBL508166" nat: 2 n: 1944
C[21001NH]C iso: ATYPE smi: "C[21001NH]C" par: "CHEMBL12873" nat: 3 n: 2033
[6007CH3]CC iso: ATYPE smi: "[6007CH3]CC" par: "CHEMBL532645" nat: 3 n: 2255
O=[3038CH]C iso: ATYPE smi: "O=[3038CH]C" par: "CHEMBL548334" nat: 3 n: 2585
[6044CH3]C iso: ATYPE smi: "[6044CH3]C" par: "CHEMBL525153" nat: 2 n: 2624
N[3038CH]=N iso: ATYPE smi: "N[3038CH]=N" par: "CHEMBL1202102" nat: 3 n: 2652
N[3001CH3] iso: ATYPE smi: "N[3001CH3]" par: "CHEMBL1204421" nat: 2 n: 2662
[3001BrH] iso: ATYPE smi: "[3001BrH]" par: "CHEMBL502354" nat: 1 n: 2822
[6007OH]C iso: ATYPE smi: "[6007OH]C" par: "CHEMBL160210" nat: 2 n: 2871
[21001OH2] iso: ATYPE smi: "[21001OH2]" par: "CHEMBL1205831" nat: 1 n: 3041
O=[3001CH]N iso: ATYPE smi: "O=[3001CH]N" par: "CHEMBL503272" nat: 3 n: 3783
C[6007CH2]C iso: ATYPE smi: "C[6007CH2]C" par: "CHEMBL1203252" nat: 3 n: 3827
[3038SH]C iso: ATYPE smi: "[3038SH]C" par: "CHEMBL8394" nat: 2 n: 4939
N#[3001CH] iso: ATYPE smi: "N#[3001CH]" par: "CHEMBL6461" nat: 2 n: 5379
[3038IH] iso: ATYPE smi: "[3038IH]" par: "CHEMBL263058" nat: 1 n: 5744
[3001NH2]C iso: ATYPE smi: "[3001NH2]C" par: "CHEMBL1203109" nat: 2 n: 5948
C[3038CH2]C iso: ATYPE smi: "C[3038CH2]C" par: "CHEMBL540951" nat: 3 n: 6111
O=[6007CH]C iso: ATYPE smi: "O=[6007CH]C" par: "CHEMBL7892" nat: 3 n: 6240
O=[3038CH]N iso: ATYPE smi: "O=[3038CH]N" par: "CHEMBL1208835" nat: 3 n: 6397
C[3038NH]C iso: ATYPE smi: "C[3038NH]C" par: "CHEMBL593669" nat: 3 n: 6702
C[3001CH2]C iso: ATYPE smi: "C[3001CH2]C" par: "CHEMBL581501" nat: 3 n: 7459
[9001CH3]C iso: ATYPE smi: "[9001CH3]C" par: "CHEMBL1202061" nat: 2 n: 7513
[3038CH3]C iso: ATYPE smi: "[3038CH3]C" par: "CHEMBL154228" nat: 2 n: 7789
C[3001NH]C iso: ATYPE smi: "C[3001NH]C" par: "CHEMBL549128" nat: 3 n: 8249
[3001ClH] iso: ATYPE smi: "[3001ClH]" par: "CHEMBL8137" nat: 1 n: 8342
[3001CH3]OC iso: ATYPE smi: "[3001CH3]OC" par: "CHEMBL582360" nat: 3 n: 8497
[3001CH3]CC iso: ATYPE smi: "[3001CH3]CC" par: "CHEMBL1203224" nat: 3 n: 8908
N#[3038CH] iso: ATYPE smi: "N#[3038CH]" par: "CHEMBL500021" nat: 2 n: 9326
[3001OH]CC iso: ATYPE smi: "[3001OH]CC" par: "CHEMBL1202061" nat: 3 n: 9329
[18013OH2] iso: ATYPE smi: "[18013OH2]" par: "CHEMBL1161220" nat: 1 n: 9439
[6007CH3]C iso: ATYPE smi: "[6007CH3]C" par: "CHEMBL547643" nat: 2 n: 10078
O[3001CH3] iso: ATYPE smi: "O[3001CH3]" par: "CHEMBL507303" nat: 2 n: 10372
O[3038CH]=O iso: ATYPE smi: "O[3038CH]=O" par: "CHEMBL263193" nat: 3 n: 10456
[21001NH3] iso: ATYPE smi: "[21001NH3]" par: "CHEMBL580443" nat: 1 n: 10990
[6007OH2] iso: ATYPE smi: "[6007OH2]" par: "CHEMBL501969" nat: 1 n: 11541
[6044CH4] iso: ATYPE smi: "[6044CH4]" par: "CHEMBL385384" nat: 1 n: 15874
[3038OH]CC iso: ATYPE smi: "[3038OH]CC" par: "CHEMBL156037" nat: 3 n: 16134
O[3001CH]=O iso: ATYPE smi: "O[3001CH]=O" par: "CHEMBL503802" nat: 3 n: 17578
O=[3038NH]=O iso: ATYPE smi: "O=[3038NH]=O" par: "CHEMBL263879" nat: 3 n: 18032
[3001OH]C iso: ATYPE smi: "[3001OH]C" par: "CHEMBL503258" nat: 2 n: 21243
[21001CH4] iso: ATYPE smi: "[21001CH4]" par: "CHEMBL155263" nat: 1 n: 23642
[3001NH3] iso: ATYPE smi: "[3001NH3]" par: "CHEMBL4116111" nat: 1 n: 25451
[3001CH3]C iso: ATYPE smi: "[3001CH3]C" par: "CHEMBL1203224" nat: 2 n: 26652
[6007CH4] iso: ATYPE smi: "[6007CH4]" par: "CHEMBL1203109" nat: 1 n: 49040
[3038NH3] iso: ATYPE smi: "[3038NH3]" par: "CHEMBL501701" nat: 1 n: 52698
[3001OH2] iso: ATYPE smi: "[3001OH2]" par: "CHEMBL216546" nat: 1 n: 55955
[9001CH4] iso: ATYPE smi: "[9001CH4]" par: "CHEMBL1203132" nat: 1 n: 61112
[3038BrH] iso: ATYPE smi: "[3038BrH]" par: "CHEMBL268339" nat: 1 n: 61298
[3001FH] iso: ATYPE smi: "[3001FH]" par: "CHEMBL547542" nat: 1 n: 87151
[3038OH2] iso: ATYPE smi: "[3038OH2]" par: "CHEMBL503634" nat: 1 n: 118294
[3038OH]C iso: ATYPE smi: "[3038OH]C" par: "CHEMBL504077" nat: 2 n: 138279
[3038FH] iso: ATYPE smi: "[3038FH]" par: "CHEMBL405225" nat: 1 n: 205627
[3001CH4] iso: ATYPE smi: "[3001CH4]" par: "CHEMBL503865" nat: 1 n: 248905
[3038ClH] iso: ATYPE smi: "[3038ClH]" par: "CHEMBL441131" nat: 1 n: 272736
[3038CH4] iso: ATYPE smi: "[3038CH4]" par: "CHEMBL263810" nat: 1 n: 296770
2 changes: 2 additions & 0 deletions data/minor_changes/minor_changes.textproto
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ replace_inner_fragments: true
max_fragment_lib_size: 100
max_bivalent_fragment_lib_size: 200

remove_fused_aromatics: true

reaction: "make_ring_amide_1.rxn"
reaction: "make_ring_amide_2.rxn"
reaction: "fuse_biphenyl.rxn"
Expand Down
84 changes: 46 additions & 38 deletions docs/Molecule_Tools/minor_changes.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,54 @@
# minor_changes

## TLDR
## Objective

This tool uses rule based transformations to make small changes to starting
molecules. The most common usage is to help fill out an SAR around
a set of active molecules. The transformations applied might
include adding small fragments, removing CH2 groups, changing
a substitution pattern, etc... Since it applies all available
transformations exhaustively, potentially large numbers of molecules
can be formed.

Want a potentially large number of new molecules that are derived
from a set of starting molecules:
Taking the default, which turns on all known transformations that
do not depend on external fragments.
```
minor_changes.sh -c file.smi > file.variants.smi
minor_changes file.smi > file.variants.smi
```
generates 31k molecules from a starting set of 1000 in 1.2 seconds.

The rules applied can be controlled via a configuration file, which will
also enable specification of fragments to be added, and reactions to be
performed. A shell script (minor_changes.sh)[../../contrib/bin/minor_changes.sh]
invokes the executable with all options turned on, as well as making
available some reaction based transformations, as well as a small
fragment library (60 functional groups, max 3 atoms). Given
1000 starting molecules that generates 337k new molecules in about
15 seconds.

Probably generates more than you need. 500 random molecules from
ChEMBL took 30 minutes to generate 10.5M unique variants. There are
many ways in which these numbers can be reduced.
Obviously if a larger fragment library is used, those number can increase
dramatically - see below for how to generate fragment libraries.

## Introduction
The reactions specified with the script include a number of common
isostere type transformations. More could be added - definitely open
to ideas...

This tool is designed for making minor modifications to an existing molecule,
using simple transformations, as well as externally derived fragments. Many
attempts are made to have it generate plausible molecules, so the results
should mostly be reasonable, but this is definitely not guaranteed. Passing the
results through a synthetic feasibility assessment would generally be
desirable. Experience says that this may eliminate 80% of what is generated,
but this is obviously very dependent on the parameters selected.
## Details

The tool is designed for making minor modifications to an existing molecule,
using simple transformations, as well as externally derived fragments. Many
attempts are made to have it generate plausible molecules, so the results should
mostly be reasonable, but this is definitely not guaranteed. Passing the
results through a synthetic feasibility assessment would generally be desirable.
Experience says that this may eliminate a significant fraction of what is
generated, but this is obviously very dependent on the parameters selected.

## Complementary Tools
`ring_replacement` can be used for exploring different substitution patterns
in rings. By default, `minor_changes` does not swap atoms in an aromatic ring. Pipe
the output from `ring_replacement` to this tool in order to explore
further variants involving different ring atom arrangements.

Isostere replacement can be done with the reactions associated with the
`molecular_variants` tool - although some of those need to have the
reverse transformation also implemented. Again, a pipeline of generators
can be formed.

## Numbers
This tool can generate large numbers of molecules - largely driven
by the libraries of fragments specified, as noted previously. This
is despite the fact that the fragment libraries contained no more
than 4 atoms per molecule.

Again, it is recommended that the output be passed to a synthetic precedent
tool to eliminate unlikely atomic arrangements.

## Specifics.
The tool is complex, with a lot of choices made about how it behaves. For
Expand Down Expand Up @@ -80,21 +88,19 @@ swap_adjacent_atoms: true
swap_adjacent_aromatic_atoms: true
insert_fragments: true
replace_inner_fragments: true
remove_fused_aromatics: true
# These substantially cut the run time.
max_fragment_lib_size: 100
max_bivalent_fragment_lib_size: 200
```
The proto definition is [GitHub](https://github.com/EliLillyCo/LillyMolPrivate/blob/a6b84fa94d451438a6a16166c9eaf4b5f4c76d53/src/Molecule_Tools/minor_changes.proto#L1)

The instance above takes 52 seconds to generated 680k variants from 500 input
molecules.
The proto definition is [GitHub](../../src/Molecule_Tools/minor_changes.proto)

## Atom Typing.
It is recommended that the tool be used with atom typing enabled. As
seen in the proto above, I have been using `UST:AY` which classifies
atoms by their atomic number and aromaticity only. Other atom types
would clearly be possible.
If library fragments are being added, it is recommended that the tool be used
with atom typing enabled. As seen in the proto above, I have been using
`UST:AY` which classifies atoms by their atomic number and aromaticity only.
Other atom types would clearly be possible.

The reason to use atom types in the library is that ensures that the
fragment is joined to an atom similar to the atomic context from
Expand Down Expand Up @@ -123,7 +129,7 @@ atom types match, to the join point. `replace_terminal_fragments` directive.
Bivalent fragments can have either 1 or 2 attachment points. If there is
a single attachment point, it is inserted between two atoms, and each of
those two atoms bond to the same atom in the fragment. If there are two
attachment points, a bond in the parent molecle is selected and removed.
attachment points, a bond in the parent molecule is selected and removed.
The bivalent fragment is inserted by bonding (in both directions) to the
remaining atoms in the parent.

Expand Down Expand Up @@ -165,7 +171,7 @@ F[9001CH2]N iso: ATYPE smi: "F[9001CH2]N" par: "CHEMBL3897128" nat: 3 n: 1
```
Again, note the extra token in column 1. The isotopes are the atom types
of the atoms to which these fragments used to be attached. These numbers of
course make no sense on their own. If curios, use `fileconv` to get molecules
course make no sense on their own. If curious, use `fileconv` to get molecules
labelled by atom type
```
fileconv.sh -I atype -Y atype=UST:AY -S - file.smi
Expand Down Expand Up @@ -236,3 +242,5 @@ prolific variants are done last.
Note that even if a max number of variants is specified, you may get more
than that number produced, because the tool only checks periodically on
the number if items generated.

More information about the transformations are in the [proto](../../src/Molecule_Tools/minor_changes.proto)
Loading

0 comments on commit f8d0873

Please sign in to comment.