Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong outcome of RunReaction depending on atom numbering #39

Open
hesther opened this issue Jan 17, 2023 · 5 comments
Open

Wrong outcome of RunReaction depending on atom numbering #39

hesther opened this issue Jan 17, 2023 · 5 comments

Comments

@hesther
Copy link

hesther commented Jan 17, 2023

Hi, I just noticed that the outcome of RunReaction depends on the atom ordering of the input molecule for some edge cases.

For example, applying the template '[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH2;D2;+0:5]-[OH;D1;+0:6]>>[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH;D2;+0:5]=[O;H0;D1;+0:6]' (a simple reduction of an aldehyde to an alcohol next to a double bond, which does not change the double bond or its stereochemistry)
to the molecule '[CH3:1][C:2]1=[C:3](/[CH:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15])[C:16]([CH3:17])([CH3:18])[CH2:19][CH2:20][CH2:21]1'
correctly produces 'CC1=C(/C=C/C(C)=C/C=C/C(C)=C/C=O)C(C)(C)CCC1'

while applying the same template to the molecule '[CH3:1][C:13](=[CH:28]\\[CH:48]=[CH:8]\\[C:34](=[CH:21]\\[CH2:49][OH:6])[CH3:35])/[CH:37]=[CH:15]/[C:29]1=[C:2]([CH3:44])[CH2:11][CH2:16][CH2:41][C:39]1([CH3:12])[CH3:24]' (exactly the same as above, only different atom numbering)
wrongly produces 'CC1=C(/C=C/C(C)=C/C=C/C(C)=C\\C=O)C(C)(C)CCC1' (i.e. flipped a trans to a cis bond)

Code to reproduce:

import random
from rdkit import Chem
from rdchiral.main import rdchiralRun, rdchiralReaction, rdchiralReactants

smi = '[CH3:1][C:2]1=[C:3](/[CH:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15])[C:16]([CH3:17])([CH3:18])[CH2:19][CH2:20][CH2:21]1'
template = '[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH2;D2;+0:5]-[OH;D1;+0:6]>>[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH;D2;+0:5]=[O;H0;D1;+0:6]'

outcomes = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi), combine_enantiomers=False)
print("Case 1, right outcome:",outcomes)

#Change atom map numbers (but not map numbers 0)
mol = Chem.MolFromSmiles(smi)
x=[x for x in range(1,50)]
random.Random(0).shuffle(x)
x=[0]+x
for a in mol.GetAtoms():
    a.SetAtomMapNum(x[a.GetAtomMapNum()])
smi=Chem.MolToSmiles(mol)

outcomes = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi), combine_enantiomers=False)
print("Case 2, wrong outcome:",outcomes)

Is there any way this could be fixed in the near future? (Am relying on rdchiral for a big project)

@connorcoley
Copy link
Owner

That is a very interesting edge case, and an unfortunate bug. It's not immediately clear to me why this would be the case since the bond direction map is built-up from the atom mapping (here). I don't believe RDKit would be making any assumptions or errors about atom map numbers and the directionality of the bond in terms of ENDDOWNRIGHT and ENDUPRIGHT flags. It seems like something must be wrong in my possible_defs definition that is only triggered when the atom mapping is "unusual".

I'm afraid that we don't have anyone actively working on RDChiral at the moment. I will tag @ljn917 in case this is something he encountered when working on the C++ version. My suggestion would otherwise be to drop in some debugging statements into the function that restores bond stereochemistry in that bonds.py file to start to narrow down where the bug is coming from

@ljn917
Copy link
Contributor

ljn917 commented Jan 18, 2023

I confirm that the C++ version also has this bug. I encountered an rdkit bug where renumbering changes double bond directions, though I don't think they are related.

I tried to narrow it down a bit by simplifying the reactant structure, but find more interesting behaviors. Please see the script below. It was run with the C++ version. In summary, to trigger this bug, a second double bond (between mapno 9 and 10) must exist, and the atom 9 (mapno) must be substituted, no matter the substituents are symmetric or not.

Update: the two double bonds must be next to each other (in conjugation).

#!/usr/bin/env python3

import random

from rdkit import Chem
from rdchiral.main import rdchiralRun, rdchiralReaction, rdchiralReactants

random.seed(0)

def randomize_mapno(s: str):
    mol = Chem.MolFromSmiles(s)
    x=[x for x in range(1,50)]
    random.shuffle(x)
    x=[0]+x
    for a in mol.GetAtoms():
        a.SetAtomMapNum(x[a.GetAtomMapNum()])
    return Chem.MolToSmiles(mol)

# smi1 = r'[CH3:9][CH2:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 5: correct behavior
# smi1 = r'[CH2:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 4: correct behavior
smi1 = r'[CH3:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 3: incorrect behavior
#smi1 = r'[CH3:8][CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 3-1: incorrect behavior
# smi1 = r'[CH2:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15]' # case 2: incorrect behavior
# smi1 = '[CH3:1][C:2]1=[C:3](/[CH:4]=[CH:5]/[C:6]([CH3:7])=[CH:8]/[CH:9]=[CH:10]/[C:11]([CH3:12])=[CH:13]/[CH2:14][OH:15])[C:16]([CH3:17])([CH3:18])[CH2:19][CH2:20][CH2:21]1' # case 1: incorrect behavior

template = r'[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH2;D2;+0:5]-[OH;D1;+0:6]>>[C:1]/[C:2](-[C;D1;H3:3])=[C:4]/[CH;D2;+0:5]=[O;H0;D1;+0:6]'

outcomes1 = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi1), combine_enantiomers=False)
print("right outcome:",outcomes1)

#smi2 = r'[CH3:48][CH2:8]\[C:34](=[CH:21]\[CH2:49][OH:6])[CH3:35]'
for cnt in range(100000):
    smi2 = randomize_mapno(smi1)
    outcomes2 = rdchiralRun(rdchiralReaction(template), rdchiralReactants(smi2), combine_enantiomers=False)
    # print(smi2, outcomes2[0])
    if outcomes1[0] != outcomes2[0]:
        print(f'wrong, smi2={smi2}, outcomes2={outcomes2[0]}')
    else:
        # print('correct')
        pass

@hesther
Copy link
Author

hesther commented Jan 18, 2023

That's interesting ... I will run a couple of tests tonight, let's see whether we can narrow down which part of the code it comes from (shouldn't be too hard with a couple of print statements)

@hesther
Copy link
Author

hesther commented Jan 25, 2023

Update: Identified the issue. If a template specifies one or multiple double bonds that are in a conjugated system and reverses a bond direction (which happens based on the atom order which in turn depends on the map number) we do not reverse all other bond directions in the conjugated system (which would be the correct thing to do). So the copied over bond directions in unchanged parts of the molecule might not match the bond directions set by the template if the system is conjugated. I am now working on a fix (check whether a conjugated system exists and whether changes were made to any bond, if so, correct all bond directions). Will post a PR in the next days

@connorcoley
Copy link
Owner

connorcoley commented Jan 25, 2023

Very interesting & good catch! While the test cases are not that exhaustive, would you also be able to add this example into the list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants