How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? #4

Frank-LIU-520 · 2019-03-06T05:24:14Z

If we only have the reaction
'CH2:15[Mg+:19].[CH2:20]1[O:21][CH2:22][CH2:23][CH2:24]1.[Cl-:14].[OH:1][c:2]1[n:3][cH:4]c:5[cH:12][cH:13]1>>[OH:1][c:2]1[n:3][cH:4]c:5[cH:12][cH:13]1 ' ,
how can we get the bond changing information 6-8-0.0;15-6-1.0;15-19-0.0 at the end of reaction in datesets.
That's important for us to train the neural networks.
Can you show us your codes and post them to your rexgen_direct issue?
Much thanks.

connorcoley · 2019-03-06T15:55:31Z

Hi @mario-liu ,

You only need the atom-mapped reaction SMILES as input (note: what you've written isn't valid SMILES due to missing square brackets). The script at rexgen_direct/scripts/prep_data.py takes care of preparing the bond changing information. This is the difference between the example data files that do or do not contain .proc

Frank-LIU-520 · 2019-03-07T17:08:51Z

Much thanks for your reply.Here is another question:
What if part of the molecules like C1(C)C=CC=CC=1 and C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1 does not have the mapped atom numbers in your own reaction like
C1(C)C=CC=CC=1.[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:12]=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])[CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1>>[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[CH:12]=O)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1.C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1 ?
When I run rexgen_direct/scripts/prep_data.py ,it gets

 line 22, in get_changed_bonds
    [bond.GetBeginAtom().GetProp('molAtomMapNumber'),
KeyError: 'molAtomMapNumber'

How can I solve this?

connorcoley · 2019-03-07T18:12:18Z

You'll need to complete the atom mapping on the product side for this code; that is a consequence of using this graph-based representation where we enforce atom conservation and thus need to understand the atom-to-atom correspondence. Any reactant leaving groups can be assigned dummy atom map numbers that do not appear in the products.

There are a number of ways to assign atom mapping. The USPTO dataset was prepared (not by me) using Indigo. This is an open source toolkit and should be straightforward to apply as a pre-processing step

WYejian · 2020-07-18T15:23:33Z

How can I enforce atom conservation?

WYejian · 2020-07-19T06:10:44Z

非常感谢您的回复。这里有另一个问题
：如果像分子的一部分没有映射的原子数在你自己的反应像呢？

当我跑的时候，它得到C1(C)C=CC=CC=1 and C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1``C1(C)C=CC=CC=1.[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:12]=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])[CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1>>[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[CH:12]=O)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1.C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1``rexgen_direct/scripts/prep_data.py
 line 22, in get_changed_bonds
    [bond.GetBeginAtom().GetProp('molAtomMapNumber'),
KeyError: 'molAtomMapNumber'
我怎样才能解决这个问题？

Did you solve this problem？

connorcoley · 2020-07-19T16:35:04Z

Hi @WYejian , completing the atom mapping so that all product atoms are mapped will solve this problem

WYejian · 2020-07-20T01:04:17Z

Hi @WYejian , completing the atom mapping so that all product atoms are mapped will solve this problem

Thanks for your reply. I want to know how you numbered the unmapped leaving group in the reaction.

connorcoley · 2020-07-20T01:09:39Z

Leaving groups in the reactants can have arbitrary (but unique) atom map numbers that do not appear in the product

WYejian · 2020-07-20T01:18:35Z

Leaving groups in the reactants can have arbitrary (but unique) atom map numbers that do not appear in the product

Can indigo be used to number the atoms of the leaving group？

Frank-LIU-520 · 2020-07-20T01:28:41Z

Leaving groups in the reactants can have arbitrary (but unique) atom map numbers that do not appear in the product

Can indigo be used to number the atoms of the leaving group？

Yes, and there are other papers and algrithoms to do this. Check it with google.

WYejian · 2020-07-20T05:52:53Z

Can i add your wechat？ This is my WeChat ID:1527808346 发自我的iPhone

…

------------------ Original ------------------ From: Mario-Liu <[email protected]> Date: 周一,7月 20,2020 09:28 To: connorcoley/rexgen_direct <[email protected]> Cc: WYejian <[email protected]>, Mention <[email protected]> Subject: Re: [connorcoley/rexgen_direct] How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? (#4) Leaving groups in the reactants can have arbitrary (but unique) atom map numbers that do not appear in the product Can indigo be used to number the atoms of the leaving group？ Yes, and there are other papers and algrithoms to do this. Check it with google. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

liuyifei11 · 2020-07-23T02:15:19Z

Much thanks for your reply.Here is another question:
What if part of the molecules like does not have the mapped atom numbers in your own reaction like
?
When I run ,it getsC1(C)C=CC=CC=1 and C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1``C1(C)C=CC=CC=1.[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:12]=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])[CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1>>[CH2:1]([O:8][C:9]1[CH:10]=[C:11]([CH:14]=[C:15]([O:17][CH3:18])[CH:16]=1)[CH:12]=O)[C:2]1[CH:7]=[CH:6][CH:5]=[CH:4][CH:3]=1.C1(P(=[CH:38][C:39]([O:41][CH2:42][CH3:43])=[O:40])(C2C=CC=CC=2)C2C=CC=CC=2)C=CC=CC=1``rexgen_direct/scripts/prep_data.py
 line 22, in get_changed_bonds
    [bond.GetBeginAtom().GetProp('molAtomMapNumber'),
KeyError: 'molAtomMapNumber'

Have you solved this problem? I still can't solve it, can you help me?

liuyifei11 · 2020-07-23T02:18:27Z

You'll need to complete the atom mapping on the product side for this code; that is a consequence of using this graph-based representation where we enforce atom conservation and thus need to understand the atom-to-atom correspondence. Any reactant leaving groups can be assigned dummy atom map numbers that do not appear in the products.

There are a number of ways to assign atom mapping. The USPTO dataset was prepared (not by me) using Indigo. This is an open source toolkit and should be straightforward to apply as a pre-processing step

I still don't know how to number the unmapped atoms, do you have any method?

connorcoley · 2020-07-23T13:31:49Z

See "Reaction Atom-to-Atom Mapping" in Indigo

Frank-LIU-520 · 2020-07-23T13:39:56Z

You'll need to complete the atom mapping on the product side for this code; that is a consequence of using this graph-based representation where we enforce atom conservation and thus need to understand the atom-to-atom correspondence. Any reactant leaving groups can be assigned dummy atom map numbers that do not appear in the products.
There are a number of ways to assign atom mapping. The USPTO dataset was prepared (not by me) using Indigo. This is an open source toolkit and should be straightforward to apply as a pre-processing step

I still don't know how to number the unmapped atoms, do you have any method?

There are many other algorithoms to define atom mapping. Indigo is not the best choice already.
see "ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions" in JCIM for clear comparison

liuyifei11 · 2020-08-05T05:27:19Z

I would like to ask whether the graph convolutional neural networks can be used to predict compounds with chirality？

connorcoley · 2020-08-05T13:50:06Z

Currently, the code is designed to work only with achiral compounds. Some additional comments:

Chirality can be included as an atom-level feature, although one can debate how meaningful this representation is
The function that applies predicted graph edits could be changed to ensure that tetrahedral chirality is preserved upon reacting
Handling changes in chirality (introduction or inversion of a tetrahedral center or introduction of cis/trans isomerism) would require allowing an additional type of "graph edit" to be predicted

liuyifei11 · 2020-08-05T13:53:34Z

Thank you for your reply On 08/05/2020 21:50, Connor Coley wrote: Currently, the code is designed to work only with achiral compounds. Some additional comments: Chirality can be included as an atom-level feature, although one can debate how meaningful this representation is The function that applies predicted graph edits could be changed to ensure that tetrahedral chirality is preserved upon reacting Handling changes in chirality (introduction or inversion of a tetrahedral center or introduction of cis/trans isomerism) would require allowing an additional type of "graph edit" to be predicted — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

crazythor123 · 2020-08-25T18:07:28Z

Hi I have tried Indigo for atom mapping and I still get the error in preparing data step:

File "", line 28, in get_changed_bonds
[bond.GetBeginAtom().GetProp('molAtomMapNumber'),

KeyError: 'molAtomMapNumber'

Here is my code for preparing the atom mapping in indigo:
rxn = indigo.createReaction();
rxn.addReactant(mol1);
rxn.addProduct(mol2);
rxn.automap("discard");
result = rxn.smiles()

Here is my input reaction:'C(=O)(N([H])[H])N([H])C(=O)O[H]>>C(=O)=O'

And here is the output of indigo: 'C(=[O:4])(N([H])[H])N([H])C:5O[H]>>C:5=[O:4]'

Can you give me some idea about how to fix this or what alternative software I can use to solve this?
Thanks a lot!
@mario-liu @liuyifei11 @wengong-jin @connorcoley @WYejian

connorcoley · 2020-08-26T14:55:47Z

These tools aren't meant to work with explicit hydrogen atoms in your SMILES strings.

Are you sure that's the direct output of Indigo? That isn't a well-formed SMILES string at all. Ignoring the atom mapping, neither the reactant side or the product side are valid SMILES.

crazythor123 · 2020-08-27T11:45:11Z

Thanks for your reply.
I just recheck Indigo and I think I forget to choose canonicalSmiles. After choose that I get:
'[O:1]=C:3[OH:7]>>[O:1]=[C:3]=[O:2]'
Which I think will be a valid SMILE string for the software input.
Thanks again for the quick response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? #4

How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? #4

Frank-LIU-520 commented Mar 6, 2019

connorcoley commented Mar 6, 2019

Frank-LIU-520 commented Mar 7, 2019

connorcoley commented Mar 7, 2019

WYejian commented Jul 18, 2020

WYejian commented Jul 19, 2020

connorcoley commented Jul 19, 2020

WYejian commented Jul 20, 2020

connorcoley commented Jul 20, 2020

WYejian commented Jul 20, 2020

Frank-LIU-520 commented Jul 20, 2020

WYejian commented Jul 20, 2020 via email

liuyifei11 commented Jul 23, 2020

liuyifei11 commented Jul 23, 2020

connorcoley commented Jul 23, 2020

Frank-LIU-520 commented Jul 23, 2020

liuyifei11 commented Aug 5, 2020

connorcoley commented Aug 5, 2020 •

edited

Loading

liuyifei11 commented Aug 5, 2020 via email

crazythor123 commented Aug 25, 2020

connorcoley commented Aug 26, 2020

crazythor123 commented Aug 27, 2020

How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? #4

How can we get the bond breaking and forming information (numbers at the end of each reaction) from our own smiles datesets? #4

Comments

Frank-LIU-520 commented Mar 6, 2019

connorcoley commented Mar 6, 2019

Frank-LIU-520 commented Mar 7, 2019

connorcoley commented Mar 7, 2019

WYejian commented Jul 18, 2020

WYejian commented Jul 19, 2020

connorcoley commented Jul 19, 2020

WYejian commented Jul 20, 2020

connorcoley commented Jul 20, 2020

WYejian commented Jul 20, 2020

Frank-LIU-520 commented Jul 20, 2020

WYejian commented Jul 20, 2020 via email

liuyifei11 commented Jul 23, 2020

liuyifei11 commented Jul 23, 2020

connorcoley commented Jul 23, 2020

Frank-LIU-520 commented Jul 23, 2020

liuyifei11 commented Aug 5, 2020

connorcoley commented Aug 5, 2020 • edited Loading

liuyifei11 commented Aug 5, 2020 via email

crazythor123 commented Aug 25, 2020

connorcoley commented Aug 26, 2020

crazythor123 commented Aug 27, 2020

connorcoley commented Aug 5, 2020 •

edited

Loading