Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to decide on values of NK and NK0? #24

Open
gnsrivastava opened this issue Feb 17, 2021 · 11 comments
Open

How to decide on values of NK and NK0? #24

gnsrivastava opened this issue Feb 17, 2021 · 11 comments

Comments

@gnsrivastava
Copy link

Hello Dr Coley,

In your scripts you have used NK and NK0 as 20 and 10, respectively. NK and NK0 are used for reporting accuracies during training. NK is used to set the number of edits included in the output file during inference.
I was wondering if I should keep the NK and NK0 values the same?
I was hoping if you can elaborate on how you decided these values?

Gopal

PS: I have some biological reactions with total number of bond changes ~= 50 or more.

@connorcoley
Copy link
Owner

These values are somewhat arbitrary; changing NK (number of different bond changes considered) will improve the coverage from the first step, but will make the number of candidates after enumeration much larger.

If there are 50 or more bond changes in some of the reactions you're interested in, I'd probably suggest that this isn't the right tool. I'm not sure what reactions you're working with, but it's unlikely they are single-step with contiguous reaction centers

@YH-88
Copy link

YH-88 commented Feb 19, 2021

Hi. I have some chemical reactions with total number of bond changes <= 20. Should kmax be set to 20? In addition, when training the WLN model, set NK0=25 and NK=35. When testing the WLN model, set the NK values to range from 40 to 100. Is this the right way to determine the values of NK and NK0?

@connorcoley
Copy link
Owner

Those changes would theoretically work, but I'm afraid that the number of candidates generated after the first step will be impractically large. The combinatorial enumeration will lead to a huge number of candidates. I would suggest testing this with a very small batch size before committing to this approach

@YH-88
Copy link

YH-88 commented Feb 20, 2021

I got it. Thanks a lot.

@gnsrivastava
Copy link
Author

Hello Dr Coley,
when I am training rank diff wln using my data, I am getting following warning.
warning! could not recover true smiles from gbonds:
Could you tell me what "true smiles" mean?

I apologize for trivial questions.

Gopal

@connorcoley
Copy link
Owner

The true SMILES would be whatever is provided in the dataset as the ground truth answer, for example, in data/test.txt.proc

@YH-88
Copy link

YH-88 commented Apr 21, 2021

Hello Dr Coley,

I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated?
Thanks a lot.

@connorcoley
Copy link
Owner

Hello Dr Coley,

I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated?
Thanks a lot.

Yes, that's the intended use case. All you need to use the trained model is the reactant SMILES!

@YH-88
Copy link

YH-88 commented Apr 22, 2021

Thank you. But the input of the initial model is the reaction with atomic mapping, how can we do atomic mapping without products and get the reaction center?

@connorcoley
Copy link
Owner

You can use the fully trained model to predict outcomes by following the example at the end of rexgen_direct/rank_diff_wln/directcandranker.py

@YH-88
Copy link

YH-88 commented Apr 23, 2021

Thank you. I got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants