Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models requiring more information #30

Open
peastman opened this issue Jun 9, 2022 · 2 comments
Open

Models requiring more information #30

peastman opened this issue Jun 9, 2022 · 2 comments

Comments

@peastman
Copy link
Member

peastman commented Jun 9, 2022

createSystem() takes a Topology as its input. That's fine for ANI, but some models will require more information. We'll definitely need formal charges, and we might need hybridization states or bond orders. We need to extend the API in some way to allow this information to be determined. Here are some ideas for approaches we might take. These aren't exclusive. We could do more than one of them.

We could extend Topology to store more information. It already can store bond orders, and it would be easy to add formal charges. On its own, this is only one piece of a solution. It still leaves the problem of where to get the information from. None of the standard file formats OpenMM imports contains it. That's why in practice Topologies never specify bond orders, even though in principle they can.

We could copy the approach used by ForceField. It would be easy to create a pseudo-forcefield that would fill in chemical information for standard residues by matching templates. Nonstandard ones could be specified manually similar to the way SMIRNOFFTemplateGenerator works.

Another possibility is to allow createSystem() to accept an OpenFF Topology in place of the OpenMM Topology. It already provides mechanisms for building descriptions of the relevant information. Models that don't need additional information would work with either type of topology.

We also could try to determine the missing information automatically based on what we do have (elements, bonds, positions). RDKit can do this. In practice I find it isn't very robust, though, so this probably isn't a good idea.

@jchodera
Copy link
Member

We'll definitely need formal charges, and we might need hybridization states or bond orders.

There are several important caveats here:

  • Requiring information about bonds would prevent us from being able to treat systems where bonds are dynamic
  • Requiring information about bond orders further ties us to a specific aromaticity model, of which there are many. Different cheminformatics toolkits will provide different models.
  • Requiring information about formal charges also breaks any chance of ensuring chemically equivalent atoms are treated equivalently, since simple resonance structures (e.g. of guanidinium) change formal charges on atoms.

It's worth asking whether we really need to support these things, since it (1) carries significant infrastructure burden, (2) essentially requires we carry over all of the awful things about MM into the QML world, hindering rapid progress.

We could extend Topology to store more information. It already can store bond orders, and it would be easy to add formal charges. On its own, this is only one piece of a solution. It still leaves the problem of where to get the information from. None of the standard file formats OpenMM imports contains it. That's why in practice Topologies never specify bond orders, even though in principle they can.

We would still need to choose (and possibly implement) a single aromaticity model even if we did this, which carries huge infrastructure costs.

We could copy the approach used by ForceField. It would be easy to create a pseudo-forcefield that would fill in chemical information for standard residues by matching templates. Nonstandard ones could be specified manually similar to the way SMIRNOFFTemplateGenerator works.

This would bring over significant limitations from MM into the QML regime.

Another possibility is to allow createSystem() to accept an OpenFF Topology in place of the OpenMM Topology. It already provides mechanisms for building descriptions of the relevant information. Models that don't need additional information would work with either type of topology.

The OpenFF molecular topology representations have solved some, but not all, of these problems: They adopt a single aromaticity model implemented (almost consistently) in multiple cheminformatics toolkits, and provide access to bond orders and formal charges. The formal charges are not unique with resonance form, creating the chemical equivalence problem. And, for biopolymers, they use templates from the PDB Chemical Component Dictionary to do template matching to determine bond orders, which is also a significant burden.

We should ask ourselves: Do we really want this? Or do we want to focus on architectures that do not require this legacy information, freeing ourselves of the problems inherent to MM potentials?

@peastman
Copy link
Member Author

It's not so much a question of what we want, but rather of what a particular potential function requires. For example, if a particular potential function requires information about bonds, it clearly won't be able to model chemical reactions. But there are lots of applications where that's fine, and there are likely to be potential functions designed for that purpose. If we have no way to specify bond information, it will be impossible for us to support those potentials.

Some potentials will require nothing but elements, and those ones will be easy to support. But we don't want to be limited to only those ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants