-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models requiring more information #30
Comments
There are several important caveats here:
It's worth asking whether we really need to support these things, since it (1) carries significant infrastructure burden, (2) essentially requires we carry over all of the awful things about MM into the QML world, hindering rapid progress.
We would still need to choose (and possibly implement) a single aromaticity model even if we did this, which carries huge infrastructure costs.
This would bring over significant limitations from MM into the QML regime.
The OpenFF molecular topology representations have solved some, but not all, of these problems: They adopt a single aromaticity model implemented (almost consistently) in multiple cheminformatics toolkits, and provide access to bond orders and formal charges. The formal charges are not unique with resonance form, creating the chemical equivalence problem. And, for biopolymers, they use templates from the PDB Chemical Component Dictionary to do template matching to determine bond orders, which is also a significant burden. We should ask ourselves: Do we really want this? Or do we want to focus on architectures that do not require this legacy information, freeing ourselves of the problems inherent to MM potentials? |
It's not so much a question of what we want, but rather of what a particular potential function requires. For example, if a particular potential function requires information about bonds, it clearly won't be able to model chemical reactions. But there are lots of applications where that's fine, and there are likely to be potential functions designed for that purpose. If we have no way to specify bond information, it will be impossible for us to support those potentials. Some potentials will require nothing but elements, and those ones will be easy to support. But we don't want to be limited to only those ones. |
createSystem()
takes a Topology as its input. That's fine for ANI, but some models will require more information. We'll definitely need formal charges, and we might need hybridization states or bond orders. We need to extend the API in some way to allow this information to be determined. Here are some ideas for approaches we might take. These aren't exclusive. We could do more than one of them.We could extend Topology to store more information. It already can store bond orders, and it would be easy to add formal charges. On its own, this is only one piece of a solution. It still leaves the problem of where to get the information from. None of the standard file formats OpenMM imports contains it. That's why in practice Topologies never specify bond orders, even though in principle they can.
We could copy the approach used by ForceField. It would be easy to create a pseudo-forcefield that would fill in chemical information for standard residues by matching templates. Nonstandard ones could be specified manually similar to the way SMIRNOFFTemplateGenerator works.
Another possibility is to allow
createSystem()
to accept an OpenFF Topology in place of the OpenMM Topology. It already provides mechanisms for building descriptions of the relevant information. Models that don't need additional information would work with either type of topology.We also could try to determine the missing information automatically based on what we do have (elements, bonds, positions). RDKit can do this. In practice I find it isn't very robust, though, so this probably isn't a good idea.
The text was updated successfully, but these errors were encountered: