-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217
Comments
Regarding the linkage between ligand and protein, we do support arbitrary SMILES string in addition to standard CCD codes, and you can find the atom index by doing something like this, opening the resulting file in PyMOL (or something similar), clicking through, and finding the name of the atom that you wish to bind to: def write_reference_conformer(query: str, query_type: str, output_file="ligand.pdb"):
"""Writes reference conformer of the given query to the given pdb file.
NOTE: query_type must either be "ccd" or "smiles" and determine how we parse query.
Meant as a research utility for quickly determining atom names and checking the
conformer generated for a smiles string.
"""
conformer_generator = RefConformerGenerator()
if query_type == "ccd":
conf_data = conformer_generator.get(query)
assert conf_data is not None
elif query_type == "smiles":
conf_data = conformer_generator.generate(query)
else:
raise ValueError(f"Unrecognized {query_type=}")
# log the atom names
for i, n in enumerate(conf_data.atom_names):
logging.info(f"Atom {i}:\t{n}")
with open(output_file, "w") as f:
# PDB format uses 1-based indexing for atom numbers
for atom_index, (atom, coord) in enumerate(
zip(conf_data.atom_names, conf_data.position), 1
):
# Format each line according to PDB specifications
line = (
f"HETATM{atom_index:5d} {atom:<3s} LIG A 1 "
f"{coord[0]:8.3f}{coord[1]:8.3f}{coord[2]:8.3f}"
f" 1.00 0.00 {atom:>2s}\n"
)
f.write(line)
bonds_grouped = defaultdict(list)
for i, j in sorted(conf_data.bonds):
bonds_grouped[i + 1].append(j + 1) # Converts 0 to 1 indexed
for center_atom, connections in bonds_grouped.items():
bonded_atoms = sorted(connections)
while bonded_atoms:
# Take up to 4 bonded atoms at a time
current_bonds = bonded_atoms[:4]
bonded_atoms = bonded_atoms[4:]
# Format the CONECT record
# The format requires each atom number to take exactly 5 columns
line = f"CONECT{center_atom:5d}"
for bonded_atom in current_bonds:
line += f"{bonded_atom:5d}"
line += "\n"
f.write(line)
# Write the END record to properly terminate the PDB file
f.write("END\n") Regarding hydrogens - we remove all hydrogens during training, so while it might be possible to hack this in by changing the conformer generation logic, I wouldn't expect particularly good results when doing so. If you wish to hack it in, I would start by seeing what happens if you remove the call to remove hydrogens here: chai-lab/chai_lab/data/sources/rdkit.py Lines 95 to 97 in 3278943
|
We've also added an example of specifying covalently bound ligands via SMILES string here: https://github.com/chaidiscovery/chai-lab/tree/main/examples/covalent_bonds#non-glycan-ligands |
Thanks for providing the script. I ran it and here is the PDB I ended up with. In your example, you specified the sulfur atom as @s1. Does the "1" here means the first S atom in the atom order of the generated PDB? Because the index is not 1 for the desired atom in the output file.
|
Atom names are assigned in order of occurrence in the reference conformer, so In the general case, you can hover over each atom in something like PyMOL to find the atom name we assign to it, and change which atom you specify in your restraint accordingly. |
Hi Chai team,
Thanks for the update for the covalent bond restraint. I'm wondering to specify a linkage between ligand and protein, how do I find the atom name/index to put in the restraint file for the ligand atom of interest? Or is the function for now just for glycan and other modification with standard names?
My second question: I realize Chai removes all hydrogens in the output ligand, which is, I think, due to rdkit's procedure reading the smiles string. But sometimes there are some important hydrogens, like metal-hydride, that I wish to be explicitly predicted. Is there any way to do that in the current version or near future?
The text was updated successfully, but these errors were encountered: