Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217

andrew0901 · 2024-12-05T22:22:00Z

Hi Chai team,

Thanks for the update for the covalent bond restraint. I'm wondering to specify a linkage between ligand and protein, how do I find the atom name/index to put in the restraint file for the ligand atom of interest? Or is the function for now just for glycan and other modification with standard names?

My second question: I realize Chai removes all hydrogens in the output ligand, which is, I think, due to rdkit's procedure reading the smiles string. But sometimes there are some important hydrogens, like metal-hydride, that I wish to be explicitly predicted. Is there any way to do that in the current version or near future?

wukevin · 2024-12-06T17:51:26Z

Regarding the linkage between ligand and protein, we do support arbitrary SMILES string in addition to standard CCD codes, and you can find the atom index by doing something like this, opening the resulting file in PyMOL (or something similar), clicking through, and finding the name of the atom that you wish to bind to:

def write_reference_conformer(query: str, query_type: str, output_file="ligand.pdb"):
    """Writes reference conformer of the given query to the given pdb file.

    NOTE: query_type must either be "ccd" or "smiles" and determine how we parse query.

    Meant as a research utility for quickly determining atom names and checking the
    conformer generated for a smiles string.
    """
    conformer_generator = RefConformerGenerator()

    if query_type == "ccd":
        conf_data = conformer_generator.get(query)
        assert conf_data is not None
    elif query_type == "smiles":
        conf_data = conformer_generator.generate(query)
    else:
        raise ValueError(f"Unrecognized {query_type=}")

    # log the atom names
    for i, n in enumerate(conf_data.atom_names):
        logging.info(f"Atom {i}:\t{n}")

    with open(output_file, "w") as f:
        # PDB format uses 1-based indexing for atom numbers
        for atom_index, (atom, coord) in enumerate(
            zip(conf_data.atom_names, conf_data.position), 1
        ):
            # Format each line according to PDB specifications
            line = (
                f"HETATM{atom_index:5d}  {atom:<3s} LIG A   1    "
                f"{coord[0]:8.3f}{coord[1]:8.3f}{coord[2]:8.3f}"
                f"  1.00  0.00          {atom:>2s}\n"
            )
            f.write(line)

        bonds_grouped = defaultdict(list)
        for i, j in sorted(conf_data.bonds):
            bonds_grouped[i + 1].append(j + 1)  # Converts 0 to 1 indexed
        for center_atom, connections in bonds_grouped.items():
            bonded_atoms = sorted(connections)
            while bonded_atoms:
                # Take up to 4 bonded atoms at a time
                current_bonds = bonded_atoms[:4]
                bonded_atoms = bonded_atoms[4:]

                # Format the CONECT record
                # The format requires each atom number to take exactly 5 columns
                line = f"CONECT{center_atom:5d}"
                for bonded_atom in current_bonds:
                    line += f"{bonded_atom:5d}"
                line += "\n"
                f.write(line)

        # Write the END record to properly terminate the PDB file
        f.write("END\n")

Regarding hydrogens - we remove all hydrogens during training, so while it might be possible to hack this in by changing the conformer generation logic, I wouldn't expect particularly good results when doing so. If you wish to hack it in, I would start by seeing what happens if you remove the call to remove hydrogens here:

chai-lab/chai_lab/data/sources/rdkit.py

Lines 95 to 97 in 3278943

    
           def _load_ref_conformer_from_rdkit(self, mol: Chem.Mol) -> ConformerData: 
        
               mol = Chem.RemoveAllHs(mol)

wukevin · 2024-12-12T22:02:21Z

We've also added an example of specifying covalently bound ligands via SMILES string here: https://github.com/chaidiscovery/chai-lab/tree/main/examples/covalent_bonds#non-glycan-ligands

andrew0901 · 2024-12-13T22:57:31Z

Thanks for providing the script. I ran it and here is the PDB I ended up with. In your example, you specified the sulfur atom as @s1. Does the "1" here means the first S atom in the atom order of the generated PDB? Because the index is not 1 for the desired atom in the output file.

HETATM 1 C LIG A 1 0.867 -0.164 -1.047 1.00 0.00 C
HETATM 2 C LIG A 1 0.155 -1.005 -1.898 1.00 0.00 C
HETATM 3 C LIG A 1 -1.020 -0.593 -2.489 1.00 0.00 C
HETATM 4 C LIG A 1 -1.470 0.679 -2.211 1.00 0.00 C
HETATM 5 C LIG A 1 -0.754 1.506 -1.363 1.00 0.00 C
HETATM 6 C LIG A 1 0.422 1.108 -0.765 1.00 0.00 C
HETATM 7 O LIG A 1 1.081 1.989 0.071 1.00 0.00 O
HETATM 8 C LIG A 1 2.271 1.682 0.745 1.00 0.00 C
HETATM 9 C LIG A 1 2.181 0.616 1.751 1.00 0.00 C
HETATM 10 O LIG A 1 3.235 0.274 2.391 1.00 0.00 O
HETATM 11 N LIG A 1 0.962 -0.037 2.033 1.00 0.00 N
HETATM 12 C LIG A 1 0.932 -1.083 3.030 1.00 0.00 C
HETATM 13 C LIG A 1 1.340 -0.610 4.406 1.00 0.00 C
HETATM 14 S LIG A 1 1.246 -2.029 5.541 1.00 0.00 S
HETATM 15 CL LIG A 1 -2.960 1.237 -2.944 1.00 0.00 CL
HETATM 16 CL LIG A 1 -1.907 -1.671 -3.564 1.00 0.00 CL
CONECT 1 2
CONECT 2 3
CONECT 3 4 16
CONECT 4 5 15
CONECT 5 6
CONECT 6 1 7
CONECT 7 8
CONECT 8 9
CONECT 9 10 11
CONECT 11 12
CONECT 12 13
CONECT 13 14
END

wukevin · 2024-12-14T01:40:43Z

Thanks for providing the script. I ran it and here is the PDB I ended up with. In your example, you specified the sulfur atom as @s1. Does the "1" here means the first S atom in the atom order of the generated PDB? Because the index is not 1 for the desired atom in the output file.

Atom names are assigned in order of occurrence in the reference conformer, so @S1 indicates the first sulfur in the reference conformer.

In the general case, you can hover over each atom in something like PyMOL to find the atom name we assign to it, and change which atom you specify in your restraint accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217

Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217

andrew0901 commented Dec 5, 2024

wukevin commented Dec 6, 2024 •

edited

Loading

wukevin commented Dec 12, 2024 •

edited

Loading

andrew0901 commented Dec 13, 2024

wukevin commented Dec 14, 2024

Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217

Atom index of the ligand in covalent bond restraint. And explicit hydrogen #217

Comments

andrew0901 commented Dec 5, 2024

wukevin commented Dec 6, 2024 • edited Loading

wukevin commented Dec 12, 2024 • edited Loading

andrew0901 commented Dec 13, 2024

wukevin commented Dec 14, 2024

wukevin commented Dec 6, 2024 •

edited

Loading

wukevin commented Dec 12, 2024 •

edited

Loading