Skip to content

Latest commit

 

History

History
322 lines (240 loc) · 10.8 KB

doc_mol22lt.md

File metadata and controls

322 lines (240 loc) · 10.8 KB

mol22lt.py

mol22lt.py is a program for converting MOL2 files into moltemplate (LT) file format.

WARNING: BETA SOFTWARE. THIS SOFTWARE IS EXPERIMENTAL AS OF 2024-12-05

Usage:

   mol22lt.py \
      --in FILE.MOL2 \
      --out FILE.LT \
      [--name MOLECULE_NAME] \
      [--charges charges.txt] \
      [--ff FORCE_FIELD_NAME] \
      [--ff-file FORCE_FIELD_FILE_NAME]

Example:

Convert polyphenylene sulfide (PPS) polymer (stored in a file named "PPS_5mer.mol2") into moltemplate format:

   mol22lt.py \
      --in PPS_5mer.mol2 \
      --out PPS_5mer.lt \
      --name PPS5 \
      --ff GAFF2 \
      --ff-file "gaff2.lt"

Later on, you would use this "PPS_5mer.lt" file we just created by referring to it in another file (usually "system.lt"). Here is an example "system.lt" file which uses the "PPS_5mer.lt" file we just created:

import "PPS_5mer.lt"
pps5_copy = new PPS5  # (instantiate a single copy of the "PPS5" polymer)

To make multiple copies of "PPS5", you could use:

import "PPS_5mer.lt"
pps5_copy1 = new PPS5.move(-24.7, -3.9, -4.3)
pps5_copy2 = new PPS5.move(-21.3, 1.9, 0.7)

To prepare a LAMMPS simulation, we would enter this command into the terminal:

moltemplate.sh system.lt

(Once defined, molecules (like "PPS5") can be customized and combined with (bonded to) other molecules, as demonstrated in the moltemplate manual.)

WARNING: THIS SOFTWARE DOES NOT WORK WITH MULTIPLE CHAINS

This software does not work with MOL2 files containing multiple "chains". ("Chains" are optional features located in the SUBSTRUCTURE section of some MOL2 files.) However there is a manual workaround. (See below.)

Details

The MOL2 file format is a versatile file generated by many popular molecular simulation software tools (including AmberTools, Gaussian, OpenBabel, and the RED-server).

This program will extract the following information from a MOL2 file, converting the result to a moltemplate LT file (using the "full" atom-style).

  • charge (column 9 of the ATOM section)
  • atom-names (column 2 of the ATOM section)
  • XYZ coordinates (columns 3,4,5 of the ATOM section)
  • atom-type (column 6 of the ATOM section)
  • subunit-id (column 7 of the ATOM section)
  • subunit-name (column 8 of the ATOM section)
  • bonds (columns 2 and 3 from the BOND section)

This program will IGNORE the following information in a MOL2 file:

  • any information not contained in the ATOM or BOND sections
  • atom id (column 1 from the ATOM section)
  • bond id (column 1 from the BOND section)
  • bond type (column 4 from the BOND section)
  • "chain" (subunit/substructure ID numbers are considered, but not the "chain")
  • status bits (columns 10 and 5 from the ATOM and BOND sections, respectively)

If the MOL2 file contains multiple subunits a new molecule-object definition will be created for each subunit. In that case, if you want the entire system to be stored in a single molecule definition, use the --name argument. (See below.)

MOL2 file format requirements

  • The atom-names (2nd column) must be unique within each molecular subunit.

  • All of the atom-ID numbers and subunit-ID numbers in the file must be unique and begin at 1 (although the order can vary).

Force Fields

The atom type names (column 6 of the MOL2 file) may correspond to atom types used by popular force-fields (such as AMBER GAFF or GAFF2). If you want to use these force fields in your simulations, you must let moltemplate know the name of force field and the file that stores the force field parameters using the --ff and --ff-file arguments. (Example: "--ff GAFF2 --ff-file gaff2.lt")

Molecular Subunits

LT files are typically used to store (one or more) molecule type definitions (or monomers or other types of molecular subunits). The LT files generated by mol22lt.py contain definitions of all of the molecules or molecular subunits (a.k.a. "substructures") defined in the MOL2 file. Again, if you want the entire system to be stored in a single molecule definition, use the --name argument.

Redundant Subunits

If the the MOL2 file contains multiple identical types of molecules or molecular subunits, the resulting LT file will contain multiple redundant definitions of the same molecular subunits (but with different atomic coordinates). This won't cause any problems (other than larger LT files).

(If, for some reason, the user wants to avoid redefining the same types of molecules or molecular subunits, they should supply a MOL2 file containing only a single copy of that molecule or subunit. Later they can use moltemplate's "new", ".move()", and ".rot()" commands to instantiate multiple copies of the molecular subunit at those positions instead of redefining it.)

Centering the molecule(s)

The mol22lt.py ignores the "CENTROID" and "CENTER_OF_MASS" sections of the MOL2 file. Instead, each molecular subunit (or the entire molecule) can be manually recentered or rotated by editing the LT file generated by this program and appending a line containing a sequence of .move() and/or .rot() commands to correct the position. In the example above, if the "PPS5" polymer is centered at (24.7,3.9,4.3), we could append this line to the end of the "PPS_5mer.lt" file to recenter it:

PPS5.move(-24.7, -3.9, -4.3)

This will modify the definition of the "PPS5" molecule, adding (-24.7, -3.9, -4.3) to the coordinates of all the atoms the molecule (before it is copied/instantiated using the "new" command).

Arguments

--in FILE.mol2

Specify the name of the MOL2 file you want to convert. (If omitted, the terminal (stdin) is used by default.)

--out FILE.lt

Specify the name of the moltemplate file (LT file) you want to create. (If omitted, the terminal (stdout) is used by default.)

Optional Arguments

--charges CHARGES.txt

By default mol22lt.py will read the charges from the MOL2 file (if present). But if the the charges in the MOL2 file are absent or not correct, you can also customize them by supplying a file containing the correct charges using the --charges argument. This is a one-column text file containing one number per line (Comments following '#' characters are allowed.) The charges in this file must appear in the same order as the atom-ID numbers in the first column of the MOL2 file.

--name MOLECULE_NAME

By default mol22lt.py will treat each molecular subunit (a.k.a. "substructure") in the MOL2 file as an independent molecule. If there are bonds connecting them together, they will be included, however each molecular subunit will have a different molecule name. (And the atoms in different subunits will be assigned to different molecule-ID numbers.) This is inconvenient to use. Later you want to create multiple copies of this entire molecule (polymer), you will have to copy each one of these molecular subunits that it is built from.

The --name argument allows you to group everything together in a single molecule definition. Later on, you can refer to this entire compound molecule using the MOLECULE_NAME you gave it. (And all of the the atoms in the entire file will share the same molecule-ID.)

This is useful if you plan to use this molecule as a building block for creating larger simulations.

Note: There is no need to use the --name argument if your MOL2 file only contains a single molecular subunit definition. This argument was intended for use with more complex molecules that contain multiple subunits, such as polymers.

--ff FORCE_FIELD

If the molecules are associated with a particular force field (such as GAFF2), the user can specify that using this argument (eg. "--f GAFF2"). The atom names in the MOL2 file will be used to lookup the force field parameters from that force field. (You should probably also specify the name of the file containing that force field using the --ff-file argument.)

--ff-file FORCE_FIELD_FILE

This will add a line to the beginning of the LT file generated by this program telling moltemplate to load a file. (Typically this file contains atom type definitions and force field parameters.) In the example above, if you are using the GAFF2 force field, you would use "--ff-file gaff2.lt". (The "gaff2.lt" stores the GAFF2 parameters.)

--upper-case-types

This will force all of the atom type names to use upper-case letters. (This is useful for fixing some force-field specific format errors.)

--lower-case-types

This will force all of the atom type names to use lower-case letters. (This is useful for fixing some force-field specific format errors.)

--upper-case-names

This will force all of the atom names to use upper-case letters.

--lower-case-names

This will force all of the atom names to use lower-case letters.

(Note that atom names are used to identify atoms in bonds. They are not used to lookup force-field information. Make sure they remain uniquely named, even after changing capitalization.)

Working with multiple chains

If your MOL2 file contains multiple chains, split it into multiple MOL2 files (one per chain). Then convert each file separately. Afterwards, if you want to define a large molecular complex (such as a protein with quaternary structure), you can use moltemplate to define a large molecule composed of multiple chain subunits. For example, suppose we have a .mol2 file containing two chains. If we split that file into two files ("chainA.mol2", "chainB.mol2"), we can create two .lt files, one for each chain:

mol22lt.py --in chainA.mol2 --out chainA.lt --name ChainA --ff GAFF2 --ff-file "gaff2.lt"
mol22lt.py --in chainB.mol2 --out chainB.lt --name ChainB --ff GAFF2 --ff-file "gaff2.lt"

Then we can then can manually create a new .lt file (eg. "protein_with_2_chains.lt") defining a molecular complex containing two chains:

import "chainA.lt"  # Defines "ChainA"
import "chainB.lt"  # Defines "ChainB"
ProteinWith2Chains {
  a = ChainA
  b = ChainB
}

And then (in our "system.lt" file) we can instantiate that complex this way (for example):

protein1 = new ProteinWith2Chains

Python API

It is possible to access the functionality of mol22lt.py from within python. Example:

import moltemplate
# Open the file you want to convert
fMol2 = open('PPS_5mer.mol2', 'r')
# Now create a new moltemplate file
fLT   = open('PPS_5mer.lt', 'w')
# Write the contents of the new file
ConvertMol22Lt(fMol2,
               fLT,
               ff_name = 'GAFF2',        # <-- optional argument (force field)
               ff_file = 'gaff2.lt',     # <-- optional argument (ff file)
               object_name = 'PPS5')     # <-- optional argument (molecule name)