-
Notifications
You must be signed in to change notification settings - Fork 31
Home
SMILES: a simple ascii string-based method for representing molecules and reactions (see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html). Note that a single molecule can be represented by multiple SMILES strings.
SMARTS: a simple ascii string-based method for representing molecular substructures; an extension of SMILES (see http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html)
SMILES file: A text file containing SMILES strings; each SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the molecule. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smi’
SMARTS file: A text file containing SMARTS strings; each SMARTS is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the substructure. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smt’
SDF or SD file: A simple, ascii connection table-based method for representing molecules and substructures (see https://en.wikipedia.org/wiki/Chemical_table_file). Typical file extension: ‘.sdf’
RDF or RD file: A simple, ascii connection table-based method for representing chemical reactions (see http://c4.cabrillo.edu/404/ctfile.pdf). Typical file extension: ‘.rdf’
Chemical substructure: A contiguous chemical fragment; may not be a valid molecule.
Substructure search: The process of searching for the presence of a chemical substructure in molecules.
Canonical SMILES: A special, unique SMILES representation for a specific chemical structure.
Structure clean-up: Common chemoinformatics process that ‘cleans’ a structure representation from salts, fragments, etc and checks the structure representation for simple errors e.g. syntax, valence, etc.
Reaction SMILES: a simple ascii string-based method for representing chemical reactions using SMILES strings.
Reaction SMILES file: A text file containing reaction SMILES strings; each reaction SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the reaction. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.rsmi’
Reaction signature: The unique SMILES-like string representing the extended reaction core of a chemical reaction.
https://en.wikipedia.org/wiki/Chemical_file_format
http://c4.cabrillo.edu/404/ctfile.pdf
Description:
Merge identical chemical structures to one common name in a SMILES file (also see unique_molecules tool). Useful for identifying unique chemical structures in a SMILES file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
common_names input1.smi input2.smi -S ./output -r 10000 -D + -v
Explanation:
Find all compounds in input1.smi and input2.smi with common structure and different name; write them to file output.smi with new name consisting of old names separated by a "+" symbol; report progress every 10000 rows (-r 10000)
Shell output:
… output to './output'
File output: (output.smi)
Combined compound list
Help command:
common_names
Description:
Fetches records from one file based on identifiers in another file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
fetch_smiles_quick -c 1 -C 2 -X notInRecord -Y notInIdentifier record.w identifier.smi
Explanation:
Fetches records from record.w file to identifier.smi file based on common identifiers. The matched records (column 1 in record.w and column 2 in identifier.smi) will be displayed in the shell window. The list of unmatched identifiers will be saved in the notInRecord file. The list of unmatched records will be saved in the notInIdentifier file.
Shell output: (matched record)
O=C(C)C=CC=C(C)CCC=C(C)C PBCHM1756999 24 3 6 349.4 5 0.2083 10
File output: (notInIdentifier)
Unmatched records
File output: (notInRecord)
Unmatched identifiers
Help command:
fetch_smiles_quick
Description:
Filters out duplicate chemical structures based on unique smiles
Author/owner: C3/Eli Lilly and Co
Sample 1:
unique_molecules -S unique input.smi -D duplicate -v -l
Explanation:
Traverse structures in input.smi and identify duplicate structures; write duplicates in duplicate.smi; write unique in unique.smi; only consider largest fragment of each smiles (-l)
Shell output:
Execution summary
File output: (duplicate.smi)
Duplicate molecule list
File output: (unique.smi)
Unique molecule list
Help command:
unique_molecules
Description:
Identifies the unique rows in a file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
unique_rows -c 1 -c 2 input.dat
Explanation:
Check input.dat file for unique rows based on values in column 1 and 2. The unique rows will be displayed in the shell window
Shell output:
Unique row list
Help command:
unique_rows
Description:
Extract columns from a text file
Author/Owner: C3/Eli Lilly and Co
Sample 1: iwcut -f 5,3 input.txt
Explanation:
Extract column 5 and column 3 from input.txt file. The extracted columns data will be displayed in the shell window
Shell output:
Data from column 5 and column 3
Help command:
iwcut
Description:
Structure file utility to clean up SMILES files and filter on specific criteria. It can also be used to convert between chemical file formats including, e.g from SDF to SMILES
Author/owner: C3/Eli Lilly and Co
Sample 1:
fileconv -Y dbg -B 100 -S -a input.smi
Explanation:
Debug/print each molecule structure in input.smi; ignore as many as 100 fatal input errors
Shell output:
Molecule information
Sample 2:
fileconv -F 6 -c 4 -C 14 -v -i smi list.smi -S selection
Explanation:
Select the molecules that have number of atoms ranging from 4-14 and less than 6 fragments from list.smi file; store results in file with selection.smi
Shell output:
Execution summary
File output: (selection.smi)
List of molecules meeting the search criteria
Sample 3:
fileconv -o sdf -i smi single.smi -S single
Explanation:
Convert single.smi file to the sdf format single.sdf
File output: (single.sdf)
Converted sdf file
Help command:
fileconv
Description:
Generates reaction signatures for input reactions.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_signature -v -r 0,1,2 -C Cfile -F Ffile all.rsmi >all.sig 2>all.log
Explanation:
Extract the reaction signatures of all reactions in all.rsmi. Store signatures in all.sig – program prints to stdout. The signature radius from the reaction core (i.e. the changing atoms) to the signature is 0 1 2. The list of changed atoms are written to Cfile. Failed reactions are written to Ffile.
Notes:
Reaction signatures capture the extended core of a reaction around the atoms that change in a reaction. A signature is based on the unique smiles of the reaction core. The smiles includes atoms colored by their environment in the original reaction smiles. In addition, information about the ring bond status in the original reaction smiles is appended to the reaction signature produced.
Help command:
rxn_signature
Description:
Checks and standardizes input chemical reactions; converts to a reaction smiles file format
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_standardize -s -c -D x -X igbad -v -C 60 -K -E autocreate -e -o -b -f gsub input.rsmi > output.rsmi
Explanation:
Check and standardize reactions in an input reactions smiles (.rsmi) file. Discard chirality on input (-c). Discard reactions containing duplicate atom map numbers (-D x). Ignore bad reactions (-X igbad). Discard any reaction where the largest reactant has more than 60 atoms (-C 60). Kekule fix (-K). Automatically create new elements when encountered (-E autocreate). Move small fragments that show up on products to orphan status (-e). Create reagent fragments that are orphans (-o). Remove duplicate reactants, even if atom maps scrambled (-b). Replace unusual characters in reaction names with _ (-f gsub).
Notes:
Input file can be in RDF or rsmi format. Output is in rsmi format
Help command:
rxn_standardize
Description:
Perform 2D substructure searches with SMILES/SMARTS against SMILES files
Author/owner: C3/Eli Lilly and Co
Sample 1:
tsubstructure -s 'C(C)(=O)C' -m hits.smi -n nonhits.smi list.smi
Explanation:
Search for molecules in list.smi containing defined smarts (-s); write hits in hits.smi (-m) and nonhits in nonhits.smi(-n)
Sample 2:
tsubstructure.sh -f -b -A D -o smi -m hits.smi -s 'C(C)(=O)C' list.smi
Explanation:
Search for molecules containing defined smarts (-s); only find one embedding of the query (-f); for each molecule, break after finding a query which matches (-b); use daylight aromaticity (-A D); write hits in hits.smi
Note:
Use -X to successfully skip structures with unconventional symbols, e.g. X, R, ...
Sample 3:
tsubstructure -s '[ND1H2]-[C@H]1CCN2CCCCC2C1' -A D -o usmi list.smi -m match.out
Explanation:
Search for molecules containing defined smarts (-s)
Sample 4:
tsubstructure -A D -q carboxylic_acids.qry -u -M imp2exp -m match.smi list.smi
Explanation:
Find all matches to specific query file (-q) and place in match.smi (-m); use Daylight aromaticity; convert implicit hydrogen in target molecules to explicit before matching attempt (-M imp2exp); find unique matches only (-u)
Help command:
tsubstructure
Description:
Defines synthetic routes for input chemical structures by deconstructing input molecules into reactants using a set of known reactions templates. Conceptually, the inverse process of chemical reaction synthesis as implemented by tool trxn.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS -M ncon -M ring -M unsat -M arom 10Cmpds.smi
Explanation:
Looks for synthesis paths for the molecules in 10Cmpds.smi using the reaction signatures in CentroidRxnSmi_1. Various standardization flags (-Y, -X, -q, -P, -M options). Require at least 2 heavy atoms in fragments (-a), verbose (-v), centroid radius 1 (-R).
Help command:
retrosynthesis
Description:
Performs reactions between reactant molecules to enumerate product structures. Uses a control reaction file, a scaffold SMILES file and zero or more reactant SMILES files. Conceptually inverse of retrosynthesis process as implemented by tool retrosynthesis.
Author/owner: C3/Eli Lilly and Co
Sample 1:
trxn -v -r 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn -Z -z i -M RMX -m RMX -S 1.2.1_run 20180412_amines.smi 20180412_aldehydes.smi
Explanation:
Perform reaction in 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn ignoring sidechains (-Z) and modules (-z i) not reacting, ignoring sidechains with multiple substructure match (-M RMX), ignoring scaffolds that generate multiple structure hits(-m RMX). Output file is saved to 1.2.1_run 20180412_amines.smi 20180412_aldehydes.
Sample 2:
trxn -v –r 2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn -Z -z i -M RMX -m RMX -S 2.1.2_run 20180412_amines.smi 20180412_carboxylic_acids.smixbntr
Explanation:
Perform reaction in ./2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn
Help command:
trxn