-
Notifications
You must be signed in to change notification settings - Fork 31
Home
SMILES: a simple ascii string-based method for representing molecules and reactions (see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html). Note that a single molecule can be represented by multiple SMILES strings.
SMARTS: a simple ascii string-based method for representing molecular substructures; an extension of SMILES (see http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html)
SMILES file: A text file containing SMILES strings; each SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the molecule. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smi’
SMARTS file: A text file containing SMARTS strings; each SMARTS is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the substructure. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smt’
SDF or SD file: A simple, ascii connection table-based method for representing molecules and substructures (see https://en.wikipedia.org/wiki/Chemical_table_file). Typical file extension: ‘.sdf’
RDF or RD file: A simple, ascii connection table-based method for representing chemical reactions (see http://c4.cabrillo.edu/404/ctfile.pdf). Typical file extension: ‘.rdf’
Chemical substructure: A contiguous chemical fragment; may not be a valid molecule.
Substructure search: The process of searching for the presence of a chemical substructure in molecules.
Canonical SMILES: A special, unique SMILES representation for a specific chemical structure.
Structure clean-up: Common chemoinformatics process that ‘cleans’ a structure representation from salts, fragments, etc and checks the structure representation for simple errors e.g. syntax, valence, etc.
GFP (or gfp): Generalized FingerPrint format commonly used by LillyMol tools. A TDT-like format (inspired by TDT - Thor Data Tree format introduced by Daylight Chemical Information Systems; see http://www.daylight.com/meetings/summerschool01/course/basics/tdt.html)
Reaction SMILES: a simple ascii string-based method for representing chemical reactions using SMILES strings.
Reaction SMILES file: A text file containing reaction SMILES strings; each reaction SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the reaction. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.rsmi’
Reaction signature: The unique SMILES-like string representing the extended reaction core of a chemical reaction.
https://en.wikipedia.org/wiki/Chemical_file_format
http://c4.cabrillo.edu/404/ctfile.pdf
Description:
Merge identical chemical structures to one common name in a SMILES file (also see unique_molecules tool). Useful for identifying unique chemical structures in a SMILES file.
Author/owner: C3/Eli Lilly and Co
Sample 1:
common_names -S ./output -s 10000 -r 10000 -D + -v input1.smi input2.smi
Explanation:
Find all compounds in input1.smi and input2.smi with common structure and different name; write them to file output.smi with new name consisting of old names separated by a "+" symbol; maximum number of molecules to process is 10000 (-s 10000); report progress every 10000 rows (-r 10000)
Shell output:
… output to './output'
File output: (output.smi)
Combined compound list
Help command:
common_names
Description:
Fetches records from one file based on identifiers in another file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
fetch_smiles_quick -j -c 1 -C 2 -X notInRecord -Y notInIdentifier record.w structures.smi
Explanation:
Fetches records from record.w file to identifier.smi file based on common identifiers. The matched records (column 1 in record.w and column 2 in structures.smi) will be displayed in the shell window. The list of unmatched identifiers will be saved in the notInRecord file. The list of unmatched records will be saved in the notInIdentifier file. The generated identifier file is a descriptor file without header record(-j).
Shell output: (matched record)
O=C(C)C=CC=C(C)CCC=C(C)C PBCHM1756999 24 3 6 349.4 5 0.2083 10
File output: (notInIdentifier)
Unmatched records
File output: (notInRecord)
Unmatched identifiers
Help command:
fetch_smiles_quick
Description:
Filters out duplicate chemical structures based on unique smiles
Author/owner: C3/Eli Lilly and Co
Sample 1:
unique_molecules -S unique -D duplicate -v -l input.smi
Explanation:
Traverse structures in input.smi and identify duplicate structures; write duplicates in duplicate.smi; write unique in unique.smi; only consider largest fragment of each smiles (-l)
Shell output:
Execution summary
File output: (duplicate.smi)
Duplicate molecule list
File output: (unique.smi)
Unique molecule list
Help command:
unique_molecules
Description:
Identifies the unique rows in a file
Author/Owner: C3/Eli Lilly and Co
Sample 1:
unique_rows -c 1 -c 2 input.dat
Explanation:
Check input.dat file for unique rows based on values in column 1 and 2. The unique rows will be displayed in the shell window
Shell output:
Unique row list
Help command:
unique_rows
Description:
Extract columns from a text file
Author/Owner: C3/Eli Lilly and Co
Sample 1: iwcut -f 5,3 input.txt
Explanation:
Extract column 5 and column 3 from input.txt file. The extracted columns data will be displayed in the shell window
Shell output:
Data from column 5 and column 3
Help command:
iwcut
Description:
Structure file utility to clean up SMILES files and filter on specific criteria. It can also be used to convert between chemical file formats including, e.g from SDF to SMILES
Author/owner: C3/Eli Lilly and Co
Sample 1:
fileconv -Y dbg -B 100 -S -a input.smi
Explanation:
Debug/print each molecule structure in input.smi; ignore as many as 100 fatal input errors
Shell output:
Molecule information
Sample 2:
fileconv -F 6 -c 4 -C 14 -v -i smi -S selection list.smi
Explanation:
Select the molecules that have number of atoms ranging from 4-14 and less than 6 fragments from list.smi file; store results in file with selection.smi
Shell output:
Execution summary
File output: (selection.smi)
List of molecules meeting the search criteria
Sample 3:
fileconv -o sdf -i smi -S single single.smi
Explanation:
Convert single.smi file to the sdf format single.sdf
File output: (single.sdf)
Converted sdf file
Help command:
fileconv
Description:
Generates reaction signatures for input reactions.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_signature -v -r 0,1,2 -C Cfile -F Ffile all.rsmi >all.sig 2>all.log
Explanation:
Extract the reaction signatures of all reactions in all.rsmi. Store signatures in all.sig – program prints to stdout. The signature radius from the reaction core (i.e. the changing atoms) to the signature is 0 1 2. The list of changed atoms are written to Cfile. Failed reactions are written to Ffile.
Notes:
Reaction signatures capture the extended core of a reaction around the atoms that change in a reaction. A signature is based on the unique smiles of the reaction core. The smiles includes atoms colored by their environment in the original reaction smiles. In addition, information about the ring bond status in the original reaction smiles is appended to the reaction signature produced.
Help command:
rxn_signature
Description:
Checks and standardizes input chemical reactions; converts to a reaction smiles file format
Author/Owner: C3/Eli Lilly and Co
Sample 1:
rxn_standardize -s -c -D x -X igbad -v -C 60 -K -E autocreate -e -o -b -f gsub input.rsmi > output.rsmi
Explanation:
Check and standardize reactions in an input reactions smiles (.rsmi) file. Discard chirality on input (-c). Discard reactions containing duplicate atom map numbers (-D x). Ignore bad reactions (-X igbad). Discard any reaction where the largest reactant has more than 60 atoms (-C 60). Kekule fix (-K). Automatically create new elements when encountered (-E autocreate). Move small fragments that show up on products to orphan status (-e). Create reagent fragments that are orphans (-o). Remove duplicate reactants, even if atom maps scrambled (-b). Replace unusual characters in reaction names with _ (-f gsub).
Notes:
Input file can be in RDF or rsmi format. Output is in rsmi format
Help command:
rxn_standardize
Description:
Perform 2D substructure searches with SMILES/SMARTS against SMILES files
Author/owner: C3/Eli Lilly and Co
Sample 1:
tsubstructure -s 'C(C)(=O)C' -m hits.smi -n nonhits.smi list.smi
Explanation:
Search for molecules in list.smi containing defined smarts (-s); write hits in hits.smi (-m) and nonhits in nonhits.smi(-n)
Sample 2:
tsubstructure.sh -f -b -A D -o smi -m hits.smi -s 'C(C)(=O)C' list.smi
Explanation:
Search for molecules containing defined smarts (-s); only find one embedding of the query (-f); for each molecule, break after finding a query which matches (-b); use daylight aromaticity (-A D); write hits in hits.smi
Note:
Use -X to successfully skip structures with unconventional symbols, e.g. X, R, ...
Sample 3:
tsubstructure -s '[ND1H2]-[C@H]1CCN2CCCCC2C1' -A D -o usmi -m match.out list.smi
Explanation:
Search for molecules containing defined smarts (-s)
Sample 4:
tsubstructure -A D -q carboxylic_acids.qry -u -M imp2exp -m match.smi list.smi
Explanation:
Find all matches to specific query file (-q) and place in match.smi (-m); use Daylight aromaticity; convert implicit hydrogen in target molecules to explicit before matching attempt (-M imp2exp); find unique matches only (-u)
Help command:
tsubstructure
Description:
Defines synthetic routes for input chemical structures by deconstructing input molecules into reactants using a set of known reactions templates. Conceptually, the inverse process of chemical reaction synthesis as implemented by tool trxn.
Author/Owner: C3/Eli Lilly and Co
Sample 1:
retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS -M ncon -M ring -M unsat -M arom 10Cmpds.smi
Explanation:
Looks for synthesis paths for the molecules in 10Cmpds.smi using the reaction signatures in CentroidRxnSmi_1. Various standardization flags (-Y, -X, -q, -P, -M options). Require at least 2 heavy atoms in fragments (-a), verbose (-v), centroid radius 1 (-R).
Help command:
retrosynthesis
Description:
Performs reactions between reactant molecules to enumerate product structures. Uses a control reaction file, a scaffold SMILES file and zero or more reactant SMILES files. Conceptually inverse of retrosynthesis process as implemented by tool retrosynthesis.
Author/owner: C3/Eli Lilly and Co
Sample 1:
trxn -v -r 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn -Z -z i -M RMX -m RMX -S 1.2.1_run 20180412_amines.smi 20180412_aldehydes.smi
Explanation:
Perform reaction in 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn ignoring sidechains (-Z) and modules (-z i) not reacting, ignoring sidechains with multiple substructure match (-M RMX), ignoring scaffolds that generate multiple structure hits(-m RMX). Output file is saved to 1.2.1_run 20180412_amines.smi 20180412_aldehydes.
Sample 2:
trxn -v –r 2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn -Z -z i -M RMX -m RMX -S 2.1.2_run 20180412_amines.smi 20180412_carboxylic_acids.smixbntr
Explanation:
Perform reaction in ./2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn
Help command:
trxn
Description:
Computes demerit of a molecule. In this context demerits refers to non-desirable molecular structure characteristics/features.
Author/owner: C3/Eli Lilly and Co
Sample 1:
iwdemerit -A D -A I -S foo -G - -f 99999 -t -W imp2exp -W maxe=1 -E autocreate -q F:PAINS/queries_latest -O hard -W dnv=0 -W slist -i smi pubchem_example.smi
Explanation:
Compute the demerits for the molecules in pubchem_example.smi (-i smi pubchem_example.smi) using the queries_latest query file (-q F:PAINS/queries_latest). The good, non-rejected (-G) structures will be written into foo.demerit (-S foo). Use Daylight aromaticity definitions (-A D) and enable input of aromatic structures (-A I). Molecules are rejected when they have 9999 or higher demerits (-f 9999). Append demerit text to molecule names (-t). Make implicit hydrogen explicit (-W imp2exp), maximum number of substructure queries to identity is 1 (-W maxe=1), use value 0 in the query file as the demerit score (-W dnv=0) and write a sorted list of demerit values and reasons (-W slist). Skip all the hard coded substructure queries (-O hard).
Help command:
iwdemerit
Description:
Generate random smiles based on input smiles.
Author/owner: C3/Eli Lilly and Co
Sample 1:
random_smiles.sh -n 5 -a -e -A D -v pubchem_example.smi
Explanation:
Generate 5 new random smiles (-n 5) based on the smiles in pubchem_examples.smi using Daylight aromaticity(-A D). Append permutation number to name (-a) and echo initial molecule (-e)
Help command:
random_smiles
Description:
Write either unique smiles (if interpretable) or non aromatic unique form.
Author/owner: C3/Eli Lilly and Co
Sample 1:
preferred_smiles.sh pubchem_example.smi
Explanation:
Write unique smiles for the smiles in the pubchem_example.smi
Help command:
preferred_smiles
Description:
Calculate the rotatable bonds in the modules
Author/owner: C3/Eli Lilly and Co
Sample 1:
rotatable_bonds.sh pubchem_example.smi
Explanation:
Calculate the rotatable bonds for molecules in the pubchem_example.smi
Help command:
rotatable_bonds
Description:
Concatenates descriptor files by joining on identifiers
Author/owner: C3/Eli Lilly and Co
Sample 1:
concat_files t1.1 t1.2
Explanation:
Concatenates the descriptor files t1.1 and t1.2 based on the identifier
Help command:
concat_files
Description:
Sorts a molecule file by various criteria
Author/owner: C3/Eli Lilly and Co
Sample 1:
msort -a pubchem_example.smi
Explanation:
Sort the molecule file pubchem_example.smi based on the number of atoms the molecule has (-a)
Help command:
msort
Description:
Runs molecules through the Lilly medchem rules, skip to next molecule upon crossing a threshold or instant kill rule
Author/owner: C3/Eli Lilly and Co
Sample 1:
tp_first_pass -C 20 -i smi -o smi -a -L bad0 -S ok0 pubchem_example.smi
Explanation:
Filter the input smile file (-i smi) pubchem_example.smi with maximum atom count 20 (-C 20). Write molecules to the smile file (-o smi) bad0.smi (-L bad0) if the molecule atom count is large than 20, otherwise write the molecules to the ok0.smi (-S ok0)
Help command:
tp_first_pass
Description:
Converts a molecule to a query file
Author/owner: C3/Eli Lilly and Co
Sample 1:
mol2qry -M 'C1=CC=CC=C1CCCC' -S out
Explanation:
Convert the molecule 'C1=CC=CC=C1CCCC' (-M 'C1=CC=CC=C1CCCC) to the output query file out.qry (-S out)
Help command:
mol2qry
Description:
Chemical structure fragmentation tool to recursively cuts molecules into fragments
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecular_scaffold -g all -t -c pubchem_example.smi
Explanation:
Recursively cuts molecules in pubchem_example.smi file into fragments with all standardistions (-g all), removing cis-trans bonds from input (-t) and all chirality from input molecules (-c)
Help command:
molecular_scaffold
Description:
Extracts a subset of atoms from set of molecules based on a single substructure
Author/owner: C3/Eli Lilly and Co
Sample 1:
molecule_subset -s c1ccccc1 pubchem_example.smi
Explanation:
List a subset of molecules from pubchem_example.smi which contain the substructure c1ccccc1 (-s c1ccccc1)
Help command:
molecule_subset
Description:
Identifies substituents from substructure matched molecule
Author/owner: C3/Eli Lilly and Co
Sample 1:
rgroup -s c1ccccc1 pubchem_example.smi
Explanation:
Identifies substituents for the molecules matching substructure c1ccccc1 (-s c1ccccc1)
Help command:
rgroup
Description:
Extracts rings from molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_extraction pubchem_example.smi
Explanation:
Extracts rings from molecules in the pubchem_example.smi
Help command:
ring_extraction
Description:
Exhaustively trim rings from ring systems, preserving aromaticiy
Author/owner: C3/Eli Lilly and Co
Sample 1:
ring_trimming -u -c -w parent -w rings -w scaffold -m 1 -J 2 -j 1 pubchem_example.smi
Explanation:
Exhaustively trim rings from pubchem_example.smi, using only unique structures from each input molecule (-u), removing all chiral centres (-c), writing parent (-w parent), writing isoloated ring systems (-w ring), writing scaffold (-w scaffold), maximum number of rings to remove from a ring system is 1 (-m 1), 2 isotope for where the ring joins are broken (-J 2), 1 isotope for where scaffold joined the reset of the molecule.
Help command:
ring_trimming
Description:
Filter molecules according to sp3 content
Author/owner: C3/Eli Lilly and Co
Sample 1:
sp3_filter -c 2 -x 2 -U out pubchem_example.smi
Explanation:
Filter molecules in the pubchem_examples.smi with minimum number of 2 Carbon sp3 atoms (-c 2) and minimum number of 2 non-Carbon sp3 atoms (-x 2). The rejected molecules are written into out.smi file (-U out)
Help command:
sp3_filter
Description:
Enumerate tautomeric forms for molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
tautomer_generation pubchem_example.smi
Explanation:
Enumerate tautomeric forms for molecules in the pubchem_example.smi
Help command:
tautomer_generation
Description:
Generate new smiles from a set of input molecules by random strong operations
Author/owner: C3/Eli Lilly and Co
Sample 1:
smiles_mutation -N 50000 -n 20 -p 5 -c 15 -C 40 pubchem_example_short.smi
Explanation:
Generate new smiles from the molecules in the pubchem_example_short.smi, running 50000 iterations (-N 50000), completing refresh from initial smiles every 20 iterations(-n 20), generating 5 random repliciates of each starting molecule (-p 5), minimum number of atom in generated molecules is 15 (-c 15), and maximum number of atom in generated moelcules is 40 (-C 40)
Help command:
smile_mutation
Description:
Search substructure over reactions
Author/owner: C3/Eli Lilly and Co
Sample 1:
rxn_substructure_search -q 'C(F)(F)F>>' -m found sample_reactions.rsmi
Explanation:
Search the substructure 'C(F)(F)F>>' (-q 'C(F)(F)F>>' in smaple_reactions.rsmi file, allowing to match anywhere in reagents/agents/products (-b), saving the matched result to found.rxnsmi file (-m found)
Help command:
rxn_substructure_search
Description:
Group identical molecules (including isomers) with varying activity values
Author/owner: C3/Eli Lilly and Co
Sample 1:
activity_consistency -a -l -e 2 -X pubchem_in.act pubchem.smi
Explanation:
Group identical molecules in pubchem.smi using experimental data in pubchem_in.act (-X pubchem_in.act), using the activity data from the column 2 of pubchem_in.act file (-e 2), reducing to graph form (-a), reducing to largest fragment (-l),
Help command:
activity_consistency
Description:
Converts an integer descriptor file to fingerprints, either of type fixed 0/1 or non colliding counted
Author/owner: C3/Eli Lilly and Co
Sample 1:
descriptor_file_to_01_fingerprints -F NCFP -S pubchem.smi cleaned_descriptor.txt
Explanation:
Create the fingerprints for the smiles in the pubchem.smi (-S pubchem.smi), using descriptors in the cleaned_descriptor.txt and tag string NCFP (-F NCFP)
Help command:
descriptor_file_to_01_fingerprints
Description:
Converts a descriptors to a sparse fingerprint
Author/owner: C3/Eli Lilly and Co
Sample 1:
descriptors_to_fingerprint -S pubchem.smi -D w_natoms:10,40,1 -D w_nelem:1,8,1 pubchem.w
Explanation:
Compute fingerprint for the smiles in the pubchem.smi (-S pubchem.smi), using descriptor corresponding to number of heavy atoms w_natoms (-D w_natoms:10,40,1) and descriptor corresponding to number of elements w_nelem (-D w_nelem:1,8,1) in the pubchem.w file tabular descriptor file. The minimum value for the w_natoms descriptor is -1, the maximum value is 40, and the incremental unit between the minimum and maximum is 1. The minimum value for the w_nelem descriptor is 1, the maximum value is 8, and the incremental unit between the minimum and maximum is 1.
Help command:
descriptors_to_fingerprints
Description:
Computes the distance matrix for a pool of gfp fingerprints; generates a human-readable ascii matrix
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_distance_matrix pubchem.gfp
Explanation:
Compute the distance matrix for the gfp fingerprint pubchem.gfp
Help command:
gfp_distance_matrix
Description:
Performs clustering with leader (sphere exclusion) algorithm on gfp descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_leader_v2 -t 0.3 pubchem.gfp
Explanation:
Compute clustering with leader algorithm on pubchem.gfp with distance threshold 0.3 (-t 0.3)
Help command:
gfp_leader_v2
Description:
Finds near neighbours in a set of fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_nearneighbours -p pubchem.gfp -n 2 -T 0.5 pubchem_10.gfp
Explanation:
Find near neighbours for the fingerprints in pubchem_10.gfp; compare against fingerprints in haystack fingerprint set pubchem.gfp (-p pubchem.gfp), search for 2 neighbours for each descriptor (-n 2) and discard distance greater than 0.5 (-T 0.5)
Help command:
gfp_nearneighbours
Description:
Finds the single linkage in the gfp fingerpint set
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_single_linkage -t 0.25 pubchem.gfp
Explanation:
Compute clustering with single linkage algorithm on pubchem.gfp with distance 0.25 (-t 0.25) as the threshold value for grouping
Help command:
gfp_single_linkage
Description:
Converts non-colliding fingerprints to fixed counted, or binary forms
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_sparse_to_fixed -F NCSELW pubchem.gfp
Explanation:
Convert non-colliding fingerprints NCSELW (-F NCSELW) in pubchem.gfp to binary form
Help command:
gfp_sparse_to_fixed
Description:
Filters molecules according to how close they are to members of a comparison pool
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_distance_filter -p pubchem_10.gfp -t 0.7 -n 3 -U U1 pubchem.gfp
Explanation:
Compare fingerprints in pubchem.gfp to pubchem_10.gfp. The minimum required distance between molecules is 0.7 (-t 0.7). Any molecules in pubchem.gfp will be rejected if it violates the minimum distance requirement at least 3 times (-n 3). The rejected molecule will be saved into U1 file (-U U1)
Help command:
gfp_distance_filter
Description:
Calculates pairwise distance between molecules
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_pairwise_distances -p pubchem.gfp -T 0.5 100PairsId
Explanation:
Calculate the pairwise distance between the molecules in pubchem.gfp (-p pubchem.gfp). The distance will not be reported if it is larger than the threshold value 0.5 (-T 0.5). The required pairs for calculation are listed in the 100PairsId file.
Help command:
gfp_pairwise_distances
Description:
Converts fingerprints to descriptors
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_to_descriptors -f -F FPDSC pubchem.gfp
Explanation:
Convert the fixed width fingerprint (-f) pubchem.gfp into descriptor format; use fingerprint type FPDSC (-F FPDSC) in fingerprint file.
Help command:
gfp_to_descriptors
Description:
Finds near neighbours within a set of fingerprints
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_nearneighbours_single_file -p -z -T 0.2 pubchem.gfp FPDSC,w=0.2 -F NCSELW,nc,w=0.8 -V a=0.03 -V b=1.7
Explanation:
Finds near neighbours in the pubchem.gfp file. Writes all pair-wise distances in 3 columns (-p); discards any molecule without neighbours (-z) or distances greater than 0.2. Assign weight 0.2 to fingerprint type FPDSC 0.2 (-F FPDSC,w=0.2), and weight 0.8 to non-colliding fingerprint with tag NCSELW (-F NCSELW,nc,w=0.8) for distance calculation. Use Tversky asymmetric similarity with parameters a set to 0.03 (-V a=0.03) and b set to 1.7 (-V b=1.7)
Help command:
gfp_nearneighbours_single_file
Description:
Finds near neighbours of compounds supplied in gfp fingerprint format; can handle LARGE numbers of compounds
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_lnearneighbours -F FPDSC,w=0.3 -F NCSELW,nc,w=0.7 -T 0.4 -n 2 -h -p pubchem_needles.gfp pubchem_haystack.gfp
Explanation:
Finds 2 near neighbours (-n 2) for each fingerprint in the needle file pubchem_needles.gfp against haystack file pubchem_haystack.gfp (-p). Discards neighbours with zero distance and the same ID as the target (-h) or with a distance larger than 0.4 (-T). Fingerprint of type FPDSC weighs 0.3 (-F FPDSC,w=0.3), and non-colliding fingerprint with tag NCSELW weighs 0.7 in the distance calculation(-F NCSELW,nc,w=0.7)
Help command:
gfp_lnearneighbours
Description:
Adds descriptors to the gfp file with matching ID
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_add_descriptors -D FPADD pubchem.gfp cleaned_descriptor.txt
Explanation:
Add the descriptor cleaned_descriptor.txt to the gfp file pubchem.gfp; use tag FPADD (-D FPADD)
Help command:
gfp_add_descriptors
Description:
Scans a fingerprint file and computes average activity associated with each bit; an activity file needs to be provided
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_profile_activity_by_bits -E activityFile.txt pubchem.gfp
Explanation:
Compute the average activity from file activityFile.txt (-E activityFile.txt) for fingerprints in the pubchem.gfp.
Help command:
gfp_profile_activity_by_bits
Description:
Calculates the spread distance of fingerprints against target fingerprint set. Sort output by spread distance.
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_spread_v2 -A pubchem_10.gfp pubchem.gfp
Explanation:
Compute the spread distance of fingerprints in pubchem.gfp. Bias away from fingerprints in pubchem_10.gfp (-A pubchem_10.gfp)
Help command:
gfp_spread_v2
Description:
Calculates the spread distance of fingerprints against target fingerprint set while considering a bucketized variable like activity/property
Author/owner: C3/Eli Lilly and Co
Sample 1:
gfp_spread_buckets_v2 -B NATOMS pubchem_with_natom.gfp
Explanation:
Compute the spread distance of the fingerprints in pubchem_with_natom.gfp while considering the variable NATOMS (-B NATOMS) also included in the fingerprint file.
Help command:
gfp_spread_buckets_v2
Description:
Processes the output of several gfp nearneighbour-based tools into human readable form including SMILES format
Author/owner: C3/Eli Lilly and Co
Sample 1:
nplotnn -L def leader_result_raw.txt
Explanation:
Reformat the output of leader clustering to tabular (-L tbl)
Help command:
nplotnn
Description:
Sorts fields of a TDT file or stream according to specific tag, properties, or the degree of each node
Author/owner: C3/Eli Lilly and Co
Sample 1:
tdt_sort -T FPADD,col=4 -r unsorted.gfp
Explanation:
Sort the file unsorted.gfp in reverse order (-r) based on the 4th column of FPADD field (-T FPADD,col=4)
Help command:
tdt_sort
Description:
Joins two TDT streams, possible with different identifiers
Author/owner: C3/Eli Lilly and Co
Sample 1:
tdt_join.sh -d part1.gfp part2.gfp
Explanation:
Join fingerprint file part1.gfp with fingerprint file part2.gfp and eliminate duplicate tags from second file (-d)
Help command:
tdt_join