Skip to content
G-Huang edited this page Apr 11, 2019 · 22 revisions

Instruction for LillyMol

Glossary

SMILES: a simple ascii string-based method for representing molecules and reactions (see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html). Note that a single molecule can be represented by multiple SMILES strings.

SMARTS: a simple ascii string-based method for representing molecular substructures; an extension of SMILES (see http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html)

SMILES file: A text file containing SMILES strings; each SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the molecule. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smi’

SMARTS file: A text file containing SMARTS strings; each SMARTS is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the substructure. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.smt’

SDF or SD file: A simple, ascii connection table-based method for representing molecules and substructures (see https://en.wikipedia.org/wiki/Chemical_table_file). Typical file extension: ‘.sdf’

RDF or RD file: A simple, ascii connection table-based method for representing chemical reactions (see http://c4.cabrillo.edu/404/ctfile.pdf). Typical file extension: ‘.rdf’

Chemical substructure: A contiguous chemical fragment; may not be a valid molecule.

Substructure search: The process of searching for the presence of a chemical substructure in molecules.

Canonical SMILES: A special, unique SMILES representation for a specific chemical structure.

Structure clean-up: Common chemoinformatics process that ‘cleans’ a structure representation from salts, fragments, etc and checks the structure representation for simple errors e.g. syntax, valence, etc.

GFP (or gfp): Generalized FingerPrint format commonly used by LillyMol tools. A TDT-like format (inspired by TDT - Thor Data Tree format introduced by Daylight Chemical Information Systems; see http://www.daylight.com/meetings/summerschool01/course/basics/tdt.html)

Reaction SMILES: a simple ascii string-based method for representing chemical reactions using SMILES strings.

Reaction SMILES file: A text file containing reaction SMILES strings; each reaction SMILES is the first element of each line/row. Traditionally, the second row is the identifier (ID) of the reaction. Additional columns may exist. Columns are separated by space or tab with the former being the standard for LillyMol. Typical file extension: ‘.rsmi’

Reaction signature: The unique SMILES-like string representing the extended reaction core of a chemical reaction.

References/Resources

https://en.wikipedia.org/wiki/Chemical_file_format

http://c4.cabrillo.edu/404/ctfile.pdf

Tool Documentation

1. common_names

Description:

Merge identical chemical structures to one common name in a SMILES file (also see unique_molecules tool). Useful for identifying unique chemical structures in a SMILES file.

Author/owner: C3/Eli Lilly and Co

Sample 1:

common_names -S ./output -s 10000 -r 10000 -D + -v input1.smi input2.smi

Explanation:

Find all compounds in input1.smi and input2.smi with common structure and different name; write them to file output.smi with new name consisting of old names separated by a "+" symbol; maximum number of molecules to process is 10000 (-s 10000); report progress every 10000 rows (-r 10000)

Shell output:

… output to './output'

File output: (output.smi)

Combined compound list

Help command:

common_names

2. fetch_smiles_quick

Description:

Fetches records from one file based on identifiers in another file

Author/Owner: C3/Eli Lilly and Co

Sample 1:

fetch_smiles_quick -j -c 1 -C 2 -X notInRecord -Y notInIdentifier record.w structures.smi

Explanation:

Fetches records from record.w file to identifier.smi file based on common identifiers. The matched records (column 1 in record.w and column 2 in structures.smi) will be displayed in the shell window. The list of unmatched identifiers will be saved in the notInRecord file. The list of unmatched records will be saved in the notInIdentifier file. The generated identifier file is a descriptor file without header record(-j).

Shell output: (matched record)

O=C(C)C=CC=C(C)CCC=C(C)C PBCHM1756999 24 3 6 349.4 5 0.2083 10

File output: (notInIdentifier)

Unmatched records

File output: (notInRecord)

Unmatched identifiers

Help command:

fetch_smiles_quick

3. unique_molecules

Description:

Filters out duplicate chemical structures based on unique smiles

Author/owner: C3/Eli Lilly and Co

Sample 1:

unique_molecules -S unique -D duplicate -v -l input.smi

Explanation:

Traverse structures in input.smi and identify duplicate structures; write duplicates in duplicate.smi; write unique in unique.smi; only consider largest fragment of each smiles (-l)

Shell output:

Execution summary

File output: (duplicate.smi)

Duplicate molecule list

File output: (unique.smi)

Unique molecule list

Help command:

unique_molecules

4. unique_rows

Description:

Identifies the unique rows in a file

Author/Owner: C3/Eli Lilly and Co

Sample 1:

unique_rows -c 1 -c 2 input.dat

Explanation:

Check input.dat file for unique rows based on values in column 1 and 2. The unique rows will be displayed in the shell window

Shell output:

Unique row list

Help command:

unique_rows

5. iwcut

Description:

Extract columns from a text file

Author/Owner: C3/Eli Lilly and Co

Sample 1: iwcut -f 5,3 input.txt

Explanation:

Extract column 5 and column 3 from input.txt file. The extracted columns data will be displayed in the shell window

Shell output:

Data from column 5 and column 3

Help command:

iwcut

6. fileconv

Description:

Structure file utility to clean up SMILES files and filter on specific criteria. It can also be used to convert between chemical file formats including, e.g from SDF to SMILES

Author/owner: C3/Eli Lilly and Co

Sample 1:

fileconv -Y dbg -B 100 -S -a input.smi

Explanation:

Debug/print each molecule structure in input.smi; ignore as many as 100 fatal input errors

Shell output:

Molecule information

Sample 2:

fileconv -F 6 -c 4 -C 14 -v -i smi -S selection list.smi

Explanation:

Select the molecules that have number of atoms ranging from 4-14 and less than 6 fragments from list.smi file; store results in file with selection.smi

Shell output:

Execution summary

File output: (selection.smi)

List of molecules meeting the search criteria

Sample 3:

fileconv -o sdf -i smi -S single single.smi

Explanation:

Convert single.smi file to the sdf format single.sdf

File output: (single.sdf)

Converted sdf file

Help command:

fileconv

7. rxn_signature

Description:

Generates reaction signatures for input reactions.

Author/Owner: C3/Eli Lilly and Co

Sample 1:

rxn_signature -v -r 0,1,2 -C Cfile -F Ffile all.rsmi >all.sig 2>all.log

Explanation:

Extract the reaction signatures of all reactions in all.rsmi. Store signatures in all.sig – program prints to stdout. The signature radius from the reaction core (i.e. the changing atoms) to the signature is 0 1 2. The list of changed atoms are written to Cfile. Failed reactions are written to Ffile.

Notes:

Reaction signatures capture the extended core of a reaction around the atoms that change in a reaction. A signature is based on the unique smiles of the reaction core. The smiles includes atoms colored by their environment in the original reaction smiles. In addition, information about the ring bond status in the original reaction smiles is appended to the reaction signature produced.

Help command:

rxn_signature

8. rxn_standardize

Description:

Checks and standardizes input chemical reactions; converts to a reaction smiles file format

Author/Owner: C3/Eli Lilly and Co

Sample 1:

rxn_standardize -s -c -D x -X igbad -v -C 60 -K -E autocreate -e -o -b -f gsub input.rsmi > output.rsmi

Explanation:

Check and standardize reactions in an input reactions smiles (.rsmi) file. Discard chirality on input (-c). Discard reactions containing duplicate atom map numbers (-D x). Ignore bad reactions (-X igbad). Discard any reaction where the largest reactant has more than 60 atoms (-C 60). Kekule fix (-K). Automatically create new elements when encountered (-E autocreate). Move small fragments that show up on products to orphan status (-e). Create reagent fragments that are orphans (-o). Remove duplicate reactants, even if atom maps scrambled (-b). Replace unusual characters in reaction names with _ (-f gsub).

Notes:

Input file can be in RDF or rsmi format. Output is in rsmi format

Help command:

rxn_standardize

9. tsubstructure

Description:

Perform 2D substructure searches with SMILES/SMARTS against SMILES files

Author/owner: C3/Eli Lilly and Co

Sample 1:

tsubstructure -s 'C(C)(=O)C' -m hits.smi -n nonhits.smi list.smi

Explanation:

Search for molecules in list.smi containing defined smarts (-s); write hits in hits.smi (-m) and nonhits in nonhits.smi(-n)

Sample 2:

tsubstructure.sh -f -b -A D -o smi -m hits.smi -s 'C(C)(=O)C' list.smi

Explanation:

Search for molecules containing defined smarts (-s); only find one embedding of the query (-f); for each molecule, break after finding a query which matches (-b); use daylight aromaticity (-A D); write hits in hits.smi

Note:

Use -X to successfully skip structures with unconventional symbols, e.g. X, R, ...

Sample 3:

tsubstructure -s '[ND1H2]-[C@H]1CCN2CCCCC2C1' -A D -o usmi -m match.out list.smi

Explanation:

Search for molecules containing defined smarts (-s)

Sample 4:

tsubstructure -A D -q carboxylic_acids.qry -u -M imp2exp -m match.smi list.smi

Explanation:

Find all matches to specific query file (-q) and place in match.smi (-m); use Daylight aromaticity; convert implicit hydrogen in target molecules to explicit before matching attempt (-M imp2exp); find unique matches only (-u)

Help command:

tsubstructure

10. retrosynthesis

Description:

Defines synthetic routes for input chemical structures by deconstructing input molecules into reactants using a set of known reactions templates. Conceptually, the inverse process of chemical reaction synthesis as implemented by tool trxn.

Author/Owner: C3/Eli Lilly and Co

Sample 1:

retrosynthesis -Y all -X kg -X kekule -X ersfrm -a 2 -q f -v -R 1 -I CentroidRxnSmi_1 -P UST:AZUCORS -M ncon -M ring -M unsat -M arom 10Cmpds.smi

Explanation:

Looks for synthesis paths for the molecules in 10Cmpds.smi using the reaction signatures in CentroidRxnSmi_1. Various standardization flags (-Y, -X, -q, -P, -M options). Require at least 2 heavy atoms in fragments (-a), verbose (-v), centroid radius 1 (-R).

Help command:

retrosynthesis

11. trxn

Description:

Performs reactions between reactant molecules to enumerate product structures. Uses a control reaction file, a scaffold SMILES file and zero or more reactant SMILES files. Conceptually inverse of retrosynthesis process as implemented by tool retrosynthesis.

Author/owner: C3/Eli Lilly and Co

Sample 1:

trxn -v -r 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn -Z -z i -M RMX -m RMX -S 1.2.1_run 20180412_amines.smi 20180412_aldehydes.smi

Explanation:

Perform reaction in 1.2.1_Aldehyde_reductive_amination_FROM_amines_AND_aldehydes.rxn ignoring sidechains (-Z) and modules (-z i) not reacting, ignoring sidechains with multiple substructure match (-M RMX), ignoring scaffolds that generate multiple structure hits(-m RMX). Output file is saved to 1.2.1_run 20180412_amines.smi 20180412_aldehydes.

Sample 2:

trxn -v –r 2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn -Z -z i -M RMX -m RMX -S 2.1.2_run 20180412_amines.smi 20180412_carboxylic_acids.smixbntr

Explanation:

Perform reaction in ./2.1.2_Carboxylic_acid_+_amine_condensation_FROM_amines_AND_carboxylic_acids.rxn

Help command:

trxn

12. iwdemerit

Description:

Computes demerit of a molecule. In this context demerits refers to non-desirable molecular structure characteristics/features.

Author/owner: C3/Eli Lilly and Co

Sample 1:

iwdemerit -A D -A I -S foo -G - -f 99999 -t -W imp2exp -W maxe=1 -E autocreate -q F:PAINS/queries_latest -O hard -W dnv=0 -W slist -i smi pubchem_example.smi

Explanation:

Compute the demerits for the molecules in pubchem_example.smi (-i smi pubchem_example.smi) using the queries_latest query file (-q F:PAINS/queries_latest). The good, non-rejected (-G) structures will be written into foo.demerit (-S foo). Use Daylight aromaticity definitions (-A D) and enable input of aromatic structures (-A I). Molecules are rejected when they have 9999 or higher demerits (-f 9999). Append demerit text to molecule names (-t). Make implicit hydrogen explicit (-W imp2exp), maximum number of substructure queries to identity is 1 (-W maxe=1), use value 0 in the query file as the demerit score (-W dnv=0) and write a sorted list of demerit values and reasons (-W slist). Skip all the hard coded substructure queries (-O hard).

Help command:

iwdemerit

13. random_smiles

Description:

Generate random smiles based on input smiles.

Author/owner: C3/Eli Lilly and Co

Sample 1:

random_smiles.sh -n 5 -a -e -A D -v pubchem_example.smi

Explanation:

Generate 5 new random smiles (-n 5) based on the smiles in pubchem_examples.smi using Daylight aromaticity(-A D). Append permutation number to name (-a) and echo initial molecule (-e)

Help command:

random_smiles

14. preferred_smiles

Description:

Write either unique smiles (if interpretable) or non aromatic unique form.

Author/owner: C3/Eli Lilly and Co

Sample 1:

preferred_smiles.sh pubchem_example.smi

Explanation:

Write unique smiles for the smiles in the pubchem_example.smi

Help command:

preferred_smiles

15. rotatable_bonds

Description:

Calculate the rotatable bonds in the modules

Author/owner: C3/Eli Lilly and Co

Sample 1:

rotatable_bonds.sh pubchem_example.smi

Explanation:

Calculate the rotatable bonds for molecules in the pubchem_example.smi

Help command:

rotatable_bonds

16. concat_files

Description:

Concatenates descriptor files by joining on identifiers

Author/owner: C3/Eli Lilly and Co

Sample 1:

concat_files t1.1 t1.2

Explanation:

Concatenates the descriptor files t1.1 and t1.2 based on the identifier

Help command:

concat_files

17. msort

Description:

Sorts a molecule file by various criteria

Author/owner: C3/Eli Lilly and Co

Sample 1:

msort -a pubchem_example.smi

Explanation:

Sort the molecule file pubchem_example.smi based on the number of atoms the molecule has (-a)

Help command:

msort

18. tp_first_pass

Description:

Runs molecules through the Lilly medchem rules, skip to next molecule upon crossing a threshold or instant kill rule

Author/owner: C3/Eli Lilly and Co

Sample 1:

tp_first_pass -C 20 -i smi -o smi -a -L bad0 -S ok0 pubchem_example.smi

Explanation:

Filter the input smile file (-i smi) pubchem_example.smi with maximum atom count 20 (-C 20). Write molecules to the smile file (-o smi) bad0.smi (-L bad0) if the molecule atom count is large than 20, otherwise write the molecules to the ok0.smi (-S ok0)

Help command:

tp_first_pass

19. mol2qry

Description:

Converts a molecule to a query file

Author/owner: C3/Eli Lilly and Co

Sample 1:

mol2qry -M 'C1=CC=CC=C1CCCC' -S out

Explanation:

Convert the molecule 'C1=CC=CC=C1CCCC' (-M 'C1=CC=CC=C1CCCC) to the output query file out.qry (-S out)

Help command:

mol2qry

20. molecular_scaffold

Description:

Chemical structure fragmentation tool to recursively cuts molecules into fragments

Author/owner: C3/Eli Lilly and Co

Sample 1:

molecular_scaffold -g all -t -c pubchem_example.smi

Explanation:

Recursively cuts molecules in pubchem_example.smi file into fragments with all standardistions (-g all), removing cis-trans bonds from input (-t) and all chirality from input molecules (-c)

Help command:

molecular_scaffold

21. molecule_subset

Description:

Extracts a subset of atoms from set of molecules based on a single substructure

Author/owner: C3/Eli Lilly and Co

Sample 1:

molecule_subset -s c1ccccc1 pubchem_example.smi

Explanation:

List a subset of molecules from pubchem_example.smi which contain the substructure c1ccccc1 (-s c1ccccc1)

Help command:

molecule_subset

22. rgroup

Description:

Identifies substituents from substructure matched molecule

Author/owner: C3/Eli Lilly and Co

Sample 1:

rgroup -s c1ccccc1 pubchem_example.smi

Explanation:

Identifies substituents for the molecules matching substructure c1ccccc1 (-s c1ccccc1)

Help command:

rgroup

23. ring_extraction

Description:

Extracts rings from molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

ring_extraction pubchem_example.smi

Explanation:

Extracts rings from molecules in the pubchem_example.smi

Help command:

ring_extraction

24. ring_trimming

Description:

Exhaustively trim rings from ring systems, preserving aromaticiy

Author/owner: C3/Eli Lilly and Co

Sample 1:

ring_trimming -u -c -w parent -w rings -w scaffold -m 1 -J 2 -j 1 pubchem_example.smi

Explanation:

Exhaustively trim rings from pubchem_example.smi, using only unique structures from each input molecule (-u), removing all chiral centres (-c), writing parent (-w parent), writing isoloated ring systems (-w ring), writing scaffold (-w scaffold), maximum number of rings to remove from a ring system is 1 (-m 1), 2 isotope for where the ring joins are broken (-J 2), 1 isotope for where scaffold joined the reset of the molecule.

Help command:

ring_trimming

25. sp3_filter

Description:

Filter molecules according to sp3 content

Author/owner: C3/Eli Lilly and Co

Sample 1:

sp3_filter -c 2 -x 2 -U out pubchem_example.smi

Explanation:

Filter molecules in the pubchem_examples.smi with minimum number of 2 Carbon sp3 atoms (-c 2) and minimum number of 2 non-Carbon sp3 atoms (-x 2). The rejected molecules are written into out.smi file (-U out)

Help command:

sp3_filter

26. tautomer_generation

Description:

Enumerate tautomeric forms for molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

tautomer_generation pubchem_example.smi

Explanation:

Enumerate tautomeric forms for molecules in the pubchem_example.smi

Help command:

tautomer_generation

27. smiles_mutation

Description:

Generate new smiles from a set of input molecules by random strong operations

Author/owner: C3/Eli Lilly and Co

Sample 1:

smiles_mutation -N 50000 -n 20 -p 5 -c 15 -C 40 pubchem_example_short.smi

Explanation:

Generate new smiles from the molecules in the pubchem_example_short.smi, running 50000 iterations (-N 50000), completing refresh from initial smiles every 20 iterations(-n 20), generating 5 random repliciates of each starting molecule (-p 5), minimum number of atom in generated molecules is 15 (-c 15), and maximum number of atom in generated moelcules is 40 (-C 40)

Help command:

smile_mutation

28. rxn_substructure_search

Description:

Search substructure over reactions

Author/owner: C3/Eli Lilly and Co

Sample 1:

rxn_substructure_search -q 'C(F)(F)F>>' -m found sample_reactions.rsmi

Explanation:

Search the substructure 'C(F)(F)F>>' (-q 'C(F)(F)F>>' in smaple_reactions.rsmi file, allowing to match anywhere in reagents/agents/products (-b), saving the matched result to found.rxnsmi file (-m found)

Help command:

rxn_substructure_search

29. activity_consistency

Description:

Group identical molecules (including isomers) with varying activity values

Author/owner: C3/Eli Lilly and Co

Sample 1:

activity_consistency -a -l -e 2 -X pubchem_in.act pubchem.smi

Explanation:

Group identical molecules in pubchem.smi using experimental data in pubchem_in.act (-X pubchem_in.act), using the activity data from the column 2 of pubchem_in.act file (-e 2), reducing to graph form (-a), reducing to largest fragment (-l),

Help command:

activity_consistency

30. descriptor_file_to_01_fingerprints

Description:

Converts an integer descriptor file to fingerprints, either of type fixed 0/1 or non colliding counted

Author/owner: C3/Eli Lilly and Co

Sample 1:

descriptor_file_to_01_fingerprints -F NCFP -S pubchem.smi cleaned_descriptor.txt

Explanation:

Create the fingerprints for the smiles in the pubchem.smi (-S pubchem.smi), using descriptors in the cleaned_descriptor.txt and tag string NCFP (-F NCFP)

Help command:

descriptor_file_to_01_fingerprints

31. descriptors_to_fingerprint

Description:

Converts a descriptors to a sparse fingerprint

Author/owner: C3/Eli Lilly and Co

Sample 1:

descriptors_to_fingerprint -S pubchem.smi -D w_natoms:10,40,1 -D w_nelem:1,8,1 pubchem.w

Explanation:

Compute fingerprint for the smiles in the pubchem.smi (-S pubchem.smi), using descriptor corresponding to number of heavy atoms w_natoms (-D w_natoms:10,40,1) and descriptor corresponding to number of elements w_nelem (-D w_nelem:1,8,1) in the pubchem.w file tabular descriptor file. The minimum value for the w_natoms descriptor is -1, the maximum value is 40, and the incremental unit between the minimum and maximum is 1. The minimum value for the w_nelem descriptor is 1, the maximum value is 8, and the incremental unit between the minimum and maximum is 1.

Help command:

descriptors_to_fingerprints

32. gfp_distance_matrix

Description:

Computes the distance matrix for a pool of gfp fingerprints; generates a human-readable ascii matrix

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_distance_matrix pubchem.gfp

Explanation:

Compute the distance matrix for the gfp fingerprint pubchem.gfp

Help command:

gfp_distance_matrix

33. gfp_leader_v2

Description:

Performs clustering with leader (sphere exclusion) algorithm on gfp descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_leader_v2 -t 0.3 pubchem.gfp

Explanation:

Compute clustering with leader algorithm on pubchem.gfp with distance threshold 0.3 (-t 0.3)

Help command:

gfp_leader_v2

34. gfp_nearneighbours

Description:

Finds near neighbours in a set of fingerprints

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_nearneighbours -p pubchem.gfp -n 2 -T 0.5 pubchem_10.gfp

Explanation:

Find near neighbours for the fingerprints in pubchem_10.gfp; compare against fingerprints in haystack fingerprint set pubchem.gfp (-p pubchem.gfp), search for 2 neighbours for each descriptor (-n 2) and discard distance greater than 0.5 (-T 0.5)

Help command:

gfp_nearneighbours

35. gfp_single_linkage

Description:

Finds the single linkage in the gfp fingerpint set

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_single_linkage -t 0.25 pubchem.gfp

Explanation:

Compute clustering with single linkage algorithm on pubchem.gfp with distance 0.25 (-t 0.25) as the threshold value for grouping

Help command:

gfp_single_linkage

36. gfp_sparse_to_fixed

Description:

Converts non-colliding fingerprints to fixed counted, or binary forms

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_sparse_to_fixed -F NCSELW pubchem.gfp

Explanation:

Convert non-colliding fingerprints NCSELW (-F NCSELW) in pubchem.gfp to binary form

Help command:

gfp_sparse_to_fixed

37. gfp_distance_filter

Description:

Filters molecules according to how close they are to members of a comparison pool

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_distance_filter -p pubchem_10.gfp -t 0.7 -n 3 -U U1 pubchem.gfp

Explanation:

Compare fingerprints in pubchem.gfp to pubchem_10.gfp. The minimum required distance between molecules is 0.7 (-t 0.7). Any molecules in pubchem.gfp will be rejected if it violates the minimum distance requirement at least 3 times (-n 3). The rejected molecule will be saved into U1 file (-U U1)

Help command:

gfp_distance_filter

38. gfp_pairwise_distances

Description:

Calculates pairwise distance between molecules

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_pairwise_distances -p pubchem.gfp -T 0.5 100PairsId

Explanation:

Calculate the pairwise distance between the molecules in pubchem.gfp (-p pubchem.gfp). The distance will not be reported if it is larger than the threshold value 0.5 (-T 0.5). The required pairs for calculation are listed in the 100PairsId file.

Help command:

gfp_pairwise_distances

39. gfp_to_descriptors

Description:

Converts fingerprints to descriptors

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_to_descriptors -f -F FPDSC pubchem.gfp

Explanation:

Convert the fixed width fingerprint (-f) pubchem.gfp into descriptor format; use fingerprint type FPDSC (-F FPDSC) in fingerprint file.

Help command:

gfp_to_descriptors

40. gfp_nearneighbours_single_file

Description:

Finds near neighbours within a set of fingerprints

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_nearneighbours_single_file -p -z -T 0.2 pubchem.gfp FPDSC,w=0.2 -F NCSELW,nc,w=0.8 -V a=0.03 -V b=1.7

Explanation:

Finds near neighbours in the pubchem.gfp file. Writes all pair-wise distances in 3 columns (-p); discards any molecule without neighbours (-z) or distances greater than 0.2. Assign weight 0.2 to fingerprint type FPDSC 0.2 (-F FPDSC,w=0.2), and weight 0.8 to non-colliding fingerprint with tag NCSELW (-F NCSELW,nc,w=0.8) for distance calculation. Use Tversky asymmetric similarity with parameters a set to 0.03 (-V a=0.03) and b set to 1.7 (-V b=1.7)

Help command:

gfp_nearneighbours_single_file

41. gfp_lnearneighbours

Description:

Finds near neighbours of compounds supplied in gfp fingerprint format; can handle LARGE numbers of compounds

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_lnearneighbours -F FPDSC,w=0.3 -F NCSELW,nc,w=0.7 -T 0.4 -n 2 -h -p pubchem_needles.gfp pubchem_haystack.gfp

Explanation:

Finds 2 near neighbours (-n 2) for each fingerprint in the needle file pubchem_needles.gfp against haystack file pubchem_haystack.gfp (-p). Discards neighbours with zero distance and the same ID as the target (-h) or with a distance larger than 0.4 (-T). Fingerprint of type FPDSC weighs 0.3 (-F FPDSC,w=0.3), and non-colliding fingerprint with tag NCSELW weighs 0.7 in the distance calculation(-F NCSELW,nc,w=0.7)

Help command:

gfp_lnearneighbours

42. gfp_add_descriptors

Description:

Adds descriptors to the gfp file with matching ID

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_add_descriptors -D FPADD pubchem.gfp cleaned_descriptor.txt

Explanation:

Add the descriptor cleaned_descriptor.txt to the gfp file pubchem.gfp; use tag FPADD (-D FPADD)

Help command:

gfp_add_descriptors

43. gfp_profile_activity_by_bits

Description:

Scans a fingerprint file and computes average activity associated with each bit; an activity file needs to be provided

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_profile_activity_by_bits -E activityFile.txt pubchem.gfp

Explanation:

Compute the average activity from file activityFile.txt (-E activityFile.txt) for fingerprints in the pubchem.gfp.

Help command:

gfp_profile_activity_by_bits

44. gfp_spread_v2

Description:

Calculates the spread distance of fingerprints against target fingerprint set. Sort output by spread distance.

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_spread_v2 -A pubchem_10.gfp pubchem.gfp

Explanation:

Compute the spread distance of fingerprints in pubchem.gfp. Bias away from fingerprints in pubchem_10.gfp (-A pubchem_10.gfp)

Help command:

gfp_spread_v2

45. gfp_spread_buckets_v2

Description:

Calculates the spread distance of fingerprints against target fingerprint set while considering a bucketized variable like activity/property

Author/owner: C3/Eli Lilly and Co

Sample 1:

gfp_spread_buckets_v2 -B NATOMS pubchem_with_natom.gfp

Explanation:

Compute the spread distance of the fingerprints in pubchem_with_natom.gfp while considering the variable NATOMS (-B NATOMS) also included in the fingerprint file.

Help command:

gfp_spread_buckets_v2

46. nplotnn

Description:

Processes the output of several gfp nearneighbour-based tools into human readable form including SMILES format

Author/owner: C3/Eli Lilly and Co

Sample 1:

nplotnn -L def leader_result_raw.txt

Explanation:

Reformat the output of leader clustering to tabular (-L tbl)

Help command:

nplotnn

47. tdt_sort

Description:

Sorts fields of a TDT file or stream according to specific tag, properties, or the degree of each node

Author/owner: C3/Eli Lilly and Co

Sample 1:

tdt_sort -T FPADD,col=4 -r unsorted.gfp

Explanation:

Sort the file unsorted.gfp in reverse order (-r) based on the 4th column of FPADD field (-T FPADD,col=4)

Help command:

tdt_sort

48. tdt_join

Description:

Joins two TDT streams, possible with different identifiers

Author/owner: C3/Eli Lilly and Co

Sample 1:

tdt_join.sh -d part1.gfp part2.gfp

Explanation:

Join fingerprint file part1.gfp with fingerprint file part2.gfp and eliminate duplicate tags from second file (-d)

Help command:

tdt_join

Clone this wiki locally