-
Notifications
You must be signed in to change notification settings - Fork 32
/
ReleaseNotes.txt
377 lines (346 loc) · 23.5 KB
/
ReleaseNotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
Version 2.8.0 (2023-10-29)
Support for undecahectane/undecadictane (previously only hendeca was supported)
Support for dicarboximido
Improved support for lysergic acid derivatives
Added a few more sugars e.g. digitalose
Added borodeuteride and hydro contractions of pharmaceutical salts e.g. hydromethanesulfonate
Support substitution on glyceric acid
Corrected interpretation of imidazolium, trioxane and phthalhydrazide
Version 2.7.0 (2022-08-16)
Improved coverage of flavonoid parent structures
Support for apiofuranosyl, added 5 locant to apiose
Improved support for n-amyl
Superscripted numbers in poly spiro systems are now intelligently determined if the input lacks superscript indication
Support for annulynes
Fixed issues where amino acid salts were being interpreted as functionalisation of the amino acid
Fixed bug where annulene parsing was case sensitive
Chalcone, in accordance with current IUPAC recommendations, is now interpreted as specifically the trans isomer
Minor dependency updates
Version 2.6.0 (2021-12-21)
OPSIN now requires Java 8 (or higher)
OPSIN command-line functionality moved to opsin-cli module
OPSIN standalone jars are now built with mvn package
Updated from InChI 1.03 to InChI 1.06
Support for capturing relative/racemic stereochemistry (output via CxSmiles) [contributed by John Mayfield]
Support for deaza/dethia
Support nitrile as a suffix on amino acids [contributed by John Mayfield]
Support more glycero-n-phospho substituents
Support for chloroxime and other haloximes
Support cis/trans on rings where a stereocenter has two non-hydrogen substituents, using Cahn-Ingold-Prelog rules to determine which are relative
Multiple improvements to implicit bracketting logic
Corrected interpretation of methylselenopyruvate
Added group 1/2 nitrides e.g. magnesium nitride
Added molecular diatomics e.g. molecular hydrogen (or dihydrogen)
Fixed out of memory error if a fusion bracket referenced an interior atom instead of a peripheral atom
Fixed out of memory error while parsing very long ambiguous input, by switching parsing algorithm from breadth-first to depth-first
Dependency changes:
Updated logging from Log4J v1.2.17 to the latest Log4J2 (v2.17.0). Neither OPSIN 2.5.0 nor 2.6.0 are vulnerable to Log4Shell. The logging implementation is only included in the opsin-cli module
opsin-inchi now uses JNA-InChI (https://github.com/dan2097/jna-inchi) rather than JNI-InChI. This supports the latest version of InChI and also support new Macs with ARM64 processors
Woodstox now uses groupid com.fasterxml.woodstox (the groupid change did not signify a break in API compatibility)
dk.brics.automaton now uses groupid dk.brics (the groupid change did not signify a break in API compatibility)
commons-cli is only used by the opsin-cli module
Version 2.5.0 (2020-10-04)
OPSIN now requires Java 7 (or higher)
Support for traditional oxidation state names e.g. ferric
Added support for defining the stereochemistry of phosphines/arsines
Added newly discovered elements
Improved algorithm for correctly interpreting ester names with a missing space e.g. 3-aminophenyl-4-aminobenzenesulfonate
Fixed structure of canavanine
Corrected interpretation of silver oxide
Vocabulary improvements
Minor improvements/bug fixes
Internal XML Changes:
tokenList files now all use the same schema (tokenLists.dtd)
Version 2.4.0 (2018-12-23)
OPSIN is now licensed under the MIT License
Locant labels included in extended SMILES output
Command-line now has a name flag to include the input name in SMILES/InChI output (tab delimited)
Added support for carotenoids
Added support for Vitamin B-6 related compounds
Added support for more fused ring system bridge prefixes
Added support for anilide as a functional replacement group
Allow heteroatom replacement as a detachable prefix e.g. 3,6,9-triaza-2-(4-phenylbutyl)undecanoic acid
Support Boughton system isotopic suffixes for 13C/14C/15N/17O/18O
Support salts of acids in CAS inverted names
Improved support for implicitly positively charged purine nucleosides/nucleotides
Added various biochemical groups/substituents
Improved logic for determining intended substitution in names with too few brackets
Incorrectly capitalized locants can now be used to reference ring fusion atoms
Some names no longer allow substitution e.g. water, hydrochloride
Many minor precision/recall improvements
Version 2.3.1 (2017-07-23)
Fixed fused ring numbering algorithm incorrectly numbering some ortho- and peri-fused fused systems involving 7-membered rings
Support P-thio to indicate thiophosphate linkage
Count of isotopic replacements no longer required if locants given
Fixed bug where CIP algorithm could assign priorities to identical substituents
Fixed "DL" before a substituent not assigning the substituted alpha-carbon as racemic stereo
L-stereochemistry no longer assumed on semi-systematic glycine derivatives e.g. phenylglycine
Fixed some cases where substituents like carbonyl should have been part of an implicitly bracketed section
Fixed interpretation of leucinic acid and 3/4/5-pyrazolone
Version 2.3.0 (2017-02-23)
D/L stereochemistry can now be assigned algorithmically e.g. L-2-aminobutyric acid
Other minor improvements to amino acid support e.g. homoproline added
Extended SMILES added to command-line interface
Names intended to include the triiodide/tribromide anion no longer erroneously have three monohalides
Ambiguity detected when applying unlocanted subtractive prefixes
Better support for adjacent multipliers e.g. ditrifluoroacetic acid
deoxynucleosides are now implicitly 2'-deoxynucleosides
Added support for <number> as a syntax for a superscripted number
Added support for amidrazones
Aluminium hydrides/chlorides/bromides/iodides are now covalently bonded
Fixed names with isotopes less than 10 not being supported
Fixed interpretation of some trivial names that clash with systematic names
Version 2.2.0 (2016-10-16)
Added support for IUPAC system for isotope specification e.g. (3-14C,2,2-2H2)butane
Added support for specifying deuteration using the Boughton system e.g. butane-2,2-d2
Added support for multiplied bridges e.g. 1,2:3,4-diepoxy
Front locants after a von baeyer descriptor are now supported e.g. bicyclo[2.2.2]-7-octene
onosyl substituents now supported e.g. glucuronosyl
More sugar substituents e.g. glucosaminyl
Improved support for malformed polycyclic spiro names
Support for oximino as a suffix
Added method [NameToStructure.getVersion()] to retrieve OPSIN version number
Allowed bridges to be used as detachable prefixes
Allow odd numbers of hydro to be added e.g. trihydro
Added support for unbracketed R stereochemistry (but not S, for the moment, due to the ambiguity with sulfur locants)
Various minor bug fixes e.g. stereochemistry was incorrect for isovaline
Minor vocabulary improvements
Version 2.1.0 (2016-03-12)
Added support for fractional multipliers e.g. hemihydrochloride
Added support for abbreviated common salts e.g. HCl
Added support for sandwich compounds e.g. ferrocene
Improved recognition of names missing the last 'e' (common in German)
Support for E/Z directly before double bond indication e.g. 2Z-ylidene, 2Z-ene
Improved support for functional class ethers e.g. "glycerol triglycidyl ether"
Added general support for names involving an ester formed from an alcohol and an ate group
Grignards reagents and certain compounds (e.g. uranium hexafluoride), are now treated as covalent rather than ionic
Added experimental support for outputting extended SMILES. Polymers and attachment points are annotated explicitly
Polymers when output as SMILES now have atom classes to indicate which end of the repeat unit is which
Support * as a superscript indicator e.g. *6* to mean superscript 6
Improved recognition of racemic stereochemistry terms
Added general support for names like "beta-alanine N,N-diacetic acid"
Allowed "one" and "ol" suffixes to be used in more cases where another suffix is also present
"ic acid halide" is not interpreted the same as "ic halide"
Fixed some cases where ambiguous operations were not considered ambiguous e.g. monosubstitututed phenyl
Improvements/bug fixes to heuristics for detecting when spaces are omitted from ether/ester names
Improved support for stereochemistry in older CAS index names
Many precision improvements e.g. cyclotriphosphazene, thiazoline, TBDMS/TBDPS protecting groups, S-substituted-methionine
Various minor bug fixes e.g. names containing "SULPH" not recognized
Minor vocabulary improvements
Internal XML Changes:
Synonymns of the same concept are now or-ed rather being seperate entities e.g. <token>tertiary|tert-|t-</token>
Version 2.0.0 (2015-07-10)
MAJOR CHANGES:
Requires Java 1.6 or higher
CML (Chemical Markup Language) is now returned as a String rather than a XOM Element
OPSIN now attempts to identify if a chemical name is ambiguous. Names that appear ambiguous return with a status of WARNING with the structure provided being one interpretation of the name
Added support for "alcohol esters" e.g. phenol acetate [meaning phenyl acetate]
Multiplied unlocanted substitution is now more intelligent e.g. all substituents must connect to same group, and degeneracy of atom environments is taken into account
The ester interpretation is now preferred in more cases where a name does not contain a space but the parent is methanoate/ethanoate/formate/acetate/carbamate
Inorganic oxides are now interpreted, yielding structures with [O-2] ions
Added more trivial names of simple molecules
Support for nitrolic acids
Fixed parsing issue where a directly substituted acetal was not interpretable
Fixed certain groups e.g. phenethyl, not having their suffix attached to a specific location
Corrected interpretation of xanthyl, and various trivial names that look systematic
Name to structure is now ~20% faster
Initialisation time reduced by a third
InChI generation is now ~20% faster
XML processing dependency changed from XOM to Woodstox
Significant internal refactoring
Utility functions designed for internal use are no longer on the public API
Various minor bug fixes
Internal XML Changes:
Groups lacking a labels attribute now have no locants (previously had ascending numeric locants)
Syntax for addGroup/addHeteroAtom/addBond attributes changed to be easier to parse and allow specification of whether the name is ambiguous if a locant is not provided
Version 1.6.0 (2014-04-26)
Added API/command-line options to generate StdInchiKeys
Added support for the IUPAC recommended nomenclature for carbobohydrate lactones
Added support for boronic acid pinacol esters
Added basic support for specifying chalcogen acid tautomer form e.g. thioacetic S-acid
Fused ring bridges are now numbered
Names with Endo/Exo/Syn/Anti stereochemistry can now be partially interpreted if warnRatherThanFailOnUninterpretableStereochemistry is used
The warnRatherThanFailOnUninterpretableStereochemistry option will now assign as much stereochemistry as OPSIN understands (All ignored stereochemistry terms are mentioned in the OpsinResult message)
Many minor nomenclature support improvements e.g. succinic imide; hexaldehyde; phenyldiazonium, organotrifluoroborates etc.
Added more trivial names that can be confused with systematic names e.g. Imidazolidinyl urea
Fixed StackOverFlowError that could occur when processing molecules with over 5000 atoms
Many minor bug fixes
Minor vocabulary improvements
Minor speed improvements
NOTE: This is the last release to support Java 1.5
Version 1.5.0 (2013-07-21)
Command line interface now accepts files to read and write to as arguments
Added option to allow interpretation of acids missing the word acid e.g. "acetic" (off by default)
Added option to treat uninterpretable stereochemistry as a warning rather than a failure (off by default)
Added support for nucleotide chains e.g. guanylyl(3'-5')uridine
Added support for parabens, azetidides, morpholides, piperazides, piperidides and pyrrolidides
Vocabulary improvements e.g. homo/beta amino acids
Many minor bug fixes e.g. fulminic acid correctly interpreted
Version 1.4.0 (2013-01-27)
Added support for dialdoses,diketoses,ketoaldoses,alditols,aldonic acids,uronic acids,aldaric acids,glycosides,oligosacchardides, named systematically or from trivial stems, in cyclic or acyclic form
Added support for ketoses named using dehydro
Added support for anhydro
Added more trivial carbohydrate names
Added support for sn-glcyerol
Improved heuristics for phospho substitution
Added hydrazido and anilate suffixes
Allowed more functional class nomenclature to apply to amino acids
Added support for inverting CAS names with substituted functional terms e.g. Acetaldehyde, O-methyloxime
Double substitution of a deoxy chiral centre now uses the CIP rules to decide which substituent replaced the hydroxy group
Unicode right arrows, superscripts and the soft hyphen are now recognised
Version 1.3.0 (2012-09-16)
Added option to output radicals as R groups (* in SMILES)
Added support for carbolactone/dicarboximide/lactam/lactim/lactone/olide/sultam/sultim/sultine/sultone suffixes
Resolved some cases of ambiguity in the grammar; the program's capability to handle longer peptide names is improved
Allowed one (as in ketone) before yl e.g. indol-2-on-3-yl
Allowed primed locants to be used as unprimed locants in a bracket e.g. 2-(4'-methylphenyl)pyridine
Vocabulary improvements
SMILES writer will no longer reuse ring closures on the same atom
Fixed case where a name formed of many words that could be parsed ambiguously would cause OPSIN to run out of memory
NameToStructure.getInstance() no longer throws a checked exception
Many minor bug fixes
Version 1.2.0 (2011-12-06)
OPSIN is now available from Maven Central
Basic support for cylised carbohydrates e.g. alpha-D-glucopyranose
Basic support for systematic carbohydrate stems e.g. D-glycero-D-gluco-Heptose
Added heuristic for correcting esters with omitted spaces
Added support for xanthates/xanthic acid
Minor vocabulary improvements
Fixed a few minor bugs/limitations in the Cahn-Ingold-Prelog rules implementation and made more memory efficient
Many minor improvements and bug fixes
Version 1.1.0 (2011-06-16)
Significant improvements to fused ring numbering code, specifically 3/4/5/7/8 member rings are no longer only allowed in chains of rings
Added support for outputting to StdInChI
Small improvements to fused ring building code
Improvements to heuristics for disambiguating what group is being referred to by a locant
Lower case indicated hydrogen is now recognised
Improvements to parsing speed
Many minor improvements and bug fixes
Version 1.0.0 (2011-03-09)
Added native isomeric SMILES output
Improved command-line interface. The desired format i.e. CML/SMILES/InChI as well as options such as allowing radicals can now all be specified via flags
Debugging is now performed using log4j rather than by passing a verbose flag
Added traditional locants to carboxylic acids and alkanes e.g. beta-hydroxybutyric acid
Added support for cis/trans indicating the relative stereochemistry of two substituents on rings and fused rings sytems
Added support for stoichiometry ratios and mixture indicators
Added support for alpha/beta stereochemistry on steroids
Added support for the method for naming spiro systems described in the 1979 recommendations rule A-42
Added detailedFailureAnalysis option to detect the part of a chemical name that fails to parse
Added support for deoxy
Added open-chain saccharides
Improvements to CAS index name uninversion algorithm
Added support for isotopes into the program allowing deuterio/tritio
Added support for R/S stereochemistry indicated by a locant which is also used to indicate the point of substitution for a substituent
Many minor improvements and bug fixes
Version 0.9.0 (2010-11-01)
Added transition metals/f-block elements and nobel gases
Added support for specifying the charge or oxidation number on elements e.g. aluminium(3+), iron(II)
Calculations based off a van Arkel diagram are now used to determine whether functional bonds to metals should be treated as ionic or covalent
Improved support for prefix functional replacement e.g. hydrazono/amido/imido/hydrazido/nitrido/pseudohalides can now be used for functional replacement on appropriate acids
Ortho/meta/para handling improved - can now only apply to six membered rings
Added support for methylenedioxy
Added support for simple bridge prefixes e.g. methano as in 2,3-methanoindene
Added support for perfluoro/perchloro/perbromo/periodo
Generalised alkane support to allow alkanes of lengths up to 9999 to be described without enumeration
Updated dependency on JNI-InChI to 0.7, hence InChI 1.03 is now used.
Improved algorithm for assigning unlocanted hydro terms
Improved heuristic for determing meaning of oxido
Improved charge balancing e.g. ionic substance of an implicit ratio 2:3 can now be handled rather than being represented as a net charged 1:1 mixture
Grammar is a bit more lenient of placement of stereochemistry and multipliers
Vocabulary improvements especially in the area of nucleosides and nucleotides
Esters of biochemical compounds e.g. triphosphates are now supported
Many minor improvements and bug fixes
Version 0.8.0 (2010-07-16)
NameToStructureConfig can now be used to configure whether radicals e.g. ethyl are output or not.
Names like carbon tetrachloride are now supported
glycol ethers e.g. ethylene glycol ethyl ether are now supported
Prefix functional replacement support now includes halogens e.g. chlorophosphate
Added support for epoxy/epithio/episeleno/epitelluro
Added suport for hydrazides/fluorohydrins/chlorohydrins/bromohydrins/iodohydrins/cyanohydrins/acetals/ketals/hemiacetals/hemiketals/diketones/disulfones named using functional class nomenclature
Improvements to algorithm for assigning and finding atoms corresponding to element symbol locants
Added experimental right to left parser (ReverseParseRules.java)
Vocabulary improvements
Parsing is now even faster
Various bug fixes and name intepretation fixes
Version 0.7.0 (2010-06-09)
Added full support for conjunctive nomenclature e.g. 1,3,5-benzenetriacetic acid
Added basic support for CAS names
Added trivial poly-noncarboxylic acids and more trivial carboxylic acids
Added support for spirobi/spiroter/dispiroter and the majority of spiro(ring-locant-ring) nomenclature
Indicators of the direction that a chemical rotates plane polarised light are now detected and ignored
Fixed many cases of trivial names being interpreted systematically by adding more trivial names and detecting such cases
Names such as oxalic bromide cyanide where a halide/pseudohalide replaces an oxygen are now supported
Amino acid ester named from the neutral amino acid are now supported e.g. glycine ethyl ester
Added more heteroatom replacement terms
Allowed creation of an OPSIN parse through NameToStructure.getOpsinParser()
Added support for dehydro - for unsaturating bonds
Improvements to element symbol locant assignment and retrieving appropriate atoms from locants like N2
OPSIN's SMILES parser now accept specification of number of hydrogens in cases other than chiral atoms
Mixtures specified by separating components by semicolonspace are now supported
Many internal improvements and bug fixes
Version 0.6.1 (2010-03-18)
Counter ions are now duplicated such as to lead to if possible a neutral compound
In names like nitrous amide the atoms modified by the functional replacement can now be substituted
Allowed ~number~ for specifying superscripts
Vocabulary improvements
Added quinone suffix
Tetrahedral sulfur stereochemistry is now recognised
Bug fixes to fix incorrect interpretation of some names e.g. triphosgene is now unparseable rather than 3 x phosghene, phospho has different meanings depending on whether it used on an amino acid or another group etc.
Version 0.6.0 (2010-02-18)
OPSIN is now a mavenised project consisting of two modules: core and inchi. Core does name -->CML, inchi depends on core and allows conversion to inchi
Instead of CML an OpsinResult can be returned which can yield information as to why a name was not interpretable
Added support for unlocanted R/S/E/Z stereochemistry. Removed limit on number of atoms that stereochemistry code can handle
Added support for polymers e.g. poly(ethylene)
Improvements in handling of multiplicative nomenclature
Improvements to fusion nomenclature handling: multiplied components and multi parent systems are now supported
Improved support for functional class nomenclature; space detection has been improved and support has been added for anhydride,oxide,oxime,hydrazone,semicarbazone,thiosemicarbazone,selenosemicarbazone,tellurosemicarbazone,imide
Support for the lambda convention
Locanted esters
Improvements in dearomatisation code
CML output changed to being CML-Lite compliant
Speed improvements
Support for greek letters e.g. as alpha or $a or α
Added more infixes
Added more suffixes
Vocabulary improvements
Systematic handling of amino acid nomenclature
Added support for perhydro
Support for ylium/uide
Support for locants like N-1 (instead of N1)
Fixed potential infinite loop in fused ring numbering
Made grammar more lenient in many places e.g. euphonic o, optional sqaure brackets
Sulph is now treated like sulf as in sulphuric acid
and many misc fixes and improvements
Version 0.5.3 (2009-10-22)
Added support for amic, aldehydic, anilic, anilide, carboxanilide and amoyl suffixes
Added support for cyclic imides e.g. succinimide/succinimido
Added support for amide functional class
Support for locants such as N5 which means a nitrogen that is attached in some way to position 5. Locants of this type may also be used in ester formation.
Some improvements to functional replacement using prefixes e.g. thioethanoic acid now works
Disabled stereochemistry in molecules with over 300 atoms as a temporary fix to the problem in 0.52
Slight improvement in method for deciding which group detachable hydro prefixes apply to.
Minor vocabulary update
Version 0.5.2 (2009-10-04)
Outputting directly to InChI is now supported using the separately available nameToInchi jar (an OPSIN jar is expected in the same location as the nameToInchi jar)
Fused rings with any number of rings in a chain or formed entirely of 6 membered rings can now be numbered
Added support for E/Z/R/S where locants are given. Unlocanted cases will be dealt with in a subsequent release. In very large molecules a lack of memory may be encountered, this will be resolved in a subsequent release
Some Infixes are now supported e.g. ethanthioic acid
All spiro systems with Von Baeyer brackets are now supported e.g. dispiro[4.2.4.2]tetradecane
Vocabulary increase (especially: terpenes, ingorganic acids, fused ring components)
Fixed some problems with components with both acylic and cyclic sections e.g. trityl
Improved locant assignments e.g. 2-furyl is now also fur-2-yl
Speed improvements
Removed dependence on Nux/Saxon
Misc minor fixes
Version 0.5.1 (2009-07-20)
Huge reduction in OPSIN initialisation time (typical ~7 seconds -->800ms)
Allowed thio/seleno/telluro as divalent linkers and for functional replacement when used as prefixes. Peroxy can now be used for functional replacement
Better support for semi-trivally named hydrocarbon fused rings e.g. tetracene
Better handling of carbonic acid derivatives
Improvements to locant assignment
Support for names like triethyltetramine and triethylene glycol
Misc other fixes to prevent OPSIN generating the wrong structure for certain types of names
Version 0.5 (2009-06-23)
Too many changes to list
Version 0.1 (2006-10-11)
Initial release