JHOVE - JSTOR/Harvard Object Validation Environment Copyright 2003-2015 by JSTOR and the President and Fellows of Harvard College. Copyright 2015-2023 by The Open Preservation Foundation. JHOVE is made available under the GNU Lesser General Public License (LGPL; see the file LICENSE for details).
Versions 1.7 to 1.11 of JHOVE released independently. Versions 1.12 onwards released by the Open Preservation Foundation.
2023-05-19
- Don't report duplicate messages [#839]
- GitHub action build and QA. [#811], [#812], [#813], [#814], [#815], [#816]
- Make only one pass to xml-encode the values [#818], adresses [#817]
- Update build dependencies, and tidied POMs with minor build fixes [#798]
- Removed erroneous test file that prevented Windows checkouts [#630]
- NISO Image metadata gets a new GPSHPositioningError tag [#801], fixes [#787]
- NISO Image metadata, date validation [#800], fixes [#799]
- Array added to JSON reporting to support multi file reporting [#728], fixes [#667]
- Fixed small issue in generated reports where schema version wasn't incremented to 1.9. [#849]
- Copy the orientation info from the Exif structure [#748] and [#821], fixes [#747]
- Purged string constant message [#831]
- Fixed bug with string-valued token initialisation [#806], mitigates [#668]
- Only inform about unknown PDF Name prefixes [#807], mitigates [#668]
- Now report PDF Encryption for non-references [#743], and [#810]
- Fixed unhandled exception when Size is not an integer [#744] and [#819]
- Improved handling of empty string properties [#782], [#820]. Fixes [#809]
- Now handle Filters which are indirect objects [#672] and [#822]
- Updated the Prefix Registry with new Prefixes [#779]
- Remove references to defunct PDF/A profile [#759]
- Now handle dictionary encryption objects [#783]
- Caught unhandled exception when size is not an| integer [#744]
- Fixed minor duplicate error issue [#778]
- Check added to ensure that an Extension is a direct object [#780]
- Handle encrypted Name and LastMod properties found in Annotation [#781]
- Regression tests for empty string cases [#825]
- Purged string constant message [#830], [#835]
- Updated dependencies [#803]
- descriptions for the
FILESOURCE
tag. [#804], closing [#767] - break out of parseIFDChain() infinite loop [#784]
- Fix incorrect 'byteoffset' config. description [#751]
- Fix to handle unaligned TIFF data [#750]
- Purged string constant message [#829]
- Purged string constant message [#828]
- Purged string constant message [#833]
- Purged string constant message [#832]
- Purged string constant message [#827]
- Reverted reporting of XmlParseExceptions so that exception detail is part of message body. [#850]
- Purged string constant message [#836]
- Added mechanism to mint errors with IDs and removed plain text messages [#836]
2022-07-14
2022-06-09
- GUI improvements including keyboard shortcuts and code clarifications. [#635]
- Fixed issue [#667] GUI JSON output truncated. [#728]
- Fixed issue [#628] Allow for folder analysis from GUI. [#635]
- Fixed issue [#627] GUI: in character encoding options, "UTF-8" appearing twice. [#635]
- Fixed issue [#643] Language changeable using Java property and configuration file. [#693], [#729]
For example:
<languageCode>de</languageCode>
- Refactoring and fixes for Java 11 compatibility. [#688], [#682], [#716]
- Java constants for various core classes. [#426], [#586], [#594] - [#598], [#600], [#603] - [#606]
- Minor refactorings to address Java warnings and similar. [#577]
- Builds now performed by Github actions. [#718]
- Third party module JARS added to documentation site. [#566]
- Re-prioritize logging levels for core events. [#636]
- Remove extra "s" from "fonts" in Representation Info [#677]
- Bump EPUBCheck in extension modules from 4.2.2 to 4.2.6 [#650]
- Use ModuleBase#readUnsignedByte to read global color table [#663]
- Fixed issue [#358] Parenthesis handling in Document Information Dictionary. [#359]
- Fixed issue [#375] ClassCastException when handling indirect objects. [#596]
- Fixed issue [#531] Missing error IDs for "Size entry missing in trailer dictionary." [#579], [#590], [#597]
- Added support of 256 bit AES encryption algorithm. [#621]
- Fixed bug in halding reporting of skipped pages. [#620]
- Improvements to reporting of cross-reference exceptions. [#619]
- Ignore comments in PDF annotations. [#622]
- Fixed issue [#669] Inconsistent double entries in pdf module's errormessages.properties and translations [#689]
- Safely exit infinite loops on AProfile.outlinesOK / checkItemOutline [#704]
- Prevent infinite loop in Literal.readUTFLanguageCode() [#709]
- Fixed German translation of PDF-HUL-18 [[#673][]]
- Fixed issue [#662] PDF-Hul produces Invalid Page Dictionary for PDF's with VP dictionaries. [#665]
- Fixed issue [#653] No document catalog dictionary (PDF-HUL-86) error reported even though document catalog exists [#654]
- Fixed issue [#645] StackOverflowError in with 1.24.1 in PDF-hul. [#652]
- Fixed issue [#646] TimeOut / stuck in loop (?) - 1.24.1 PDF-hul. [#652]
- Fixed issue [#101] JHOVE reporting PDF as v1.3 and as ISO PDF/A-1, Level B, inadequate PDF/A disabled [#393]
- Added support for PDF extension levels [#626]
- Fixed issue[#696] Close parenthesis included in Literal _rawBytes, causes incorrect NameTreeNode.compareKey() [#734]
- Replace += with StringBuilder for whitespace to speed up Tokenizer. [#615]
- Added Java constants for error strings, magic numbers, etc. [#578] [#587]
- Fixed issue [#148] null pointer exception. [#580]
- Fixed issue [#624] Codes missing for several Geographic CS Types. [#623]
- Fixed issue [[#690][]] JHOVE inappropriately defaults the TIFF Exif Version tag to 0220. [#691]
- Fixed variable formatting in message translations. [#557]
- Fixed issue [#681] XML Signature detection does not work. [#683]
- Fixed issue [#680] XML should not be validated when no schema provided [#685]
- Assorted improvements to reporting of schema locations. [#634]
- Fixed XML version reporting for documents with byte-order marks (BOMs). [#634]
- Cleaned up unnecessary code, formatting and documentation. [#634]
2020-03-12
- Added a JSON output handler [#515]
- Fixed compatibility issues for JDK 9 and greater [#514]
- SHA-256 Checksumming [#497, #386]
- Error ID node for GUI message display [#546]
- Improved formatting of error IDs by text handler [#547]
- Improved handling of sub-messages [#548]
- Plugged Message ID assignation gaps in various modules [#536]
- Fixed bug with formatting of Rationals in MIX 2.0 [#504]
- Message properties defined for core constants [#499, #500]
- Fix for quote encoding error [#472]
- Addition of dedicated
Utils
class for rescoped encoding methods [#465, #462] - Documentation clean up and improvement[#488, #494, #495]
- German error message translations [#461, #462, #463, #464, #467]
- Portuguese error message translations [#490, #491, #492, #493, #496, #557]
- Dutch error message translations [#503, #550]
- Danish error message translations [#551]
- French error message translations [#552]
- Check that chunk IDs only consist of characters in the printable ASCII range [#468]
- Check that spaces do not precede printable characters in chunk IDs [#468]
- Clarified error messages and improved offset reporting accuracy [#468]
- Update error message properties for Control Extensions [#513]
- Enhanced to handle the APP14 marker segment [#518]
- Enhancements to MIX format metadata [#445]
- Fixed issues with PNG module error messages [#545]
- Fixed issue with PDF version inconsistency reporting [#486]
- Fixed bug with PDF destination handling [#498]
- Corrected handling of empty date for CreationDate [#549]
- Fixed issue with array instantiation [#510]
- Check that chunk IDs only consist of characters in the printable ASCII range [#468]
- Check that spaces do not precede printable characters in chunk IDs [#468]
- Clarified error messages and greatly improved offset reporting accuracy [#468]
- Added reporting of unrecognized data in the top-level RIFF structure [#468]
- Made the Table Length field of
ds64
chunks optional to better align with the specification [#468] - Reinstated WAVE-HUL-4 reporting which had been lost during refactoring [#468]
- Corrected WAVE-HUL-15 from an Error to an Informational message [#468]
- Retired WAVE-HUL-16, an unused duplicate of WAVE-HUL-19 [#468]
- Documented undocumented chunks and specification references [#501]
- Module documentation [#489]
- Only one declaration per line [#477]
- Field declarations at the top of a class [#484]
- Default cases for switch statements [#507, #485]
- Fixed nested if statements [#517]
- Merged or refactored duplicate if statements [#505]
- Removed redundant imports [#509]
2019-04-18
- Error IDs for JHOVE messages [#397]
- Fixed Rational data types for MIX metadata [#394, #429]
- Individual, Maven based versioning for internal modules [#390]
- Java support upgraded to Java 1.8 [#342, #343, #391]
- Factored out error messages for core applications [#348]
- Code maintenance [#351, #392]
- Improvements to test scripts and automated build [#350, #352, #379, #383]
- Error IDs and message constants as external resources [#398]
- Refactoring and code readability improvements [#353]
- Error IDs and message constants as external resources [#399]
- Error IDs and message constants as external resources [#400]
- Error IDs and message constants as external resources [#401]
- Error IDs and message constants as external resources [#402]
- Error IDs and message constants as external resources [#403]
- Refactored all PDF Module error messages [#347]
- Fixed class cast exception on cross-ref streams [#349]
- Error IDs and message constants as external resources [#404]
- Refactoring and code readability improvements [#389]
- Error IDs and message constants as external resources [#406]
- Flag not well-formed when chunk exceeds RIFF length [#360]
2018-03-29
- Removed obsolete subsitituion from izpack installer [#300]
- Improved counting accuracy of skipped bytes, allowing better EOF detection [#308]
- Fixed bug causing JHOVE to skip the wrong number of characters in
APP0
segments [#303]
- Header check for invalid PDF minor version (not > 7) [#317]
- Unit tests for PDF Header parsing conditions [#317]
- Check that document catalog dictionary key
\Type
equalsCatalog
[#318] - Test that document catalog XRef lookup retrieves the right object number [#319]
- Unit tests for document catalog issues [#318]
- Test that page dictionary key
\Type
equalsPages
[#322] - Unit tests for page dictionary issues [#322]
- Improved handling of XRef lookup errors for document catalog and pages dictonary [#322]
- Added synthetic test files created by @asciim0 for iPres as unit test resources ([#317-#319])
- Fixed assignment of
application/pdf
as MIME type for images embedded in a PDF [#324] - Added method to derive MIME type from Filters and assign to NISO metatadata and added String constants for Filter names [#324]
- Fixed byte skipping issue when parsing Associated Data List chunks [#309]
- Added support for parsing and validating RF64 files [#308]
- Made WAVE parser more resilient to unexpected chunk data [#308]
- Improved reporting of WAVE codecs in WAVEFORMATEXTENSIBLE files [#308]
- Avoids reporting file format and MIME type until signatures have been verified
and reports extended MIME type information, e.g.
audio/vnd.wave; codec=1
, as per RFC 2361 [#308] - Subformat GUID's are now reported in their standard format, e.g.
00000001-0000-0010-8000-00AA00389B71
, instead of as an array of byte values [#308] - Added checks to verify the existence of Data chunks and their appearance after Format chunks [#308]
- Expanded WAVE example corpora to cover more formats and errors [#308]
- Improved truncation detection and reporting [#308]
- Fixed erroneous reporting of Cue Point values and renamed "Cue" report property to "CuePoints" [#308]
- NISO MIX 1.0 output now includes MIME type as
FormatName
[#323]
- NISO MIX 1.0 output now included mandatory
<FormatDesignation>
element [#323] - Image MIME type output as mandatory
<FormatName>
element [#323]
2017-11-30
- Installation of external modules is now optional [#292]
- Inaccessible files are now reported as of "Unknown" status instead of "Not well-formed" [#257]
- Improvements to error handling and uncaught module exceptions, increasing resilience during batch processing [#257, #259]
- Improved path handling, allowing installation locations and file paths to contain spaces, and more exotic characters [#206]
- Error and informational messages have been consolidated into discrete message classes for easier maintenance and future improvement [#120, #157, #283–#285, #287–#291]
- Increased the minimum version of Java from 1.5 to 1.6 [#273]
- Fixed a false invalid result for some types of encrypted document [#257]
- Fixed incorrect parsing of escaped characters in name objects [#280]
- More detailed error messages for indirect references to non-existent destinations [#123]
- Report invalid NISO color types [#171]
- Added validation for ICC profiles [#249]
- Added support for reporting BWF v2 fields [#273]
- Simplified BWF profile detection, allowing detection of any future BWF versions. All BWF versions will now be reported as "BWF" instead of "BWF version #", with any unrecognized versions being flagged [#273]
- Reformatted the BWF UMID field into a hexadecimal string instead of a long sequence of numbers [#273]
- Changed property label from "Originator Reference" to "OriginatorReference" for consistency and predictability [#273]
- Fixed incorrectly reported format names and
ArrayIndexOutOfBoundsException
errors when processing certain non-PCM WAVE files [#118] - Changed reported MIME type from
audio/x-wave
toaudio/vnd.wave
[#257]
- Fixed MIX 1.0 and TextMD XML generation for images with certain properties [#220]
2017-07-20
- Fixed: Some PDFs being reported as "Well-formed and valid" while remaining largely unchecked [#258]
2017-03-20
- Fixed: Core method causing modules to skip more bytes than expected [#194]
2017-03-16
- Added PDF and WAVE test files submitted by community during JHOVE hack day
- JHOVE Maven artefacts made available on Maven Central in addition to OPF Artifactory
- Improved error reporting for Travis test failures
- Improvements to GitHub pages website
- Formatting improvements to README.md, RELEASENOTES.md and pom.xml
- Fixed: CrossRefStream incorrectly assumes Index value is a two-element array
- Fixed: Bug in
skipIISBytes
andPdfModule.getObject
- Better handling where image heights and widths are PdfIndirectObjects
- Better handling of "empty" hex strings
- Better handling where form-fields are PdfIndirectObjects
- Fixed: Validation of WAVE files larger than 2 GB
- Fixed: Skip Bytes issue for WAVE files larger than 100 MB
2016-05-12
Version 1.12 was never officially released, so to avoid confusion the 1.12 changes are included with the 1.14 notes below.
- Ant build replaced with Maven
- Modularised project structure with "fat" JAR packaging
- Java 5 support
- Cross-platform installer
- Travis CI builds
- Maven distribution through OPF Artefactory server
- Updated JHOVE site pages
- GZIP Module, ported from JHOVE2 via JWAT by KB
- WARC Module, ported from JHOVE2 via JWAT by KB
- PNG Module, developed by Gary McGath
- Support for Unicode 7.0.0
2013-09-30
-
I've added lots of logging code. Calls at the FINE level and lower don't show up no matter what I do, so I've put them at the INFO level. The level is set in JhoveBase.java.
-
All .bat and _bat.tmpl files now have CR-LF line endings. That is, they do in the gzip and zip archives you download. I'm not sure how SourceForge will treat files that you download individually, but hopefully it will have the sense to keep CR-LF when downloading to a Windows system.
-
All .bat files now assume JHOVE_HOME is the directory from which they're run. They no longer try to set JAVA_HOME (which was still stuck in Java 1.4 and probably wasn't working for many people), instead assuming that the
java
command is available on the command line. -
All
javac
commands in build.xml files now specify source=1.5 for compatibility with more recent compilers. -
gdumpwin.bat is deleted. It's redundant with gdump.bat and has bugs of its own.
- Fix to PDF module, submitted by willp-bl, may reduce tendency to run out of heap space on some files.
2013-06-10
-
The amount of logging code has been increased, mostly at the DEBUG level.
-
Further work on generics in Java code.
-
JhoveView now checks for Java 1.5. Was previously allowing 1.4 even though it wouldn't work.
- XHTML files are processed by the HTML module, which invokes the XML modules. In this case, the XML module doesn't have the parameters specified in the JHOVE configuration file and so won't use local copies of schemas. Starting with this version, the parameters of the HTML module are passed to the XML module when invoking it. However, this doesn't work properly (in either module) for a DTD that invokes additional DTDs by relative URLs. Such DTDs should be edited to use only absolute URLs.
-
Failure to get a page object number wasn't being handled cleanly, resulting in a report of an invalid document without an error message to explain it (SourceForge bug 49). This has been fixed.
-
The PDF module unnecessarily uses huge amounts of memory to build complex structure trees, when it doesn't need to keep the whole tree in memory to validate it. In the new version, it uses memory more economically. This should result in the successful processing of some PDF files that ran out of memory or took hours to process before.
-
If an annotation isn't a dictionary object, report that explicitly. This happens with some otherwise good files; I can't find any warrant for it in the PDF spec.
-
Some efficiency improvements to PDF parser. Increased buffer size from 4K to 64K. Made Parser.collapseObjectVector more efficient. Parser now returns pseudo-objects for array and dictionary end instead of throwing an exception.
-
Minor cleanup of error reporting.
-
If an object uses a compression scheme which JHOVE can't deal with, JHOVE will try to give a specific error message.
2012-12-17
-
Jhove.java and JhoveView.java now get their version information from JhoveBase.java. Before it was redundantly kept in three places, and sometimes they didn't all get updated for a new release. Like in 1.8.
-
ConfigWriter was in the package edu.harvard.hul.ois.jhove.viewer, which caused a NoClassDefFoundError if non-GUI configurations didn't include JhoveViewer.jar in the classpath. It's been moved to edu.harvard.hul.ois.jhove.
-
Added script packagejhove.sh and made md5.pl part of the CVS repository to make packaging for delivery easier.
-
jhove.bat now simply uses the Java command rather than requiring the user to set up the Java path.
-
JhoveView.jar and jhove (the top level shell script) are now forced by ant to be executable so there are no mistakes.
-
Warning message given on invalid buffer size string, and minimum buffer size is 1024.
-
Configuration file code for adding handlers and giving init strings to modules was an awful mess that never could have worked. Major repairs done.
- If an AIFF file was found to be little-endian, the module instance would stay in little-endian mode for all subsequent files. This has been fixed.
- TIFF files that had strip or tile offsets but no corresponding byte counts were throwing an exception all the way to the top level. Now they're correctly being reported as invalid.
- Cleaned up reporting of schemas, Added some small classes to replace the use of string arrays for information structures. Made URI comparison for local schema parameter case-independent. Resolved conflict between "s" and "schema" parameters.
- Some uncaught exceptions caused the module to throw all the way back to JhoveBase and not report any result for certain defective files. These now report the file as not well-formed.
2012-11-07
-
If JHOVE doesn't find a configuration file, it creates a default one.
-
Generics widely added to clean up the code.
-
build.xml files fixed to force compilation to Java 1.5.
-
Shell script "jhove" no longer makes you figure out where JAVA_HOME is.
-
Several errors in checking for PDF/A compliance were corrected. Aside from fixing some outright bugs, the Contents key for non-text Annotations is no longer checked, as its presence is only recommended and not required.
-
Improved code by Håkan Svenson is now used for finding the trailer.
-
TIFF tag 700 (XMP) now accepts field type 7 (UNDEFINED) as well as 1 (BYTE), on the basis of Adobe's XMP spec, part 3.
-
If compression scheme 6 is used in a file, an InfoMessage will report that the file uses deprecated compression.
- The Originator Reference property, found in the Broadcast Wave Extension (BEXT) chunk, is now reported.
2012-08-12
-
JHOVE 1.7, as well as future releases unless noted otherwise, is released independently of Harvard under the GNU General Public License.
-
JHOVE now will tell you where it was looking for the config file if it can't open it. This should help debug configuration problems.
- Changes to XmlHandler.java and NisoImageMetadata.java to correct invalid MIX 2.0 XML output in the value of grayResponseUnit. It was previously writing integers (as in 1.0) rather than the expected enumerated strings.
- A situation that caused an infinite loop and eventual memory exhaustion processing in some PDF files with malformed literals has been fixed.
2011-01-04
- The default version of MIX is now 2.0. In earlier versions it was 0.2. However, MIX 2.0 still isn't supported in the text handler, so it will produce 1.0 output by default. The XML handler will produce MIX 2.0 output.
-
JHOVE returned a "String index out of range: 4" exceptions during TIFF validation for a tiff contains an empty (not NULL) date/time field. This has been corrected so that a date/time field with the wrong length won't be parsed but will report an error instead.
-
If text tags contain characters which aren't printable ASCII, these are now output as escape sequences so that invalid XML isn't output.
- Updated to Unicode 6.0.0.
2009-12-17
- An ArrayIndexOutOfBoundsException was thrown on a PDF with an invalid object number in the cross-reference stream. In JHOVE 1.5, this is correctly reported as a violation of well-formedness.
- With some very simple UTF-8 files, JHOVE handlers would throw an exception processing them, and the GUI would fail silently. This happened with files using no UTF-8 blocks. This has been fixed.
-
TextMD metadata can now optionally be reported. To get this, it's necessary to edit jhove.conf. TextMD can be enabled on a per-module basis for HtmlModule, AsciiModule, Utf8Module, and XmlModule. The
<module>
element for each chosen module must contain the element<param>withtextmd=true</param>
(no spaces). -
The TextMD feature was added by Thomas Ledoux.
2009-07-31
-
The PDF/A profile has been updated to the final version of 19005-1:2005(E) and made more thorough. Among the changes:
a. The set-state and no-op actions disqualify a PDF/A candidate.
b. The ASCIIHexDecode and ASCII85Decode filters no longer disqualify a candidate.
c. Checking of outlines has been added.
d. Additional checking of Type 1 fonts and symbolic fonts.
e. Bug fix in checking type 2 subfonts.
f. An LZW filter in an image object disqualifies a candidate.
g. The xpacket processing instruction is checked for attributes which disqualify from PDF/A.
h. Conformity to implementation limits is checked as a condition of PDF/A conformity.
- The pathological case of an image with no components is checked so it won't cause a crash.
- A reset() function has been added so that if the handler is reused, it will return to a valid initial state.
2009-06-04
-
The build.xml files now force compilation to Java 1.4, preventing accidental distributions that aren't 1.4-compatible.
-
Spaces are allowed in file paths on Windows, if the path is enclosed in quotes. This fix had been in version 1.1i, and had been lost since then.
-
According to the PDF 1.6 specification, table 3.4, parameters for a stream filter can be either a dictionary or the null object. The null object was treated as an error; it is now allowed.
-
Object stream handling was seriously buggy, causing rejection of well-formed and valid files; it's better now.
-
In PDF 1.4, an outline dictionary unconditionally must have a "First" and a "Last" entry. JHOVE follows this requirement, declaring a file invalid if it isn't met. However, PDF 1.6 relaxes the requirement, applying it only "if there are any open or closed outline entries." Thus, an empty outline dictionary with no "First" or "Last" entry is valid. It is now accepted (for all PDF versions).
-
If a page number tree in a PDF file is missing an expected "Nums" entry, this was being reported as an invalid date. A more appropriate error message is now given.
- TIFF tag 33723 (IPTC-NAA) was considered valid only if the data type is ASCII or LONG. But according to Aware Systems, the valid types are UNDEFINED and BYTE. All four types are now accepted.
- Omissions in MIX 1.0 and 2.0 output have been fixed.
2009-02-10
- A bug has been fixed in CountedInputStream, which could potentially have caused infinite recursion in some modules.
- An incompatibility with Java 1.6 has been fixed.
-
A null pointer exception would be thrown for PDF documents without a document root tree. This has been fixed.
-
A source of possible false positives in PDF profiles has been fixed.
-
Certain checks weren't being done to Type 2 fonts, and some PDF/A profile violations might have been missed as a result. This has been fixed.
- Sub-chunks of the 'adtl' chunk are now constrained to even byte boundaries.
-
MIX 2.0 is now supported.
-
The URL for the MIX 0.2 schema has changed to reflect the change on the LOC MIX site.
-
The handler was sometimes incorrectly reporting whether the AESAudioMetadata property had an empty value or not. This has been fixed.
2008-02-22
-
Allow filenames with internal spaces if they are quoted on the command line.
-
Corrected error setting the Classpath in the Windows Shell script (jhove.bat).
-
Corrected error opening the configuration file using the default GCJ parser in the GNU Java Runtime Environment.
-
AES metadata properties displayed in the RepInfo window rearranged slightly to make their ordering consistent with the Text and XML handlers.
-
The JhoveView.main() method will now accept a
-c configFile
option on the command line. The GUI interface can now be invoked by:java -jar bin/JhoveView.jar -c configFile
-
Corrected error opening the configuration file using the default GCJ parser in the GNU Java Runtime Environment.
-
Correct recurrent problems with reading the configuration file on Windows installations.
-
Correct value for first sample offset by included non-zero offset defined in the SSND chunk.
-
Do not report bitrate reduction data for PCM data.
-
All non-final instance fields and methods are protected, rather than private.
-
A minimal file containing no line-end characters now does not produce an empty ASCIIMetadata property, which is invalid against the JHOVE schema.
-
Zero-length files are considered not well-formed.
-
Issue informative message if file contains no printable characters.
-
All non-final instance fields and methods are protected, rather than private.
- All non-final instance fields and methods are protected, rather than private.
- All non-final instance fields and methods are protected, rather than private.
-
The HTMLMetadata block in the module output is only produced if there is at least one actual metadata property to report.
-
All non-final instance fields and methods are protected, rather than private.
-
The JPEG module reports the X and Y sampling frequency for files meeting the JFIF profile.
-
The JPEG module reports the pixel aspect ratio for JFIF profile files for which it is defined.
-
File handles were not being properly closed when processing embedded EXIF metadata. In cases where JHOVE was invoked against large numbers of objects this was causing a premature crash due to the resource leak.
-
All non-final instance fields and methods are protected, rather than private.
-
Correct parsing of the EXIF "subsecTimeOriginal" (37251) and "subsecTimeDigitized" (37522) properties.
-
Validation errors in embedded EXIF metadata were not being fully reported.
-
All non-final instance fields and methods are protected, rather than private.
-
Files generated by the LuraWave codec are no longer incorrectly identified as having unrecognized QCC marker segments.
-
Date strings are now parsed with strict conformance to the ASN.1 syntax.
-
Destinations defined by indirect references to non-existent objects are assumed to have the value "null". Files containing such destinations are reported as "well-formed, but not valid".
-
No attempt is made to display encrypted outline item title strings are not displayed.
-
Catch error if the Info key of the trailer dictionary is not an indirect reference.
-
Read entire page tree structure, regardless of its internal organization. This error may have caused the under reporting of page resources, such as fonts and images.
-
The NISO Compression Scheme for all images using the CCITTFaxDecode compression filter is now reported properly; previously, the scheme was always reported as CCITT 1D even if the actual compression algorithm was CCITT Group 3 or 4.
-
Properly parse UTF-16 escape characters encoded in double-byte form.
-
The module properly stops looking for the header comment after 1024 bytes.
-
All non-final instance fields and methods are protected, rather than private.
-
The number of incremental updates is now reported correctly, rather than the total number of file trailers, which is one greater than the number of updates.
-
Only up to 1000 fonts will be reported. After that, an informative message will be generated. The limit can be set using the parameter "nxxxx" in the module-specific section of the configuration file:
<module> <class>edu.harvard.hul.ois.jhove.module.PdfModule</class> <param>n2000</param> </module>
-
Subfonts of Type 0 are now being properly reported.
-
PDF/A-1b profile is now being properly reported.
-
Permit trailer info key to be optional.
-
Additional correction for outline recursion.
-
Fix treatment of indirect object of Actions.
-
Correctly handle trailer dictionary without Info entry.
-
Ignore comments within dictionaries.
-
Corrected error parsing pyramidal TIFF using the SubIFDs tag with a type of IFD (13) rather than LONG (4).
-
Correct parsing of the EXIF "subsecTimeOriginal" (37251) and "subsecTimeDigitized" (37522) properties.
-
All sub-IFDs of a pyramidal TIFF are now properly parsed.
-
The EXIF GainControl tag (41991) is now correctly identified as a SHORT, not a RATIONAL, value.
-
Corrected error in which valid files were reported as being only well-formed due to an incorrect parsing of the DateTime (306) tag.
-
Byte-aligned offsets can be considered well-formed if the module parameter "byteoffset=true" is set in the configuration file:
<module> <class>edu.harvard.hul.ois.jhove.module.TiffModule</class> <param>byteoffset=true</param> </module>
-
All non-final instance fields and methods are protected, rather than private.
-
Correct parsing of the EXIF "subsecTimeOriginal" (37251) and "subsecTimeDigitized" (37522) properties.
-
Using the
-s
option, the TIFF module was incorrectly reporting signature matches for text files starting with "II". -
Validation errors in embedded EXIF metadata were not being fully reported.
-
Corrected error under which malformed UTF-8 files containing encoding sequences starting with a byte value in the range 0xF8 through 0xFF were reported as well-formed and valid.
-
Zero-length files are considered not well-formed.
-
Issue informative message if file contains no printable characters.
-
All non-final instance fields and methods are protected, rather than private.
-
BWF files now set the correct start time in the AES metadata.
-
All non-final instance fields and methods are protected, rather than private.
-
"cue" and "adtl" chunks are now properly read.
-
The DTD is assumed to be the first DOCTYPE system ID in the file with a ".dtd" extension.
-
All non-final instance fields and methods are protected, rather than private.
-
The module correctly handles schemaLocation attributes that do not provide two whitespace-separated URIs.
- AES audio metadata properties rearranged slightly to make their ordering consistent with the XML schema.
-
Correct sample rate formatting in AES Time Code Format (TCF) temporal references.
-
Correct face IDREF in AES metadata.
-
Disallowed control characters are removed from content.
-
Null property values no longer generate empty elements.
-
Image technical metadata can be reported in terms of the MIX 1.0 schema, as opposed to the default reporting against MIX 0.2. To specify the 1.0 schema include the
<mixVersion>1.0</mixVersion>
directive in the configuration file.
-
The process() and processFile() methods of the JhoveBase class are now public, to permit direct access to the API by applications.
-
Checksum calculations now use buffered I/O uniformly for improved performance.
-
All non-final fields and methods in the JhoveBase class are protected, rather than private.
-
When invoked with the
-s
option JHOVE now reports the signature matched format and MIME type. -
The processing of files in a directory is now performed in an alphabetically sorted order.
- Display the field values of known chunks.
-
New format that sorts all tag definitions by their byte offset and also displays the byte ranges for image data.
-
Command line flags permit the suppression of BYTE data display (
-b
) and and subIFD parsing (-s
).
- A new utility program, UserHome, is available to determine the value of the Java user.home property needed to know where to place the configuration file. This utility can be invoked by the driver scripts "userhome" (Bourne shell) or "userhome.bat" (Windows).
2005-05-26
-
Zero length files are now handled properly in all modules.
-
Missing start time in audio files is now handled property in all audio modules.
-
Miscellaneous bug fixes, enhancements, and documentation updates.
- Corrected error causing BitrateReduction to be incorrectly reported for uncompressed PCM audio.
-
The module now validates the enumerated ICC profile types in the Color Specification Box. In the JP2 profile, an unrecognized ICC profile type marks the file as not well formed; in the JPX, the file is merely not valid.
-
In the beta 3 release certain invalid JPEG 2000 files were reported as well formed in the JP2 profile. This has been corrected.
-
Following the practice of Acrobat, the PDF module will accept the "%PDF-1.n" header comment anywhere in the first 1024 bytes of a file (with appropriate notification via an information message), rather than requiring that it start at byte offset 0.
-
The requirements for the PDF/A profile have been brought into conformance with the most recent version of the PDF/A specification, ISO/DIS 19005-1 of 2004-12-22.
-
Corrected bug that prevented valid PDF/X-1 files from being recognized as such.
- Corrected error causing BitrateReduction to be incorrectly reported for uncompressed PCM audio.
-
Dates reported for the NISO Z39.87
<mix:DateTimeCreated>
element are now canonicalized to be in proper ISO 8601 form. -
The NISO Z39.87
<mix:ScannerManufacturer>
element is now reported, if known.
- The current working directory is reported as the "home"
attribute of the
<audit>
element and individual files are reported as relative pathnames.
2005-02-04
-
The architecture has been modified to simplify the use of JHOVE with new "front ends." The new JhoveBase class is used in conjunction with the App class to incorporate nearly all the work of setting up a JHOVE instance. The main Jhove class and the App class are now smaller than before.
-
Checksums were often being reported with incorrect values due to an output formatting error that dropped zeroes. This has been fixed.
-
New utilities GDUMP and JDUMP created for GIF and JPEG documents.
-
Error messages are more consistently factored into submessages. This allows messages indicating the same type of error to be more readily grouped.
-
Some modules were reporting a MIME type for a document that is not well-formed. This no longer occurs.
-
Duplicate reporting of AES BitDepth has been suppressed.
-
New module for HTML format. Be sure to update the configuration file, jhove/conf/jhove.conf, to include the module:
<module> <class>edu.harvard.hul.ois.jhove.module.HtmlModule</class> </module>
-
The AES audio metadata representation has been updated to conform with schema version 1.02b (pre-release).
-
New property, sigMatches, has been added to RepInfo. This records which module(s) regarded the signature of the document as a match, even if the document was not well-formed. This is useful in identifying broken documents that are reported as ASCII or Bytestream.
-
The logging API is supported, permitting the generation of debugging messages.
-
All modules are now non-final, so that they can be subclassed by adventurous users.
-
The
-p
and-P
arguments of the command line are no longer supported. Instead, the equivalent parameters can be provided to all variants of JHOVE (including those which don't take a command line) by specifying a<param>
element within the<module>
element of the configuration file. Example:<module> <class>edu.harvard.hul.ois.jhove.module.PdfModule</class> <param>a</param> <param>f</param> <param>p</param> </module>
-
The JHOVE command-line interface can now accept directory names, as well as file pathnames and URIs:
java jhove [-c config] [-m module] [-h handler] [-e encoding] [-H handler] [-o output] [-x saxclass] [-t tempdir] [-b bufsize] [-l loglevel] [[-krs] dir-file-or-uri [...]]
All of the files in the directories are processed in a depth-first recursive descent.
-
The JhoveViewer class now allows dragging of a directory or of multiple files, and the output for all files is presented in a single window. This significantly reduces the window clutter.
-
The JhoveViewer presents the module menu in alphabetical order rather than configuration file order.
-
The JhoveViewer was failing to report some submessages. This is fixed.
-
The JhoveViewer was failing silently on certain URL errors; it now puts up an error alert.
-
If an empty module class name is added in the Configuration dialog, it is ignored.
-
Descriptive properties added.
-
Checksum was sometimes missing; fixed.
-
Specification URL added to descriptive information.
-
Reported MIME type changed to 'audio/x-aiff' from 'application/aiff'.
- BitsPerSample is now reported.
-
Errors occurring when parsing an optional EXIF segment were not being reported. This problem manifested itself by incorrectly reporting that the JPEG file is not well-formed.
-
Array size bug in BitsPerSample fixed.
-
Specification information added for ITU.
-
Errors in parsing of an EXIF segment are now reported.
-
In certain instances the module was inappropriately reporting well-formed PDF files as being non-well-formed, indicating (incorrectly) that the file does not contain a trailer.
-
Fixed a NullPointerException being thrown with a defective page root tree.
-
Certain broken cross-reference tables would throw the module into a loop. This is fixed.
-
Problems in XMP data that triggered a SAX error were being reported to standard output as a "fatal error." They are now properly reported.
-
Error in offset reporting fixed.
-
Now reports FontFile2 and FontFile3.
-
File trailers are now found more reliably.
-
PDF/A profile updated to latest draft proposal, ISO/CD 19005-1 (2004-09-20).
-
Parameters that would have been specified by the
-p
argument of the command line are now specified by the<param>
element in the configuration file. The sense of these parameters has been reversed; by default, the PDF module presents the maximum amount of information unless suppressed by including the characters a, p, f, or o in the parameter value(s).
-
Adobe DNG tags are recognized, and a DNG profile has been added.
-
Bug in DATETIME checking fixed.
-
Changes in validity tests for PhotometricInterpretation, SamplesPerPixel and BitsPerSample.
-
Corrected spurious null values for some properties.
-
Tag data type checking was badly broken, now fixed.
-
Type 'exif' recognized in LIST chunk.
-
Format and signature information updated.
-
Checksum was sometimes missing; fixed.
-
Reported MIME type changed to 'audio/x-wave' from 'audio/x-wav'.
-
Now reports 1.0 and 1.1 as versions rather than profiles.
-
Reported MIME type changed to 'text/xml' from 'application/xml'.
-
A base URL for DTD's may now be specified using the
<param>
element. The URL must be preceded by the letterb
to distinguish it from potential future parameters, e.g.,<module> <class>edu.harvard.hul.ois.jhove.module.XmlModule</class> <param>bhttp://www.example.com/</param> </module>
-
The "xsi" namespace is now defined in the NISO Image Metadata
<mix:mix>
and AES Audio Metadata<aes:audioObject>
elements. This allows these segments to validate when extracted from the JHOVE output document. -
The
<ImagingPerformanceAssessment>
element is properly named; it had been improperly displayed as<ImagePerformanceAssessment>
. -
X and YSamplingFrequency are reported as positive integers ("600"), not ratios ("600/1"), for consistency with the MIX schema.
-
An empty Properties element in the XML handler is now suppressed.
- New utility to dump GIF files in human-readable form.
- New utility to dump JPEG files in human-readable form.
-
The output format has changed slightly, e.g.
00000000: "II" (little endian) 42 00000008: IFD 1 with 15 entries 00000034: 254 (NewSubFileType) LONG 1 = 0 00000046: 256 (ImageWidth) LONG 1 = 2948 00000058: 257 (ImageLength) LONG 1 = 4620 ...
2004-07-19
-
Multiple files can now be specified in command line:
jhove ... [[-krs] file-or-uri ...]
A single output document (XML or text) will be generated for a set of files specified in a command line.
-
API version information is now available through methods in the App class.
-
AESAudioMetadata property has been added for sound formats. The new PropertyPath class facilitates the extraction of Properties by applications that use the JHOVE API.
-
The ErrorMessage and InfoMessage classes now support a submessage string for more flexible message factoring.
-
The SAX parser class may now be specified in the jhove.properties file in the property "edu.harvard.hul.ois.jhove.saxClass".
-
Supports drag and drop of directories; subdirectories are processed recursively.
-
The menu option "File > Close document windows" closes all document windows.
-
Performance has been improved in all modules.
-
New modules for JPEG 2000, AIFF, and WAVE formats. Be sure to update the configuration file, jhove/conf/jhove.conf, to include these modules:
<module> <class>edu.harvard.hul.ois.jhove.module.AiffModule</class> </module> <module> <class>edu.harvard.hul.ois.jhove.module.WaveModule</class> </module> <module> <class>edu.harvard.hul.ois.jhove.module.Jpeg2000Module</class> </module>
-
Bug reading unsigned integers has been fixed.
-
More information provided about encryption keys.
-
UserAccess property now shows "No permissions" if no bits are set.
- Unexpected EOF is now handled cleanly.
- Exif data exception properly thrown.
-
Identification of Exif profile has been improved.
-
Photoshop tags 34377 and 50255 are now recognized.
-
Bug in handling ExtraSamples tag fixed.
-
Bug in determining valid date/time formats; the range for hours was incorrectly constrained to 1-24, rather than 0-24.
-
If no encoding is specified, encoding is now reported as UTF-8.
-
Catches and reports UTFDataFormatException.
-
A greater range of parsers (including Xerces) now will do schema validation.
-
Omitted values in NisoImageMetadata were being reported in XML in some cases as default values (e.g., -1). These have been suppressed.
-
<PlanarConfiguration>
element was inappropriately nested underneath the<Segments>
element. -
The "subMessage" attribute is now properly defined in the jhove.xsd schema.