A contribution to Yvonne Friese’ publication on the topic: “Ensuring long-term access: PDF validation with JHOVE ?”
The CINES is a French centre whose one of its main missions is digital archiving. The team in charge of this task is made up of archivists, developers and files formats experts. They have developed a platform called PAC (Plateforme d’Archivage du CINES for Archiving Platform of the CINES) through which users can check their files and store them when they are “well formed and valid”. The files are checked with a “validator” which is a package of tools and “files formats validators” including JHOVE. As such, a web service, FACILE , has been developed to help users in the validation of their files.
JHOVE contains several modules and each of them deals with a file format and its subtypes. These files formats include: AIFF, GIF, JPEG, JPEG 2000, PDF, TIFF, and WAVE. Up till now, JHOVE has been useful for most of the files the CINES has stored through the PAC project. Most of these files are PDF in various versions from 1.1 to 1.7, and not long ago PDF/A files joined the bundles.
Some months ago, the files formats experts started a survey on PDF validators. The purpose was to highlight the best(s) validator(s). For this, they gathered files from Isartor suite , Bavaria suite and some tests files they produced. The results confirmed what Yvonne Friese wrote in her publication . JHOVE’s statistics were not good at all. Only one file was found invalid among the invalid files. YF mentions the requirements of the PDF/A-1b, which brings us to the conclusion that the PDF/A files were checked as normal PDF. She therefore concluded that JHOVE is not suited for PDF/A validation.
She said JHOVE is still useful provided that users understand its error reports. Indeed, JHOVE is useful for PDF, as I mentioned above, it also deals with other formats. It will surely be flattering and incentive for the originator of this project to see that many people and institutions around the world are interested in promoting his idea.
I don’t know of any open source validator that is as efficient as JHOVE, able to handle about 12 formats, written in JAVA and as famous as it. There are surely some others, but one which includes PDF for free, I don’t know of any. The work was well done and needs to continue. Unfortunately, the project hasn’t known any major update since the release of the 1.11 version in ending September 2013. The recent activity concerning it is the “mavenization” of JHOVE. Gary McGath said on his blog that the problem is that his current job has rather long hours, and when he comes home from it, looking at more Java code isn’t at the top of his list of things to do.
At this level, it is undeniably the opportunity for the whole digital archiving community to join efforts in order to maintain and improve the situation of this international tool. JHOVE can do better than what it does now. Gary McGath owns a blog through which it is possible to get in touch with him. All sorts of contributions could be helpful to the long-term preservation of JHOVE itself.
By Franklin Boumda, posted in Franklin Boumda's Blog