April’s JHOVE hack day was another great success covering a range of development and non-development tasks; issues and pull requests were closed, sample files were...
Blogs
Over the last months we’ve been working on the development of a provisional workflow for preserving the content of optical media in our collection. The...
I spent 22nd and 23rd of May at the GitHub Satellite conference in London. The aim of the event was to: provide a showcase for...
On June 7 and 8 2017, the General Annual Meeting of the Open Preservation Foundation was held at the National Library of France in Paris....
Some four years ago I wrote a blog post that demonstrated how Apache Preflight (the PDF/A validator tool that is part of Apache PDFBox) can...
When we work with government records we tend to work on file formats. We think about documents in formats such as PDF, or DOCX (Word), or PPT (PowerPoint), or other exotic formats such as Serif PagePlus. We tend not to think as much about the web, for one, it can be argued that it is its own separate discipline, complementary, and somewhere between digital preservation, and archives more generally. Yet, as soon as we type a hyperlink into a Word document, what do we have? We have a link out to the web. It doesn't magically make that document the world-wide-web, but we have given the document a new intrinsic characteristic, it relies on the web to aid with interpretation or understanding. When we look at that link, we will probably find something about it that will require preservation. How do record that information? How do we expose its existence? How do we preserve hyperlinks in documentary heritage?
