Building a digital preservation toolkit for digital curators

In the context of digital information, many curation tasks need to be performed to ensure continuous access to information. As digital assets grow in size and number, tools must be deployed to ease on the execution of common digital preservation tasks and therefore making the whole digital preservation process more manageable.

In the SCAPE project, a set of preservation action tools were surveyed, evaluated and selected for use as part of preservation plans to be deployed in large-scale digital preservation scenarios (see http://tinyurl.com/c68qns6 for the full list of surveyed action tools). Examples of preservation action tools are file format converters. Selected preservation action tools were developed by many people and/or organizations, resulting in very distinct applications, with different ways of being invoked, passing parameters, handle errors, etc.

In order to create a user-friendly digital preservation toolkit we decided to normalize the way tools are named and invocated. To accomplish that, we have developed an application called Tool wrapper which, in essence, reads a tool specification file (called toolspec) that describes a particular digital preservation tool, i.e. what it does, who developed it, how to install it, how to invoke it, what are its dependencies and other technical details.

The toolspec is written in XML following a well-defined schema. Making a toolspec file for a tool enables the following outputs to be automatically generated:

  1. A command-line script that uniforms the name of the tool, parameter passing, adds support for input and output streams, normalizes output errors, etc.;
  2. A web service for invoking the tool over the Web;
  3. A single-step Taverna workflow that enables anyone to easily use the tool in larger, more complex, Taverna workflows;
  4. A software package for easy installation of these artifacts and all its dependencies in Debian Linux machines.

Let’s demonstrate how to use the Tool wrapper. Imagine the case where one wants to create a smaller version of a set of digitized images stored as TIFF using GIMP as a command-line application. There isn’t an easy way to do this as Gimp does not have a direct way of being invoked as a command-line application. To do a TIFF to PNG conversion, first we need to define a function to be passed to GIMP that reads the input image, merges all layers and then saves the image in the desired output format. That function looks something like:

(define (convert-tiff-to-png filename outfile)(let* ((image (car (gimp-file-load RUN-NONINTERACTIVE
   filename filename)))(drawable (car (gimp-image-merge-visible-layers image CLIP-TO-IMAGE))))
   (file-png-save-defaults RUN-NONINTERACTIVE image drawableoutfileoutfile)(gimp-image-delete
   image)))(convert-tiff-to-png \"image1.tiff\" \"image1.png\")(gimp-quit 0)

As we can see, it’s a command that we would like to write once and use often, so it would probably end up in a bash script to be used whenever needed.

But having this script is only part of the solution. Nothing is defined about where to put the script, what dependencies it has and how to install them. This is where the Tool wrapper comes in. One only needs to describe the tool in toolspec/XML format, reference the script previously created, package everything and that’s it, it’s ready for installation, usage and distribution.

The following excerpt is an example of a toolspec that solves the described scenario.

<?xml version="1.0" encoding="utf-8" ?>
<tool name="GIMP" version="1.0" homepage="http://www.gimp.org/">
   <installation>
      <dependency operatingSystemName="Debian">gimp</dependency>
      <license type="Apache Licence 2.0">Apache License, Version 2.0</license>
   </installation>
   <operations>
      <operation name="digital-preservation-migration-image-gimp-tiff2png">
         <description>Converts TIFF to PNG</description>
         <command>/usr/share/digital-preservation-migration-image-gimp-tiff2png/digital-preservation-
         migration-image-gimp-tiff2png.sh ${input} ${output}</command>
         <inputs>
            <input name="input" required="true">
               <description>Reference to input file</description>
            </input>
         </inputs>
         <outputs>
            <output name="output" required="true">
               <description>Reference to output file</description>
            </output>
         </outputs>
      </operation>
   </operations>
</tool>

 

The”digital-preservation-migration-image-gimp-tiff2png.sh” mentioned in the toolspec is as follows:

#!/bin/bash

if [ $# -eq 2 ]; then
    echo "(define (convert-tiff-to-png filename outfile)(let* ((image (car (gimp-file-load
    RUN-NONINTERACTIVE filename filename)))(drawable (car (gimp-image-merge-visible-layers
    image CLIP-TO-IMAGE))))(file-png-save-defaults RUN-NONINTERACTIVE image 
    drawableoutfileoutfile)(gimp-image-delete image)))(convert-tiff-to-png \"$1\" 
    \"$2\")(gimp-quit 0)" | gimp -i -b -
    exit 0
else
    exit -1
fi

 

The SCAPE project has already produced a first version of the Digital Preservation Toolkit. The toolkit is available at http://scape.keep.pt/apt. More information on how to install and on how to use it is available at http://tinyurl.com/cbd72jd.

At the moment the toolkit only includes preservation action tools but over the course of the project the same toolkit will be enhanced to include tools for file format identification, digital object characterization, and migration quality assurance. 

By hsilva, posted in hsilva's Blog

8th Nov 2012  5:49 PM  13045 Reads  No comments

Comments

There are no comments on this post.


Leave a comment