Resource Audit and Comparison Tool (ReACT)

The ReACT tool, developed following the  SPRUCE mash-up in Glasgow, uses VBA macros within a Microsoft Excel environment to implement automated comparisons of files across folder and directory structures enacted through a simple GUI.

Ray Moore (Archaeology Data Service) and Andrew Amato (London School of Economics and Political Science)

At the SPRUCE mash-up in Glasgow (16th-18 April 2012) digital archive practitioners highlighted a number of practical issues relating to the management of digital files, particularly with regard to the comparison of lists and directories of files in order to monitor the migration of files during the archiving process (http://wiki.opf-labs.org/display/SPR/Disassociation+of+files+and+metadata). During the event Andrew Amato (London School of Economics and Political Science) developed a series of tools, based around Microsoft Excel and VBA macros, which assisted in the audit of collections.[1] The main intention at the outset was to develop a tool that was relatively easy to use, and which could be deployed directly from the desktop in order to appeal to users of varying computing ability. Having developed a proof of concept it was found that the uniqueness of repository infrastructures made the application of the tool problematic outside the specific organisations for which it was initially developed. As a result it was considered that a more generic version of the tool would have a broader appeal and potential use value within the wider digital preservation community. A successful application for a SPRUCE Award, made possible with the support of JISC, (http://wiki.opf-labs.org/display/SPR/SPRUCE+Awards+-+funding+opportunity+for+digital+preservation) allowed Andrew Amato (LSE) and Ray Moore (ADS) time between July-October 2012 to develop and test the tools further.

GUI ScreenshotDuring development it was recognised that in order to give the tool wide applicability within a broad range of organisational infrastructures a degree of flexibility was necessary; initially an ability to specify folder and directories for comparison was an obvious development. For those following an Open Archival Information System (OAIS) reference model, the tool allows comparison of the Submission Information Package (SIP), Archival Information Package (AIP) and Dissemination Information Package (DIP) allowing a full audit of the contents of the archive [2]. Although the flexibility of the tool means there is potential to extend its use to almost any structure. The tool allows any assessments to be recursive through existing structures, meaning comparisons can also be implemented at the highest level of the file structure. The tool also addresses the difficult issue of relating files in different formats by allowing users to stipulate which file types should be matched with which other format, thereby making the resulting audit more accurate. This feature is particularly useful in an archive environment where the file type which is initially deposited is often different from the archived or disseminated format. So, for example, if an archive contains Microsoft Word documents (.doc), preservation policy dictates that they should be preserved in Microsoft Word Open XML (.docx) and disseminated as Portable Document Format (.pdf); the tool allows users to set this relationship at the outset. Once an audit has been completed the results are viewed in discrete datasheets which allow for a straightforward comparison of files that have been ‘Matched’ or those where files may be missing or mismatched. Each of the items in the audit results is also hyperlinked, so that users can move directly to the specific file location; this appears in a pop-up window. Throughout development an awareness of the broad technical ability of potential users made the development of a simple Graphical User Interface (GUI) a necessity.

The ReACT tool was subsequently tested on a collection of archives within the ADS which form the backbone of the Grey Literature Library [3]; a resource which makes unpublished archaeological fieldwork reports available to heritage professionals and the wider public. The sheer size of this resource (21,000 files across 157 collections) and the fact it is added to on monthly basis, means that, unlike standard collections, discrepancies seemed more likely; the ReACT tool proved invaluable in auditing the content of this resource.

The ReACT file and folder audit tool is available via the SPRUCE Repository on GitHub – https://github.com/openplanets/SPRUCE/tree/master/ReACT and a write up of the solution can be found on the SPRUCE wiki (http://wiki.opf-labs.org/display/SPR/File+management+and+matching+of+tif%2C+htm+and+pdf+files+solution).



[1] Amato, A (2012) ‘SPRUCE/ExcelComparisonMacros’. https://github.com/openplanets/SPRUCE/tree/master/ExcelComparisonMacros, accessed 19 November 2012.

[2] Lavoie, BF (2004) The Open Archival Information System Reference Model: Introductory Guide. DPC Technology Watch Series Report 04-01. http://www.dpconline.org/docs/lavoie_OAIS.pdf, accessed 19 November 2012.

[3] Grey Literature Library http://archaeologydataservice.ac.uk/archives/view/greylit/, accessed 19 November 2012.

By Ray Moore, posted in Ray Moore's Blog

5th Dec 2012  11:54 AM  12310 Reads  No comments

Comments

There are no comments on this post.


Leave a comment