Software Archiving for EaaS

The typical digital artefact or complex object does not function (render, execute, …) without a certain software environment. Emulation-as-a-Service (EaaS) provides original environments running in platform emulators. Depending on the (complex) object to be handled, several software components are required to reproduce an original environment. Often, these components are proprietary and require a software license. The software itself and the licenses need to be preserved to enable the reproduction of the original environments. There are a couple of issues linked to software licenses. The issue can change over time definitely influence EaaS as licenses (and software "patents") expire or local and remote license servers become unavailable. Another interesting point, masively disputed by some software vendors, is the development of a second hand software market.

Software Archive of Standard Components

Software components required to reproduce original environments for certain (complex) digital objects can be classified in several ways. There is standard software such as operating systems and off-the-shelf applications sold in (significant) numbers to customers. There might exist different releases and various localized versions (the user interaction part translated to different languages as is the case for Microsoft Windows or Adobe products) but otherwise the copies were exactly the same. Such software should be described uniquely and kept in a software archive of standard components.

There are several ideas on software identification and description already discussed in this blog (e.g. by Andrew Jackson). DOIs would definitely be helpful to tag software like ISBNs, describe books and other media. These tags would be useful for tool registries like TOTEM, too. Optimally, such software archives are managed by the relevant (national) memory institutions. As the archive's content is comparably small and well described by the tags, the workload can easily be shared (federated) among several institutions. Different ways could be envisioned to stock these archives. Legal deposit, as is well established for books and other media, is one option. Or, software components could be collected on-demand upon object ingest. This option is discussed and demonstrated e.g. by the bwFLA project. It provides necessary interfaces to a software archive, so that all required software components can be collected and described. This is done via observed installation processes which records all the required user interaction to install a certain component. Such additional information is to be stored alongside the standard metadata such as license keys. The successful rendering of the object can be directly validated by the user to verify the complete capture of all relevant components.

Unfortunately, a general, coordinated software archiving is still a partially unresolved issue. There are a several activities going on at the National Archives of New Zealand or the National Library of Australia. These activities are very valuable to the whole community as some of the software producers often do not archive their products very long. Additionally, some companies leave the market and not all assets are maintained. There exist initiatives like which try to tackle this problem but operate in a legally problematic domain. They might go down because of take-down or simply because of running out of funding. Other sources are specialized archives like for web browsers. The drive-by software archiving as run by the Internet Archive might not capture all relevant software as many components were not freely and openly available for download. Especially for older and less popular platforms it becomes more difficult to get hold of obsolete software. Nevertheless, storing and maintaining software components is a prerequisite of the deal. Nevertheless, memory institutions should have special rights to archive software.


Every actually running instance of an original environment requires a certain set of licenses depending on the installed or used software. If e.g. a set of presentation slides with embedded audio, video and spreadsheets needs to be rendered, the licenses for the operating system and the presentation software are required. Additionally, audio and video codecs as well as an appropriate spreadsheet renderer needs to be obtained and installed to make the presentation of the object complete. For EaaS a license management component is required to match the number of available licenses to the requested original environments to run. The sources of the licenses could be different and could depend on the user (and institution) requiring access to a certain digital object in its original environment. In a federated EaaS environment run by different institutions, the sharing and handling of licenses becomes an interesting topic, especially if national borders are crossed (e.g. because software vendors try to maintain seperated markets with different pricing).

Within the realm of (national) libraries and archives the licenses of the legal deposit might suffice. For a more open and general service other ways of licensing are required. Either, the software producers offer a specific type of license for that purpose or specifically acquired licenses (e.g. pre-owned license market) are used. Another option is that licenses are obtained (from the original user/producer of the object) when ingesting the particular object. This might be the case for finished (scientific) projects or end-of-life office environments in companies or government organizations. At the moment, licenses are often just thrown away like used IT equipment. For the future a more elaborate digital lifecycle management should be put in place. With the planning and beginning of a project the licensing of all required components should be secured for the complete intended lifecycle of a particular object.

Custom Made Software Components

Not for all software components a (federated) software archive of standard components makes sense. In many domains custom made software and user programming plays a significant role. This could be scripts or applications written by scientists to run their analysis on gathered data, run specific computations or extend existing standard software packages. Other examples are software tools written for governmental offices or companies to produce certain forms or implement and configure business processes. Such software is to be taken care of and stored alongside the preserved object. The same applies for complex setups of standard components with lots of very specific configurations. In these cases it could make sense to preserve the system as a whole (see blog post on that topic for full system preservation).

Pre-Produced and On-Demand Original Environments

EaaS allows to centralize services and share the efforts. This could be especially useful to re-use pre-produced original environments of standard components. Depending on the type of user – if rendering the object within the premises of the memory institution or being from some commercial entity or a private person – different ways of the (re)production of original environments could be chosen:

  • Complete environments together with the required metadata to run it in the chosen virtual machine or emulator. This would be the method to deploy for imaged complete systems.
  • Reproduce the complete environment from standard components using the license information delivered by the user together with the object to render. This may take a while as the setup procedure needs to be completed. The bwFLA project started to implement workflows to gather all the required metadata and user interaction to automatically reproduce such steps.
  • Re-use existing environments from a "cache" (pre-produced environments). This should be possible for in-house use or as an external service if the required type and number of licenses is available. Here a couple of legal concerns might prove problematic as many licenses may not explicitly allow software lending.
  • Partially re-use pre-configured environments if licenses are less problematic and just add the problematic/proprietary component.

Several ways were described to automatically re-produce certain environments e.g. for Windows operating systems (link) or as researched within the bwFLA context. Nevertheless, these procedures take time to complete and extend the time span till an artefact or original environment can be presented to the user.

By Dirk von Suchodoletz, posted in Dirk von Suchodoletz's Blog

1st Apr 2013  2:23 PM  12958 Reads  3 Comments


There are no comments on this post.

Leave a comment