In defence of migration

There is a trend in digital preservation circles to question the need for migration. The argument varies a little from proponent to proponent but in essence, it states that software exists (and will continue to exist) that will read (and perform requisite functions, e.g., render) old formats. Hence, proponents conclude, there is no need for migration. I had thought it was a view held by a minority but at a recent workshop it became apparent that it has been accepted by many.

However, I’ve never thought this is a very strong argument. I’ve always seen a piece of software that can deal with not only new formats but also old formats as really just a piece of software that can deal with new formats with a migration tool seamlessly bolted onto the front of it. In essence, it is like saying I don’t need a migration tool and a separate rendering tool because I have a combined migration and rendering tool. Clearly that’s OK but it does not mean you’re not performing a migration?

As I see it, whenever a piece of software is used to interpret a non-native format it will need to perform some form of transformation from the information model inherent in the format to the information model used in the software. It can then perform a number of subsequent operations, e.g., render to the screen or maybe even save to a native format of that software. (If the latter happens this would, of course, be a migration.)

Clearly the way software behaves is infinitely variable but it seems to me that it is fair to say that there will normally be a greater risk of information loss in the first operation (the transformation between information models) than in subsequent operations that are likely to utilise the information model inherent in the software (be it rendering or saving in the native format). Hence, if we are concerned with whether or not we are seeing a faithful representation of the original it is the transformation step that should be verified.

This is where using a separate migration tool comes into its own (at least in principle). The point is that it allows an independent check to be made of the quality of the transformation to take place (by comparing the significant properties of the files before and after). Subsequent use of the migrated file (e.g., by a rendering tool) is assumed to be lossless (or at least less lossy) since you can choose the migrated format so that it is the native format of the tool you intend to use subsequently (meaning when the file is read no transformation of information model is required).

However, I would concede that there are some pragmatic things to consider…

First of all, migration either has a cost (if it requires the migrated file to be stored) or is slow (if it is done on demand). Hence, there are probably cases where simply using a combined migration and rendering tool is a more convenient solution and might be good enough.

Secondly, is migration validation worth the effort? Certainly it is worth simply testing, say, a rendering tool with some example files before deciding to use it which should be sufficient to determine that the tool works without detailed validation most of the time. However, we have cases where we detect uncommon issues in common migration libraries so migration validation does detect issues that would go unnoticed if the same libraries are used in a combined migration and rendering tool.

Thirdly, is migration validation comprehensive enough? The answer to this depends on the formats but for some (even common) formats it is clear that better, more comprehensive tools would do a better job. Of course the hope is that this will continually improve over time.

So, to conclude, I do see migration as a valid technique (and in fact a technique that almost everyone uses even if they don’t realise it). I see one of the aims of the digital preservation community should be to provide an intellectually sound view of what constitutes a high quality migration (e.g., through a comprehensive view of significant properties across a wide range of object types). It might be that real-life tools provide some pragmatic approximation to this idealistic vision (potentially using short cuts like using a combined migration and rendering tool) but we should at least understand and be able to express what these short cuts are.

I hope this post helps to generate some useful debate.

Rob

Leave a Reply

You might also like…

Apache Tika File Mime Type Identification and the Importance of Metadata

Une déclaration d’amour aux formats

What is the checksum of a directory? Using DROID reports and the concepts behind Merkle Trees to generate Directory, and Collection Checksums

Join the conversation

Member-only content

or

or

or

or

Download

or