BnF Preservation Strategy for the PSD Format
By Alix Bruys, Bertrand Caron, Yannick Grandcolas, Thomas Ledoux, & Anne Paounov, Bibliothèque nationale de France (BnF), National Library of France
[N.B. Ce billet existe en français sous le titre “Tout transformer pour que rien ne change”]
English Translation of the blog “Tout transformer pour que rien ne change”, helped by DeepL.com/Translator
Previously, on the BnF Formats working group…
This blog post describes the continued work of BnF’s working group, called “Groupe Formats” on Data and Metadata Formats for Digital Preservation, introduced in a previous post published on November 5, 2020 and in the OPF webinar presented on April 14, 2021. In previous episodes, the group published BnF’s policy towards formats for preservation: how it evaluated them, how it analyzed files, which formats it preferred, what it knew about them – and, implicitly, what it didn’t know.
The BnF format policy document also mentions the strategies BnF adopts when data arrives in a format other than its preferred ones. Ideally, the institution’s policy is to engage a dialogue with the producer to agree on a format that is acceptable and manageable by both parties.
However, in some cases, negotiation is not possible because of the workload that this may entail for the producer, or because the producer is not available. In such cases, it is necessary for the BnF to consider transforming the data. The first studies carried out on this subject led BnF to the conclusion that the strategy to adopt depended on many criteria, and that the decision to transform or not the data had to be taken on a case by case basis. A textbook case occurred in the form of PSD files provided, in a reduced time interval, in three distinct collections.
Note that in this article, we use the terms “transform” / “transformation”, instead of their equivalents “convert” / “conversion” and “migrate” / “migration”, in their meaning defined by the OAIS(1): the modification of the content information for preservation purposes, with the objective that the result can replace the original.
The librarian context: the three collections
In the course of 2020, three collection managers from different departments at BnF were working simultaneously on three collections that have recently joined the library’s heritage collections. These three collections, which are treated as all the digital collections received as donations or acquisitions by BnF, have in common the fact that they contain digital images intended to be viewed through the Gallica digital library. Here is an overview of the main characteristics of these collections, as they were known before PSD files were detected.
The archive of filmmaker Amos Gitai, known as the “Gitai collection”
Received as a donation by the Performing Arts Department, this collection consists of more than 150,000 files produced during the making of the film Rabin, the Last Day; in addition to the sound and video content, it includes nearly 2,000 photographs taken in 2015 by three different photographers, most of which are in JPEG format.
Posters by the graphic artist Philippe Apeloig, known as the “Apeloig collection”
Received as a donation by the Prints and Photography Department, this collection concerns the creation of posters for the “Fête du livre d’Aix-en-Provence” between 1997 and 2015, by Philippe Apeloig. In addition to the printed posters and sketches, it includes nearly 300 digital sketches, mainly in PDF, TIFF and JPEG formats; the source files of the final printed poster are also identified, in TIFF or PDF formats.
Michèle Laurent’s photographs, known as the “Laurent collection”
Acquired in 2008 by the Performing Arts Department, this mixed collection (digital and paper) contains several hundred files representing the performances of actor Philippe Caubère, mainly in TIFF format.
The presence of PSD files, which are very much in the minority in the three collections, sometimes even “hidden” behind a .tif extension, was revealed by a tool developed internally called Frontin. This tool, which is available to collection managers sorting digital collections, identifies and characterizes the format of the files analyzed and makes an initial diagnosis of the files acceptability before ingestion.
Figure 1: Analysis result of a batch of files by the Frontin tool
From there, three options were quickly eliminated:
- requesting a new delivery in a format accepted by BnF (this option proved impossible)
- excluding the files (this made no sense for these research collections)
- accepting the files as is (they were not compatible with BnF’s format policy).
In the end, BnF was forced to carry out the transformation by itself.
The study
The study, which brought together preservation experts and collection managers from the departments concerned by the collections, began with a reminder of the nature, particularities and uses of the proprietary Adobe Photoshop software. It is indeed a tool for creating but also for retouching photographs. It is not, however, designed for creating posters, although it may have been used for this purpose in a roundabout way. The tool allows you to save an image editing project in its own format (PSD), generally including several “layers”, i.e. several layers of images, raster or vector, and/or text, which may be transparent, and whose ordered superimposition composes an image. It is thus possible to modify each layer separately. The PSD format, in recent versions of Photoshop, can also keep track of the most recent successive modifications.
Figure 2. Multi-layered Photoshop document (Apeloig collection)
Receiving a PSD file, which is a production format and not a distribution format, can therefore be an opportunity if one is interested in the genesis of the image. Furthermore, since the format can retain layers of text and vector images, elements of this type can be printed in large format without quality degradation. All this information is lost when producing a final version in a raster image format (JFIF/JPEG, TIFF, PNG, etc.). When producing a PDF output, the latter format retains the transparent, textual and vector layers separately, but not the traces of the creation process contained in the PSD file.
Figure 3. Photoshop document with textual elements (Apeloig collection)
In the process of defining a preservation strategy, our experts usually try to find a preferred target format for the given type of content, as well as a method of producing it, that captures all the information and functionality of the original. This work has been started for digitised images and for digital photographs. However, the PSD contents received were fundamentally different, because they were original graphic creations (Apeloig collection) and/or because they were in an intermediate state of production, where the creator’s intention can still evolve towards very different final realisations (Apeloig collection and Gitai collection). Between a PSD file and a TIFF file, a layman’s eye only sees distinct extensions. Examining the PSD files was an opportunity to learn, collectively, to adopt a more informed stance: there are potentially as many differences between these files as there are between a charcoal sketch and an oil on canvas.
An example? The operation of merging the different layers present in a PSD file, which is essential to produce a version in a final raster image format, is not as trivial as it seems. For example, one of the Gitai background files had an alpha (transparent) layer used, strangely enough, to crop the original image, but this element was disabled(2). A direct export in this case would not have taken the cropping into account. So which element should be used for the export? The presence of a layer to crop the image or the fact that this layer is not activated in the file received?
Figure 4. Photoshop document with a transparent layer (Gitai collection)
This uncertainty, as to the final rendering that the creator would have wished for, leads most heritage institutions to favour collecting the content frozen in its final state and in a format for dissemination(3) even specifying, like the Library of Congress, that it should be unlayered.
Despite a knowledge and competence that is still being developed, a choice had to be made. After long and fascinating considerations on the richness of the information contained in the PSD format and the difficulties of finding a format and a method capable of capturing all the information present in the original, the floor was given to the collection managers.
The collection managers responsible for the Apeloig collection wanted to allow their researchers to explore the traces of the creative process (layers, internal metadata, modification history). The ability to print the image at its original size was considered important only for the file intended to produce the final poster, not for the sketches. PDF was therefore initially considered as the target format, for consistency as it was the most represented format in the collection, and the most suitable for posters in a printable version. However, an additional requirement of the collection manager led to the decision: it had to be put online quickly, and therefore use the existing dissemination mechanisms, and the browsing, inside Gallica, the BnF’s digital library, between the various sketches grouped by packages, had to remain easy and efficient. The JFIF/JPEG format was therefore chosen.
Figure 5. A batch of images (Apeloig collection) viewed in Gallica
In contrast, for the documents in the Gitai collection, the digital photographs were themselves a record of the film’s creative process; keeping track of their production was secondary. Our collection managers would have preferred a version in a final release format. Furthermore, the naming of the PSD files suggested that their source was a JFIF file that had been altered. The JFIF/JPEG format was therefore chosen.
As for the images in the Michèle Laurent collection, insofar as they had been digitised and had no usable production history, their interest did not extend beyond the image flow of the single layer contained in the file. The transformation to the TIFF format, which is the majority format in this collection, was necessary.
In all cases, the irreversible nature of the transformation and the certainty of the loss of information, whether it could be measured or not, determined the BnF to keep the original file in the same information package.
In the absence of a preservation strategy that would allow us to find in the target file all the richness of the source file, in a format that meets our requirements (open, compact, stable, widespread, etc.), we preferred to seek a compromise policy. The aim was to reconcile a target format accepted by the BnF, a result that would reflect the producer’s intention and a method that could be applied consistently and homogeneously to the other files in the collection. This compromise implied accepting the loss of information resulting from the transformation, which was all the more difficult as the original file was kept in case a better transformation method came along in the future.
Thus, the experts’ concerns – quite legitimate in any case – to find a target format controlled by BnF that would make it possible to capture all the information contained in the PSD were usefully limited by the intervention of the collection managers to what was really relevant to the business, thus showing how fundamental the role of the collection manager is. Even if the collection manager is totally lacking in technical expertise, he/she is still able to express his/her preservation intention.
Implementation and control
In order to ensure a minimum of drift and given the small amount of files to be processed, the transformations were carried out manually by an expert in imaging using the proprietary tool that created the format: Photoshop, in preference to a “foreign” tool whose vocation is not to manage the .psd format. As BnF is not able to maintain each version of the software, the version used for processing differs from the one used by the artists to create their files. The experts assumed that it was less risky to convert a .psd via a more recent version of Photoshop than the original one. The version used was Photoshop 21.1.0.
The expert’s intervention ensured that the settings and handling of the software were carried out in the best possible conditions and with the best practices.
Once the transformation had been carried out, an initial technical check ensured that there was no chromatic drift and that the target image was representative. Then the collection managers were able to visually examine the images and validate the new files to ensure that they were suitable for the deposit.
The information packages were then created by integrating the two versions (original version and preservation version), knowing that only the transformed versions would be directly accessible and viewable by users. Several comments, in the form of PREMIS events, were also added to the file processing history to keep track of these transformations and to inform the user.
Figure 6. PREMIS event documenting the transformation
Conclusion
This concrete case of processing a particular proprietary format for which the BnF did not consider it appropriate to invest in the long term is rich in lessons for the processing of digital information. It is clear that the satisfactory resolution of the problem requires close collaboration and in-depth dialogue between the collection managers, who contextualize the collection and explain the preservation intention, and the preservation experts, who provide tools and objective assessments of both formats and transformations.
This dialogue is largely indebted to Trevor Owens for the notion of ‘preservation intent’, which he details in his book The Theory and Craft of Digital Preservation(4):
“A statement of preservation intent states precisely why content has been collected and what characteristics of that content need to be addressed in order for the content to be used for the purpose for which it was collected.”(5)
It is through the explicitness of this preservation intention that the loss of information through transformation is acceptable, and all the more so because it opens the way to a long-term preservation and access trajectory for the transformed object.
The example described in this article is only the first step in a process that is being formalized to allow the systematization of the treatment of similar cases that we will have the opportunity to develop in a future communication.
Finally, we feel it is important to emphasise three key points:
- preservation decisions must be taken and endorsed by the profession;
- the role of the experts is to assist the profession, to instruct it and to present it with the options best suited to its preservation intentions;
- even without specific technical knowledge, a collection manager is able to express and assert his or her preservation intention.
(1) Transformation is, according to the OAIS, a “Digital Migration in which the Content Information or Persistence Information (PDI) of an Archived Information Package (AIP) is modified. [Reference Model for an Open Information Archiving System (OAIS). Version 2 (June 2012), CCSDS 650.0-M-2 (E)].
(2) Non-activated layers are not displayed in the main preview of the software and will not be taken into account in the printing or merging of layers.
(3) So rather in PDF format than PSD, or DOCX for textual content, although the fixed and unalterable character of PDF is relative.
(4) See :
– Oh, you wanted us to preserve that?!, Colin Webb, David Pearson, Paul Koerbin, D-Lib Magazine, 2013. https://www.dlib.org/dlib/january13/webb/01webb.html
– Chapter 5 of The Theory and Craft of Digital Preservation, Trevor Owens, Baltimore: John Hopkins University Press, 2018.
(5) Ibid, p. 82.
Barbara Sierman
November 8, 2021 @ 9:44 AM Europe/Berlin
A small remark: the concept of the “preservation intent” originated from our colleagues in Australia: Colin Webb, David Pearson, Paul Koerbin. They published it in their D-Lib article in 2013. ‘Oh, you wanted us to preserve that?!’
Trevor Owens used several sources in his book without references to the original source, which could lead to wrong assumptions.