Last Friday I ran a workshop at the BL trying to identify what I guess we might call significant properties of ebooks. This is to inform requirements for ebook characteristation tools developed as part of SCAPE and also help inform BL staff involved in ebook ingest projects. To this end I wasn't just interested in the theoretically interesting features that anyone can get excited about – and there are plenty of interesting things about ebooks – but rather what properties of ebooks where really important for the BL's core business. (In DP speak, the significant properties of ebooks as defined by the designated community of the Britsh Library!)
For this workshop we only invited Collections staff from non-technical backgrounds. It was suggested that we re-run it for other communities within the BL – collection content groups and developers perhaps. Certainly I think we would get different discussions with more technical folk who, for example, cared more about ebook representation information and internal structures of the ebook format.
While I had planned two sessions in the workshop and a fairly structured agenda we only had 4 non-DPT staff attend and as such it made sense to let the agenda develop on its own. I still think in a bigger group the two sessions in groups could work, but the materials remain untested. The plan was as follows:
- Using a number of ebook devices explore the books and create a list of properties of interest.
- Prioritise that list into business requirements for the BL using our old friend the user story.
We managed the first part pretty well and participants were very interested in exploring a set of ebooks (a selection of fiction, non-fiction including enhanced books with embedded audio) on a number of devices – an iPad, an iPad mini, a Kindle Fire, a Nook HD, a Kindle Keyboard, a Sony PRS-505 (clearly the Kindle's ancestor!) and an Elonex Ebook. The latter two are early examples of ebook readers and were, sadly, largely unusable. The Elonex was clearly underpowered and the Sony PRS-505's battery could no longer hold its charge. I included these in the selection deliberately to get people thinking about device preservation. Keeping the Sony PRS-505 going would be costly and things like iPads do not have user servicable batteries (though, unlike the Sony at least they remain operational while plugged into a USB charger!). I also provided a couple of (physical) books (of the same items) for comparison.
What struck me during this session was how much the device featured in the discussions. When I raised this with the group they said this was mainly because there were unfamiliar with the operation of the devices and this was a barrier to the content. It was stated quite categorically that the people at the workshop felt it important to separate content from reading device in BL systems. To do this we really need to be storing content in open formats that are not tied to devices or accounts but I think we knew that already.
I then gave a brief presentation on what we in DPT have considered important properties of ebooks that we should consider and used this as the start of a brainstorming session on the significant properties and came up with this list in no particular order:
- Interactivity – were book meets computer game – e.g. Interactive fiction ebooks, apps that are books, etc.
- Searching – whilst reading instead of consulting indexes, etc. and also full-text search via the catalogue
- Versions – who published this edition, when, etc. Ebooks can be remotely updated/removed.
- Authenticity – ebooks are easy to change and re-publish. There are plenty of cheap editions on the book stores from unknown "publishers".
- Accessiblity – text to speech support, manipulation of font size, colours, etc.
- Skills – the skills required to make use of an ebook – probably lacking for some of our researchers at present
- Social Context – reviews, ratings, recommendations, tweets, comments, tags, annotations, etc. that are associated with a book (typically part of a content seller's system).
- Language of the content and on-the-fly translation support
- Linked Resources – references, bibliography, extra content (such as PDFs of knitting patterns, print at home origami, additional appendices)
- Embedded Resources – images, audio, video, fonts, software, etc.
- Layout – where the words, images, etc. appear on the page.
- Structure – where chapters start and finish, what is a heading, what is the Table of Contents, etc.
- Citation – how when there are no page numbers?
- Metadata – embedded metadata at many levels (author by chapter for example) and the ability to embed further BL enhanced metadata
- Devices – the reader hardware and software
- Digital Rights & Restrictions – the BL has a policy on what it will and will not accept so we could quickly skim this one, but it is important to know if a document can be printed, cut and pasted, accessed on only one device at a time, etc. All of these restrictions seriously hamper preservation activivies (imagine doing conservation on the cover of a book that you could not touch!)
- Usage Statistics and Recording – it is reasonable to assume the ebook readers record and perhaps report statistics. The latest incarnation of the Kindle reading software for example will tell you how fast you read and how long you have to the end of the chapter. Handy if your train is near the station I guess.
- Content – related to searching, but preserving the words themselves.
That is a long list! We expanded on a few of them:
Devices
While it was felt that the devices were interesting and had cultural and historic value – something the BL may be interested in as part of the history of the book – keeping these devices was not considered a priority. Indeed, as previously mentioned, it was felt the devices got in the way of the content. I wondered what would happen if I'd provided Calibre on a laptop instead of or as well as the devices.
Layout
I showed a slide with a scan of E.E.Cumming's The Cubist Break-Up (not available an ebook) and we discussed The Waste Land and poetry in general. It is easy, using the font size, typeface, line-spacing and screen size to alter the formatting of a poem. One of the participants noted how a stanza that should've been on a single page was split across two. Another noted how the text of the Hobbit was not flowing correctly around an image no matter what settings were used. At the same time participants seemed happy they could alter the text as they saw fit. This suggests a need to be able to preserve text layout – perhaps in the form of hints – but have the ability to turn this off when necessary.
Linked Resources
This one provoked a lot of discussion. My colleague Will Palmer noted that we do not go great lengths to ensure every book referenced in every bibliography is also available to readers at the BL. Given that, why should we want to preserve links found in ebooks. Some argued that it was a question of expectation. A reader of a physical book expects to have to do some leg work to find references, but an ebook user expects that any links will work or at least resolve to something useful. (This raises a bigger issue of use of ebooks in reading rooms on restricted networks, but that isn't a preservation problem). Further, some content is probably more important to obtain than others. Bibliographic links perhaps can be left but what about additional content omitted from the book and thus the ebook and only downloadable? The BL separates CD-ROMs from books and holds these separately. Should we do the same for downloadable content? Do we need to define an ebook as an agregation and preserve that rather than just the book itself? How does all this hook into the Web Archive?
Having created our list we spent the last half hour or so identifying those we felt mattered most to the BL and came up with this subset:
- Searching
- Linked Resources
- Digital Rights and Restrictions
- Structure – internal dictionaries, table of contents, etc.
- Metadata
- Layout and layout hints
- Versions
- Authenticity
- Content
I would have liked to have explored these further including working them into user stories but we were out of time. Hopefully the workshop will be run again and we can find out more and if you want to repeat the whole thing at your institution and add to the debate that would make me very happy!
Slides and other materials are all on GitHub.
andy jackson
September 5, 2013 @ 12:26 PM Europe/Berlin
I really like this approach – it reminds me of the NLA's Statements of Preservation Intent work (more here). After all, if significance is in the eye of the stakeholder, then we must talk to our stakeholders!