Web pages are getting more complex than ever. Thus, identifying different elements from web pages, such as main content, menus, user comments, advertising among others,...
Blogs
The Web is constantly evolving over time. Web content like texts, images, etc. are updated frequently. One of the major problems encountered by archiving systems...
Sarah McKenzie, a student completing a summer scholarship project with Victoria University and Archives New Zealand, blogs for the OPF on the work she is currently doing. Delving into the world of Electronic Document and Records Management Systems and the challenges of technical metadata extraction, she describes how the challenge is as much about understanding the range of EDRMS in use across the government horizon as it is about connecting the tools in the digital preservation toolkit to that range of systems. Sarah talks about how she went about that research, the technical work completed so far, and her goals in the remaining few weeks of the project.
First things first. The Github repository with the Audio QA workflows is here: https://github.com/statsbiblioteket/scape-audio-qa. And version 1 is working. Version is really all wrong here....
One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work...
Anyone willing to preserve digital content must be aware of events that might constitute a relevant risk. In SCAPE we are developing tools that will...
