Blogs Archive - Page 43 of 89 - Open Preservation Foundation

SCAPE QA Tool: Technologies behind Pagelyzer – II Web Page Segmentation

Web pages are getting more complex than ever. Thus, identifying different elements from web pages, such as main content, menus, user comments, advertising among others,...

SCAPE QA Tool: Technologies behind Pagelyzer – I Support Vector Machine

The Web is constantly evolving over time. Web content like texts, images, etc. are updated frequently. One of the major problems encountered by archiving systems...

EDRMS across New Zealand’s Government – Challenges with even the most managed of records management systems!

Sarah McKenzie, a student completing a summer scholarship project with Victoria University and Archives New Zealand, blogs for the OPF on the work she is currently doing. Delving into the world of Electronic Document and Records Management Systems and the challenges of technical metadata extraction, she describes how the challenge is as much about understanding the range of EDRMS in use across the government horizon as it is about connecting the tools in the digital preservation toolkit to that range of systems. Sarah talks about how she went about that research, the technical work completed so far, and her goals in the remaining few weeks of the project.

Developing an Audio QA workflow using Hadoop: Part II

First things first. The Github repository with the Audio QA workflows is here: https://github.com/statsbiblioteket/scape-audio-qa. And version 1 is working. Version is really all wrong here....

Why can’t we have digital preservation tools that just work?

One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work...

SCAPE survey on preservation monitoring. Participate now!

Anyone willing to preserve digital content must be aware of events that might constitute a relevant risk. In SCAPE we are developing tools that will...