Dependency Discovery Tool (for office files) – Code published

Dependency Discovery Tool (for office files) – Code published

Niklas is finishing up his time here at Archives New Zealand and we are pleased to be able to announce the first publication of code to come out of his work to identify office files with linked dependencies

“The Dependency Discovery Tool searches through binary office files (.doc, .xls and .ppt) and tries to find any documents or files that are linked to the document. 

It is written in java, using the Apache POI libraries ( http://poi.apache.org ) 

This project was part of a summer scholarship from the School of Engineering and Computer Science at Victoria University, Wellington (http://ecs.vuw.ac.nz) in conjunction with Archives New Zealand (http://archives.govt.nz ). 

At the moment it requires Java 6 to build and run, but this should change soon, and it will run on Java 5 upwards.”

The code can be found here: Dependency Discovery Tool Web Site

Features

  • Finds links in .xls, .doc and .ppt files.
  • Output to plain text, XML or CSV
  • Public API (http://officeddt.sourceforge.net/api)
  • Command-line interface
We are also hoping to incorporate this code into the NLNZ metadata extraction tool later this year. 

40
reads

3 Comments

  1. andy jackson
    February 7, 2012 @ 1:58 pm CET

    Ok, I’ve added my feedback to Trac, and also added a reference for this tool on the OPF Tool Registry: http://wiki.opf-labs.org/display/TR/Dependency+Discovery+Tool

    Thanks for the great work!

  2. ecochrane
    February 6, 2012 @ 8:12 pm CET

    I’m fairly sure Niklas wants feedback via Trac yes.  I’ll confirm with him post Waitangi Day. And thank you! — confirmed. Trac it is. 

  3. andy jackson
    February 3, 2012 @ 7:47 am CET

    I’ve had a quick experiment with the tool, and have a few bits of feedback. They are probably best captured as a set of issues – I assume you want such feedback via Trac?

Leave a Reply

Join the conversation