Dependency Discovery Tool (for office files) – Code published

Niklas is finishing up his time here at Archives New Zealand and we are pleased to be able to announce the first publication of code to come out of his work to identify office files with linked dependencies

“The Dependency Discovery Tool searches through binary office files (.doc, .xls and .ppt) and tries to find any documents or files that are linked to the document. 

It is written in java, using the Apache POI libraries ( http://poi.apache.org ) 

This project was part of a summer scholarship from the School of Engineering and Computer Science at Victoria University, Wellington (http://ecs.vuw.ac.nz) in conjunction with Archives New Zealand (http://archives.govt.nz ). 

At the moment it requires Java 6 to build and run, but this should change soon, and it will run on Java 5 upwards.”

The code can be found here: Dependency Discovery Tool Web Site

Features

  • Finds links in .xls, .doc and .ppt files.
  • Output to plain text, XML or CSV
  • Public API (http://officeddt.sourceforge.net/api)
  • Command-line interface
We are also hoping to incorporate this code into the NLNZ metadata extraction tool later this year. 

By Euan Cochrane, posted in Euan Cochrane's Blog

1st Feb 2012  8:22 PM  13104 Reads  3 Comments

Comments

There are no comments on this post.


Leave a comment