A Sustainable Future for FITS
As Paul mentioned here, FITS is a classic case of a great digital preservation tool that many of us use and benefit from but that wasn’t set up to accept community code contributions. Different versions of FITS were proliferating instead of dovetailing into a better product. For this reason we decided to take a look at the situation to see what we could do to change it.
First we looked at the current FITS codebase and all of the forks out there with the aim of merging all existing stable features and patches. While merging appears a rather trivial task, ensuring that the existing functionality is not broken afterwards, isn’t. This is especially tricky when there aren’t (m)any unit tests. Writing unit tests post-factum usually involves refactoring code for testability. As any seasoned developer out there will likely agree – refactoring a large code base without unit tests usually means one thing: bugs…
So how do you verify, with a relatively high level of confidence, that the code base still works as expected following the merge? Blackbox testing and git-bisect to the rescue!
In order to circumvent this in the limited time we had available for the FITS Blitz we decided to use blackbox testing. We created a FITS XML comparator, which compares the output files produced by different FITS versions. We also created an accompanying script that combines this comparison tool with git-bisect. For those of you who don’t know git-bisect, it's a tool that is able to pinpoint a specific commit within a git repository that introduced a problem. This is done with the help of a simple binary search and a test suite – in our case the FITS XML comparator.
We were able to go through the different branches and take the ones that didn’t break functionality, but leave the ones that still needed more work. After a result of all this merging during the FITS Blitz, the next version of FITS will include:
- A few minor performance optimisations
- The possibility to run FITS in a nailgun server
- Droid updated to version 6
- Apache Tika enhancements
- Numerous bug fixes
- Better error reporting
And the best thing is: these are all community improvements! Unfortunately, not all of the contributors have dared to hit the Pull Request Button on github and that is something we have to improve as a community.
In any case, having this simple way of validating that nothing major is broken has another advantage. We can now set up a continuous integration infrastructure that will help FITS maintainers to get further insight into future patches before merging them. Note, that this doesn’t mean that no unit tests should be written. Quite the opposite, creating a unit test suite and refactoring the core of FITS where necessary is the next logical step.
From this foundation, made possible with a Jisc-funded SPRUCE award, we will now work in partnership with interested members of the community to develop and maintain FITS in a way that we hope will give its users much greater belief in its reliability and ability to accept code contributions. To that end we're in the process of establishing a Steering Group that will meet regularly to review the status of FITS, manage a more sustainable development process, develop and champion community contributions to FITS, and create a development roadmap for the toolset. The Group will be composed of a variety of experienced FITS developers and users, and we'll be aiming to be as inclusive as possible within (in particular) the developer community.
So how will all this work in practice? When we've added the finishing touches to this phase of the work, Carl will be back to blog about the new development process and how you can get involved to make FITS better. We are in the process of setting up a new website for FITS to centralize (and improve!) the FITS documentation.
Our ultimate aim is to make FITS a community-maintained tool that is kept up to date with a reliable build at everyone's fingertips, and hopefully demonstrate a better way to sustain community-created preservation tools.
Petar Petrov, Carl Wilson, Andrea Goethals, Spencer McEwen and Paul Wheatley
By paul, posted in paul's Blog
There are no comments on this post.