Beyond code: Versioning data with Git and Mercurial

  • Charlie Collett, California Digital Library, charlie.collett@ucop.edu
  • Martin Haye, California Digital Library, martin.haye@ucop.edu

code4lib 2012, Tuesday 7 February 2012, 10:20-10:40

Within a relatively short time since their introduction, distributed version control systems (DVCS) like Git and Mercurial have enjoyed widespread adoption for versioning code. It didn’t take long for the library development community to start discussing the potential for using DVCS within our applications and repositories to version data. After all, many of the features that have made some of these systems popular in the open source community to version code (e.g. lightweight, file-based, compressed, reliable) also make them compelling options for versioning data. And why write an entire versioning system from scratch if a DVCS solution can be a drop-in solution? At the California Digital Library (CDL) we’ve started using Git and Mercurial in some of our applications to version data. This has proven effective in some situations and unworkable in others. This presentation will be a practical case study of CDL’s experiences with using DVCS to version data. We will explain how we’re incorporating Git and Mercurial in our applications, describe our successes and failures and consider the issues involved in repurposing these systems for data versioning.

Slides attached below.