Hey, Dilbert. Where’s My Data?!

  • Thomas Barker, University of Pennsylvania, tbarker@pobox.upenn.edu

Code4Lib 2011, Tuesday 8 February, 11:30 - 11:50

Libraries are notorious for maintaining data in massively disparate systems such as databases, flat files, xml and web services. The data is rich and valuable to assessment, but extracting value from multiple systems is complex and time consuming. Yes, there are open source and commercial solutions available, but libraries have unique requirements that can be difficult to integrate into these products. Commercial options also tend to be overly complex or the cool features require an expensive enterprise edition.

With funding from the Institute of Museum and Library Services, UPenn is developing MetriDoc to address data integration headaches within the library, and support reporting requirements from management. MetriDoc’s mission is to provide an open source API / tool set where users can specify dataflows and use library based services to solve integration problems while MetriDoc worries about scalability and performance. MetriDoc accomplishes this with no complex xml configuration or scary SOA middleware, but instead uses a simple DSL where possible. Eventually the project will also include dashboards to assist with complex job management and data flow monitoring.

The first half the presentation briefly discusses MetriDoc’s architecture while the remainder of the presentation will include code samples to illustrate problems it can solve. Information on how to contribute or download MetriDoc will be provided as well.