You are here


You know, when people get together and talk about stuff.

The Case for Code4Lib 501c(3)

Roy Tennant

Libraries face tremendous challenges to create effective and responsive institutions in a Googlezon world. But the type of leadership we need so far hasn’t materialized. If it isn’t going to come from the administrators, let it come from the coders. In this talk I will build a case for establishing Code4Lib as a nonprofit library software cooperative. A financial structure would allow us to put real resources—both financial and human—into bringing libraries into the 21st century.

1,000 Lines of Code, and other topics from OCLC Research

Thomas Hickey

You can do a lot in 1,000 lines of code. A retrieval system blending a Dewey browser and an interactive ‘live’ search is presented which is currently implemented in about 1,000 lines. As the user types, the system searches for records, classifies those records into DDC categories, and displays them using standard protocols. The data structures needed to support this are computed in parallel using a Python implementation of map-reduce.

Other topics include discussing the reasoning behind OCLC’s recent shift to using version 2.0 of the Apache License to release our open source software.

--Thomas Hickey, Chief Scientist, OCLC

Chasing Babel

Devon Smith

"Two Paths to Interoperable Metadata” [1] proposed a model for metadata
translation that offers substantial gains over models based on the
current community standard, which usually involves an XSLT
implementation. In this presentation, I will discuss implementation
issues with the Semantic Equivalence Expression Language (Seel), our
alternative to XSLT [2]. I will show how Seel eases the complex task of
change management because it represents a more faithful computational
model of the metadata translation problem.


Devon Smith
Senior Software Engineer
Office of Research
OCLC Online Computer Library Center, Inc

Library Text Mining

Rob Sanderson

Using the TeraGrid1 and the SRB DataGrid2, we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools3[4] within the Cheshire35 information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California’s MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we’ll discuss
the results of applying new techniques to ‘old’ data.


Rob Sanderson, (

Standards, Reusability, and the Mating Habits of Learning Content

Robby Robson

Digital libraries are supposed to foster reuse of digital content but it is hard to combine content from different sources. We are building prototype software that (1) converts different types of courseware to an XML interchange format based on OpenDocument and other specs/standards (2) enables the content to be disaggregated, recombined, re-styled and endowed with SCORM reporting behaviors and (3) realizes instructional design through the use of the SCORM (or IMS) Simple Sequencing. Will demo, discuss and am happy to talk about the bigger picture of reusability in educational digital libraries and standards if given a longer slot.

Robby Robson,

Anatomy of aDORe

Ryan Chute

The aDORe Archive is a write-once/read-many storage approach for Digital Objects and their constituent datastreams. First, XML-based representations of multiple Digital Objects are concatenated into a single, valid XML file named an XMLtape. Second, ARC files, as introduced by the Internet Archive, are used to contain the constituent datastreams of the Digital Objects. The software was developed by the LANL Digital Library Research & Prototyping Team and is available under GNU LGPL license.

Ryan Chute
Los Alamos National Laboratory, Research Library

Quality Metrics

Aaron Krowne

This talk will discuss the core development activities of the “Quality
Metrics” project at Emory’s Woodruff Library. This project is being
conducted under an IMLS grant to research requirements for and build
a working prototype digital library search system.

What this project is doing that is new is truly generalizing and
integrating explicit and latent quality indicators which allow
users to ascertain the fitness of digital library resources. Most
search engine components have only one indicator: content-query
similarity (“relevance”). Google only has two, adding PageRank to the
latter. Our system, QM-search, will have an unlimited number of these,
which will be customizable by the digital librarian for the target
community and collections, and even customizeable from user to user or
search to search.

Some basic examples of quality indicators that digital libraries might
be able to exploit would be activations (views online or check-outs in
circulation), selection (compilation in “bookmark” lists online or
additions to course reserves lists), extent of review (from a peer-
reviewed journal, conference, or not?), or citation-based metrics.

The ouput of QM-search will be in a completely generalized XML format,
with the search results represented as a structure based on the
structure specified in the input “organization spec”. This XML output
can be transformed into presentation HTML resembling anything from a
“linear” Google-like search results list to an A9-like column display to
more exotic groupings and breakdowns.

Requirements for QM-search are being gleaned from focus groups being
conducted at Emory (preliminary results will be shared), and development
is being conducted as a high-level layer atop the excellent Lucene open
source search engine project.

—Aaron Krowne Head of Digital Library Research Emory University General Libraries President and Founder, Office: 404-712-2810 Cell: 404-405-5766

Connecting Everything with unAPI and OPA

Dan Chudnov

unAPI is a simple-to-use, simple-to-implement API for web sites that allows rich object access and can be easily layered over existing services like Atom, OpenSearch, OAI-PMH, or SRU. OPA is a general-purpose identifier resolver that wraps API calls to heavily-used but incompatible web services like those from Amazon, Flickr, and Pubmed.

Together they will do the same thing we do every code4libcon – try to take over the world!

[Update 2006-02-28: Slides are here, in pdf.]

What Blog Applications Can Teach Us About Library Software Architecture

Casey Bisson

The number of programmers in the library world is growing and our individual efforts have shown great promise, but they exist largely as a spectacle that few libraries can enjoy. We need better means to aggregate our efforts and share solutions that can be employed by libraries without programming staff.

Looking outside libraries, we see some interesting examples in the blog world. The blog world is growing with new bloggers every day, but the most interesting aspect is how many people with limited technical skills are using (maintaining and configuring) blog applications like WordPress or Moveable Type, and how quickly the contributions of the many plugin and theme developers are implemented on those blogs. What lessons can we learn from this and how might a library application built from those lessons work? Are some software architectures better at leveraging the network effects of the growing number of developers in our community than others?

I’m working on a project that attempts to answer those questions and I hope to release a public beta shortly (update: it's WPopac, online now). I’d like to demo it and ask for participation.

Update: slides posted.

Casey Bisson
E-Learning Application Developer
Plymouth State University
Plymouth, New Hampshire

Voting on Code4Lib 2006 Presentation Proposals

Vote for the Code4lib 2006 presentations!

Please log in to participate in voting!

You may choose up to 11 proposals.

Voting closes at January 9th 11PM EST.

The 11 proposals with the most votes win. In case of a tie, we will have a "run off" election tomorrow, January 10th at 5PM - 11PM EST.

I will be deleting all votes cast before 5PM EST unless you specifically tell me that you have to vote early. So, be sure to tell me. Seriously. Send an email to

Happy voting!


Subscribe to RSS - conferences