You are here


You know, when people get together and talk about stuff.

A Better Advanced Search - Naomi Dushay and Jessie Keck - Code4Lib 2010

A Better Advanced Search

  • Naomi Dushay, Stanford University,
  • Jessie Keck, Stanford University,

Code4Lib 2010 - Wednesday, February 24 - 13:00-13:20

Even though we'd love to get basic searches working so well that advanced search wouldn't be necessary, there will always be a small set of users that want it, and there will always be some library searching needs that basic searching can't serve. Our user interface designer was dissatisfied with many aspects of advanced search as currently available in most library discovery software; the form she designed was excellent but challenging to implement. See We'll share details of how we implemented Advanced Search in Blacklight:

  1. non-techie designed html form for the user
  2. boolean syntax while using Solr dismax magic (dismax does not speak Boolean)
  3. checkbox facets (multiple facet value selection)
  4. fielded searching while using Solr dismax magic (dismax allows complex weighting formulae across multiple author/title/subject/... fields, but does not allow "fielded" searching in the way lucene does) - easily configured in solrconfig.xml
  5. manipulating user entered queries before sending them to Solr
  6. making advanced search results look like other search results: breadcrumbs, selectable facets, and other fun.

I'm sure slides will be made available on the code4lib site, but in the meantime, you can see them at

The document explaining the specifics of the Solr queries is

Stanford's SearchWorks Solr configuration files:

Naomi says: "Jessie and I are working on getting a version of this into ProjectBlacklight, but we're not quite there yet."

Ask Anything! – Facilitated by Dan Chudnov - Code4Lib 2010

Ask Anything!

  • Dan Chudnov, dchud at umich edu

Code4Lib 2010 - Wednesday, February 24 - 11:15-12:00

a.k.a. "Human Search Engine". A chance for you to ask a roomful of code4libbers anything that's on your mind: questions seeking answers (short or long), requests for things (hardware, software, skills, or help), or offers of things. We'll keep the pace fast, and the answers faster. Come with questions and line up at the start of the session and we'll go through as many as we can; sometimes we'll stop at finding the right person or people to answer a query and it'll be up to you to find each other after the session. First time at code4libcon! (Thanks to Ka-Ping Yee for the inspiration/explanation, reused here in part.)

Becoming Truly Innovative: Migrating from Millennium to Koha - Ian Walls - Code4Lib 2010

Becoming Truly Innovative: Migrating from Millennium to Koha

  • Ian Walls, (formerly) System Integration Librarian, NYU Health Sciences Libraries, (currently) Ian.Walls at

Code4Lib 2010 - Wednesday, February 24 - 10:55-11:15

On Sept. 1st, 2009, the NYU Health Sciences Libraries made the unprecedented move from their Millennium ILS to Koha. The migration was done over the course of 3 months, without assistance from either Innovative Interfaces, Inc. or any Koha vendor. The in-house script, written in Perl and XSLT, can be used with any Millennium installation, regardless of which modules have been purchased, and can be adapted to work for migration to systems other than Koha. Helper scripts were also developed to capture the current circulation state (checkouts, holds and fines), and do minor data cleanup.

This presentation will cover the planning and scheduling of the migration, as well as an overview of the code that was written for it. Opportunities for systems integration and development made newly available by having an open source platform are also discussed.

Media, Blacklight, and Viewers Like You (pdf, 2.61MB) - Chris Beer - Code4Lib 2010

Media, Blacklight, and Viewers Like You

  • Chris Beer, WGBH,

Code4Lib 2010 - Wednesday, February 24 - 10:35-10:55

There are many shared problems (and solutions) for libraries and archives in the interest of helping the user. There are also many "new" developments in the archives world that the library communities have been working on for ages, including item-level cataloging, metadata standards, and asset management. Even with these similarities, media archives have additional issues that are less relevant to libraries: the choice of video players, large file sizes, proprietary file formats, challenges of time-based media, etc. In developing a web presence, many archives, including the WGBH Media Library and Archives, have created custom digital library applications to expose material online. In 2008, we began a prototyping phase for developing scholarly interfaces by creating a custom-written PHP front-end to our Fedora repository.

In late 2009, we finally saw the (black)light, and after some initial experimentation, decided to build a new, public website to support our IMLS-funded /Vietnam: A Television History/ archive (as well as existing legacy content). In this session, we will share our experience of and challenges with customizing Blacklight as an archival interface, including work in rights management, how we integrated existing Ruby on Rails user-generated content plugins, and the development of media components to support a rich user experience.

Slides in PDF (2.61 MB)

I Am Not Your Mother: Write Your Test Code - Naomi Dushay, Willy Mene, and Jessie Keck - Code4Lib 2010

I Am Not Your Mother: Write Your Test Code

  • Naomi Dushay, Stanford University,
  • Willy Mene, Stanford University,
  • Jessie Keck, Stanford University,

Code4Lib 2010 - Wednesday, February 24 - 09:55-10:15

How is it worth it to slow down your code development to write tests? Won't it take you a long time to learn how to write tests? Won't it take longer if you have to write tests AND develop new features, fix bugs? Isn't it hard to write test code? To maintain test code? We will address these questions as we talk about how test code is crucial for our software. By way of illustration, we will show how it has played a vital role in making Blacklight a true community collaboration, as well as how it has positively impacted coding projects in the Stanford Libraries.


Vampires vs. Werewolves: Ending the War Between Developers and Sysadmins with Puppet - Bess Sadler - Code4Lib 2010

Vampires vs. Werewolves: Ending the War Between Developers and Sysadmins with Puppet

  • Bess Sadler, University of Virginia,

Code4Lib 2010 - Wednesday, February 24 - 09:35-09:55

Developers need to be able to write software and deploy it, and often require cutting edge software tools and system libraries. Sysadmins are charged with maintaining stability in the production environment, and so are often resistant to rapid upgrade cycles. This has traditionally pitted us against each other, but it doesn't have to be that way. Using tools like puppet for maintaining and testing server configuration, nagios for monitoring, and hudson for continuous code integration, UVA has brokered a peace that has given us the ability to maintain stable production environment with a rapid upgrade cycle. I'll discuss both the individual tools, our server configuration, and the social engineering that got us here.

Presentation (PDF)

iBiblio copy of presentation (PDF)

Iterative Development Done Simply - Emily Lynema - Code4Lib 2010

Iterative Development Done Simply

  • Emily Lynema, North Carolina State University Libraries,

Code4Lib 2010 - Wednesday, February 24 - 09:15-09:35

With a small IT unit and a wide array of projects to support, requests for development from business stakeholders in the library can quickly spiral out of control. To help make sense of the chaos, increase the transparency of the IT "black box," and shorten time lag between requirements definition and functional releases, we have implemented a modified Agile/SCRUM methodology within the development group in the IT department at NCSU Libraries.

This presentation will provide a brief overview of the Agile methodology as an introduction to our simplified approach to iteratively handling multiple projects across a small team. This iterative approach allows us to regularly re-evaluate requested enhancements against institutional priorities and more accurately estimate timelines for specific units of functionality. The presentation will highlight how we approach each development cycle (from planning to estimating to re-aligning) as well as some of the actual tools and techniques we use to manage work (like JIRA and Greenhopper). It will identify some challenges faced in applying an established development methodology to a small team of multi-tasking developers, the outcomes we've seen, and the areas we'd like to continue improving. These types of iterative planning/development techniques could be adapted by even a single developer to help manage a chaotic workplace.

Slides in Powerpoint (2.15 MB)

Metadata Editing – A Truly Extensible Solution - David Kennedy and David Chandek-Stark - Code4Lib 2010

Metadata Editing – A Truly Extensible Solution

  • David Kennedy, Duke University,
  • David Chandek-Stark, Duke University,

Code4Lib 2010 - Tuesday, February 23 - 14:00-14:20

We set out in the Trident project to create a metadata tool that scales. In doing so we have conceived of the metadata application profile, a profile which provides instructions for software on how to edit metadata. We have built a set of web services and some web-based tools for editing metadata. The metadata application profile allows these tools to extend across different metadata schemes, and allows for different rules to be established for editing items of different collections. Some features of the tools include integration with authority lists, auto-complete fields, validation and clean integration of batch editing with Excel. I know, I know, Excel, but in the right hands, this is a powerful tool for cleanup and batch editing.

In this talk, we want to introduce the concepts of the metadata application profile, and gather feedback on its merits, as well as demonstrate some of the tools we have developed and how they work together to manage the metadata in our Fedora repository.

Link: Trident Project site

Slides in Google Docs
Slides (PDF)

HIVE: A New Tool for Working With Vocabularies - Ryan Scherle and Jose Aguera - Code4Lib 2010

HIVE: A New Tool for Working With Vocabularies

  • Ryan Scherle, National Evolutionary Synthesis Center,
  • Jose Aguera, Universitty of North Carolina,

Code4Lib 2010 - Tuesday, February 23 - 13:40-14:00

HIVE is a toolkit that assists users in selecting vocabulary and ontology terms to annotate digital content. HIVE combines the ease of folksonomies with the rigor of traditional vocabularies. By combining semantic web standards with text mining techniques, HIVE will improve the effectiveness of subject metadata generation, allowing users to search and browse terms from a variety of vocabularies and ontologies. Documents can be submitted to HIVE to automatically generate suggested vocabulary terms.

Your system can interact with common vocabularies such as LCSH and MESH via the central HIVE server, or you can install a local copy of HIVE with your own custom set of vocabularies. This talk will give an overview of the current features of HIVE and describe how to build tools that use the HIVE services.

Slides in PowerPoint

Matching Dirty Data – Yet Another Wheel - Anjanette Young and Jeff Sherwood - Code4Lib 2010

Matching Dirty Data – Yet Another Wheel

  • Anjanette Young, University of Washington Libraries, younga3 at u washington edu
  • Jeff Sherwood, University of Washington Libraries, jeffs3 at u washington edu

Code4Lib 2010 - Tuesday, February 23 - 13:20-13:40

Regular expressions is a powerful tool to identify matching data between similar files. When one or both of these files has inconsistent data due to differing character encodings or miskeying, the use of regular expressions to find matches becomes impractically complex.

The Levenshtein distance (LD) algorithm is a basic sequence comparison technique that can be used to measure word similarity more flexibly. Employing the LD to calculate difference eliminates the need to identify and code into regex patterns all of the ways in which otherwise matching strings might be inconsistent. Instead, a similarity threshold is tuned to identify close matches while eliminating false positives.

Recently, the UW Libraries began an effort to store Electronic Theses and Dissertations (ETD) in our institutional repository which runs on DSpace. We received 6,756 PDFs along with a file of UMI-created MARC records which needed to be matched to our library's custom MARC records (60,175 records). Once matched, merged information from both records would be used to create the dublin_core.xml file needed for batch ingest into DSpace. Unfortunately, records within the MARC data had no common unique identifiers to facilitate matching. Direct matching by title or author was impractical due to slight inconsistencies in data entry. Additionally, one of the files had "flattened" characters in title and author fields to ASCII. We successfully employed LD to match records between the two files before merging them.

This talk demonstrates one method of matching sets of MARC records that lack common unique identifiers and might contain slight differences in the matching fields. It will cover basic usage of several python tools. No large stack traces, just the comfort of pure python and basic computational algorithms in a step-by-step presentation on dealing with an old library task: matching dirty data. While much literature exists on matching/merging duplicate bibliographic records, most of this literature does not specify how to accomplish the task, just reports on the efficiency of the tools used to accomplish the task, often within a larger system such as an ILS.

Slides on Slideshare
Presentation Slides (PDF)


Subscribe to RSS - conferences