You are here


You know, when people get together and talk about stuff.

Evolving Towards a Consortium MARCR Redis Datastore

Evolving Towards a Consortium MARCR Redis Datastore

Slide presentation

  • Jeremy Nelson, Colorado College,
  • Sheila Yeh, University of Denver,

The current state of technology in library automation is not keeping pace with the explosive growth in information storage and retrieval system. The lag costs institutions as well as users’ resource discovery. To address this problem, we should look into how successfully enterprise such as Craigslist and StackOverflow manage and scale their enormous volume of data. The key lies in the Redis, a NoSQL open source advanced key-value data structure server. Therefore, Colorado College and the University of Denver, along with the Colorado Alliance of Research Libraries are exploring and co-developing a MARCR Redis Datastore. It is a peer-to-peer bibliographic datastore, modeled using the Library of Congress Bibliographic Framework's new Linked Data based MARC 21 replacement, called MARCR (MARC Resources). The structure of MARCR leads itself to an advanced Consortium catalog where a Work is cataloged once and multiple institutions have complete control over their own Instances of the Work, de-duplicating cataloging efforts while supporting real-time resource sharing between the Instances. Control, access, and discovery of records in the proposed MARCR Redis Datastore are provided through lightweight HTML5 responsive apps built with Django, Bootstrap, and KnockoutJS that also integrate with both open-source and commercial discovery products.

Redis offers many advantages for a shared MARCR bibliographic datastore, such as speed, scalability, and ease-of-deployment. Especially it can support multiple cloud models that benefits institution of various size and capital. We will demonstrate a MVP (Minimal Viable Product) iteration of this MARCR Datastore using the transformed MARC 21 records from Colorado College and the University of Denver into Redis with coordination by Colorado Alliance of Research Libraries.

Download the video

n Characters in Search of an Author

n Characters in Search of an Author

  • Jay Luker, IT Specialist, Smithsonian Astrophysics Data System,

When it comes to author names the disconnect between our metadata and what a user might enter into a search box presents challenges when trying to maximize both precision and recall [0]. When indexing a paper written by "Wäterwheels, A" a goal should be to preserve as much as possible the original information. However, users searching by author name may frequently omit the diaeresis and search for simply, "Waterwheels". The reverse of this scenario is also possible, i.e., your decrepit metadata contains only the ASCII, "Supybot, Zoia", whereas the user enters, "Supybot, Zóia". If recall is your highest priority the simple solution is to always downgrade to ASCII when indexing and querying. However this strategy sacrifices precision, as you will be unable to provide an "exact" search, necessary in cases where "Hacker, J" and "Häcker, J" really are two distinct authors.

This talk will describe the strategy ADS[1] has devised for addressing common and edge-case problems faced when dealing with author name indexing and searching. I will cover the approach we devised to not only the transliteration issue described above, but also how we deal with author initials vs. full first and/or middle names, authors who have published under different forms of their name, authors who change their names (wha? people get married?!). Our implementation relies on Solr/Lucene[2], but my goal is an 80/20 mix of high- vs. low-level details to keep things both useful and stackgnostic [3].


Practical Relevance Ranking for 10 million books.

Practical Relevance Ranking for 10 million books

  • Tom Burton-West, University of Michigan Library,

HathiTrust Full-text search indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

  • Length normalization for indexing the full-text of book-length documents
  • Indexing granularity for books
  • Testing new features in Solr 4.0:
    • New ranking formulas that should work better with book-length documents: BM25 and DFR.
    • Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?
    • Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?
  • Relevance testing methodologies:Query log analysis, Click models, Interleaving, A/B testing, and Test collection based evaluation.
  • Testing of a new high-performance storage system to be installed in early 2013. We will report on any tests we are able to run prior to conference time.

Download slides:

Download the video

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

Slide presentation

  • Michael Klein, Senior Software Developer, Northwestern University LIbrary, michael.klein AT northwestern DOT edu
  • Nathan Rogers, Programmer/Analyst, Indiana University, rogersna AT indiana DOT edu

Based on the success of the Variations digital music platform, Indiana University and Northwestern University have developed a next generation educational tool for delivering multimedia resources to the classroom. The Avalon Media System (formerly Variations on Video) supports the ingest, media processing, management, and access-controlled delivery of library-managed video and audio collections. To do so, the system draws on several existing, mature, open source technologies:

  • The ingest, search, and discovery functionality of the Hydra framework
  • The powerful multimedia workflow management features of Opencast Matterhorn
  • The flexible Engage audio/video player
  • The streaming capabilities of both Red5 Media Server (open source) and Adobe Flash Media Server (proprietary)

Extensive customization options are built into the framework for tailoring the application to the needs of a specific institution.

Our goal is to create an open platform that can be used by other institutions to serve the needs of the academic community. Release 1 is planned for a late February launch with future versions released every couple of months following. For more information visit and

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

Slide presentation

  • Trevor Thornton, New York Public Library,

The New York Public Library is reengineering its system for delivering archival finding aids on the Web. The foundation of this system is a data management application, written in Rails, within which collections and their components are managed as associated model instances, and descriptive data is stored natively as JSON and HTML. Front-end applications interact with the back-end via a flexible API that is capable of returning any part of the description at any level. This approach provides a number of benefits over the traditional XML/XSLT approach:

  • Data is stored natively in the format in which it is needed by the front-end application, making rendering much faster
  • Finding aid data can be lazy-loaded via AJAX requests
  • Enables presentation of the archival description beyond the traditional finding aid structure (alternate arrangements, visualizations, etc.)
  • Links to digital assets can be maintained independently of archival description
  • Data cleanup and normalization can be accomplished during and/or after ingest of original data into the system, ensuring data quality and consistency
  • Data is stored in a schema-neutral format, enabling easy transformation into other formats as required (e.g. RDF for semantic web applications, future version(s) of EAD schema for harvesting, etc.)

In this session I will describe the architecture of this system and its data model, and discuss the challenges presented in the design process.

Hacking the DPLA

Hacking the DPLA

Slide presentation

  • Nate Hill, Chattanooga Public Library, nathanielhill AT
  • Sam Klein, Wikipedia, metasj AT

The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.

Come learn what we are doing, how to contribute or hack the DPLA roadmap, and how you (or your favorite institution) can draw from and publish through it. Larger institutions can join as a (content or service) hub, helping to aggregate and share metadata and services from across their {region, field, archive-type}. We will discuss current challenges and possibilities (UI and API suggestions wanted!), apps being built on the platform, and related digitization efforts.

DPLA has a transparent community and planning process; new participants are always welcome. Half the time will be for suggestions and discussion. Please bring proposals, problems, partnerships and possible paradoxes to discuss.

Pitfall! Working with Legacy Born Digital Materials in Special Collections

Pitfall! Working with Legacy Born Digital Materials in Special Collections

Slide presentation

  • Donald Mennerich, The New York Public Library, don.mennerich AT
  • Mark A. Matienzo, Yale University Library, mark AT

Archives and special collections are being faced with a growing abundance of born digital material, as well as an abundance of many promising tools for managing them. However, one must consider the potential problems that can arise when approaching a collection containing legacy materials (from roughly the pre-internet era). Many of the tried and true, "best of breed" tools for digital preservation don't always work as they do for more recent materials, requiring a fair amount of ingenuity and use of "word of mouth tradecraft and knowledge exchanged through serendipitous contacts, backchannel conversations, and beer" (Kirschenbaum, "Breaking badflag").

Our presentation will focus on some of the strange problems encountered and creative solutions devised by two digital archivists in the course of preserving, processing, and providing access to collections at their institutions. We'll be placing particular particular emphasis of the pitfalls and crocodiles we've learned to swing over safely, while collecting treasure in the process. We'll address working with CP/M disks in collections of authors' papers, reconstructing a multipart hard drive backup spread across floppy disks, and more.

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App

Slide presentation

  • Dan Coughlin, Penn State University,
  • Mike Giarlo, Penn State University,

ScholarSphere is a web application that allows the Penn State research community to deposit, share, and manage its scholarly works. It is also, as some of our users and our peers have observed, a repository app that feels much more like Google Docs or GitHub than earlier-generation repository applications. ScholarSphere is built upon the Hydra framework (Fedora Commons, Solr, Blacklight, Ruby on Rails), MySQL, Redis, Resque, FITS, ImageMagick, jQuery, Bootstrap, and FontAwesome. We'll talk about techniques we used to:

  • eliminate Fedora-isms in the application
  • model and expose RDF metadata in ways that users find unobtrusive
  • manage permissions via a UI widget that doesn't stab you in the face
  • harvest and connect controlled vocabularies (such as LCSH) to forms
  • make URIs cool
  • keep the app snappy without venturing into the architectural labyrinth of YAGNI
  • build and queue background jobs
  • expose social features and populate activity streams
  • tie checksum verification, characterization, and version control to the UI
  • let users upload and edit multiple files at once

The application will be demonstrated; code will be shown; and we solemnly commit to showing ABSOLUTELY NO XML.

Code4Lib 2013 Scholarship (deadline: December 14, 2012)

Oregon State University and the Digital Library Federation are sponsoring five scholarships to promote gender and cultural diversity. Each scholarship will provide up to $1,000 to cover travel costs and conference fees for one qualified attendee to attend the 2013 Code4Lib Conference, which will be held in Chicago, Illinois, from Monday,February 11 through Thursday, February 14. The Code4Lib scholarship committee will award two scholarships per category, awarding the remaining scholarship to the best remaining candidate in either category. The Code4Lib scholarship committee will award these scholarships based on merit and need.


Applicants, if eligible, may apply for both scholarships, but no applicant will receive more than one scholarship. Past winners of either scholarship are not eligible for either scholarship. Scholarship recipients will be required to write a short trip report to be submitted to the scholarships committee by February 17, 2012.


For more information on the Code4Lib Conference, please see the conference website:

and write-ups of previous Code4Lib Conferences:


The Gender Diversity Scholarships will provide up to $1,000 to cover travel costs and conference fees for two qualified applicants to attend the 2013 Code4Lib Conference. Any woman or transgendered person who is interested in actively contributing to the mission and goals of the Code4Lib Conference is encouraged to apply.


The Minority Scholarships will provide up to $1,000 to cover travel costs and conference fees for two qualified applicants to attend the 2013 Code4Lib Conference. To qualify for this scholarship, an applicant must be interested in actively contributing to the mission and goals of the Code4Lib Conference and must be of Hispanic or Latino, Black or African-American, Asian, Native Hawaiian or Pacific Islander, or American Indian or Alaskan Native descent.


To apply, please send an email to Jeremy Frumkin ( with the following (preferably combined into a single PDF, if possible):

  • Indication of which scholarship (Gender or Minority or both) to which you are applying
  • A brief letter of interest, which:
    1. Describes your interest in the conference and how you intend to participate
    2. Discusses your statement of need
    3. Indicates your eligibility
  • A résumé or CV
  • Contact information for two professional or academic references

The application deadline is Dec. 14, 2012.
The scholarship committee will notify successful candidates the week of Jan. 1, 2013.

Code4Lib 2013 Call for Host Proposals

The Code4Lib Conference Planning Group is calling for
proposals to host the 2013 Code4Lib Conference.  Information on the
kind of venue we seek and the delineation of responsibilities between
the host organization and the Planning Group can be found at the
conference hosting web page [1] and on the Code4Lib Wiki [2].

The deadline for proposals is Sunday January 22, 2012. The
decision will be made over the course of the following weeks by a
popular vote.  Voting will begin on or around Wednesday January 25,
2012 and will continue through the first three days of Code4Lib 2012 until
11:59PM Pacific on Wednesday, February 8th. The results of the vote will be
announced on Thursday, February 9th, the final day of Code4Lib 2012.
You can apply by making your pitch to the Code4Lib Conference Planning
list [3]; attention to the criteria listed on the conference hosting
page is appreciated.  May the best site win!

Feel free to take a look at the winning proposal from 2012

and past hosting proposals from 2011 for ideas:





Subscribe to RSS - conferences