conferences

Practical Relevance Ranking for 10 million books.

Practical Relevance Ranking for 10 million books

  • Tom Burton-West, University of Michigan Library, tburtonw@umich.edu

HathiTrust Full-text search indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

  • Length normalization for indexing the full-text of book-length documents
  • Indexing granularity for books
  • Testing new features in Solr 4.0:
    • New ranking formulas that should work better with book-length documents: BM25 and DFR.
    • Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?
    • Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

Slide presentation

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

Slide presentation

Hacking the DPLA

Hacking the DPLA

Slide presentation

  • Nate Hill, Chattanooga Public Library, nathanielhill AT gmail.com
  • Sam Klein, Wikipedia, metasj AT gmail.com

The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.

Come learn what we are doing, how to contribute or hack the DPLA roadmap, and how you (or your favorite institution) can draw from and publish through it. Larger institutions can join as a (content or service) hub, helping to aggregate and share metadata and services from across their {region, field, archive-type}. We will discuss current challenges and possibilities (UI and API suggestions wanted!), apps being built on the platform, and related digitization efforts.

DPLA has a transparent community and planning process; new participants are always welcome. Half the time will be for suggestions and discussion. Please bring proposals, problems, partnerships and possible paradoxes to discuss.

Pitfall! Working with Legacy Born Digital Materials in Special Collections

Pitfall! Working with Legacy Born Digital Materials in Special Collections

Slide presentation

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App

Slide presentation

Code4Lib 2013 Scholarship (deadline: December 14, 2012)

Oregon State University and the Digital Library Federation are sponsoring five scholarships to promote gender and cultural diversity. Each scholarship will provide up to $1,000 to cover travel costs and conference fees for one qualified attendee to attend the 2013 Code4Lib Conference, which will be held in Chicago, Illinois, from Monday,February 11 through Thursday, February 14. The Code4Lib scholarship committee will award two scholarships per category, awarding the remaining scholarship to the best remaining candidate in either category. The Code4Lib scholarship committee will award these scholarships based on merit and need.

Code4Lib 2013 Call for Host Proposals

The Code4Lib Conference Planning Group is calling for
proposals to host the 2013 Code4Lib Conference.  Information on the
kind of venue we seek and the delineation of responsibilities between
the host organization and the Planning Group can be found at the
conference hosting web page [1] and on the Code4Lib Wiki [2].

The deadline for proposals is Sunday January 22, 2012. The
decision will be made over the course of the following weeks by a
popular vote.  Voting will begin on or around Wednesday January 25,
2012 and will continue through the first three days of Code4Lib 2012 until
11:59PM Pacific on Wednesday, February 8th. The results of the vote will be
announced on Thursday, February 9th, the final day of Code4Lib 2012.
You can apply by making your pitch to the Code4Lib Conference Planning
list [3]; attention to the criteria listed on the conference hosting
page is appreciated.  May the best site win!

Feel free to take a look at the winning proposal from 2012
https://sites.google.com/site/code4lib2012seattle/

and past hosting proposals from 2011 for ideas:

https://wiki.dlib.indiana.edu/display/EVENTS/Code4Lib+2011+Proposal
http://www.library.yale.edu/~dlovins/c4l/code4lib2011.html
http://sites.google.com/site/code4libvancouver2011

and

1. http://code4lib.org/conference/hosting
2. http://wiki.code4lib.org/index.php/How_To_Plan_A_Code4LibCon
3. code4libcon@googlegroups.com

Code4Lib 2012 Scholarship (deadline: December 9, 2011)

 
Oregon State University and the Digital Library Federation are sponsoring five scholarships to promote gender and cultural diversity. Each scholarship will provide up to $1,000 to cover travel costs and
conference fees for one qualified attendee to attend the 2012 Code4Lib Conference, which will be held in Seattle, Washington, from Monday,February 6 through Thursday, February 9.

The Code4Lib scholarship committee will award two scholarships per category, awarding the remaining scholarship to the best remaining candidate in either category. The Code4Lib scholarship committee will
award these scholarships based on merit and need.
 

ELIGIBILITY:
Applicants, if eligible, may apply for both scholarships, but no applicant will receive more than one scholarship. Past winners of either scholarship are not eligible for either scholarship. Scholarship recipients will be required to write a short trip report to be submitted to the scholarships committee by February 17, 2012.
 

CONFERENCE INFO:
For more information on the Code4Lib Conference, please see the conference website: http://code4lib.org/conference/2012/
 
and write-ups of previous Code4Lib Conferences:

Code4Lib 2012 Schedule

The schedule for the 2012 Code4Lib Conference in Seattle, Washington.

DIRECT LINK TO LIVE STREAM

Sign-up for Lightning Talks


Monday, February 6

Pre-Conference day overview:

08:00-09:00 - Registration / coffee
09:00-16:30 - Full Day sessions

Developing applications using REST web services

09:00-12:00 - Morning sessions

Linkfest | wiki page for notes, etc

What's New in Solr

Knocking Down Silos: Tools and Approaches for Simplifying Discovery

12:00-13:30 - Lunch (on your own)
13:30-16:30 - Afternoon sessions

Git -r done

Blacklight

DACS and EAD Overview

Digging into metadata: context, code, and collaboration

"Geo"

16:15-21:00 - Evening session

Microsoft Campus Visit

Conference Program

Tuesday, February 7

08:00-09:00 - Registration / Continental Breakfast
09:00-09:15 - Welcome
09:15-10:00 - Keynote #1 - Dan Chudnov
10:00-10:20 - Break (beverage service only)
10:20-10:40 - Beyond code: Versioning data with Git and Mercurial. (Stephanie Collett and Martin Haye) - Slides
10:40-11:00 - "Linked-Data-Ready" Software for Libraries - Slides
11:00-11:20 - Your Catalog in Linked Data - Slides
11:20-11:40 - HTML5 Microdata and Schema.org - Slides
11:40-12:00 - ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us. SLIDES
12:00-13:00 - Lunch (provided)
13:00-13:20 - HathiTrust Large Scale Search: Scalability meets Usability - Slides
13:20-13:40 - Relevance Ranking in the Scholarly Domain - Slides
13:40-14:00 - Kill the search button II - the handheld devices are coming
14:00-14:20 - Design for Developers - Slides
14:20-14:40 - The Golden Road (To Unlimited Devotion): Building a Socially Constructed Archive of Grateful Dead Artifacts - Slides
14:40-15:00 - Break (beverage service only)
15:00-15:50 - Breakout Sessions 1
15:50-16:10 - Break (beverage service only)
16:10-16:20 - Breakout reports
16:20-17:20 - Lightning Talks 1
17:20-17:30 - Daily wrap-up
18:00-20:00 - Newcomer dinners

Wednesday, February 8

08:00-09:00 - Continental Breakfast
09:00-09:15 - Housekeeping, Introductions
09:15-09:35 - Discovering Digital Library User Behavior with Google Analytics - Slides
09:35-09:55 - How people search the library from a single search box - Slides
09:55-10:15 - Building research applications with Mendeley - Slides
10:15-10:35 - Break (beverage service only)
10:35-10:55 - Stack View: A Library Browsing Tool - Slides
10:55-11:15 - NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis - Slides
11:15-12:00 - Ask Anything!
12:00-13:00 - Lunch (provided)
13:00-13:20 - Indexing big data with Tika, Solr & map-reduce - Slides
13:20-13:40 - In-browser data storage and me - Slides
13:40-14:00 - Lies, Damned Lies, and Lines of Code Per Day - Slides
14:00-14:20 - Practical Agile: What's Working for Stanford, Blacklight, and Hydra (Slides as PDF) (Slides as PowerPoint)
14:20-14:30 - Break (beverage service only)
14:30-15:40 - Breakout Sessions 2
15:40-15:50 - Break (beverage service only)
15:50-16:00 - Breakout reports
16:00-17:00 - Lightning Talks 2
17:00-17:15 - Daily wrap-up
18:00-20:00 - Social network dine-arounds

Thursday, February 9

08:00-09:00 - Continental Breakfast
09:00-09:15 - Housekeeping
09:15-10:00 - Keynote #2 - Bethany Nowviskie
10:00-10:15 - Break (beverage service only)
10:15-11:00 - Lightning Talks 3
11:00-11:20 - Your UI can make or break the application (to the user, anyway) - Slides
11:20-11:40 - Quick and Dirty Clean Usability: Rapid Prototyping with Bootstrap
11:40-12:00 - Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene - Slides
12:00-12:20 - Wrap-Up
Syndicate content