You are here


You know, when people get together and talk about stuff.

n Characters in Search of an Author

n Characters in Search of an Author

  • Jay Luker, IT Specialist, Smithsonian Astrophysics Data System,

When it comes to author names the disconnect between our metadata and what a user might enter into a search box presents challenges when trying to maximize both precision and recall [0]. When indexing a paper written by "Wäterwheels, A" a goal should be to preserve as much as possible the original information. However, users searching by author name may frequently omit the diaeresis and search for simply, "Waterwheels". The reverse of this scenario is also possible, i.e., your decrepit metadata contains only the ASCII, "Supybot, Zoia", whereas the user enters, "Supybot, Zóia". If recall is your highest priority the simple solution is to always downgrade to ASCII when indexing and querying. However this strategy sacrifices precision, as you will be unable to provide an "exact" search, necessary in cases where "Hacker, J" and "Häcker, J" really are two distinct authors.

Practical Relevance Ranking for 10 million books.

Practical Relevance Ranking for 10 million books

  • Tom Burton-West, University of Michigan Library,

HathiTrust Full-text search indexes the full-text and metadata for over 10 million books. There are many challenges in tuning relevance ranking for a collection of this size. This talk will discuss some of the underlying issues, some of our experiments to improve relevance ranking, and our ongoing efforts to develop a principled framework for testing changes to relevance ranking.

Some of the topics covered will include:

  • Length normalization for indexing the full-text of book-length documents
  • Indexing granularity for books
  • Testing new features in Solr 4.0:
    • New ranking formulas that should work better with book-length documents: BM25 and DFR.
    • Grouping/Field Collapsing. Can we index 3 billion pages and then use Solr's field collapsing feature to rank books according to the most relevant page(s)?
    • Finite State Automota/Block Trees for storing the in-memory index to the index. Will this allow us to allow wildcards/truncation despite over 2 billion unique terms per index?

Hacking the DPLA

Hacking the DPLA

Slide presentation

  • Nate Hill, Chattanooga Public Library, nathanielhill AT
  • Sam Klein, Wikipedia, metasj AT

The Digital Public Library of America is a growing open-source platform to support digital libraries and archives of all kinds. DPLA-alpha is available for testing, with data from six initial Hubs. New APIs and data feeds are in development, with the next release scheduled for April.

Code4Lib 2013 Scholarship (deadline: December 14, 2012)

Oregon State University and the Digital Library Federation are sponsoring five scholarships to promote gender and cultural diversity. Each scholarship will provide up to $1,000 to cover travel costs and conference fees for one qualified attendee to attend the 2013 Code4Lib Conference, which will be held in Chicago, Illinois, from Monday,February 11 through Thursday, February 14. The Code4Lib scholarship committee will award two scholarships per category, awarding the remaining scholarship to the best remaining candidate in either category.

Code4Lib 2013 Call for Host Proposals

The Code4Lib Conference Planning Group is calling for
proposals to host the 2013 Code4Lib Conference.  Information on the
kind of venue we seek and the delineation of responsibilities between
the host organization and the Planning Group can be found at the
conference hosting web page [1] and on the Code4Lib Wiki [2].



Subscribe to RSS - conferences