Library Text Mining

Rob Sanderson

Using the TeraGrid1 and the SRB DataGrid2, we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools3[4] within the Cheshire35 information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California’s MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we’ll discuss
the results of applying new techniques to ‘old’ data.

1: http://www.teragrid.org
2: http://www.sdsc.edu/srb
3: http://www.ailab.si/orange
4: http://www-tsujii.is.s.u-tokyo.ac.jp/
5: http://www.cheshire3.org/

Rob Sanderson, (azaroth@liv.ac.uk)

AttachmentSize
code4lib.odp.zip32.3 KB