Rob Sanderson

Using the TeraGrid1 and the SRB DataGrid2, we have sufficient computational and storage facilities to run normally prohibitively expensive processing tasks. By integrating text and data mining tools3[4] within the Cheshire35 information architecture, we can parse the natural language present in 20 million MARC records (the University of California’s MELVYL collection) and extract information to provide to search/retrieve applications. In this talk, we’ll discuss the results of applying new techniques to ‘old’ data.

1: http://www.teragrid.org 2: http://www.sdsc.edu/srb 3: http://www.ailab.si/orange 4: http://www-tsujii.is.s.u-tokyo.ac.jp/ 5: http://www.cheshire3.org/

Rob Sanderson, (azaroth@liv.ac.uk)