Consuming Big Linked Open Data in Practice: Authority Shifts and Identifier Drift

Kathryn Stine, katstine@berkeley.edu, UC Berkeley (first-time presenter)
Charlie Collett, charlie.collett@ucop.edu, California Digital Library, UC (c4l 2012 presenter)

Increasingly, authoritative datasets of interest to libraries (subjects, names, classifications, etc.) are are available in bulk, exposed as linked open data. Unfettered access can allow libraries to aggregate, connect, and augment data in new ways that will benefit users. This talk will describe our exploratory experience integrating bulk data from the Virtual International Authority File (VIAF) into HathiTrust metadata to improve discovery and collection management.

Authoritative data is not static - datasets change with new contributions and re-clustering, resulting in new identifier relationships. We will describe the challenges this presents with accessing, processing, and syncing our metadata with a massive, complex linked dataset. We will talk about our technical approach to navigating an ecosystem of identifiers and mitigating cached identifier drift between systems as authority data shifts. We aim to spark conversation about data accessibility and the relationships between local, consortial, and authoritative metadata as the library community moves beyond “Hello, world” linked data examples to integrating this data at scale into existing systems.