You are here

Consuming Big Linked Open Data in Practice: Authority Shifts and Identifier Drift

  • Kathryn Stine,, UC Berkeley (first-time
  • Charlie Collett,, California Digital
    Library, UC (c4l 2012 presenter)


Increasingly, authoritative datasets of interest to libraries (subjects,
names, classifications, etc.) are are available in bulk, exposed as
linked open data. Unfettered access can allow libraries to aggregate,
connect, and augment data in new ways that will benefit users. This talk
will describe our exploratory experience integrating bulk data from the
Virtual International Authority File (VIAF) into HathiTrust metadata to
improve discovery and collection management.

Authoritative data is not static - datasets change with new
contributions and re-clustering, resulting in new identifier
relationships. We will describe the challenges this presents with
accessing, processing, and syncing our metadata with a massive, complex
linked dataset. We will talk about our technical approach to navigating
an ecosystem of identifiers and mitigating cached identifier drift
between systems as authority data shifts. We aim to spark conversation
about data accessibility and the relationships between local,
consortial, and authoritative metadata as the library community moves
beyond “Hello, world” linked data examples to integrating this data at
scale into existing systems.