You are here

Heiðrún: DPLA's Metadata Harvesting, Mapping and Enhancement System

  • Audrey Altman, audrey at dp.la, Digital Public Library of America
  • Gretchen Gueguen, gretchen at dp.la, Digital Public Library of
    America
  • Mark Breedlove, mb at dp.la, Digital Public Library of America
  • Slides and more info

The Digital Public Library of America aggregates metadata for over 8
million objects from more than 24 direct partners, or Hubs, using its
Metadata Application Profile (MAP), an RDF metadata application profile
based on the Europeana Data Model. After working with the initial system
for harvesting, mapping and enhancing our Hub’s metadata for a year, we
realized that it was inadequate for working with data at this scale.
There were architectural issues; it was opaque to non-developer and
partner staff; there were inadequate tools for quality assurance and
analysis; and the system was unaware that it was working with RDF data.
As the network of Hubs expanded and we ingested more metadata, it became
harder and harder to know when or why a harvest, a mapping task, or an
enrichment went wrong because the tools for quality assurance were
largely inadequate.

The DPLA Content and Technology teams decided to develop a new system
from the ground up to address those problems. Development of Heidrun,
the internal version of the new system, started in October 2014.
Heidrun’s goals are to make it easier for us to harvest and map metadata
from various sources and in variety of schemas to the DPLA MAP, to
better enrich that metadata using external data sources, and to actively
involve our partners in the ingestion process through access to better
QA tools. Heidrun and its componentry are built on Ruby on Rails,
Blacklight, and ActiveTriples. Our presentation will give some
background on our design principles and processes used during
development, the architecture of the system, and its functionality. We
plan to release a version of Heidrun and its components as a generalized
metadata aggregation system for use by DPLA Hubs and others working to
aggregate cultural heritage metadata.