You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 3 days 11 hours ago

Ed Summers: Culture Machines

Fri, 2017-05-12 04:00

We believe in the power of code as a set of magical symbols linking the invisible and visible, echoing our long cultural tradition of logos, or language as an underlying system of order and reason, and its power as a kind of sourcery. We believe in the elegant abstractions of cybernetics and, ultimately, the computation universe–that algorithms embody and reproduce the mathematical substrate of reality in culturally readable ways. This is what it means to say that an algorithm is a culture machine: it operates both within and beyond the reflexive barrier of effective computability, producing culture at a macro-social level at the same time as it produces cultural objects, processes, and experiences. (Finn, 2017, p. 34)

Finn, E. (2017). What algorithms want: Imagination in the age of computing. MIT Press.

FOSS4Lib Recent Releases: ArchivesSpace - 2.0.1

Fri, 2017-05-12 02:25

Last updated May 11, 2017. Created by Peter Murray on May 11, 2017.
Log in to edit this page.

Package: ArchivesSpaceRelease Date: Tuesday, May 2, 2017

FOSS4Lib Recent Releases: ArchivesSpace - 2.0.0

Fri, 2017-05-12 02:24

Last updated May 11, 2017. Created by Peter Murray on May 11, 2017.
Log in to edit this page.

Package: ArchivesSpaceRelease Date: Tuesday, April 18, 2017

Jonathan Rochkind: Memo on Technical Operational Considerations for IIIF in a Sufia/Hyrax app

Thu, 2017-05-11 20:56

(A rather lengthy, as is my wont, memo I wrote for internal use, which I also share with you)

IIIF (International Image Interoperability Framework) is a standard API for a server which delivers on-demand image transformations.

What sort of transformations are we interested in (and IIIF supports)?

  • Changing image formats
  • Resizing images (to produce thumbnails other various display or delivery sizes)
  • Creating tiled image sources to support high-res zoom-in without having to deliver enormous original source images. (such an operation will involve resizing too to create tiles at different zoom levels, as well as often format changes if the original source is not in JPG or other suitable web format)

@jcoyne has created Riiif, an IIIF server in ruby, using imagemagick to do the heavy-lifting, that is a Rails engine that can turn any an IIIF server. In addition to it being nice that we know ruby so can tweak it if needed, this also allows it to use your existing ruby logic for looking up original source images from app ids and access controls. It’s unclear how you’d handle these things with an external IIIF server in a sufia/hyrax app; to my knowledge, nobody is using anything but riiif.

Keep in mind that the reason you need tiled image source is only when the full-resolution image (or the image at the resolution you desire to allow zoom to) in a JPG format is going to be too large to deliver in it’s entirety to the browser (at least with reasonable performance). If this isn’t true, you can allow pan and zoom in a browser with JS without needing a tiled image source.

And keep in mind that the primary reason you need an on demand image transformation service (whether for tiled image source or other transfomrations), is when storing all the transformations you want is going to take more disk space than you can afford or is otherwise feasible. (There are digital repositories with hundreds of thousands or millions of images, each which need various transformations).

There is additionally some development/operational convenience to an on-demand transformation aside from disk space issues, but there is a trade-off in additional complexity in other areas — mainly in dealing with caching and performance.

The first step is defining what UI/UX we want for our app, before being able to decide if an on-demand image transformation server is useful in providing that. But here, we’ll skip that step, assume we’ve arrived at a point from UI/UX to wanting to consider an on-demand image transformation service, and move on to consider some operational issues with deploying RIIIF.

Server/VM seperation?

riiif can conceivably be quite resource-intensive. Lots of CPU taken calling out to imagemagick to transform images. Lots of disk IO in reading/writing images (effected by cache and access strategies, see below). Lots of app server http connections/threads taken by clients requesting images — some of which, depending on caching strategies, can be quite slow-returning requests.

In an ideal scenario, one wouldn’t want this running on the same server(s) handling ordinary Rails app traffic, one would want to segregate it so it does not interfere with the main Rails app, and so each can be scaled independently.

This would require some changes to our ansible/capistrano deploy scripts, and some other infrastructure/configuration/deploy setup. The riiif server would probably still need to be deployed as the entire app, so it has access to app-located authorization and retrieval logic; but be limited to only serving riiif routes. This is all do-able, just a bunch of tweaking and configuring to set up.

This may not be necessary even if strictly ideal.

Original image access

The riiif server needs access to the original image bytestreams, so it can tranasform them.

In the most basic setup, the riiif server somehow has access to the file system fedora bytestreams are stored on, and knows how to find a byestream for a particular fedora entity on disk.

The downsides of this are that shared file systems are… icky. As is having to reverse engineer
fedora’s file storage.

Alternately, riiif can be set up to request the original bytestreams from fedora via http, on demand, and cache them in the local (riiif server) file system. The downsides of this are:

  • performance — if a non-cached transformation is requested, and the original source image is also not in the local file system cache, riiif first must download it from fedora, before moving on to transform it, and only then delivering it to the client.
  • cache management. Cache management as a general rule can get surprisingly complicated. If you did not trim/purge the local ‘original image source’ file system cache at all, it would of course essentially grow to be the size of the complete corpus of images (which are quite large uncompressed TIFFs in our case). Kind of defeating the purpose of saving file space
    with an on-demand image transformer in the first place (the actual transformed products are almost always going to be in a compressed format and a fraction of the size of the original TIFFs).

    • There is no built-in routine to trim original source file cache, although the basic approach is straightforward, the devil can be in the details.
    • To do an LRU cache, you’d need your file system tracking access times. Linux file systems are not infrequently configured with ‘noatime’ for performance these days, which wouldn’t work. Or alternately, you’d need to add code to riiif to track last access time in some other means.
    • When trimming, you have to be careful not to trim sources currently being processed by an imagemagick transformation.
    • Even if trimming/purging regularly, there is a danger of bursts of access filling up the cache quickly, and possibly exceeding volume space (unless the volume is big enough to hold all original sources of course). For instance, if using riiif for derivatives, one could imagine googlebot or another web spider visiting much of the corpus fairly quickly. (A use case ideally we want to support, the site ought to be easily spiderable)
      • There is of course a trade-off between cache size and overall end-user responsiveness percentiles.

It is unclear to me how many institutions are using riiif in production, but my sense is that most or even all of them take the direct file system access approach rather than http access with local file cache. Anyone I could find using riiif at ahc was taking this approach, one way or another.

Transformed product caching

Recall a main motivation for using an on-demand image transformer is not having to store every possible derivative (including tiles) on disk.

But there can be a significant delay in producing a transformation. It can depend on size and characteristics of original image; on whether we are using local file system access or http downloading as above (and on whether the original is in local cache if latter); on network
speed, disk I/O speed, and imagemagick (cpu) speed.

  • It’s hard to predict what this latency would be, but in the worst case with a very large source
    image one could conceive of it being a few seconds — note that’s per image,
    and you could pay it each time you move from page to page in a multi-page work,
    or even, pathological case, each time you pan or zoom in a pan-and-zoom viewer.

As a result, riiif tries to cache it’s transformation output.

It uses an ActiveSupport::Cache::Store to do so, by default the one being used by your entire Rails app as Rails.cache. It probably makes sense to separate the riiif cache, so a large volume of riiif products isn’t pushing your ordinary app cache content out of the cache and vice versa, and both caches can be sized appropriately, and can even use different cache backends.

ActiveSupport::Cache::Store supports caching in file system, local app memory, or a Memcached instance; or hypothetically you can easily write an adapter for any back-end store you want. But for this use case, anything but file system probably doesn’t make sense, it would get too expensive for the large quantity of bytes involved. (Although one could consider things like an S3 store instead
of immediate file system, that has it’s own complications but could be considered).

So we have the same issues to consider we did with http original source cache: performance, and cache management.

  • Even when something is in the riiif image cache, it’s not going to be as fast as an ordinary web-server-served image. ActiveSupport::Cache::Store does not support streaming, so the entire product needs to be read from the cache into local app memory before a byte of it goes to the server. (One could imagine writing an ActiveSupport::Cache::Store adapter that extends the API to support streaming).
    • How much slower? Hard to say. I’d guess in the hundreds of ms, maybe less, probably not usually more but there could be pathological edge cases.
    • Not actually sure how this compares to serving from fedora, I don’t know for sure if the serving from fedora case also needs a local memory copy before streaming to browser. I know some people work around this with nginx tricks, where the nginx server also needs access to fedora filesystem.
  • And there is still a cache management issue, similar to cache management issues above.
Consider: Third-party CDN

Most commercial sector web apps these days use a third party (or at least external) CDN (Content Delivery Network) — certainly especially image-heavy ones.

A CDN is basically a third-party cloud-hosted HTTP cache, which additionally distributes the cache geographically to provide very fast access globally.

Using a CDN you effectively can “cache everything”, they usually have pricing structures (in some cases free) that do not limit your storage space significantly. One could imagine putting a CDN in front of some or all of our delivered image assets (originals, derivatives, and tile sources), You could actually turn off riiif’s own image caching, and just count on the CDN to cache everything.

This could work out quite well, and would probably be worth considering for our image-heavy site even if we were not using an on-demand IIIF image server — a specialized CDN can serve images faster than our Rails or local web server can.

Cloudflare is a very popular CDN (significant portions of the web are cached by cloudflare) which offers a free tier that would probably do everything we need.

One downside of a CDN are that it only works for public images, access-controlled images only available to some users don’t work in a CDN. In our app, where images are either public or still ‘in process’, one could imagine pointing at cloudflare CDN cached images for public images, but serving staff-only in-process images locally.

Another downside is it would make tracking download counts somewhat harder, although probably not insurmountable, there are ways.

Image-specializing CDN or cloud image transformation service

In addition to general purpose CDNs, there exist a number of fairly successful cloud-hosted on-demand image transformation services, that effectively function as image-specific CDNs, with on-demand transformations services. They basically give you what a CDN gives you (including virtually unlmited cache so they can cache everything), plus what an on-demand image transformation service gives you, combined.

One popular one I have used before is imgix. Imgix supports all the features a IIIF server like riiif gives you — although it does not actually support the IIIF API. Nonetheless, one could imagine using imgix instead of a local IIIF server, even with tools like JS viewers that expect IIIF, by writing a translation gateway, or writing a plugin to (eg) OpenSeadragon to read from imgix. (OpenSeadragon’s IIIF support was not original, and was contributed by hydra community). (One could even imagine convincing to support IIIF API natively).

imgix is not free, but it’s pricing is pretty reasonable: “$3 per 1,000 master images accessed each month. 8¢ per GB of CDN bandwidth for images delivered each month.” It’s difficult for me to estimate how much bandwidth we’d end up paying for (recall our derivatives will be substantially smaller than the original uncompressed TIF sources).

An image transformation CDN like imgix would almost entirely get us out of worrying about cache management (it takes care of it for us), as well as managing disk space ourselves for storing derivatives, and CPU and other resource issues. It has the same access control and analytics issues as the general CDN.

Consider the lowest-tech solution

Is it possible we can get away without an on-demand image transformation service
at all?

For derivatives (alternate formats and sizes of the whole image), we can if
we can feasibly manage the disk space to simply store them all.

For pan-and-zoom, we only need a tile-source if our full-resolution (or as high
resolution as we desire to support zoom in a browser to) are too big to deliver
to a browser.

Note that in both cases (standard derivative or derived tile-soruce) JPGs we’re delivering
to the browser are significantly smaller than the uncompressed source TIFFs.
In one simple experiment a 100MB source TIF I chose from our corpus turned into
a 3.8MB JPG, and that’s without focusing on making the smallest usable/indistinguishable
JPG possible.

At least hypothetically, one could even pre-render and store all the sub-images neccesary
for a tiling pan-and-zoom viewer, without using an on-demand image transformation service.

(PS: We might consider storing our original source TIF’s as losslessly compressed. I believe they are entirely uncompressed now. Lossless compression could store the images with
substantially smaller footprints, losing no original data or resolution).


We have a variety of potentially feasible paths. It’s important to remember that none of them are going to be just “install it and flip the switch”, they are all going to take some planning and consideration, and some time spent configuring, tweaking, and/or developing.

I guess the exception would be installing riiif in the most naive way possible, and incurring the technical debt of dealing with problems (performance and/or resource consumption) later when they arrive. Although even this would still require some UI/UX design work.


Filed under: General

District Dispatch: Corporate champions urge all Senators to support FY18 library funding

Thu, 2017-05-11 13:59

If you’re part of or connected to the library world in any way, you know that the President’s “skinny budget” released in mid-March proposed eliminating the Institute for Museum and Library Services (IMLS), the small and respected federal agency that administers bulk of federal library funding. It also “zeroed out” virtually all such appropriations anywhere in the federal government, including programs authorized by the Library Services and Technology Act (LSTA) and the Innovative Approaches to Literacy (IAL) program.

(AP Photo/Carolyn Kaster)

Happily, however, Congress just added $1 million to the IMLS FY 2017 budget in the omnibus spending act signed into law last week. Moreover, it specifically directs that $600,000 of that $1 million be used for LSTA-authorized purposes. It did so after one-third of the entire House of Representatives signed separate “Dear Appropriator” letters in support of slightly increased FY 2018 funding for LSTA ($186.6 million) and level funding for IAL ($27 million).  Two similar bipartisan letters are now circulating in the Senate, where both programs also historically have enjoyed the support of approximately one-third of all Senators. ALA’s Fight for Libraries! grassroots campaign for FY 2018 LSTA and IAL funding is aiming to increase that base of support to 51 Senators – a majority of all members of that chamber.

Today, that effort got an enormous boost when ALA delivered a powerful letter by eight leading national companies with collective revenue measured in billions of dollars to the offices of all Senators who have not already signed both the Senate LSTA and IAL letters urging them to do so.  As Baker & Taylor, Follett School Solutions, Gale/Cengage, OverDrive, Peachtree Publishers, Penguin Random House, ProQuest and Rosen Publishing detail in the letter, they took such action because fundamentally: “[L]ibrary funding may be among the very best yielding and most leverageable investment that Congress makes across the entire federal budget. Libraries are thus very much critical national infrastructure: ubiquitous, indispensable and economically essential.”

ALA President Julie Todaro warmly welcomed today’s endorsement of libraries’ value and the companies’ specific appeal to all Senators to sign the Dear Appropriator letters now circulating, noting:

The eight leading companies that today have urged all Senators to support critical federal library programs are spotlighting an often overlooked and critical aspect of libraries’ service to the public and the nation: libraries mean business. In addition to loaning print and electronic materials, modern libraries are job training, job search, workforce-building, entrepreneur-training, veteran-helping and business building centers at the core of almost every community in America. As these corporate leaders accurately recognize, libraries are among the highest yielding and most leverageable investments that Congress makes. Now is not the time to deprive taxpayers of that tremendous ROI by cutting federal library funding.

The companies’ letter is open to online signature by any other similarly supportive business.  Please pass the word to any and all other business owners you may know and ask them to join our corporate champions today.

Finally, corporate support is terrific, but if Senators don’t hear from you – their constituent – they are far less likely to sign the “Dear Appropriator” letters on which we need 51 Senators’ signatures by May 19.  Please, check our sortable tracker tool to see if both of your Senators have signed both the LSTA and IAL letters and, if they both haven’t signed both, contact them – or re-contact them if you’ve already reached out – TODAY.

The post Corporate champions urge all Senators to support FY18 library funding appeared first on District Dispatch.

Open Knowledge Foundation: A summary of the last 8 months at Open Knowledge Brazil

Thu, 2017-05-11 09:21

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Brazil (OKBR) team.  

This post was written by Ariel Kogan, CEO of Open Knowledge Brazil and first appeared on OKBR

Open Knowledge Brazil (OKBR) seeks to build a more open, transparent, fair and collaborative society. At the same time, we seek to build an organisational model that develops a dialogue and is coherent with that mission and challenges of the 21st century.

And this is the direction we are going. Eight months ago, with this purpose, I accepted the challenge to become the organisation’s CEO. In collaboration with advisors and the OKBR community, we established partnerships, participated in projects and developed others. It has been a very productive period! And, for this reason, we would like to share an overview.

Firstly, we have put together an essential document: the planning for the organisation until 2018.

The priority initiatives will be:

  1. Our participation in the national and sub-national process in #OGP (Open Government Partnership);
  2. Open expenses project. More than 180 leaders from different regions of Brazil are playing a game to open their cities budget data. We have also developed the first Open Budget course in partnership with ITS Rio.
  3. In partnership with FGV-DAPP, we are building the Brazilian Open Data Index at a federal and local level (Rio de Janeiro e São Paulo). We are going to post the results in April 2017;
  4. We are designing and building a lean and intelligent structure, which manages to give the necessary support to projects and community.

With the support of Fundación Avina, we travelled to Estonia to learn about two of the main current cases of digital government. We shared this experience in the following articles: “Estonia: a digital democracy” and “The country that votes through the internet“.


We are building new strategic partnerships for the OKBR mission. We have already agreed with the Social Observatory, Brazil Transparency and we are about to do the same with the Planning Ministry and the Ministry of Transparency of Brazilian Federal Government, Superior Electoral Court (TSE), Alagoas State Government and São Paulo City Hall.

We became partners, advisors and we are helping in the planning and fundraising for Operação Serenata de Amor project, one of the most interesting initiatives that are taking place nowadays in Brazil. It mixes artificial intelligence, open data, fights against corruption and engages an excellent team of young people who are passionate about what they do.

We have also built (in collaboration) and posted a Manifest for the Brazilian Digital Identification. This is a very important initiative that seeks to protect and preserve the citizens’ personal data in order to create a more digital society.

Furthermore, in partnership with Fluxo and AppCivico, we are building a microfinance tool for independent journalism. Soon, we are going to share more news about this project through the OKBR social networks.


In December 2016, we launched the webinar series “OKBR Webinar Series: open knowledge & information worldwide” with open knowledge and open government specialists. The first of them took place on December 6th, with entrepreneur Bart van Leewen, about “Linked data and emergencies prevention”.

In January, the webinar participant was Paola Villarreal, Fellow do Berkman Klein Center, programmer/data scientist, about Justice Data; in February, Fernanda Campagnucci, journalist and public policies analyst, spoke about “Open data and local governments: how to assist people’s data demands.”

Next webinar will take place on May 5th at 1 pm, with Rufus Pullock, an economist and founder of Open Knowledge International. It will be about “Civil society’s role to open data and knowledge.”

LITA: Jobs in Information Technology: May 10, 2017

Wed, 2017-05-10 20:18

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

San Jose State University, Data Services Librarian, San Jose, CA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Islandora: Islandoracon Next Week!

Wed, 2017-05-10 17:11
The second Islandoracon is coming up next week! We have a full schedule running from through Monday Hack/Doc, Tuesday & Wednesday general conference, Thursday workshops, and Friday post-conference meetings and workshops. Here are a few bits of information to help you get the most out of the event:   Day Rates Know anyone who want sot come to Islandoracon but can only make it for a day or two? Single-day registration is open.   Travel & Accommodations We have a list of some local hotels near the conference venue. We have recommendations for nearby airports to consider, and ways to get from the airport to Hamilton. Finally, we have a map with conference and travel locations:    Ride Sharing If you will have a car at Islandoracon and are willing to give your fellow attendees a ride to and from the conference and other events, please consider adding your name to our Ride Share Sign Up Sheet. You can indicate which hotel you're at and any other limitations in the notes. This is also the place to look if you don't have a ride and need one.   Post Conference Workshops May 19th is the "Post-Con" with longer workshops and working group meetings. Everyone is welcome and there are no additional fees, but space may be limited, so please remember to sign up for the workshops you want. If we run out of room, preference will be given to those who have signed-up. Sign up here.   Questions? Concerns? Please drop me a line.

District Dispatch: “The Durationator” webinar archived

Wed, 2017-05-10 14:59

An archived copy of the CopyTalk webinar “Code + Copyright: Creating the Durationator as the 21st Century Helpdesk for Libraries, Archives and Museums” is now available. Originally webcast on May 4th by the Office for Information Technology Policy’s Copyright Education Subcommittee, our presenters were Elizabeth Townsend Gard, Associate Professor of Law and the Jill H. and Avram A. Glazer Professor of Social Entrepreneurship at Tulane University and Ron Gard, CEO of Limited Times, the parent company of the Durationator. Learn about the Durationator—a tool, helpdesk and resource for solving copyright questions for libraries, archives, museums, artists, content owners, and everyone else!

Plan ahead! One hour CopyTalk webinars occur on the first Thursday of every month at 11am Pacific/2 pm Eastern Time. Free!

On June 1st, our webinar will focus on copyright and music. Don’t miss it!

The post “The Durationator” webinar archived appeared first on District Dispatch.

District Dispatch: #NLLD17 roundup

Wed, 2017-05-10 14:54

Last week, we were thrilled to welcome over 530 librarians to Washington, D.C. for National Library Legislative Day (NLLD) 2017. With representatives from every state, and over 1,000 additional library supporters who committed to join in the advocacy efforts from home, the energy this year was higher than ever.

In the wake of the President’s budget proposal, which included the elimination of IMLS and most federal library

The VA delegation speaks with a staffer at Rep. Bobby Scott’s office. Photo credit: Andrew Propp

funding, the stakes were also high. After hearing a fantastic keynote, delivered by Hina Shamsi of the ACLU, attendees were provided with an appropriations update by Washington Office staff. You can catch both the archived keynote and the issue briefing on YouTube, and find out how you can get involved library advocacy by visiting Fight for Libraries.


This year, advocacy training was provided by the team from the Campaign Workshop. They discussed research, gave advice for telling your story to legislators, and provides plenty of tips of conducting a successful meeting with Members of Congress. The slides from the presentation are available here.

Want to hear more? Check out these articles about the event:

You can also catch up on all the social media coverage of #NLLD17 on Storify!

Looking ahead to next year? National Library Legislative Day 2018 will be held on May 7-8th. But don’t wait till 2018 to get involved with library advocacy – head to Fight for Libraries for more information or head straight to the Action Center for talking points and pre-written scripts.

The post #NLLD17 roundup appeared first on District Dispatch.

DPLA: New Exhibition: American Empire

Wed, 2017-05-10 14:45

We are pleased to announce the publication of our newest exhibition, American Empire, curated by DPLA intern and Brown University Public Humanities MA candidate Andrea Ledesma. American Empire offers a dynamic exploration of the Age of American Imperialism between the Spanish-American War and the mid-twentieth century. Through the examination of the culture, administration, and development of American colonialism in Puerto Rico, the Philippines, and Hawaii, the exhibition explores the rise and fall of the American empire and its legacies.

American Empire draws on diverse materials from across DPLA’s partner collections, including propaganda posters, administrative documents, photographs documenting life in the empire, and video clips. Using this rich trove of materials, the exhibition investigates critical questions about the Age of American Imperialism and its place in the nation’s ongoing conversations about heritage, citizenship, racism, and globalization.

  • What are the origins of American imperialistic foreign policy?
  • How did American colonizers and territorial residents navigate the boundaries between citizen and subject?
  • In what ways did indigenous residents wage cultural, political, and military resistance against American colonization?
  • How did Americans at home both embrace and reject the empire and imperialistic foreign policy?
  • How and why did the American empire fall?

Explore the answers to these questions and more in the exhibition.

View the Exhibition

American Empire was curated using materials contributed by institutions across our partner network. In particular, we would like to thank Recollection Wisconsin, California Digital Library, Illinois Digital Heritage Hub and The New York Public Library for their assistance in creating this exhibition.  

Mark E. Phillips: How do metadata records change over time?

Wed, 2017-05-10 14:00

Since September 2009 the UNT Libraries has been versioning the metadata edits that happen in the digital library system that powers The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History.  In those eight years the collection has grown from a modest size of 66,000 digital objects to the 1,814,000 digital objects that we manage today.  We’ve always tried to think of the metadata in our digital library as a constantly changing dataset.  Just how much it changes we don’t always pay attention to.

In 2014 a group of us worked on a few papers about metadata change at a fairly high level in the repository. How Descriptive Metadata Changes in the UNT Libraries Collections: A Case Study  this paper reported out on the analysis of almost 700,000 records that were in the repository at that time.  Another study Exploration of Metadata Change in a Digital Repository was presented the following year in 2015 by some colleagues in the UNT College of Information that used a smaller sample of records to answer a few more questions about what changes in descriptive metadata at the UNT Libraries.

It has been a few years since these studies so it is time again to take a look at our metadata and do a little analysis to see if anything pops out.

Metadata Edit Dataset

The dataset we are using for this analysis was generated on May 4th, 2017 by creating a copy of all of the metadata records and their versions to a local filesystem for further analysis. The complete dataset is for 1,811,640 metadata records.

Of those 1,811,640 metadata records, 683,933 had been edited at least once since they were loaded into the repository.  There are 62% of the records that have just one instance (no changes) in the system and another 38% that have at least one edit.

Records Edited in Dataset

We store all of our metadata on the filesystem as XML files using a local metadata format we call UNTL.  When a record is edited, the old version of the record is renamed with a version number and the new version of the record takes its place as the current version of a record. This has worked pretty well over the years for us and allows us to view previous versions of metadata records through a metadata history screen in our metadata system.

UNT Metadata History Interface

This metadata history view is helpful for tracking down strange things that happen in metadata systems from time to time.  Because some records are edited multiple times (like in the example screenshot above) we end up with a large number of metadata edits that we can look at over time.

After staging all of the metadata records on a local machine I wrote a script that would compare two different records and output which elements in the record changed. While this sounds like a pretty straight forward thing to do, there are some fiddly bits that you need to watch out for that I will probably cover in a separate blog post. Most of these have to do with XML as a serialization format and some of the questions on how you interpret different things.  As a quick example think about these three notations.

<title></title> <title /> <title qualifier='officialtitle'></title>

When comparing fields should those three examples all mean the same thing as far as a metadata record is concerned?  But like I said something to get into for a later post.

Once I had my script to compare two records, the next step was to create pairs of records to compare and then iterate over all of those record pairs.  This resulted in 1,332,936 edit events that I could look at.  I created a JSON document for each of these edit events and then loaded this document into Solr for some later analysis.  Here is what one of these records looks like.

{ "change_citation": 0, "change_collection": 0, "change_contributor": 1, "change_coverage": 0, "change_creator": 0, "change_date": 0, "change_degree": 0, "change_description": 0, "change_format": 0, "change_identifier": 1, "change_institution": 0, "change_meta": 1, "change_note": 0, "change_primarySource": 0, "change_publisher": 0, "change_relation": 0, "change_resourceType": 0, "change_rights": 0, "change_source": 0, "change_subject": 0, "change_title": 0, "collections": [ "NACA", "TRAIL" ], "completeness_change": 0, "content_length_change": 12, "creation_to_edit_seconds": 123564535, "edit_number": 1, "elements_changed": 3, "id": "metadc58589_2015-10-16T11:02:09Z", "institution": [ "UNTGD" ], "metadata_creation_date": "2011-11-16T07:33:14Z", "metadata_edit_date": "2015-10-16T11:02:09Z", "metadata_editor": "htarver", "r1_ark": "ark:/67531/metadc58589", "r1_completeness": 0.9830508474576272, "r1_content_length": 2108, "r1_record_length": 2351, "r2_ark": "ark:/67531/metadc58589", "r2_completeness": 0.9830508474576272, "r2_content_length": 2120, "r2_record_length": 2543, "record_length_change": 192, "systems": "DC" }

Some of the fields don’t mean much now but the main fields we want to look at are the change_* fields.  These represent the 21 metadata elements that we have use here for the UNTL metadata format.  Here they are in a more compact view.

  • title
  • creator
  • contributor
  • publisher
  • date
  • description
  • subject
  • primarySource
  • coverage
  • source
  • citation
  • relation
  • collection
  • institution
  • rights
  • resourceType
  • format
  • identifier
  • degree
  • note
  • meta

You may notice that these elements include the 15 Dublin Core elements plus six other fields that we’ve found useful to have in our element set.

The first thing I wanted to answer was which of these 21 fields was edited the most in the 1.3 million records edits that we have.

Metadata Element Changes

You can see that the meta field in the records changes almost 100% of the time.  That is because whenever you edit a record the values of the most recent metadata editor and the edit time change so the values of the elements should change each edit.

I have to admit that I was surprised that the description field was the most edited field in the metadata edits.  There were 403,713 (30%) of the edits that had the description field change in some way. This is followed by title at 304,396 (23%) and subject at 272,703 (20%).

There are a number of other things that I will be doing with this dataset as I move forward. In addition to what fields changed I should be able to look at how many field on average change in records.  I then want to see if there are any noticeable differences when you look at different subsets like specific editors, or collections.

So if you are interested in metadata change stay tuned.

If you have questions or comments about this post,  please let me know via Twitter.

Open Knowledge Foundation: Open Data Day in Burkina Faso: What is the environmental impact of the extractive industry?

Wed, 2017-05-10 13:00

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Environment theme.

This blog has been translated from the French original at

Open Data Day took place through more than 300 other events in many places around the world: in Burkina Faso, the event was celebrated by the community on 11 March. This year, the open data community of Burkina Faso strongly advocates for the opening of data about environment and mining, and in this perspective, the topic of the day was “Open Data: environmental impact of the extractive industry”.

In Burkina Faso, the initiative attracted many people in Ouagadougou, all of them coming with different backgrounds and willing to learn, exchange and share experiences around extractive data.

Idriss T. Tinto, a member of Open Burkina, stated that ‘Open Data Day was initiated by Open Knowledge International. The current Minister of ICT, Hadja Fatimata Ouattara / Sanon, was present at the very first edition of this event in 2013. This year, Open Burkina opted to work specifically on data related to the environment.”

The objective is to lead on the development of an action plan to collect data and then make it accessible and available to everyone. The goal is also to establish a data collection from the field in order to compare them with data used in some studies that have already been carried out.

Malick Lingani, co-organizer of the Open Data Day in Burkina Faso, explained the day’s aims at identifying data that have an environmental impact on both industrial and artisanal mines in Burkina Faso, and that can be publicly available and published as open data.

Group photo before the start of the work

An action plan will be established today as a deliverable on this objective. The Open Data community in Burkina is also working on the development of the Open Government Partnership (OGP) Action Plan in Burkina Faso. The Open Data movement in Burkina is not just about opening up new data. The initiative allows us to work on both the re-use of data and their social impact“, said the co-founder of the BEOG-NEERE Association.

According to the head of the Open Data Burkina Initiative (BODI), Abdoul Malick Tapsoba, the mining industry in Burkina Faso has a direct impact on the environment, “but what kind of data can prove it?” he wonders. He adds that one of the main aims as a first step would be to identify this data, mapping the actors, and elaborate on what can be done with it. He argues that this initiative is in line with the objectives of the Open Government Partnership, which aims among other priorities, at improving transparency on the management of public affairs.

As a reminder, Burkina Faso joined the Open Government Partnership on December 7, 2016, which was a great step for the open data community. The Minister in charge of reforming the Administration was appointed to lead the OGP in Burkina.

It is worth noting that the open data community of Burkina Faso is composed of several structures including Open Burkina, Open Knowledge International, Burkina Open Data Initiative (BODI), Ouaga Lab, the BEOG-NEERE Association and the Geek Developer Network (GDN). Members of this community often come together to share experiences, develop applications, release and share data, create visualisations and publish analyses using open data.


Open Knowledge Foundation: 8 stereotypes about migrants that we broke with data at #ddjcamp

Wed, 2017-05-10 09:15

They often say data journalism is a more objective kind of journalism. Is it really? We wanted to try it out and use data to combat prejudices that exist about migration. For this, we organised #ddjcamp: a data journalism training + hackathon where 60 journalists, developers and designers gathered from 11 countries.

Everybody pitched a story or an angle they would like to focus on when reporting on migration, and this is how the teams were formed. We built the training schedule to provide the necessary skills and resources for participants to conduct a data-driven story about migration.

During the camp, the stories evolved in parallel with the skill-building process: from finding the data to communicating it in a journalism piece. The balance between training sessions and hands-on work on projects ensured that the newly acquired knowledge was applied to practice straight away and could be replicated in the future.

We also agreed with national European media houses that they would send a young journalist from their newsroom to the training and in return, will publish the story of his/her team. Let us tell you what came out of it.

Stereotype one: ‘Refugees flee in search of a better life’

Reality: The team that worked on this stereotype compared the data on IDP (internally displaced people) and refugees (those who flee the country) with the data on armed conflicts and terrorist attacks in the cities to find a correlation. In their investigation, they draw attention to the unreported case of Yemen, where the escalation of the conflict has created horrible conditions for the civilian population, but it is nearly impossible to flee the country. Read, why this is so in the story published by Texty, Ukraine (EN).

A visualisation sketch for the Texty article by trainer Gianna Gruen

Stereotype two: Just build the borders properly and the refugee crisis will stop

Reality: This team came to the conclusion that despair and war push people to leave their countries of origin. Thus, increasing the security and decreasing the rescue teams only means that more people will die on their way to Europe. Read the story published in Spiegel Online, Germany (EN).

Stereotype three: Refugees are scary

Reality: In Latvia, people are not afraid of migrants from the former Soviet Union. Those outnumber the migrants from the Middle East and Africa. In fact, people are just afraid of the unknown. To read more about this, check out the story published in Delfi, Latvia (LV).

Stereotype four: There are too many refugees for our country

Reality: Refugees who come to Montenegro, actually do not stay there and do not apply for the asylum. Read the full story published in Vijesti, Montenegro (ME).

Stereotype five: Refugees commit more crimes

Reality: Not only do they not commit more crimes, but there is a huge spike in crimes against refugees (established thanks to Die Zeit data)! Read the full story published by Dennik N, Slovakia (SK).

Participants of #ddjcamp visiting data newsroom at Die Zeit with Sascha Venohr

Stereotype six: Migrants are stealing our jobs

Reality: In Italy, it is easier for migrants to get a low-skilled job than a high-skilled job, regardless of the level of their education. This means migrants are taking the jobs that locals do not want. Read the story published in L’Espresso, Italy (IT).

Stereotype seven: Migrants are ‘kebab technicians

Reality: In Denmark, migrants take more and more high-skilled positions thanks to their integration into the education system. Read why in the story published by Mandag Morgen, Denmark (DK).

Stereotype eight: Migrants enjoy nice life and social benefits

Reality: In Armenia, 90% of those who ask for asylum, get rejected. This is more than in neighbouring Azerbaijan and Georgia. Read why in the story published by HETQ, Armenia (EN).

Participants present their projects to each other halfway through the training

Our takeaways for you:

We shared the detailed schedule and training materials and a blog on our website.

If you are working in media: bring innovation into your newsroom through hackathons and training. Engage in building the external community. This way, you can harness the power of talented people from different backgrounds and may find future employees.

If you are a journalist: get inspired by data as we did. There are plenty resources online to work on your skills, but the best thing is to find a team and engage in a real life project. We wanted #ddjcamp to be this safe space where people can try things out and work together.

If you are an NGO: support projects like #ddjcamp – they enhance cross-disciplinary work and create synergies. We would be interested in scaling up the model of #ddjcamp. If you have ideas, please contact us via anastasiya.valeeva(at)  

#ddjcamp was a data journalism training that took place in Berlin from 12 to 20 November 2016. The project was organized by European Youth Press – a network of young media makers and run by Nika Aleksejeva and Anastasia Valeeva. The core funding was Erasmus+ grant provided by the German National Agency “Jugend in Aktion”.

Library of Congress: The Signal: Automating Digital Archival Processing at Johns Hopkins University

Tue, 2017-05-09 19:27

This is a guest post from Elizabeth England, National Digital Stewardship Resident, and Eric Hanson, Digital Content Metadata Specialist, at Johns Hopkins University. 

Elizabeth: In my National Digital Stewardship Residency at Johns Hopkins University’s Sheridan Libraries, I am responsible for a digital preservation project addressing a large backlog (about 50 terabytes) of photographs documenting the university’s born-digital visual history. Since 2004, the campus photography unit, Homewood Photography, has used an exclusively digital workflow and the photographs have been stored on optical media. Before I arrived, the Ferdinand Hamburger University Archives had already taken physical custody of thousands of these DVDs, but needed someone who could be devoted to processing this large-scale collection.

I’ve relied heavily on the electronic records accessioning workflow written by my mentor, the Libraries’ Digital Archivist Lora Davis, and worked with her to adapt the workflow for the Homewood Photography collection. It’s the first digital collection of this size for the University Archives, so some processing steps that work for smaller digital collections don’t work as well in this scenario. For example, typically, disk images are created to capture content from physical carriers such as hard drives or DVDs in order to preserve all the files and how they were structured on the carrier. However, in this collection, many large jobs, such as university-wide graduation ceremonies, have content split across multiple DVDs. I needed to reunite split content to restore original order, hence the decision to work with the photographs at the file level and not as disk images. On the DVDs, the photographs are saved as both .NEF (Nikon’s proprietary camera raw file format) and .JPEG. When transferring the photographs off DVDs, I’ve been keeping just the raw files since .NEF is a lossless format and is preferable for creating preservation files over the lossy .JPEG derivatives.

All this to say, this collection is being processed at a more granular level than may be expected given its size. From the beginning, I knew that using scripts to manage bulk actions would be a huge time-saver, but as someone with essentially zero scripting experience, I didn’t have a good sense of what could be automated. While reviewing the workflow with Lora, I showed her how time consuming it was going to be to manually move the .NEF files so that they were nested directly below the appropriate descriptive job titles. She recommended I look into using a script to collapse the directory structures, and although I found some scripts that accomplish this, none could manage the variety of disc directory structures.

Two examples of disc directory structures within this collection.

I described the situation to Eric Hanson, the Digital Content Metadata Specialist here at the Johns Hopkins Sheridan Libraries, knowing that he had experience with Python and might be able to help.

Eric: At the time, I had been using Python for a few months to interact with the REST API for our archival data management system, ArchivesSpace, and to retrieve uniform resource identifiers (URI) for name and subject heading for potential use in linked data applications. Up until then, I had not used Python for manipulating files on a hard drive, aside from creating a few text files, but I assumed that there were Python libraries that could handle this type of task. I soon came across the “os” module and the “os.walk” function for mapping the directory structure and the “shutil” module and the “shutil.move” function for actually moving files. Both modules are part of the built-in Python Standard Library.

I worked with Elizabeth to examine the directory structures and get a sense of the variations that we would need to act upon. Given the inconsistent depth of the files, the script was written so that any file existing below the eighth level in our directory structure (i.e. /media/bitCurator/RAID/HomewoodPhoto/IncomingTransfer/BatchNumber/Disc/JobTitle/…) would be moved to the eighth level, so that the files would be directly under the folder with the descriptive job title.

We added a time-stamped log creation function to the script so that we would have a record of all of the changes that were made. We did several test runs where the log was written with the changes that would be made, but I disabled the part of the script that would actually move the files. After we were satisfied with the results in the test log, I fully enabled the script and Elizabeth put it use. The final version of the script can be found on Github.

Elizabeth: The success of the collapse directories script helped me realize other processing tasks that could potentially be automated. Considering the storage implications of this collection, it was decided early on that sampling as a selection strategy was necessary, so I researched archival sampling to determine a standard for retention and how to go about conducting the sampling. Most resources I consulted recommended using a random number table, however, this would be too time consuming to implement for thousands of jobs. After discussing with my other mentor, the Hodson Curator of the University Archives Jordon Steele, we decided that 10% would be a sufficient standard of retention, which would be accomplished by systematically keeping every 10th image from the jobs. I decided to start the sampling with the 2nd image from each job (then 12th, 22nd, and so on) because the 1st image was often of a color checker, an important tool for photographers, but of low archival value. While this systematic sampling may not be perfect, it ensures that what’s retained captures the beginning, middle, and end of each job.

A color checker from this collection, often the 1st image in a job.

Eric: The second Python script I created for Elizabeth came together more quickly since I was now familiar with the “os” and “shutil” modules. The initial version of the script looped through every job directory, skipped the first file, then selected every 10th file of the job (rounding up), and moved those files into a separate “sampled” directory that kept the descriptive job title. We again used a time-stamped log to see what changes would be made before fully enabling the script and moving the files from “processing” to “sampled.”

10% of the files within each job are retained, and automatically moved from the Processing Directory into the Sampled Directory.

Elizabeth: I implemented the sampling script without issue the first time around, but hadn’t fully considered its future use. Because of the size of the collection and limited short-term processing storage space, I’ve been processing the collection a few terabytes at a time. I may have content in multiple parts of the processing pipeline simultaneously, as was the case the second time I went to use the sampling script. Luckily, I identified the issue before implementing the script: if the “unsampled” 90% stayed in the processing directory after running the sampling script, I couldn’t move any new content into the processing directory, because I’d be mixing to-be-sampled jobs with already-sampled jobs in the same storage space. I realized that each time I was enacting something on the content, I wanted to move the content into a new directory in the pipeline, which was an automation Eric was able to add to the sampling script.

Eric: When Elizabeth first described the unsampled issue to me, I figured that I would just add a section at the end of the script that moves all of the unsampled files after the sampling was completed. After talking it over with Elizabeth, we realized that this approach could cause problems if the sampling script failed for any reason, as we would have to manually find and move any jobs that been sampled before the script failed. With that in mind, I actually found it was easier and more efficient to move the unsampled files in the same loop that was moving the sampled files, which would leave the source directory empty after the script had run through it. The final version of this script is also available on Github.

Elizabeth: The final step I needed to accomplish before normalizing the photographs to .DNG, our chosen preservation file format for the collection, was to rename the job folders. This step was very important because the folder-level job names are essentially the only pre-existing description about the jobs. The names typically followed a date_client_event format, with underscores between each word, such as 20081211_mechanical_engineering_faculty_portraits. I wanted to simplify and standardize the names, so that they would read more like 20081211_mechEng_facultyPortraits. I knew OpenRefine was a good option for cleaning up the thousands of names, but hadn’t worked with it before.

Eric: Prior to using Python for most of my work, I worked extensively with OpenRefine, which introduced me to a number of programming and automation concepts that I carried over into Python. I agreed with Elizabeth that OpenRefine was an excellent option for the type of clean-up she needed to do. I wrote a script that created a .CSV file of all of the job names, and Elizabeth created an OpenRefine project using that .CSV file. Given that the job names were a source of metadata elements, such as dates and the job clients and events, I worked with Elizabeth to establish naming conventions, in order to simplify the extraction of these elements. I showed her some basic functions in OpenRefine’s native language, GREL, so that she could take charge of the clean-up. After the clean-up was completed, Elizabeth exported a .CSV file containing both the original job names and the corrected names. I created a simple find-and-replace script that used the “os.rename” function to change names based on the .CSV file, available here.

The content is moved from optical discs and onto our Forensic Recovery of Evidence Device (FRED), where it then undergoes a number of bulk-scripting actions before being converted to the preservation file format, bagged, and then transferred to long-term storage.

Elizabeth: I started with 1,500 DVDs for the first processing iteration. While I had outlined the processing workflow in advance, some details weren’t fully figured out until I was actually in the trenches. It took extra time to figure out how to automate processes, but devoting that time up front has already proven worthwhile for many reasons. First, the workflow for these 1,500 DVDs took 2 months to complete, while the second iteration of 1,400 DVDs was accomplished in just 2 weeks. Automating saved so much time that I’m now very ahead of schedule with the project!

Second, automating processes means there’s less room for human error. After my close call with almost running the sampling script on content a second time, I realized scripts can protect against errors through building in simple actions, such as moving content into new directories or prompting the user to enter the name of the directory on which the script should run.

Third, the logs that are generated by these scripts are useful not just when testing. Ultimately the logs generated when running these scripts in production will form part of the accession record for the collection, and document actions taken on the content. Since the jobs were renamed, tracing these changes is important for archival chain of custody.

While these scripts were written for this very specific use case, they have potential future use for my residency project. When I’m done with the DVDs, I will begin developing workflows for transferring content from other sources and physical media, such as athletics-related photos which will be transferred directly from Athletic Communications via external hard drives. This new carrier format will introduce new naming conventions and systems of organization.

Eric’s role in the greater landscape of my project is to assist with metadata clean-up (much of which is still forthcoming), and I couldn’t have predicted how extensive this collaboration would become back when Lora suggested I look into a script to collapse directory structures. One of the biggest takeaways for me has been to reach out to colleagues in other departments, ask for help, and you both might learn a new thing or two. Our collaboration has been successful not just in producing these scripts to automate processing. When we began this process in January, I was rather intimidated by Python. I still have a ways to go with learning Python, but I’m now more intrigued than apprehensive because I have a better sense of its capabilities and potential use in processing large, digital archival collections.

Eric: By tackling these problems with Elizabeth, I learned how to use the “os” and “shutil” Python modules and which I have already reused in other projects, including a script for batch ingesting files into a DSpace repository and a documentation clean-up project. More importantly though, this collaboration highlighted the advantage of taking a broad view of what it means to provide metadata support for a project. While most of these tasks could be classified as file management, none of them required all that much of my time and helping Elizabeth with these issues deepened my understanding of the project and allowed me to offer better recommendations in regards to the metadata. Additionally, these tasks were mutually beneficial to both of our interests in the project. Given that the folder names were a primary source of metadata about the images, when I helped Elizabeth with the renaming of folders, I was also helping myself because the information would be easier for me to extract at a later stage of the project.

LITA: #LITAchat “Choose Privacy Week – recap”

Tue, 2017-05-09 16:18

Meet with us on Twitter.

Friday, May 19
Noon to 1:00 pm, Central Time
on Twitter at #LITAchat

Joining us this month will be Mike Robinson, Chair, ALA-IFC Privacy Subcommittee and we’ll be following up this year’s Choose Privacy Week that took place the first week of May.

We hope you’ll join us with your questions, comments, as well as to share your Choose Privacy Week experience and feedback from your community.

To participate, fire up your favorite Twitter client and check out the #LITAchat hashtag. On the web client, just search for #LITAchat and then click “LIVE” to follow along. Ask questions using the hashtag #LITAchat, add your own comments, and even answer questions posed by other participants. Hope to see you there!

Open Knowledge Foundation: The Global Open Data Index as a national indicator – So why do we have Northern Ireland?

Tue, 2017-05-09 13:00

In May 2nd, 2017 we launched the Global Open Data Index (GODI). This blog post is part of a series that explore the main findings of GODI and the next challenges in open data measurement.  

In the past, we were asked why the Global Open Data Index assesses ‘places’ and not countries. Why do we evaluate Hong Kong? Why the Crown Dependencies Isle of Man, Jersey and Guernsey? And why do we regard Northern Ireland separately from Great Britain in this year’s edition?

To clarify our rationale, we first have to explain which data we are looking at. The Global Open Data Index assesses the publication of open data at the highest administrative level in a country. This can take three forms:

  • The data describes national government processes or procedures ( government bodies operating at the highest administrative level)
  • The data is collected or produced by national government or a national government agency
  • The data describes national parameters and public services for the entire national territory but is collected by sub-national agencies.

The Global Open Data Index looks at very different government data: from national budgets to water and air quality information. We acknowledge that not all countries have the same political structure. Data assessed through the index might not necessarily be produced by national government due to the devolution of power. Furthermore, it is possible that not all sub-national governments provide the same data as they are potentially subject to different laws and/or procedures.

So why do we look at ‘places’ instead of countries?  The Index wants to be a meaningful and actionable indicator for government by assessing those government bodies that are responsible for data publication. We regard territories with legislative, executive, and administrative autonomy separately, including the Crown Dependencies (Isle of Man, Jersey, Guernsey) and Hong Kong. We keep the option open to include regions with a disputed status that are not officially recognised as independent countries.  

Why does the Index include sub-national government this year?

As described above, sub-national governments may act autonomously from the national government and collect/produce data individually. This has always been a challenge for the Index – sometimes open data was provided in one region but not in another. How to adequately assess these gaps? This year we experiment how more systematically measure data on a sub-national level in a comparable way. As a test case, we considered Northern Ireland separately from Great Britain. By doing so, we investigate how the responsibilities of open data publication are distributed across government. Thus we open up the debate how to understand open data on a subnational level. This experiment is part of a larger research effort to understand open data governance models (see our call for research).

We did ask both of Northern Ireland and the UK government to comment on this decision, but due to the Purdah (Pre-elections period), we were unable to get a comment.

LibUX: Our first LibUX Virtual Meetup is on Wednesday, May 17th, at 4PM EST

Tue, 2017-05-09 12:20

For our first time on Wednesday, May 17th, at 4 p.m., the Library User Experience Community will have it’s first virtual meetup.

We are planning to do a meetup each month at a regular time to make it easy for folks with lives to schedule and attend. This is an excuse to just get together and talk shop face-to-face (or face-to-ears) in just about as informal a setting as we can think of. When we were talking about this in our slack, we’re thinking we’d have a few canned topics to get the chat rolling but then take it from there. Although, this can be a space to kvetch, actually workshop a user-experience problem, or practice a talk.

It’s a no-fear-of-repercussions space so there will be no recording.


If you think you might attend, I’d really appreciate the RSVP. If there’s more than ten folks then we just need to arrange an alternative to Google Hangouts. It’s no biggie, we just need to know.

I’m not utilizing any forms or eventbrites for this one. Just holler on twitter (@schoeyfield) or join our Slack channel. See you there!

DPLA: Sowing Seeds: A New Librarian at DPLAfest

Wed, 2017-05-03 14:45

This guest post was submitted by DPLAfest Travel Awardee Amanda H. Davis.

DPLAfest Awardee and guest blogger Amanda H. Davis

This April I was honored to receive one of five travel grants to attend DPLAfest at the Harold Washington Library Center, the central library in the Chicago Public Library System. That’s right—in a library named after the first black mayor of Chicago, hundreds of dedicated, inspired people gathered for two days to discuss the work they are doing to allow greater public access to this country’s cultural and historical resources. As a public librarian interested in diversity, social justice, and technology, this location and subject matter couldn’t have been more relevant.

I’d anticipated a jam-packed conference, so I arrived in Chicago a day early to do some sightseeing. After checking in to my hotel, I braved the rain and hopped on the Pink Line to Pilsen to visit the National Museum of Mexican Art. It was a nice pick-me-up after a long, cloudy morning of travel–all of the art conveyed a sense of energy, through the vibrant colors, as well as through the passionate sociopolitical and historical messages. In hindsight, visiting a museum was a great way to prepare myself for DPLAfest. Engaging with artwork reminded me of how important it is to provide access to our world’s vast cultural knowledge.

The following morning I arrived at Harold Washington for Day 1 of the conference, and I couldn’t have been greeted more warmly than I was by the DPLA staff. I registered, attended a brief plenary and panel discussion, and then I was off to my first session, Partners in History: Chicago State University Archive and International Society of Sons & Daughters of Slave Ancestry Digital Collaboration. This is definitely a project I plan to follow in the coming years, as they’re doing amazing work to digitize images of enslaved people and their ancestors. I also attended sessions on living archives, interoperable tools for digital image delivery, and primary source sets for K-12 educators. The day ended with a reception where I and the other travel grant awardees met with Dan Cohen, DPLA’s Founding Executive Director. I did some networking, took myself on a tour of the library, checked out the famous Harold’s Chicken for dinner (YUM!), then turned in for the night.

DPLAfest Travel Awardees Nicole Umayam, Valerie Hawkins, and Amanda H. Davis with Executive Director Dan Cohen and Board Chair Amy Ryan.

Day 2 was all about social justice and K-12 education. I attended a session about Tennessee librarians teaching archival skills to teens and exposing them to careers they may not have considered, and I attended a HathiTrust lesson plan jam where I learned some tips for teaching technical skills (Tip #1: Always state the obvious). After lunch, my afternoon (and my mind) exploded when I attended what ended up being my favorite presentation, Cultural Heritage and Social Change: Libraries Measuring Social Impact. Several presenters spoke about projects they were working on to support social movements in their communities, from developing Black Lives Matter conversation guides to creating an impact assessment model in Colombian libraries to respectfully recording Native American experiences.

How do we strengthen communities & collections & make these reflections in the archives by gathering stories? #historypin #dplafest

— BMRC (@ChicagoBMRC) April 21, 2017

All in all, DPLAfest was a conference focused on collaboration—libraries and archives collaborating with each other, with their communities, and with countries all over the world. It was heartening to see so much good work being done, and I left with a renewed sense of purpose, which I desperately needed.

When I graduated from library school about a year and a half ago, I knew I wanted to use my research and writing skills to meet a need in my community. But knowing and doing are two different things, and I’ve struggled to find my natural fit within the profession. The sessions I attended at DPLAfest gave me some new ideas to consider, and I’m grateful for the seeds that were planted in my mind. It’s funny; as I was touring Harold Washington, I noticed the library was giving away actual seeds to market their One Book, One Chicago selection, Animal, Vegetable, Miracle by Barbara Kingsolver. I picked up a packet and planted them when I got home, and they’re already growing. Maybe there’s hope for me yet.

In the Library, With the Lead Pipe: The Innovation in Libraries Awesome Foundation Chapter

Wed, 2017-05-03 13:00

In Brief: This article discusses the creation, philosophy, and future directions of the Innovation in Libraries Awesome Foundation Chapter, a grassroots crowdfunding initiative incubated within Library Pipeline.

On May 1st, The Innovation in Libraries Awesome Foundation Chapter will distribute its monthly call for grant applications across a global network.

On May 15th, The Innovation in Libraries Awesome Foundation Chapter will close its call for grant applications.

Between May 16th and May 30th, The Innovation in Libraries Awesome Foundation Chapter trustees will read, debate, build consensus, and ultimately vote on a winning project.

On May 31st, the Innovation in Libraries Awesome Foundation Chapter will announce the winning project.

This cycle will repeat every month through September.

What We Are & What We Fund

The Innovation in Libraries micro-grant initiative is a community-funded effort that follows on the Awesome Foundation’s model: the grant brings together a group of trustees who make or direct a monthly personal contribution of $50 USD, for an aggregate of $1000 USD, to collectively fund one project per month.

The trustees who fund and select the grant awardees were carefully recruited by members of Library Pipeline from around the world, and across the spectrum of librarianship, with an aim to create a funding body that would support innovation in all its diversity. To prevent financial constraints creating a barrier to participation, recruited trustees have the option of directly funding their own seat or requesting financial sponsorship for their monthly contribution. We gratefully acknowledge the generosity of our sponsors, whose funding has made it possible for librarians from all walks of life to serve as trustees. Though several of our financial sponsors comprise the Advisory Board of Library Pipeline, many individuals chose to donate to the initiative anonymously.

Throughout the pilot period, ending with a final granted project in September 2017, grant applications are accepted from the first through the fifteenth of each month. Trustees then discuss, vote, and fund one innovative library project in that month. Through this model, the Innovation in Libraries Awesome Foundation Chapter provides a catalyst for prototyping both technical and non-technical library innovations that embody the principles of diversity, inclusivity, creativity, and risk-taking. In March, for example, the trustees funded a collaborative project between the Summit Public Library and The Warehouse Project & Gallery entitled 100 Years…100 Selfies!

Who We Are

This initiative was spearheaded by Bonnie Tijerina, Robin Champieux, and well…us—Joshua Finnell and Stacy Konkiel—and we serve as the deans of the chapter. Incorporating the principles of diversity and inclusivity into its formation, the chapter is purposefully assembled from a global cohort of librarians, with a range of experiences and backgrounds across public, academic, nonprofit, and corporate librarianship.

Other trustees include:

Sally Brazil, Chief, Archives and Records Management, The Frick Collection

Sophie Bussman-Kemdjo, Director of the African Law Library, a core program of the African Innovation Foundation

Kate Byrne, Repository Platform Product Manager, Symplectic

Priya Charry, Librarian, Boston Public Library

Kayode Sunday John Dada, Librarian, Ahmadu Bello University

Ellah Diba-Khani, Librarian, University of Botswana

Lenore England, Assistant Director, Electronic Resource Management, University of Maryland University College

Miguel Figueroa, Director, ALA Center for the Future of Libraries

Sheila Garcia, MLIS student, Wayne State University

Michael Gutierrez, Reference & Research Services, New Mexico State University

Lareese Hall, Library Director, Rhode Island School of Design

Purity Kavuri-Mutuku, Senior Librarian at Nakuru Public Library

Stephanie Kays, Fine Arts Librarian, Denison University

Jerry Mathema, College Librarian, Masiyephambili College

Andrew Nagy, SaaS Product Leader, EBSCO

Anna Naruta-Moya, Project Director of the Indigenous Digital Archive, Museum of Indian Arts and Culture

Amber Painter, Southwest Regional Coordinator, Indiana State Library

Mark Robison, Research Services Librarian, Valparaiso University

Roy Tennant, Senior Program Manager, OCLC Programs and Research

Laura Zeigen, Assistant Professor and Liaison Librarian, Oregon Health & Science University

This initiative was incubated within the Innovation within LIS Committee of Library Pipeline, a grassroots, volunteer-driven initiative dedicated to supporting structural change in librarianship by providing opportunities, funding, and services that improve the library as an institution and librarianship as a profession. After conducting an environmental scan, the committee identified both challenges and opportunities in growing an innovation culture within library and information science that include: funding, infrastructure, diversity, and inclusivity. As a corollary, the committee created recommendations for strengthening innovation in each of these areas. Under the leadership of Stacy as Board Chair, each committee was tasked with launching a “minimum viable project” that would pilot an idea suggested in their environmental scan. As a result, two projects have launched: the Green Open Access Working Group, coordinated by the LIS Publications Committee, and the Innovation in Libraries initiative.

Why We Are

Whereas some see crowdfunding as a potential replacement for existing long-term funding mechanisms, especially in cultural production, we see crowdfunding as an opportunity to experiment on a small scale, creating a first step for projects that fall outside of mainstream funding models for myriad reasons. Most ideas, at the point of illumination, are not always scaled to the size of an IMLS or Knight Foundation Grant. It’s the micro ideas, the ones that might be small but could build into something bigger if it only had a little bit of support and funding, that we want to encourage and incubate. In this spirit, our application process is simple and succinct, with a focus on minimizing the time between the “tell us about your project” and the “do your project” phases. Moreover, the Innovation in Libraries initiative is a tangible outgrowth of the Library Pipeline’s ethos: be dedicated, embrace risk, and trust the process. Ultimately, membership in Library Pipeline is not just a line on one’s CV; it’s a commitment to building community, growing as a professional, and putting in work on ideas and projects that matter.

If your interest is piqued by the Innovation in Libraries grants, send us your innovative, creative, and risky ideas.

If you are interested in becoming a trustee or sponsoring an Innovation in Libraries initiative, please send us an email:

If you are the type of library professional who is interested in “less talking, more doing,” we invite you to join us at Library Pipeline and grow our grassroots organization.

For news and updates on Library Pipeline and any of our projects, you can follow Library Pipeline, the Innovation in Libraries Awesome Foundation Chapter, or the #GOAWG hashtag on Twitter.

Acknowledgements:  Many thanks to our internal reviewer for this article, Amy Koester, and to publishing editor Annie Pho. We would also like to thank the dedicated and generous trustees and sponsors who made the Innovation in Libraries AF Chapter a reality and the impatient optimists who make Library Pipeline hum.