You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 1 hour 55 min ago

Open Knowledge Foundation: Document Freedom Day in Kathmandu, Nepal

Thu, 2015-04-09 12:23

On 2015’s Document Freedom Day, Open Knowledge Nepal organized a seminar on Open Standards at CLASS Nepal at Maitighar, Kathmandu.

We intended to pitch openness to a new audience in Nepal and help them learn documentation skills. As we could not hope to teach documentation and spreadsheets in less than a day, we utilized the cohort to teach them small bits of information and skills that they could take home and gather information about their current knowledge and pertinent needs so as to help ourselves plan future events and trainings.

The targeted audience were office bearers and representatives of labor unions in many private and government organizations in Nepal. We also invited some students of Computer Science and Information Technology (CSIT). Few of the students are core members of Open Knowledge Nepal team and have also represented us in Open Data Day 2015, Kathmandu. We invited the students to let them know about the audience they will have to work with, in days to come.

It was a lazy March afternoon in Kathmandu and participants were slowly turning in from around 2 pm. Organizers and the students had already begun with chitchats on open, tech, football and other stuffs while waiting for enough participants to begin the event formally. Participants kept coming in ones and twos until the hall was up to its limit (35+) and we started formally just after 3:00 PM (NST).

The event was started by Mr. Durga of CLASS Nepal by welcoming all participants and introducing CLASS Nepal to the participants. He then invited Mr. Lekhnath Pokhrel, representative of UNI-Global Union in the event. He requested all participants to take full advantage of seminar and announced they will be organizing useful events in coming future too. Nikesh Balami, our active member and Open Government lead followed with his presentation on “Open Knowledge, Open Data, Open Standards, and Open Formats.” He started by gathering information about participants’ organizational backgrounds. This lightened the settings as everybody opened up to each other. NIkesh introduced Open Knowledge Nepal and our activities to the hall (see the slides).

Kshitiz Khanal, Open Access lead at Open Knowledge Nepal went next. This session was intended to be an open discussion and skill dissemination on documentation and spreadsheet basics. We started by asking everybody to share their experience, set of skills and the skills they would like to learn in the event.

We were in for a surprise. While we had prepared to teach them pivot tables, our audience were interested to learn more basic skills. Most of our audience were familiar with documentation packages like Microsoft Word, some were using spreadsheets in work, and most of them had to use slides to present their work. We paired our students with our target audience so that one can teach other. Based on the requests, we decided to teach basic spreadsheet actions like sorting and filtering data, performing basic mathematical operations.

We also explained basic presentation philosophy like use pictures in place of words whenever possible, using as less words as possible, and when we do – making them big, rehearsing before presenting. These sound like obvious but these are not commonplace yet because these were not taught anywhere as a part of curriculum to our audience. This was well received. We also had a strange request – how to attach a sound recording in email. We decided to teach how to use google drive. We demonstrated how google drive can be used to store documents and the links can be used to send any type of files by email.

There were few female participants as well. This was a good turnout when compared to most of our and other tech / open events in Kathmandu with nil female participation. One of our female participant said that while she wants to learn more skills, she doesn’t have time to learn at home while taking care of her children, and at office she mostly has her hands full with work.
Most of the work in many offices is documentation, and this day and age makes strong documentation skills almost mandatory. While having freedom in the sense of document freedom entails having access to proper tools, it also necessitates having the proper set of skills to use the tools.

We learned lessons in the status and interest of people like our audience and the level of skill that we need to begin with while preparing modules for other similar events.

See the photo stream here and find further detailed account here on the Open Knowledge Nepal blog.

Open Library Data Additions: Amazon Crawl: part fi

Wed, 2015-04-08 23:04

Part fi of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Jonathan Rochkind: simple config for faster travis ruby builds

Wed, 2015-04-08 20:08

There are a few simple things you can configure in your .travis.yml to make your travis builds faster for ruby builds. They are oddly under-documented by travis in my opinion, so I’m noting them there.


Odds are your ruby/rails app uses nokogiri. (all Rails 4.2 apps do, as nokogiri has become a rails dependency in 4.2)  Some time ago (in the past year I think?) nokogiri releases switched to building libxml and libxslt from source when you install the gem.

This takes a while. On various machines I’ve seen 30 seconds, two minutes, 5 minutes.  I’m not sure how long it usually takes on travis, as travis logs don’t seem to give timing for this sort of thing, but I know I’ve been looking at the travis live web console and seen it paused on “installing nokogiri” for a while.

But you can tell nokogiri to use already-installed libxml/libxslt system libraries if you know the system already has compatible versions installed — which travis seems to — with the ENV variable `NOKOGIRI_USE_SYSTEM_LIBRARIES=true`.  Although I can’t seem to find that documented anywhere by nokogiri, it’s the word on the street, and seems to be so.

You can set such in your .travis.yml thusly:

env: global: - NOKOGIRI_USE_SYSTEM_LIBRARIES=true Use the new Travis architecture

Travis introduced a new architecture on their end using Docker, which is mostly invisible to you as a travis user.  But the architecture is, at the moment, opt-in, at least for existing projects. 

Travis plans to eventually start moving over even existing projects to the new architecture by default. You will still be able to opt-out, which you’d do mainly if your travis VM setup needed “sudo”, which you don’t have access to in the new architecture.

But in the meantime, what we want is to opt-in to the new architecture, even on an existing project. You can do that simply by adding:

sudo: false

To your .travis.yml.

Why do we care?  Well, travis suggests that the new architecture “promises short to nonexistent queue wait times, as well as offering better performance for most use cases.” But even more importantly for us, it lets you do bundler caching too…

Bundler caching

If you’re like me, a significant portion of your travis build time is just installing all those gems. On your personal dev box, you have gems you already installed, and when they’re listed in your Gemfile.lock they just get used, the bundler/rubygems doens’t need to go reinstalling them every time.

But the travis environment normally starts with a clean slate on every build, so every build it has to go reinstalling all your gems from your Gemfile.lock.

Aha, but travis has introduced a caching feature that can cache installed gems.  At first this feature was only available for paid private repos, but now it’s available for free open source repos if you are using the new travis architecture (above).

For most cases, simply add this to your .travis.yml:

cache: bundler

There can be complexities in your environment which require more complex setup to get bundler caching to work, see the travis docs.

Happy travis’ing

The existence of travis offering free CI builds to open source software, and with such a well-designed platform, has seriously helped open source software quality/reliability increase in leaps and bounds. I think it’s one of the things that has allowed the ruby community to deal with fairly quickly changing ruby versions, that you can CI on every commit, on multiple ruby versions even.

I love travis.

It’s odd to me that they don’t highlight some of these settings in their docs better. In general, I think travis docs have been having trouble keeping up with travis changes — travis docs are quite good as far as being written well, but seem to sometimes be missing key information, or including not quite complete or right information for current travis behavior. I can’t even imagine how much AWS CPU time all those libxml/libxslt compilations on every single travis build are costing them!  I guess they’re working on turning on bundler caching by default, which will significantly reduce the number of times nokogiri gets built, once they do.

Filed under: General

OCLC Dev Network: Web Services Maintenance April 17

Wed, 2015-04-08 18:30

All Web services that require user level authentication will be unavailable during the installation window, which is between 2:00 – 4:00 AM local time, Friday April 17th. 

HangingTogether: Bracket competition: And the winner is …

Wed, 2015-04-08 18:21

OCLC Research Collective Collections Tournament


Thanks to everyone who entered the 2015 OCLC Research Collective Collections Tournament Bracket Competition! A quick re-cap of the rules: all entrants picked a conference. If no one chose the winning conference, then a random drawing would be held among all entrants to determine the winner of the prize. Well, that’s where we’re at! No one picked Atlantic 10 to prevail, so everyone gets another chance to win!

A random drawing was held this morning in the Tournament offices (well, here in OCLC Research). The winner of the 2015 OCLC Research Collective Collections Tournament Bracket Competition is …

Carol Diedrichs!

Carol wins a $100 Visa Gift Card, along with the right to call herself Bracket Competition Champion! Congratulations! And thanks to all of our Bracket Competition participants for playing.

We hope you enjoyed the Collective Collections Tournament! Keep up to date with OCLC Research as we continue to use the concept of collective collections to explore a wide range of library topics.


More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

The Semi-Finals

Champion Revealed! Real-ly!

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (14)

LITA: LITA Lightning Rounds at 2015 ALA Annual

Wed, 2015-04-08 18:10

Will you be at the American Library Association Conference in San Francisco this June? Do you have a great new technology idea that you’d like to share informally with colleagues? How about a story related to a clever tech project that you just pulled off at your institution, successfully, or less-than-successfully?

The LITA Program Planning Committee (PPC) is now accepting proposals for a round of Lightning Talks to be given at ALA.

To submit your idea please fill out this form:

The lightning rounds will be Saturday June 27, 10:30 – 11:30

All presenters will be given 5 minutes to speak.

Proposals are due Monday, May 4 at midnight. Questions? Please contact PPC chair, Debra Shapiro,


Open Knowledge Foundation: Community building through the DM2E project

Wed, 2015-04-08 16:58

During the past three years, Open Knowledge has been leading the community building work in the Digitised Manuscripts to Europeana (DM2E) project, a European research project in the area of Digital Humanities led by Humboldt University. Open Knowledge activities included the organisation of a series of events such as Open Data in Cultural Heritage workshops, running two rounds of the Open Humanities Awards and the establishment of OpenGLAM as an active volunteer-led community pushing for increased openness in cultural heritage.

DM2E and the Linked Open Web

As one of its core aims, the DM2E project worked on enabling libraries and archives to easily upload their digitised material into Europeana – the online portal that provides access to millions of items from a range of Europe’s leading galleries, libraries, archives and museums. In total, over 20 million manuscript pages from libraries, archives and research institutions were added during the three years of the project. In line with the Europeana Data Exchange Agreement, all contributing institutions agreed to make their metadata openly available under the Creative Commons Public Domain Dedication license (CC-0), which allows for easier reuse.

Since different providers make their data available in different formats, the DM2E consortium developed a toolset that converted metadata from a diverse range of formats into the DM2E model, an application profile of the Europeana Data Model (EDM). The developed software also allows the contextualisation and linking of this cultural heritage data sets, which makes this material suitable for use within the Linked Open Web. An example of this is the Pundit tool, which Net7 developed to enable researchers to add annotations in a digital text and link them to related texts or other resources on the net (read more).

Open Knowledge achievements

Open Knowledge was responsible for the community building and dissemination work within DM2E, which, apart from promoting and documenting the project results for a wide audience, focused on promoting and raising awareness around the importance of open cultural data. The presentation below sums up the achievements made during the project period, including the establishment of OpenGLAM as a community, the organisation of the event series and the Open Humanities Awards, next to the extensive project documentation and dissemination through various channels.

DM2E community building from Digitised Manuscripts to Europeana OpenGLAM

In order to realise the value of the tools developed in DM2E, as well as to truly integrate the digitised manuscripts into the Linked Data Web, there need to be enough other open resources to connect to and an active community of cultural heritage professionals and developers willing to extend and re-use the work undertaken as part of DM2E. That is why Open Knowledge set up the OpenGLAM community: a global network of people and organisations who are working to open up cultural content and data. OpenGLAM focuses on promoting and furthering free and open access to digital cultural heritage by maintaining an overview of Open Collections, providing documentation on the process and benefits of opening up cultural data, publishing regular news and blog items and organising diverse events.

Since the start in 2012, OpenGLAM has grown into a large, global, active volunteer-led community (and one of the most prominent Open Knowledge working groups to date), supported by a network of organisations such as Europeana, the Digital Public Library of America, Creative Commons and Wikimedia. Apart from the wider community taking part in the OpenGLAM discussion list, there is a focused Working Group of 17 open cultural data activists from all over the world, a high-level Advisory Board providing strategic guidance and four local groups that coordinate OpenGLAM-related activities in their specific countries. Following the end of the DM2E project, the OpenGLAM community will continue to push for openness in digital cultural heritage.

Open Humanities Awards

As part of the community building efforts, Open Knowledge set up a dedicated contest awards series focused on supporting innovative projects that use open data, open content or open source tools to further teaching and research in the humanities: the Open Humanities Awards. During the two competition rounds that took place between 2013-2014, over 70 applications were received, and 5 winning projects were executed as a result, ranging from an open source Web application which allows people to annotate digitized historical maps (Maphub) to an improved search application for Wittgenstein’s digitised manuscripts (Finderapp WITTfind). Winners published their results on a regular basis through the DM2E blog and presented their findings at conferences in the field, proving that the awards served as a great way to stimulate innovative digital humanities research using open data and content. Details on all winning projects, as well as final reports on their results, are available from this final report.

DM2E event series

Over the course of the project, Open Knowledge organised a total of 18 workshops, focused on promoting best practices in legal and technical aspects of opening up metadata and cultural heritage content, providing demonstration and training with the tools and platforms developed in the project and hackdays and coding sprints. Highlights included the Web as Literature conference at the British Library in 2013, the Open Humanities Hack series and the Open Data in Cultural Heritage workshops, as a result of which several local OpenGLAM groups were started up. A full list of events and their outcomes is available from this final report.

Open Data in Cultural Heritage Workshop: Starting the OpenGLAM group for Germany (15 July 2014, Berlin)

It has been a great experience being part of the DM2E consortium: following the project end, the OpenGLAM community will be sustained and build upon, so that we can realise a world in which our shared cultural heritage is open to all regardless of their background, where people are no longer passive consumers of cultural content created by an elite, but contribute, participate, create and share.

More information

LITA: Jobs in Information Technology: April 8

Wed, 2015-04-08 16:46

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

eResources & Discovery Librarian, Sonoma County Library, Santa Rosa, CA

Head of Library Learning Services, University Park, Pennsylvania State University Libraries,  University Park, PA

Project Manager Fresh Air CLIR Project, WHYY, inc., Philadelphia, PA

Reference and Instruction Librarian, Abington Campus, Pennsylvania State University Libraries,  Philadelphia, PA

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.

LITA: 2015 LITA/Ex Libris Student Writing Award Winner Heather Terrell

Wed, 2015-04-08 15:21

Heather Terrell, MLIS degree candidate at San Jose State University, has been named the winner of the 2015 LITA/Ex Libris Student Writing Award, sponsored by Ex Libris Group and the Library and Information Technology Association (LITA), a division of the American Library Association (ALA).

Terrell’s paper, titled “Reference is dead, long live reference: electronic collections in the digital age,” describes the changing landscape of electronic reference sources and explores the possibilities inherent in building hybrid library collections.

“The members of the LITA/Ex Libris Student Writing Award Committee are pleased to acknowledge and honor with this award Heather Terrell’s manuscript, which addresses the benefits and challenges of electronic reference materials to libraries and library users,” said Sandra Barclay, chair of the committee.

The LITA/Ex Libris Student Writing Award recognizes outstanding writing on a topic in the area of libraries and information technology by a student or students enrolled in an ALA-accredited library and information studies graduate program. The winning manuscript will be published in Information Technology and Libraries (ITAL), and the winner will receive $1,000 and a certificate of merit.

The award will be presented at the LITA Awards Ceremony on Sunday, June 28, 2015 during the ALA Annual Conference in San Francisco.

About Ex Libris

Ex Libris is a leading provider of automation solutions for academic libraries. Offering the only comprehensive product suite for electronic, digital, and print materials, Ex Libris provides efficient, user-friendly products that serve the needs of libraries today and will facilitate their transition into the future. Ex Libris maintains an impressive customer base consisting of thousands of sites in more than 80 countries on six continents. For more information about Ex Libris Group visit

About LITA

Established in 1966, LITA is the leading organization reaching out across types of libraries to provide education and services for a broad membership of nearly 3,000 systems librarians, library technologists, library administrators, library schools, vendors and many others interested in leading edge technology and applications for librarians and information providers. For more information about LITA go to, or contact the LITA office by phone, 800-545-2433, ext. 4268; or e-mail:

Question and Comments

Mary Taylor
Executive Director
Library & Information Technology Association (LITA)
(800) 545-2433 ext 4267

Jonathan Rochkind: subscription libraries, back to the future

Wed, 2015-04-08 12:55

Maybe you thought libraries were “the netflix for books”, but in this Wired article, The ‘Netflix for Books’ Just Invaded Amazon’s Turf, it’s not libraries they’re talking about, and it’s not just Amazon’s turf they’re invading. Although they’re talking about the vendor, Oyster, starting to sell books, not just offer a subscription lending library, that’s what they mean by “Amazon’s turf.”  Still, one might have thought that lending books was the “turf” of library’s, but they don’t even get a mention.

Before the existence of public libraries, paid subscription libraries were a thing, both as commercial entities and private clubs, popular in the 18th and 19th centuries. Books were comparatively expensive then compared to now.

The United States played a key role in developing public free libraries, democratizing access to published knowledge and cultural production.

It might be instructive to compare the user workflow in actually getting books onto your device of choice between Amazon and Oyster’s systems (for both lending and purchase), and the vendors and solutions typically used by libraries (OverDrive, etc).  I suspect it wouldn’t look pretty for library’s offerings. The ALA has a working group trying to figure out what can be done.

Filed under: General

In the Library, With the Lead Pipe: Randall Munroe’s What If as a Test Case for Open Access in Popular Culture

Wed, 2015-04-08 10:30

Randall Munroe, What If. CC-BY-NC 2.5. Retrieved from


In Brief:

Open access to scholarly research benefits not only the academic world but also the general public. Questions have been raised about the popularity of academic materials for nonacademic readers. However, when scholarly materials are available, they are also available to popularizers who can recontextualize them in unexpected and more accessible ways. Randall Munroe’s blog/comic What If uses open access scholarly and governmental documents to answer bizarre hypothetical questions submitted by his readers. His work is engaging, informative, and reaches a large audience. While members of the public may not rush to read open access scientific journals, their availability to writers like Munroe nevertheless contributes to better science education for the general public. Popularizers outside of academia benefit significantly from open access; so do their readers.

Open Access and the Public Good

Open access (OA) is a longstanding and important discussion within librarianship. As Peter Suber explains, the “basic idea of OA is simple: Make research literature available online without price barriers and without most permission barriers.” For a good grounding in the basics of open access, I refer the interested reader to Suber’s book Open Access; for a quick overview of open access, see this blog post by Jill Cirasella.

Open access has many benefits, both to academics and to the wider public. The benefits to academics are obvious: authors get wider distribution of their work, researchers at institutions with small budgets have better access to scholarly materials, and, for librarians, it represents a partial solution to the serials crisis.

In this article, however, I will focus on the benefit of open access to the public. When scholarship is freely available on the Web, it is available not only to scholars, but to anyone with an internet connection, the research skills to locate these materials, and the proficiency to read them. Open access has the potential to support lifelong learning by making scholarship available to people without any current academic affiliation, whether they are professionals in a field that requires continuing education, or hobbyists fascinated by a particular subject, or just people who are interested in many things and want to keep learning. In The Access Principle: The Case for Open Access to Research and Scholarship, John Willinsky describes the value of scholarly information to several specific segments of the public, including medical patients, amateur astronomers, and amateur linguists.

Both Suber and Willinsky cite critics who argue that most members of the public are not interested in reading scholarly articles or books, that the public cannot understand this material, or even that the information could be harmful to them. Suber criticizes the presumptuous attitudes of those who would make these claims, pointing out that the public’s demand for scholarly information cannot be determined until this information is made widely available. Willinsky objects strongly to the presumptuous attitudes of those who question the ability of the public to benefit from open access:

[P]roving that the public has sufficient interest in, or capacity to understand, the results of scholarly research is not the issue. The public’s right to access of this knowledge is not something that people have to earn. It is grounded in a basic right to know.

Willinsky’s argument for the public’s moral right to access scholarly research is both stirring and compelling. This is especially true for librarians, for whom access to information is a professional value.

Open access need not rely on any demonstration that the public has met some arbitrary threshold of interest and education. Without believing in a need for such proofs, I would nevertheless like to present one case illustrating how open access can benefit the public.

The public is, by its very nature, diverse. It includes the amateur and professional users of information cited above. The public also includes popularizers who can use open access scholarly literature in unexpected ways, not only to more widely distribute the fruits of scholarly research but also to create projects of their own. By looking at the role of one such popularizer, Randall Munroe, I will question two assumptions: first, that the public is so uniformly unsophisticated, and second, that they all need to read the open access literature in order to benefit from its wide availability.

What If

What If, a weekly blog that answers hypothetical questions using science and stick figure illustrations, is the work of Randall Munroe. Munroe is a former National Aeronautics and Space Administration (NASA) roboticist but is better known as a webcomic artist. His primary project is popular webcomic xkcd, which explains itself with a disclaimer:

Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).

It would be fair to describe xkcd as a nerdy joke-a-day comic with stick figure art, but I should point out immediately that Munroe has often used it to explain scientific concepts. Notable comics include “Lakes and Oceans,” which illustrates the depth of the Earth’s lakes and oceans in a way that gives a better idea of their scope, and “Up Goer Five,” which uses the simplest possible vocabulary to explain how the Saturn V rocket works. Munroe’s science education agenda is thus visible even in xkcd.

The connection to science education is clearer in What If, in which Munroe uses real, scientific information to provide serious answers to ridiculous hypothetical questions posed by his readership, such as:

What if everything was antimatter, EXCEPT Earth? (“Antimatter”)

What would happen if one person had all the world’s money? (“All the Money”)

At what speed would you have to drive for rain to shatter your windshield? (“Windshield Raindrops”)

Munroe answers these questions using math, science, humor, and art. He pitches his answers appropriately to a smart and curious, but not necessarily scientific, audience. In fact, several questions have been submitted by parents on behalf of their children.

A good example is the first of the questions listed above: “What if everything was antimatter, EXCEPT Earth?” In about 500 words, Munroe covers the proportion of matter to antimatter in the universe, the solar wind, Earth’s magnetic field, the effect of space dust on the Earth’s atmosphere, and the dangers of large asteroids. This sounds like a lot of information, but with Monroe’s straightforward style and amusing illustrations, it is easy to read and understand.

Figure 1: An illustration from “Antimatter.” CC-BY-NC 2.5 Retrieved from

So, What If is humorous and silly, but the questions are taken seriously, and in fact provide real scientific information. Having read this particular post, we know more not only about the prevalence of matter and antimatter, but also about the Earth, asteroids, and more.

What If is extremely popular. In 2014, Munroe published a book including some of the questions he’d answered in the blog along with some others which he felt deserved fuller attention. The book, the #6 overall bestseller on Amazon as of December 10, 2014, has been successful in reaching a large audience. While bestseller status is not necessarily an indicator of the book’s value, it does suggest that a high level of public awareness of this work.

Sourcing Information for Hypothetical Questions

As a guest on National Public Radio’s Science Friday, Munroe explained that What If is driven partially by his own desire to know the answers to the questions that people send him. He likes hypothetical questions, partly because “they’re fun,” but also:

A lot of the time it ends up taking you into exploring a bunch of weird questions or subjects in the real or practical science that you might not have thought about. (Munroe, “Interview”)

Through What If, Munroe uses research to explore questions and information sources. Munroe delves into many different types of sources in order to answer these questions.

Munroe’s sophistication as an information user manifests itself in his use of a wide variety of sources to answer many different kinds of questions. He uses Wikipedia as a starting point and YouTube as a useful source of visualizations, but he’s clearly familiar with a wide variety of ways to search the web and kinds of sources available there. He uses specialized web tools like Wolfram Alpha, a “knowledge engine” built to perform computations and provide controlled web searching. He takes advantage of online educational materials for the clarity with which they explain basic concepts and present mathematical formulae. He consults commercial catalogs to get the specifications on various products—unsurprising behavior for a former engineer! He consults blogs and enthusiast resources, such as amateur aviation and auto-repair sites, where there is a large and knowledgeable fan community. Amid this landscape, academic sources certainly have a place. They provide detailed information and a look at ongoing research, as I’ll discuss further below. Munroe’s frequent use of articles in ResearchGate and arXiv suggests that these repositories are also among his favorite sites.

Munroe’s teasing links to conspiracy sites also hint that he is well aware of the need to evaluate information for accuracy and confident in his ability to do so. He makes an effort to link to high-quality sites, although he has on one occasion (“All The Money”) admitted defeat (when trying to find the angle of repose for coins) and resorted to linking to a message board posting. Still, he carefully considers the information he uses; even when using a fairly standard resource like Google Maps, he looks carefully at the route it recommends. In “Letter to Mom,” he notes with surprise that Google Maps does not take advantage of the Buffalo Valley Rail Trail as a walking route and jokingly suggests it may be haunted. He also acknowledges other kinds of gaps in the information that’s available. His investigation into the amount of data storage available at Google (“Google’s Datacenters on Punch Cards”) works around the fact that Google does not disclose this information by looking into the cost of their data centers and the power that they consume.

In short, throughout What If, Munroe displays a high awareness of the information landscape and a strong ability to find, interpret, and appropriately deploy information, even though his information needs may be highly unorthodox.

Sources Used in What If

Since links serve as citations in the world of the web, I have gone through the entire run of the blog, which included 120 posts as of December 10, 2014, and analyzed the links.

This is an informal analysis; I examined and coded each entry but I have not done any validity tests on the categories. This chart is intended only to give an at-a-glance idea of the general types of sources Munroe consults.

Figure 2: Sources Used in What If

Academic Sources include scholarly journal articles, books, and online course-related materials such as textbooks and slides.

News, Blogs, and Associations includes a wider variety of sources, but what they have in common is that they are written not for professionals or academics. Rather, they address either the general public or a specialized, non-professional community. Here I include news reports, blogs by experts, hobby sites, and so on.

Reference Sources comprise popular online reference sources, mostly Wikipedia but also the International Movie Database (IMDB) and similar sources.

Government and Commercial Documents often present analysis and scientific or technical information. NASA is the biggest source here, with many documents written by engineers.

Data and Images include charts, datasets, results from the online search engine/calculator Wolfram Alpha, videos, and so on.

Self Citation links lead back to other What If posts, to xckd, or to Munroe’s blog.

Other includes links to social networks, other webcomics, company front pages, and so on.

Sophisticated Use of Popular Online Information

Not all the sources Munroe uses are scholarly in nature. Of the source types listed above, three of them — Academic Sources, News Sources (etc.), and Government and Commercial Documents — might provide experimental or analytical information about the phenomenon of interest. This accounts for about half of Munroe’s citations. The remainder serve other purposes, such as reference, demonstration, or humor.

Munroe’s use of sources, including nonscholarly sources, demonstrates his sophistication and understanding of the internet. “Popular Reference Sources” is the largest category other than the three mentioned above; this category is dominated by Wikipedia, the most commonly-cited source in What If. Wikipedia is a commonly reviled source in academic contexts, but Monroe uses it in an appropriate and knowledgeable way.

Munroe understands that Wikipedia is a reference source, and generally points to it when introducing concepts with which his readers may not be familiar. In the antimatter example discussed above, Munroe links to the pages on Baryon asymmetry and CP symmetry when discussing the prevalence of matter and antimatter in the universe. By linking to these pages, he avoids unnecessarily introducing technical jargon into the main text of his article but still invites his readers to learn more about it. Most of his uses of Wikipedia are similarly definitional. Occasionally, they are playful, as in “Balloon Car”, where he breaks the word “catenary” into two links, one to the entry for “cat” and the other for “domestic canary.” Note that this moment communicates something about Munroe’s expectations for his audience; they are of course perfectly capable of both recognizing the joke and searching Wikipedia for the correct term (“catenary”) themselves. The links, then, are only a courtesy to readers. Notably, Wikipedia is not cited in the print book.

In fact, Munroe’s expectation that Wikipedia is a major part of his readers’ information landscape is so strong that he occasionally inserts the Wikipedia tag “[citation needed]” into his articles in an ironic, jokey way, when he takes for granted something that appears obvious. My personal favorite is in post “One-Second Day”,  in which he remarks that the Earth rotates, inserts a “[citation needed]” tag and links to the famous conspiracy site, “Time Cube.”

His use of other popular online sources is similar; a good example is YouTube, to which he frequently links when he needs visual aids. In “Extreme Boating”, he links to several videos showing reactions with different substances through which he proposes rowing a boat.

Academic/Analytical Sources

Munroe is an information omnivore who constantly and intentionally mixes popular and scholarly, humorous and serious. Although he uses Wikipedia heavily for background information, he turns to deeper sources when more precise analysis is needed. His sources for this work include academic journal articles, and also government and commercial documents with scientific or technical content. However, the academic articles are of particular interest in a discussion of open access.

The post about antimatter is a good example. In it, Munroe’s links to Wikipedia links to Wikipedia are used to establish the basic concepts relevant to the question. Later in the post, questions come up that scientists still disagree about; here is where books and articles begin to be cited. The antimatter question leads to a discussion of just how much antimatter is in the universe and whether, for instance, antimatter galaxies could exist; this question is addressed with one scientific article that shows this has not yet been observed and another that proposes a new telescope to further examine the question. In other posts, many other questions are examined using similar sources. “Burning Pollen” cites a chemistry paper explaining the reaction between Diet Coke and Mentos in order to explain oxidation. “Laser Umbrella” cites several scientific articles about vaporizing liquids using lasers, as this question has often been studied. In “Speed Bump,” Munroe is working on a question about the fastest speed at which one can survive going over a speed bump, so an article in a medical journal about spinal injuries from speed bumps is useful.

As noted above, academic articles are not Munroe’s only source of scientific information. Articles from government agencies, particularly NASA, often serve a similar purpose. Munroe also often links to books, either by linking to a book’s record in WorldCat or Amazon, or by using Google Books to link a specific page, often one with a diagram or graph. What If also includes a few links (specifically, twenty-five of them) to educational materials such as class sites, lecture slides, and online textbooks.

For statistics and other kinds of quantitative information, Munroe often turns to other sites. Some of the government documents provide this sort of information, as do commercial entities such as rope manufacturers, cargo transporters, and so on. What If includes citations to data safety sheets and international standards, most notably in “Lake Tea,” which needs to cite standards for several different types of tea in order to answer a question about the strength of the brew made from dumping all the tea in the world into the Great Lakes. He uses Wolfram Alpha, the “computational knowledge engine” for calculations and conversions and Google Maps for locations and distances.

Contributions of Amateurs and Journalists

Finally, popular sources also have a place in What If. Munroe often links to news, professional and hobby associations, and blogs, both those produced by passionate amateurs and those used by professionals to connect to a lay audience. These include the New York Times and Slate, but also the popular Bad Astronomy blog, a visualization blog known as Data Pointed, aviation history enthusiast sites, and a linguistics blog by scholars at the University of Pennsylvania. In most cases, these are used because they provide specific, current information by knowledgeable people.

Thus, academic journals do not have a monopoly on useful scientific information. However, at 13% of all links, they comprise a substantial portion of Munroe’s research.

Open Access in What If

Munroe is aware of the open access movement; he has illustrated the available amount of open access literature (“The Rise of Open Access”).

As of December 10, 2014, Munroe had referenced 100 academic articles in What If, and about 72 of them can be considered open access because their full text is freely available to the public in one way or another.

Within the Open Access movement, authors often refer to two ways to achieve open access—the “gold road” and the “green road.” Gold open access is provided by open access publishers who make their content freely available rather than using paywalls and subscriptions. Green open access is achieved when authors publicly archive their content online, with the permission of their publishers.1

Figure 3: Availability of Academic Articles Cited in What If

For the purposes of this pie chart, anything that Munroe has linked from a repository or an author’s page is considered green open access, and anything linked from the journal’s website is considered gold open access. Because I am attempting to capture the perspective of a reader interested in a particular article rather than that of a publisher or librarian, I am ignoring some nuances important to open access advocates. In particular, I am counting all open access articles that are available through the publishers’ sites as “gold,” even including those which are available via hybrid models The hybrid model, in which subscription journals make some articles available to the public, contingent on author fees, does not support all the goals of the open access movement. However, it does make content available to readers within the journal itself so from a reader’s point of view, it makes sense to classify these articles as gold open access.

“Gold” and “green” open access were used about equally in What If (34% and 38%, respectively). “Gold” open access included some links to very well-known open access publications such as PLOS One, but also a wide variety of other journals and some conference proceedings.  The “green” open access links were to repositories; arXiv, the open access repository of physics papers, appeared frequently, as did academic social networks like ResearchGate and, and of course, many university repositories and faculty websites. Munroe occasionally links to articles that are not freely accessible, including some from major publishers such as Nature, Springer, and Elsevier. For these articles, only the abstracts are available. These comprise 23% of the academic articles cited. This is a substantial proportion of all academic articles, but much smaller than the proportion of open access materials.

Why Open Access Matters to What If

Although Munroe occasionally links to an article that is not freely accessible, open access articles are preferable for obvious reasons. Munroe is a professional cartoonist, not an academic, so his profession does not automatically grant him access to subscription resources. Moreover, he cannot assume that his readers have access to any given closed-access resource. If Munroe succeeds in inspiring in his readers the kind of curiosity about the world that characterizes his own work, they will need resources that they are actually able to access. Open access is thus important to both the success and the quality of What If.

What If is an example of what can be achieved when information, and scholarly information in particular, is made readily available outside of academia. While Munroe depends on information from a variety of sources, the information he gleans from open access academic works is especially important because it connects him directly to the science.

Imagine a non-academic freelancer attempting to write a weekly column like What If in an environment in which all or most scholarly information is available only by subscription. Without academic affiliation, it is very difficult to obtain scholarly material in the quantity in which it is used in What If. To pay the access fee for each article needed would soon become prohibitive. Most current scholarly materials are not held in public libraries, many public libraries limit or charge for their interlibrary loans, and waiting for articles to arrive could affect the weekly publication schedule. Under such circumstances, it is not surprising that popularizers in the past have tended to be either academics or journalists, two professions which grant their practitioners access to information.

What If is driven by Munroe’s wide-ranging curiosity and that of his readers. What If began with the questions that xkcd readers sent to him; he found he too was curious about the answers. Because of the time he spent researching these questions, he decided to write them up and post them on his website. This is suggestive of the way that being an audience member sometimes works on the internet: Munroe’s readers felt sufficiently connected to him to send him these questions, and he felt sufficiently interested in the questions to research and respond to them. The ability to answer the questions to his satisfaction depends on the availability to both Munroe and to his readers of reliable information.

The Readership of What If

Munroe writes What If for a general audience. However, he believes that whether his audience is general or technical, “the principles of explaining things clearly are the same. The only real difference is which things you can assume they already know…” (“Not So Absurd”). Munroe expresses skepticism toward assumptions about general knowledge:

[R]eal people are complicated and busy, and don’t need me thinking of them as featureless objects and assigning them homework. Not everyone needs to know calculus, Python or how opinion polling works. Maybe more of them should, but it feels a little condescending to assume I know who those people are. I just do my best to make the stuff I’m talking about interesting; the rest is up to them. (“Not So Absurd”)

Munroe resists the idea that his audience needs to learn how to do the things that he knows how to do, like using good estimation techniques to understand the size of a number. Instead, he states that “the rest is up to them.” What If uses links according to this principle; anyone can understand the articles without reference to their sources, but the sources are nevertheless available for their reference.

The nature of What If as a question-and-answer site ensures that Munroe is always addressing at least one of his readers directly. Linking his sources, then, becomes part of the answer. Munroe does not simply dispense answers, rather, he encourages his readers to see where the information is coming from. Occasionally, he even makes comments on the things that he links, for example: “a positively stunning firsthand account” (“Visit Every State”), “one of the worst scanned pdfs I’ve ever seen” (“Enforced by Radar”), “a wonderful chart” (“Star Sand”), and so on. In one case (“All the Money”), he links to a book in Google Books and refers to a specific page so that a reader can find the information that he used. Like any citation, these links make it possible for a reader to consult the author’s sources. To a non-academic audience, however, citations are a meaningless gesture if they are not to open access resources. Thus, open access resources are important not only for Munroe to access his sources, but also so that he can share them with his readers. This attitude–that readers should be able to access cited sources in a click–contrasts strongly with that of open access critics who claim there is little public interest in scholarly works. Although in most cases it is not clear how many readers click through, YouTube videos linked from What If do see increased views; many commenters on such videos indicate that they arrived via links from What If.

Why What If Matters for Open Access

Why does it matter what resources are available to the author of a silly blog with stick figure illustrations? Although What If contributes to ongoing science education, the stakes are lower than they are for some of the other things that can be accomplished with open access, such as providing education and medical information to rural, underfunded, or poor areas.

I want to be clear that the purpose of open access is not only to benefit those who are highly educated, famous, and have a large platform of their own. I must also acknowledge that, as a white man on the Internet, Munroe’s path to popularizing scientific information is far smoother than that of others who do not share his privilege. However, I still think What If matters for open access, for several reasons.

First, scholarly information is sneaking into popular culture. What If shows how scholarly information can be relevant to people in their daily lives, even if they only use it to amuse themselves by thinking about unlikely scenarios. This increases the reach of scholarly research and contributes to public science education. There is interest in this information beyond a scholarly or professional context.

In the fields most of interest to Munroe, mathematics, physics, astronomy and earth and environmental sciences, open access has increased faster than most other fields (Bjork et al). Munroe relies on open access in a way that many humanities popularizers like Idea Channel’s Mike Rugnetta do not. However, as open access in the humanities increases, I hope to see projects that make use of it in interesting ways.

Second, What If has a very large audience. As of December 10, 2014, the book based on the blog was the #6 bestselling title on Amazon. Although many readers may never consult the book’s sources, they still benefit from their availability through the existence of What If. Munroe’s role here is that of a popularizer; he reads the scholarly literature that is relevant to his writing and produces something more accessible for the public. What If joins a host of science blogs in recontextualizing science for a different audience (Luzón). Popularizers have been writing since long before the beginning of the open access movement, but open access can make it much easier for popularizers to succeed, especially those who live outside of academia or journalism.

Additionally, although many readers might not click through the links in a What If entry to read the scholarly research that Munroe cites, those who are interested have the ability and the access to do so. What If mediates the information with accessibility and clarity, but because it exists as a born-digital work and because most of the links are to open access materials, readers are invited to examine Munroe’s sources.

Finally, Munroe and his readership stand as an example of a sophisticated, curious, and playful public who, although they may not be members of the scholarly community, have a strong interest in the work that is produced there.


To read What If requires the playfulness to put serious academic work to a silly purpose, the curiosity to learn about the universe in new and unusual contexts, and the sophistication to understand the larger information landscape from which all this proceeds. What If’s readers have the ability to understand the Wikipedia jokes, to have a basic awareness of the existence of both high- and low-quality information on the internet, and to integrate scholarly concepts into this larger landscape.

One of the most intriguing aspects of What If is its repurposing of scholarly information in ways unlikely to occur to more traditional popularizers with an explicitly educational mission. What If is not the work of an academic trying to produce more accessible information for the public; rather, it is the work of one member of the public putting academic work to use in a way that is meaningful for his audience. Munroe’s work draws on scholarly research, but it is markedly different from anything that we would expect to find in an academic context.

Given the success of What If, it is clear that there is a readership for unexpected reuses of scholarly information. Without open access, What If could not exist. As open access expands and the public finds its way to materials it did not previously have available, what other intriguing projects might we see?

Thanks to Hugh Rundle and Jill Cirasella, who pushed me to think through the things that were messy and unfinished in this article, and who asked lots of good, difficult questions.  David Williams helped me with the images and Kelly Blanchat answered a copyright question for me. Thanks also to Steven Ovadia, for arranging the 2014 Grace-Ellen McCrann Lectures, at which an early version of this paper was originally presented, and to everyone who encouraged me to turn that presentation into an article.

References and Further Reading

Bjork, B. C., Laakso, M. Welling, P. & Pateau, P. “Anatomy of Green Open Access.” Journal of the American Society for Information Science and Technology. 64.2 (2014): 237-250. doi: 10.1002/asi.22963. Web. Preprint available at <>

Cirasella, Jill. “Open Access to Scholarly Articles: The Very Basics.” Open Access @ CUNY [blog]. May 18, 2011. Web. <>

–. Interview with Ira Flatow. Science Friday, NPR. 5 Sep. 2014. Web. <>

–. “Randall Munroe of xkcd Answers Our (Not So Absurd) Questions.” [Interview by Walt Hickey.] Five ThirtyEight, September 2, 2014. Web. <>

–. “The Rise of Open Access.” Science 342.6154 (2013): 58-59. Web. doi: 10.1126/sciencef.342.6154.58

–. What If. Web. <>

–. xkcd. Web. <>

Luzón, María José. “Public Communication of Science in Blogs: Recontextualizing Scientific Discourse for a Diversified Audience.” Written Communication 30.4 (2013): 428-457.

Panitch, Judith and Sarah Michalak. “The Serials Crisis: A White Paper for the UNC-Chapel Hill Scholarly Communications Convocation.” Scholarly Communications in a Digital World: A Convocation. January 27-28, 2005. Chapel Hill, North Carolina. Web. <>

Suber, Peter. Open Access. Cambridge, MA: MIT Press, 2012. Web. <>

Willinksy, John. The Access Principle: The Case for Open Access to Research and Scholarship. Cambridge, MA: MIT Press, 2012. Web. <>

  1. Many journals allow self-archiving by default, either immediately or after an embargo period. Other journals may agree to allow self-archiving after negotiation with the author. In some cases, journals allow authors to keep their copyright, so their permission is not required.

DPLA: Open Technical Advisory Committee Call: Wednesday, April 8, 2015, 2:00 PM Eastern

Wed, 2015-04-08 00:12

The DPLA Technical Advisory Committee will hold an open call on Wednesday, April 8 at 2:00 PM Eastern. To register, complete the short registration form here.


  • Continued work on the new DPLA ingestion system
  • IMLS grant awarded to DPLA, Stanford University, and DuraSpace regarding Hydra development
  • Previews of other tech-related sessions at DPLAfest

Library Tech Talk (U of Michigan): Web Accessibility, Part 2: Who Took Away My Tables?!

Wed, 2015-04-08 00:00

If you really love something you might have to let it go.

DuraSpace News: NOW AVAILABLE: Fedora 4.1.1 with Key Design Improvements

Wed, 2015-04-08 00:00

From Andrew Woods on behalf of the Fedora Team

Winchester, MA  We are proud to announce the release of Fedora 4.1.1. The release date was March 27, 2015. In addition to addressing numerous bugs, this 4.1.1 release includes the initial design work of the Fedora 3 to 4 tooling, a Vagrant box, ontology updates and improvements to the Java Client and Camel Component.

Casey Bisson: Sinistrality vs. dextrality in design

Tue, 2015-04-07 20:27

Photo CC-BY-SA Gerry Dincher

This post on why people focus on the right-hand side of a design is an old one, but still valuable today:

These days there is a lot of talk about emotional design and how to properly create a connection between users and our products. Focusing on the right-hand side of our designs can create these connections. We have the ability to influence and change a user’s belief in what is right and honest with our designs.

Take, for example, these simple dialog boxes (above). Which will get the best positive reaction by users? I’ll tell you: it’s the one with the positive actionable button on the right. This is because users trust that the next step will pull them forward in the journey they are on.

By having the positive action on the right, we’re helping create a more intuitive response. So why do we change this in our full sites by placing the most important and actionable elements in an area that the majority of the population doesn’t trust, connect with or find intuitively?

I landed on Wikipedia when I was trying to remember the correct spelling of the Latin sinister and dexter. There’s nothing special about the Latin spelling, as you can see, but I did find these two great articles on heraldry and science/medicine.

District Dispatch: Make some noise for school library funding

Tue, 2015-04-07 19:30

Photo by mikael altemark

This Thursday, the American Library Association is joining a social media push hosted by the National Education Association (NEA) in support of dedicated funding for school libraries through Elementary and Secondary Education Act (ESEA) and the Strengthening Kids’ Interest in Learning and Libraries Act (SKILLS) Act, which would expand federal investment in school libraries so they can continue to offer students the tools they need to develop the critical thinking, digital, and research skills necessary for success in the twenty-first century.

We need to tell legislators how important school libraries and librarians are to our students and we need your voice to take up the rallying cry.

Here’s how you can help:

On Thursday, April 9, 2015, between 7-8pm EST, post one or all of the tweets below, or create your own:

  • School libraries receive over 1.3 billion student visits a year! #getESEAright – include funding for school libraries with the SKILLS Act.
  • Studies show test scores trend higher in schools with full-time, certified school librarians. #getESEAright with the SKILLS Act.
  • Strong school libraries drive student achievement! #getESEAright – include the SKILLS Act!
  • .@PattyMurray Help us #getESEAright – include the SKILLS Act! Listen to teens speak out about the importance of #libraries
  • .@SenAlexander #getESEAright – include the SKILLS Act and support dedicated funding for school libraries.

For more information about ESEA and the SKILLS Act, check out our earlier post on the topic.

Senator Patty Murray (D-WA) and Senator Lamar Alexander (R-TN) are the keys to unlocking more sustainable funding for our school libraries. Join us Thursday on the #getESEAright hashtag and let them know why they should include the SKILLS Act in ESEA.

The post Make some noise for school library funding appeared first on District Dispatch.

FOSS4Lib Recent Releases: Fedora Repository - 4.1.1

Tue, 2015-04-07 15:15

Last updated April 7, 2015. Created by Peter Murray on April 7, 2015.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Friday, March 27, 2015

FOSS4Lib Updated Packages: OpenScholar

Tue, 2015-04-07 13:26

Last updated April 7, 2015. Created by Peter Murray on April 7, 2015.
Log in to edit this page.

OpenScholar is open source software built on top of Drupal that allows end users to easily create dynamic and customizable academic web sites. Each site comes with a suite of apps, widgets and themes, enabling users to build and manage feature-rich web sites.

OpenScholar is developed and maintained by The Institute for Quantitative Social Science in collaboration with HPAC and HUIT at Harvard University with contribution from open source community

Package Type: Content Management SystemLicense: GPLv2 Package Links Releases for OpenScholar Open Hub Link: Hub Stats Widget: works well with: Drupal

Mark E. Phillips: Metadata Edit Events: Part 6 – Average Edit Duration by Facet

Tue, 2015-04-07 12:29

This is the sixth post in a series of posts related to metadata edit events for the UNT Libraries’ Digital Collections from January 1, 2014 to December 31, 2014.  If you are interested in the previous posts in this series,  they talked about the when, what, who, duration based on time buckets and finally calculating the average edit event time.

In the previous post I was able to come up with what I’m using as the edit event duration ceiling for the rest of this analysis.  This means that the rest of the analysis in this post will ignore the events that took longer than 2,100 seconds this leaves us with 91,916 (or 97.6% of the original dataset) valid events to analyze after removing 2,306 that had a duration of over 2,100.


The table below is the user stats for our top ten editors once I’ve ignored items over 2,100 seconds.

username                                    min max edit events duration sum mean stddev htarver 2 2,083 15,346 1,550,926 101.06 132.59 aseitsinger 3 2,100 9,750 3,920,789 402.13 437.38 twarner 5 2,068 4,627 184,784 39.94 107.54 mjohnston 3 1,909 4,143 562,789 135.84 119.14 atraxinger 3 2,099 3,833 1,192,911 311.22 323.02 sfisher 5 2,084 3,434 468,951 136.56 241.99 cwilliams 4 2,095 3,254 851,369 261.64 340.47 thuang 4 2,099 3,010 770,836 256.09 397.57 mphillips 3 888 2,669 57,043 21.37 41.32 sdillard 3 2,052 2,516 1,599,329 635.66 388.3

You can see that many of these users have very short edit times for their lowest edits and all but one have edit times for the maximum that approach the duration ceiling.  The average amount of time spent per edit event ranges from 21 seconds to 10 minutes and 35 seconds.

I know that for user mphillips (me) the bulk of the work I tend to do in the edit system is fixing quick mistakes like missing language codes, editing dates that aren’t in Extended Data Time Format (EDTF) or hiding and un-hiding records.  Other users such as sdillard have been working exclusively on a project to create metadata for a collection of Texas Patents that we are describing in the Portal.


The top ten most edited collections and their statistics are presented below.

Collection Code Collection Name min max edit events duration sum mean stddev ABCM Abilene Library Consortium 2 2,083 8,418 1,358,606 161.39 240.36 JBPC Jim Bell Texas Architecture Photograph Collection 3 2,100 5,335 2,576,696 482.98 460.03 JJHP John J. Herrera Papers 3 2,095 4,940 1,358,375 274.97 346.46 ODNP Oklahoma Digital Newspaper Program 5 2,084 3,946 563,769 142.87 243.83 OKPCP Oklahoma Publishing Company Photography Collection 4 2,098 5,692 869,276 152.72 280.99 TCO Texas Cultures Online 3 2,095 5,221 1,406,347 269.36 343.87 TDNP Texas Digital Newspaper Program 2 1,989 7,614 1,036,850 136.18 185.41 TLRA Texas Laws and Resolutions Archive 3 2,097 8,600 1,050,034 122.1 172.78 TXPT Texas Patents 2 2,099 6,869 3,740,287 544.52 466.05 TXSAOR Texas State Auditor’s Office: Reports 3 1,814 2,724 428,628 157.35 142.94 UNTETD UNT Theses and Dissertations 5 2,098 4,708 1,603,857 340.67 474.53 UNTPC University Photography Collection 3 2,096 4,408 1,252,947 284.24 340.36

This data is a little easier to see with a graph.

Average edit duration per collection

Here is my interpretation of what I see in these numbers based on personal knowledge of these collections.

The collections with the highest average duration are the TXPT and JBPC collection,  these are followed by the UNTETD, UNTPC, TCP and JJHP collections.  The first two (Texas Patents (TXPT) and Jim Bell Texas Architecture Photograph Collection (JBPC) are example of collections that were having metadata records created for the first time via our online editing system.  These collections generally required more investigation (either by reading the patent or researching the photograph) and therefore took more time on average to create the records.

Two of the others, the UNT Theses and Dissertation Collection (UNTETD) and the UNT Photography Collection (UNTPC) involved an amount of copy cataloging for the creation of the metadata either from existing MARC records or local finding aids.  TheJohn J. Herrera Papers (JJHP) involved,  I believe,  a working with an existing finding aid,  and I know that there was a two step process of creating the record,  and then publishing it as unhidden in a different event,  therefore lowering the average time considerably.  I don’t know that much about the Texas Cultures Online (TCO) work in 2014 to be able to comment there.

On the other end of of the spectrum you have collections like ABCM, ODNP, OKPCP, and TDNP that were projects that averaged a much shorter amount of time on records.  For these there were many small edits to the records that were typically completed one field at a time.  For some of these it might have just involved fixing a consistent typo,  adding the record to a collection or hiding or un-hiding it from public view.

This raises a question for me,  is it possible to detect the “kind” of edits that are being made based on their average edit times?  That’s something to look at.

Partner Institutions

And now the ten partner institutions that had the most metadata edit events.

Partner Code Partner Name min max edit events duration sum mean stddev UNTGD UNT Libraries Government Documents Department 2 2,099 21,342 5,385,000 252.32 356.43 OKHS Oklahoma Historical Society 4 2,098 10,167 1,590,498 156.44 279.95 UNTA UNT Libraries Special Collections 3 2,099 9,235 2,664,036 288.47 362.34 UNT UNT Libraries 2 2,098 6,755 2,051,851 303.75 458.03 PCJB Private Collection of Jim Bell 3 2,100 5,335 2,576,696 482.98 460.03 HMRC Houston Metropolitan Research Center at Houston Public Library 3 2,095 5,127 1,397,368 272.55 345.62 HPUL Howard Payne University Library 2 1,860 4,528 544,420 120.23 113.97 UNTCVA UNT College of Visual Arts + Design 4 2,098 4,169 1,015,882 243.68 364.92 HSUL Hardin-Simmons University Library 3 2,020 2,706 658,600 243.39 361.66 HIGPL Higgins Public Library 2 1,596 1,935 131,867 68.15 118.5

Again presented as a simple chart.

Average edit duration per partner.

It is easy to see the difference between the Private Collection of Jim Bell (PCJB) with an average of 482 seconds or roughly 8 minutes per edit and the Higgins Public Library (HIGPL)  which had an average of 68 seconds, or just over one minute.  In the first case with the Private Collection of Jim Bell (PCJB),  we were active in creating records for the first time for these items and the average of eight minutes seems to track with what one would imagine it takes to create a metadata record for a photograph.  The Higgins Public Library (HIGPL) collection is a newspaper collection that had a single change in the physical description made to all of the items in that partner’s collection.  Other partners between these two extremes and have similar characteristics with the lower edit averages happening for partner’s content that is either being edited in a small way, hidden or un-hidden from view.

Resource Type

The final way we will slice the data for this post is by looking at the stats for the top ten resource types.

resource type min max count sum mean stddev image_photo 2 2,100 30,954 7,840,071 253.28 356.43 text_newspaper 2 2,084 11,546 1,600,474 138.62 207.3 text_leg 3 2,097 8,604 1,050,103 122.05 172.75 text_patent 2 2,099 6,955 3,747,631 538.84 466.25 physical-object 2 2,098 5,479 1,102,678 201.26 326.21 text_etd 5 2,098 4,713 1,603,938 340.32 474.4 text 3 2,099 4,196 1,086,765 259 349.67 text_letter 4 2,095 4,106 1,118,568 272.42 326.09 image_map 3 2,034 3,480 673,707 193.59 354.19 text_report 3 1,814 3,339 465,168 139.31 145.96

Average edit duration for the top ten resource types

The resource type that really stands out in this graph is the text_patents at 538 seconds per record.  These items belong to the Texas Patent Collection and they were loaded into the system with very minimal records and we have been working to add new metadata to these resources.  The almost ten minutes per record seems to be very standard for the amount of work that is being done with the records.

The text_leg collection is one that I wanted to take another quick look at.

If we calculate the statistics for the users that edited records in this collection we get the following data.

username                                    min max count sum mean stddev bmonterroso 3 1,825 890 85,254 95.79 163.25 htarver 9 23 5 82 16.4 5.64 mjohnston 3 1,909 3,309 329,585 99.6 62.08 mphillips 5 33 30 485 16.17 7.68 rsittel 3 1,436 654 22,168 33.9 88.71 tharden 3 2,097 1,143 213,817 187.07 241.2 thuang 4 1,812 2,573 398,712 154.96 227.7

Again you really see it with the graph.

Average edit duration for users who edited records that were the text_leg resource type

In this you see that there were a few users (htarver, mphillips, rsittel) who brought down the average duration because they had very quick edits while the rest of the editors either averaged right around 100 seconds per edit average or around two minutes per edit average.

I think that there is more to do with these numbers,  I think calculating the average total duration for a given metadata record in the system as edits are performed on it will be something of interest for a later post. So check back for the next post in this series.

As always feel free to contact me via Twitter if you have questions or comments.

Galen Charlton: Preserving the usefulness of the Hugo Awards as a selection tool for libraries

Tue, 2015-04-07 11:21

The Hugo Awards have been awarded by the World Science Fiction Convention for decades, and serve to recognize the works of authors, editors, directors – fans and professionals – in the genres of science fiction and fantasy.  The Hugos are unique in being a fan-driven award that has as much process – if not more – as juried awards.

That process has two main steps.  First, there’s a nomination period where members of Worldcon select works to appear on the final ballot. Second, members of the upcoming Worldcon vote on the final ballot and the awards are given out at the convention.

Typically, rather more folks vote on the final ballot than nominate – and that means that small, organized groups of people can unduly influence the nominations.  However, there’s been surprisingly few attempts to actually do that.

Until this year.

Many of the nominations this year match the slates of two groups, the “Sad Puppies” and the “Rabid Puppies.”  Not only that, some of the categories contain nothing but Puppy nominations.

The s.f. news site File 770 has a comprehensive collection of back-and-forth about the matter, but suffice it so say that the Puppy slates are have a primarily political motivation – and one, in the interests of full disclosure, that I personally despise.

There are a lot of people saying smart things about the situation, so I’ll content myself with the following observation:

Slate nominations and voting destroy the utility of the Hugo Award lists for librarians who select science fiction and fantasy.

Why? Ideally, the Hugo process ascertains the preferences of thousands of Worldcon members to arrive at a general consensus of science fiction and fantasy that is both good and generally appealing.  As it happens, that’s a pretty useful starting point for librarians trying to round out collections or find new authors that their patrons might like – particularly for those librarians who are not themselves fans of the genre.

However, should slate voting become a successful tactic, the Hugo Awards are in danger of ending up simply reflecting which factions in fandom are best able to game the system.  The results of that… are unlikely to be all that useful for librarians.

Here’s my suggestion for librarians who are fans of science fiction and fantasy and who want to help preserve a collection development tool: get involved.  In particular:

  1. Join Worldcon. A $40 supporting membership suffices to get voting privileges.
  2. Vote on the Hugos this year. I won’t tell you who the vote for, but if you agree with me that slate nominations are a problem, consider voting accordingly.
  3. Next year, participate in the nomination process. Don’t participate in nomination slates; instead, nominate those works that you think are worthy of a Hugo – full stop.