You are here

Feed aggregator

Equinox Software: Evergreen 2012: ownership and interdependence

planet code4lib - Fri, 2016-08-26 13:53

“Cats that Webchick is herding” by Kathleen Murtagh on Flickr (CC-BY)

A challenge common to any large project is, of course, herding the cats. The Evergreen project has pulled off a number of multi-year projects, including completely replacing the public catalog interface, creating acquisitions and serials modules from scratch, creating a kid’s catalog, writing Evergreen’s manual, and instituting a unit and regression testing regime. As we speak, we’re in the middle of a project to replace the staff client with a web-based staff interface.

All of this happened — and continues to happen — in a community where there’s little room for anybody to dictate to another community member to do anything in particular. We have no dictator, benevolent or otherwise; no user enhancement committee; no permanent staff employed by the Evergreen Project.

How does anything get done? By the power of Voltron interdependence.

In 2011, Evergreen become a member project of the Software Freedom Conservancy, representing a culmination of the efforts started in 2010 (as Grace mentioned).

As a member project of Conservancy, Evergreen receives several benefits: Conservancy holds the project’s money, negotiates venue contracts for the annual conference and hack-a-way, and holds the project’s trademark. However, Conservancy does not run the project — nor do they want to.

As part of joining Conservancy, the Evergreen Project established an Oversight Board, and in 2012, I had the privilege of beginning a term as chair of the EOB. The EOB is Conservancy’s interface with the Evergreen Project, and the EOB is the group that is ultimately responsible for making financial decisions.

Aha! You might think to yourself: “So, if the Evergreen Project doesn’t have a dictator in the mold of Linus Torvalds, it has elected oligarchs in the form of the Oversight Board!”

And you would be wrong. The Evergreen Oversight Board does not run the project either. The EOB does not appoint the release managers; it does not dictate who is part of the Documentation Interest Group; it does not mandate any particular sort of QA.

What does the EOB do? In part, it does help establish policies for the entire project; for example, Evergreen’s decision to adopt a code of conduct in 2014 arose from the suggestions and actions of EOB members, including Kathy Lussier and Amy Terlaga. It also, in conjunction with Conservancy, helps to protect the trademark.

The trademark matters. It represents a key piece of collective ownership, ownership that is in the hands of the community via a nonprofit, disinterested organization. Evergreen is valuable, not just as a tool that libraries can use to help patrons get access to library resources, but in part as something that various institutions have built successful services (commercial or otherwise) on.  If you take nothing else away from this post, take this: if you plan to launch an open source project for the benefit of libraries, give a thought to how the trademark should be owned and managed.  The consequences of not doing so can end up creating a huge distraction from shipping excellent software… or worse.

But back to the question of governance: how does the day to day work of writing documentation, slinging code, updating websites, training new users, seeking additional contributors, unruffling feathers, and so forth get done? By constant negotiation in a sea of interdependence. This is complicated, but not chaotic. There are plenty of contracts helping protect the interests of folks contributing to and using Evergreen: contracts with non-profit and for-profit service providers like Equinox; contracts to join consortia; contracts to pool money together for a specific project. There are also webs of trust and obligation: a developer can become a committer by showing that they are committed to improving Evergreen and have a track record of doing so successfully.

Governance is inescapable in any project that has more than one person; it is particularly important in community-based open source projects. Evergreen has benefited from a lot of careful thought about formal and informal rules and lines of communication…. and will continue to do so.

— Galen Charlton, Added Services and Infrastructure Manager

This is the seventh in our series of posts leading up to Evergreen’s Tenth birthday.

OCLC Dev Network: Leveraging Client-Side API and Linked Data Support

planet code4lib - Fri, 2016-08-26 13:00

See an example of how client-side support in APIs and Linked Data create opportunities for innovative applications.

FOSS4Lib Recent Releases: DIVA.js - 5.0

planet code4lib - Fri, 2016-08-26 12:55

Last updated August 26, 2016. Created by Peter Murray on August 26, 2016.
Log in to edit this page.

Package: DIVA.jsRelease Date: Thursday, August 25, 2016

LibUX: Improve your UX with Google Analytics

planet code4lib - Fri, 2016-08-26 11:45

Michael Beasley — author of Practical Web Analytics for User Experience — shares really quite useful tips for using Google Analytics to infer intent.

Evergreen ILS: Evergreen 2.9.7 and 2.10.6 released

planet code4lib - Fri, 2016-08-26 00:59

We are pleased to announce the release of Evergreen 2.9.7 and 2.10.6, both bugfix releases.

Evergreen 2.9.7 fixes the following issues:

  • The claims never checked out counter on the patron record is now incremented correctly when marking a lost loan as claims-never-checked-out.
  • When a transit is canceled, the copy’s status is changed only if its status was previously “In Transit”.
  • Retrieving records with embedded holdings via SRU and Z39.50 is now faster.
  • The hold status message in the public catalog now uses better grammar.
  • The error message displayed when a patron attempts to place a hold but is prevented from doing so due to policy reasons is now more likely to be useful.
  • The public catalog now draws the edition statement only from the 250 field; it no longer tries to check the 534 and 775 fields.
  • Embedded microdata now uses “offeredBy” rather than “seller”.
  • The ContentCafe added content plugin now handles the “fake” ISBNs that Baker and Taylor assigns to media items.
  • Attempting to renew a rental or deposit item in the public catalog no longer causes an internal server error.
  • Various format icons now have transparent backgrounds (as opposed to white).
  • The staff client will no longer wait indefinitely for Novelist to supply added content, improving its responsiveness.
  • A few additional strings are now marked as translatable.

Evergreen 2.10.6 fixes the same issues fixed in 2.9.7, and also fixes the following:

  • Those stock Action Trigger event definitions that send email will now include a Date header.
  • Prorating invoice charges now works again.
  • A performance issue with sorting entries on the public catalog circulation history page is fixed.
  • Various style and responsive design improvements are made to the circulation and holds history pages in the public catalog.
  • The public catalog holds history page now indicates if a hold had been fulfilled.

Evergreen 2.10.6 also includes updated translations. In particular, Spanish has received a huge update with over 9,000 new translations, Czech has received a sizable update of over 800 translations, and additional smaller updates have been added for Arabic, French (Canada), and Armenian.

Please visit the downloads page to retrieve the server software and staff clients.

David Rosenthal: Evanescent Web Archives

planet code4lib - Thu, 2016-08-25 18:00
Below the fold, discussion of two articles from last week about archived Web content that vanished.

At Urban Milwaukee Michail Takach reports that Journal Sentinel Archive Disappears:
Google News Archive launched [in 2008] with ambitious plans to scan, archive and release the world’s newspapers in a single public access database. ... When the project abruptly ended three years later, the project had scanned over a million pages of news from over 2,000 newspapers. Although nobody is entirely sure why the project ended, Google News Archive delivered an incredible gift to Milwaukee: free digital access to more than a century’s worth of local newspapers.But now:
on Tuesday, August 16, the Milwaukee Journal, Milwaukee Sentinel, and Milwaukee Journal Sentinel listings vanished from the Google News Archive home page. This change came without any advance warning and still has no official explanation. The result for Takach is:
For years, I’ve bookmarked thousands of articles and images for further exploration at a later date. In one lightning bolt moment, all of my Google News Archive bookmarks went from treasure to trash. To be fair, this doesn't appear to be another case of Google abruptly canceling a service:
“Google News Archive no longer has permission to display this content.” According to the Milwaukee Journal Sentinel:
“We have contracted with a new vendor (Newsbank.) It is unclear when or if the public will have access to the full inventory that was formerly available on Google News Archive.” The owner of the content arbitrarily decided to vanish it.

At U.S. News & World Report Steven Nelson's Wayback Machine Won’t Censor Archive for Taste, Director Says After Olympics Article Scrubbed is an excellent, detailed and even-handed look at the issues raised for the Internet Archive when the Daily Beast's:
straight reporter created a gay dating profile and reported the weights, athletic events and nationalities of Olympians who contacted him, including those from "notoriously homophobic" countries. As furor spread last week, the Daily Beast revised and then retracted the article, sending latecomers to the controversy to the Wayback Machine. The Internet Archive has routine processes that make content they have collect inaccessible, for example in response to DMCA takedown notices. It isn't clear exactly what happened in this case. Mark Graham is quoted:
“The page we’re talking about here was removed from the Wayback Machine out of a concern for safety and that’s it.”... Graham was not immediately able to think of a similar safety-motivated removal and declined to say if the Internet Archive retains a non-public copy. In fact, he says he has no proof, just circumstantial evidence, the article ever was in the Wayback Machine.I would endorse Chris Bourg's stance on this issue:
Chris Bourg, director of libraries at the Massachusetts Institute of Technology, says the matter is a "a tricky situation where librarian/archivists values of privacy and openness come in to conflict" and says in an email the article simply could be stored in non-public form for as long as necessary.

"My personal opinion is that we should always look for answers that cause the least harm, which in this case would be to dark archive the article; and keep it archived for as long as needed to best protect the gay men who might otherwise be outed," she says. "That’s a difficult thing to do, and is no guarantee that the info won’t be released and available from other sources; but I think archivists/librarians have special responsibilities to the subjects in our collections to 'do no harm'."These two stories bring up four points to consider:
  • The Internet Archive is the most-used, but only one among a number of Web archives which will naturally have different policies. Portals to the archived Web that use Memento to aggregate their content, such as, could well find content the Wayback machine had suppressed in other archives.
  • Copyright enables censorship. Anything on the public Web, or in public Web archives, can be rendered inaccessible without notice by the use or abuse of copyright processes, such as the DMCA takedown process.
  • Just because archived Web resources are in the custody of a major company, such as Google, or even what we may now thankfully call a major institution, the Internet Archive, does not guarantee them permanence.
  • Thus, scholars such as Takach are faced with a hard choice, either to risk losing access without notice to the resources on which their work is based, or to ignore the law and maintain a personal archive stored in their own equipment of all those resources.
While not specifically about Web archives, emptywheel's account of the removal of the Shadow Brokers files from GitHub, Reddit and Tumblr, and Roxane Gay's The Blog That Disappeared about Google's termination of Dennis Cooper's account, show that one cannot depend on what services such as these say in their Terms of Service.

LITA: New Titles in the LITA Guide Series

planet code4lib - Thu, 2016-08-25 17:55

A new relationship between LITA and Rowman and Littlefield publishers kicks off with the announcement of 7 recent and upcoming exciting titles on library technology. The LITA Guide Series books from Rowman and Littlefield publishers, contain practical, up to date, how-to information, and are usually under 100 pages. Proposals for new titles can be submitted to the Acquisitions editor using this link.

LITA members receive a 20% discount on all the titles. To get that discount, use promotion code RLLITA20 when ordering from the Rowman and Littlefield LITA Guide Series web site.


Here are the current new LITA Guide Series titles:

Integrating LibGuides into Library Websites
Edited by Aaron W. Dobbs and Ryan L. Sittler (October 2016)

Innovative LibGuides Application: Real World Examples
Edited by Aaron W. Dobbs and Ryan L. Sittler (October 2016)

Data Visualization: A Guide to Visual Storytelling for Libraries
Edited by Lauren Magnuson (September 2016)

Mobile Technologies in Libraries
Ben Rawlins (September 2016)

Library Service Design: A LITA Guide to Holistic Assessment, Insight, and Improvement
Joe J. Marquez and Annie Downey (July 2016)

The Librarian’s Introduction to Programming Languages
Edited by Beth Thomsett-Scott (June 2016)

Digitizing Flat Media: Principles and Practices
Joy M. Perrin (December 2015)

LITA publications help to fulfill its mission to educate, serve and reach out to its members, other ALA members and divisions, and the entire library and information community through its publications, programs and other activities designed to promote, develop, and aid in the implementation of library and information technology.

Open Knowledge Foundation: OpenTrials launch date + Hack Day

planet code4lib - Thu, 2016-08-25 13:26

Exciting news! OpenTrials, a project in which Open Knowledge is developing an open, online database of information about the world’s clinical research trials, will officially launch its beta on Monday 10th October 2016 at the World Health Summit in Berlin. After months of work behind-the-scenes meeting, planning, and developing, we’re all really excited about demoing OpenTrials to the world and announcing how to access and use the site!

The launch will take place at the ‘Fostering Open Science in Global Health’ workshop, with OpenTrials being represented by our Community Manager, Ben Meghreblian. The workshop will be a great opportunity to talk about the role of open data, open science, and generally how being open can bring improvements in medicine and beyond!

As the workshop’s theme is public health emergencies, we’ll also be demoing Ebola Trials Tracker, another OpenTrials project showing how long it takes for the results of Ebola trials to be made available.

If you’ll be attending the conference or the workshop, we’d love to meet you – please do get in touch and let us know.

Hack Day

If that wasn’t enough, we also have a confirmed date and location for the OpenTrials Hack Day – it will take place on Saturday 8th October at the German office of Wikimedia in Berlin.

We’re inviting people from a range of backgrounds. So, if you’re developer, data scientist, health technologist, open data advocate, or otherwise interested in health, medicine, and clinical trials, come along and learn more about the data that powers OpenTrials, how it’s structured, and how to use our API to search the OpenTrials database or build applications using the data.

On the day our technical lead and a domain expert will be on hand to explain the data and facilitate the day – we’re really looking forward to seeing what clever hacks and mini-projects you’ll create.

For those of you who have already asked, we’ll be releasing documentation on the OpenTrials API and database soon, but meanwhile if you’re interested in the event you’ll find more details on the OpenTrials Eventbrite page, or you can register quickly below.

OpenTrials is funded by The Laura and John Arnold Foundation and directed by Dr. Ben Goldacre, an internationally known leader on clinical transparency.

Twitter: @opentrials

Powered by Eventbrite

Karen Coyle: Catalogs and Content: an Interlude

planet code4lib - Thu, 2016-08-25 03:02
This entire series is available a single file on my web site.

"Editor's note. Providing subject access to information is one of the most important professional services of librarians; yet, it has been overshadowed in recent years by AACR2, MARC, and other developments in the bibliographic organization of information resources. Subject access deserves more attention, especially now that results are pouring in from studies of online catalog use in libraries."
American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83Having thought and written about the transition from card catalogs to online catalogs, I began to do some digging in the library literature, and struck gold. In 1984, Pauline Atherton Cochrane, one of the great thinkers in library land, organized a six-part "continuing education" to bring librarians up to date on the thinking regarding the transition to new technology. (Dear ALA - please put these together into a downloaded PDF for open access. It could make a difference.) What is revealed here is both stunning and disheartening, as the quote above shows; in terms of catalog models, very little progress has been made, and we are still spending more time organizing atomistic bibliographic data while ignoring subject access.

The articles are primarily made up of statements by key library thinkers of the time, many of whom you will recognize. Some responses contradict each other, others fall into familiar grooves. Library of Congress is criticized for not moving faster into the future, much as it is today, and yet respondents admit that the general dependency on LC makes any kind of fast turn-around of changes difficult. Some of the desiderata have been achieved, but not the overhaul of subject access in the library catalog.

The BackgroundIf you think that libraries moved from card catalogs to online catalogs in order to serve users better, think again. Like other organizations that had a data management function, libraries in the late 20th century were reaching the limits of what could be done with analog technology. In fact, as Cochrane points out, by the mid-point of that century libraries had given up on the basic catalog function of providing cross references from unused to used terminology, as well as from broader and narrower terms in the subject thesaurus. It simply wasn't possible to keep up with these, not to mention that although the Library of Congress and service organizations like OCLC provided ready-printed cards for bibliographic entries, they did not provide the related reference cards. What libraries did (and I remember this from my undergraduate years) is they placed near the card catalog copies of the "Red Book". This was the printed Library of Congress Subject Heading list, which by my time was in two huge volumes, and, yes, was bound in red. Note that this was the volume that was intended for cataloging librarians who were formulating subject headings for their collections. It was never intended for the end-users of the catalog. The notation ("x", "xx", "sa") was far from intuitive. In addition, for those users who managed to follow the references, it pointed them to the appropriate place in LCSH, but not necessarily in the catalog of the library in which they were searching. Thus a user could be sent to an entry that simply did not exist.

The "RedBook" todayFrom my own experience, when we brought up the online catalog at the University of California, the larger libraries had for years had difficulty keeping the card catalog up to date. The main library at the University of California at Berkeley regularly ran from 100,000 to 150,000 cards behind in filing into the catalog, which filled two enormous halls. That meant that a book would be represented in the catalog about three months after it had been cataloged and shelved. For a research library, this was a disaster. And Berkeley was not unusual in this respect.

Computerization of the catalog was both a necessary practical solution, as well as a kind of holy grail. At the time that these articles were written, only a few large libraries had an online catalog, and that catalog represented only a recent portion of the library's holdings. (Retrospective conversion of the older physical card catalog to machine-readable form came later, culminating in the 1990's.) Abstracting and indexing databases had preceded libraries in automating, DIALOG, PRECIS, and others, and these gave librarians their first experience in searching computerized bibliographic data.

This was the state of things when Cochrane presented her 6-part "continuing education" series in American Libraries.

Subject AccessThe series of articles was stimulated by an astonishingly prescient article by Marcia Bates in 1977. In that article she articulates both concerns and possibilities that, quite frankly, we should all take to heart today. In Lesson 3 of Cochrane's articles, Bates is quotes from 1977 saying:
"...with automation, we have the opportunity to introduce many access points to a given book. We can now use a subject approach... that allows the naive user, unconscious of and uninterested in the complexities of synonymy and vocabulary control, to blunder on to desired subjects, to be guided, without realizing it, by a redundant but carefully controlled subject access system." and
"And now is the time to change -- indeed, with MARC already so highly developed, past time. If we simply transfer the austerity-based LC subject heading approach to expensive computer systems, then we have used our computers merely to embalm the constraints that were imposed on library systems back before typewriters came into use!"
This emphasis on subject access was one of the stimuli for the AL lessons. In the early 1980's, studies done at OCLC and elsewhere showed that over 50% of the searches being done in the online catalogs of that day were subject searches, even those going against title indexes or mixed indexes. (See footnotes to Lesson 3.) Known item searching was assumed to be under control, but subject searching posed significant problems. Comments in the article include:
"...we have not yet built into our online systems much of the structure for subject access that is already present in subject cataloging. That structure is internal and known by the person analyzing the work; it needs to be external and known by the person seeking the work."
"Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"Interestingly, I don't see that any of these problems has been solved into today's systems.

As a quick review, here are some of the problems, some proposed solutions, and some hope for future technologies that are presented by the thinkers that contributed to the lessons.

Problems notedMany problems were surfaced, some with fairly simple solutions, others that we still struggle with.
  • LCSH is awkward, if not nearly unusable, both for its vocabulary and for the lack of a true hierarchical organization
  • Online catalogs' use of LCSH lacks syndetic structure (see, see also, BT, NT). This is true not only for display, but in retrieval, search on a broader term does not retrieve items with a narrower term (which would be logical to at least some users)
  • Libraries assign too few subject headings
  • For the first time, some users are not in the library while searching so there are no intermediaries (e.g. reference librarians) available. (One of the flow diagrams has a failed search pointing to a box called "see librarian" something we would not think to include today.)
  • Lack of a professional theory of information seeking behavior that would inform systems design. ("Without a blueprint of how most people want to search, we will continue to force them to search the we want to search." Lesson 5)
  • Information overload, aka overly large results, as well as too few results on specific searches

Proposed solutionsSome proposed solutions were mundane (add more subject headings to records) while others would require great disruption to the library environment.
  • Add more subject headings to MARC records
  • Use keyword searching, including keywords anywhere in the record.
  • Add uncontrolled keywords to the records.
  • Make the subject authority file machine-readable and integrate it into online catalogs.
  • Forget LCSH, instead use non-library bibliographic files for subject searching, such as A&I databases.
  • Add subject terms from non-library sources to the library catalog, and/or do (what today we call) federated searching
  • LCSH must provide headings that are more specific as file sizes and retrieved sets grow (in the document, a retrieved set of 904 items was noted with an exclamation point)

Future thinking As is so often the case when looking to the future, some potential technologies were seen as solutions. Some of these are still seen as solutions today (c.f. artificial intelligence), while others have been achieved (storage of full text).
  • Full text searching, natural language searches, and artificial intelligence will make subject headings and classification unnecessary
  • We will have access to back-of-the-book indexes and tables of contents for searching, as well as citation indexing
  • Multi-level systems will provide different interfaces for experts and novices
  • Systems will be available 24x7, and there will be a terminal in every dorm room
  • Systems will no longer need to use stopwords
  • Storage of entire documents will become possible
End of InterludeAlthough systems have allowed us to store and search full text, to combine bibliographic data from different sources, and to deliver world-wide, 24x7, we have made almost no progress in the area of subject access. There is much more to be learned from these articles, and it would be instructive to do an in-depth comparison of them to where we are today. I greatly recommend reading them, each is only a few pages long.

----- The Lessons -----*Modern Subject Access in the Online Age: Lesson 1
by Pauline Atherton Cochrane
Source: American Libraries, Vol. 15, No. 2 (Feb., 1984), pp. 80-83
Stable URL:

*Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL:

*Modern Subject Access in the Online Age: Lesson 3
Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr.
Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255
Stable URL:

*Modern Subject Access in the Online Age: Lesson 4
Author(s): Pauline A. Cochrane, Carol Mandel, William Mischo, Shirley Harper, Michael Buckland, Mary K. D. Pietris, Lucia J. Rather and Fred E. Croxton
Source: American Libraries, Vol. 15, No. 5 (May, 1984), pp. 336-339
Stable URL:

*Modern Subject Access in the Online Age: Lesson 5
Author(s): Pauline A. Cochrane, Charles Bourne, Tamas Doczkocs, Jeffrey C. Griffith, F. Wilfrid Lancaster, William R. Nugent and Barbara M. Preschel
Source: American Libraries, Vol. 15, No. 6 (Jun., 1984), pp. 438-441, 443
Stable URL:

*Modern Subject Access In the Online Age: Lesson 6
Author(s): Pauline A. Cochrane, Brian Aveney and Charles Hildreth Source: American Libraries, Vol. 15, No. 7 (Jul. - Aug., 1984), pp. 527-529
Stable URL:

DuraSpace News: KnowledgeArc Launches the NHS and FFI on DSpace

planet code4lib - Thu, 2016-08-25 00:00

By Michael Guthrie We are pleased to welcome the NHS Derby Teaching Hospitals and the Norwegian Defence Research Establishment (FFI) to the KnowledgeArc managed, hosted DSpace platform.

Eric Lease Morgan: Limit to full text in VuFind

planet code4lib - Wed, 2016-08-24 20:16

This posting outlines how a “limit to full text” functionality was implemented in the “Catholic Portal’s” version of VuFind.

While there are many dimensions of the Catholic Portal, one of its primary components is a sort of union catalog of rare and infrequently held materials of a Catholic nature. This union catalog is comprised of metadata from MARC records, EAD files, and OAI-PMH data repositories. Some of the MARC records include URLs in 856$u fields. These URLs point to PDF files that have been processed with OCR. The Portal’s indexer has been configured to harvest the PDF documents, when it comes across them. Once harvested the OCR is extracted from the PDF file, and the resulting text is added to the underlying Solr index. The values of the URLs are saved to the Solr index as well. Almost by definition, all of the OAI-PMH content indexed by Portal is full text; almost all of the OAI-PMH content includes pointers to images or PDF documents.

Consequently, if a reader wanted to find only full text content, then it would be nice to: 1) do a search, and 2) limit to full text. And this is exactly what was implemented. The first step was to edit Solr’s definiton of the url field. Specifically, its “indexed” attribute was changed from false to true. Trivial. Solr was then restarted.

The second step was to re-index the MARC content. When this is complete, the reader is able to search the index for URL content — “url:*”. In other words, find all records whose URL equals anything.

The third step was to understand that all of the local VuFind OAI-PMH identifiers have the same shape. Specifically, they all include the string “oai”. Consequently, the very astute reader could find all OAI-PMH content with the following query: “id:*oai*”.

The third step was to turn on a VuFind checkbox option found in facets.ini. Specifically, the “[CheckboxFacets]” section was augmented to include the following line:

id:*oai* OR url:* = “Limit to full text”

When this was done a new facet appeared in the VuFind interface.

Finally, the whole thing comes to fruition when a person does an initial search. The results are displayed, and the facets include a limit option. Upon selection, VuFind searches again, but limits the query by “id:*oai* OR url:*” — only items that have URLs or come from OAI-PMH repositories. Pretty cool.

Kudos go to Demian Katz for outlining this process. Very nice. Thank you!

LITA: Jobs in Information Technology: August 24, 2016

planet code4lib - Wed, 2016-08-24 18:51

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

American Institute for Radiologic Pathology, Medical Archivist / Case manager, Silver Spring, MD

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Andromeda Yelton: An open letter to Heather Bresch

planet code4lib - Wed, 2016-08-24 13:49

Dear Heather Bresch,

You lived in Morgantown. I did, too: born and raised. My parents are retired from the university you attended. My elementary school took field trips to Mylan labs. They were shining, optimistic.

You’re from West Virginia. I am, too. This means we both know something of the coal industry that has both sustained and destroyed our home. You know, as I do, how many miners have been killed in explosions: trapped underground when a pocket of methane ignites. We both know that miners long carried safety lamps: carefully shielded but raw flames that would go out when the oxygen went too low, a warning to get away — if they had not first exploded, as open flames around methane do. Perhaps you know, as I only recently learned, that miners were once required to buy their own safety lamps: so when safer ones came out, ones that would only warn without killing you first, miners did not carry them. They couldn’t afford to. They set probability against their lives, went without the right equipment, and sometimes lost, and died.

I’m a mother. You are, too. I don’t know if your children carry medication for life-threatening illnesses; I hope you have not had to face that. I have. In our case it’s asthma, not allergies, and an inhaler, not an Epi-Pen. It’s a $20 copay with our insurance and lasts for dozens of doses. It doesn’t stop asthma attacks once they start — my daughter’s asthma is too severe for that — but sometimes it prevents them. And when it does not, it still helps: we spend two days in the hospital instead of five; we don’t go to the ICU. (Have you ever been with your child in a pediatric ICU? It is the most miraculous, and the worst, place on earth.)

Most families can find their way to twenty dollars. Many cannot find six hundred. They’ll go without, and set probability against their children’s lives. Rich children will live; poor children will sometimes lose, and die.

I ask you to reconsider.


Andromeda Yelton

Equinox Software: Year 2010 : Sine Qua Non

planet code4lib - Wed, 2016-08-24 13:43

This is the fifth in our series of posts leading up to Evergreen’s Tenth birthday.  

I often tell people I hire that when you start a new job the first month is the honeymoon period. At month three you are panicking and possibly wondering why you thought you could do this. At six months you realize you’ve actually got the answers and at twelve months it’s like you never worked anywhere else. For me, 2010 represented months six through eighteen of my employment with Equinox and it was one of the most difficult, rewarding, and transformative years of my career. Coincidentally, it was also an incredibly transforming year for Evergreen.

In early 2010, Evergreen 1.6 was planned and released on schedule thanks to contributing efforts from the usual suspects back at that time. Bug fixes and new development were being funded or contributed by PINES, Conifer, Mohawk College, Evergreen Indiana, Calvin College, SAGE, and many others in the community. Somewhere in the midst of the ferocious adoption rate and and evolution of 2010, Evergreen quietly and without fanfare faced (and passed) its crucible. Instead of being thrown off stride, this amazingly determined community not only met the challenge, but deftly handled the inevitable friction that was bound to arise as the community grew.

In late August of 2010 KCLS went live on a beta version of Evergreen 2.0 after just over a year of intense and exhilarating development. It marked the beginning of another major growth spurt for Evergreen, including full support for Acquisitions, Serials, as well as the introduction of the template toolkit OPAC (or TPAC). I have nothing but positive things to say about the teams that worked to make that go-live a reality. KCLS and Equinox did amazing things together and, while not everything we did was as successful as we had envisioned, we were able to move Evergreen forward in a huge leap. More importantly, everyone involved learned a lot about ourselves and our organizations – including the community itself.

The community learned that we were moving from a small group of “insiders” and enthusiasts into a more robust and diverse community of users. This is, of course, natural and desirable for an open source project but the thing that sticks out in my mind is how quickly and easily the community adapted to rapid change. At the Evergreen Conference in 2010 a dedicated group met and began the process of creating an official governance structure for the Evergreen project. This meeting led to the eventual formation of the Evergreen Oversight Board and our current status as a member project of the Software Freedom Conservancy.

In the day-to-day of the Evergreen project I witnessed how the core principles of open source projects could shape a community of librarians. And I was proud to see how this community of librarians could contribute their core principles to strengthen the project and its broader community. We complement one another even as we share the most basic truths:
*The celebration of community
*The merit of the individual
*The empowerment of collaboration
*The belief that information should be free

Evergreen is special. More importantly, our community is special. And it’s special because behind each line of code there are dozens of people who contributed their time to create it. Each of those people brought with them their passion, their counter-argument, their insight, their thoughtfulness, and their sheer determination. And together, this community created something amazing. They made great things. They made mistakes. They learned. They adapted. They persevered. And those people behind those lines of code? They’re not abstractions. They are people I know and respect; people who have made indelible marks on our community. It’s Mike, Jason, Elizabeth, Galen, Kathy, Bill, Amy, Dan, Angela, Matt, Elaine, Ben, Tim, Sharon, Lise, Jane, Lebbeous, Rose, Karen, Lew, Joan, and too many others to name. They’re my community and when I think back on how much amazing transformation we’ve achieved in just one year, or ten years, I can’t wait to see what we do in the next ten.

– Grace Dunbar, Vice President

Open Knowledge Foundation: Open Knowledge Switzerland Summer 2016 Update

planet code4lib - Wed, 2016-08-24 10:46

The first half of 2016 was a very busy one for the Open Knowledge Swiss chapter, Just between April to June the chapter had 3 Hackathons, 15 talks, 3 meetups and 10 workshops. In this blog post we highlight some of these activities to update the Open Knowledge Community about our chapter’s work.


Main projects

Our directors worked on relaunching the federal Open Government Data portal and its new online handbook. We gathered and published datasets and ran workshops in support of various hackdays – and we migrated and improved our web infrastructure with better support of the open Transport API (handling up to 1.7 Mio requests per day!).


Main events

We held our annual conference in June, ran energy-themed hackdays in April and ran an OpenGLAM hackathon in July. Additionally, we supported two smaller regional hackathons in the spring, and a meetup on occasion of Open Data Day.



Like other organisations in this space, our main challenge is redefining our manifesto and restructuring our operations to become a smoother running chapter that is more responsive to the needs of our members and community. This restructuring continues to be a challenge that we are learning from – and need to learn more about.



Our media presence and public identity continues to be stronger than ever. We are involved in a wide range of political and inter-organizational activities in support of diverse areas of openness, and in general we are finding that our collective voice is stronger and our messages better received everywhere we go.



We have had several retreats with the board to discuss changes in the governance and to welcome new directors: Catherine Pugin (,, Martin Grandjean ( and Alexandre Cotting (

We are primarily working on a better overall organizational structure to support our community and working groups: starting and igniting new initiatives will be the next step. Among them will be the launch of business-oriented advocacy group called “Swiss Data Alliance”.


Looking ahead

We will soon announce a national program on food data, which includes hackdays and a funded follow-up/incubation phase for prototypes produced. And we are busy setting up a hackathon at the end of September with international scope and support called Hack for Ageing Well. Follow #H4AW for more info.

We are excited about upcoming cross-border events like #H4AW and Jugend Hackt, opening doors to development and research collaborations. Reach out through the Open Knowledge forums and we’ll do our best to connect you into the Swiss community!

LibUX: Helping users easily access content on mobile

planet code4lib - Wed, 2016-08-24 04:55

 Pages that show intrusive interstitials provide a poorer experience to users than other pages where content is immediately accessible. This can be problematic on mobile devices where screens are often smaller. To improve the mobile search experience, after January 10, 2017, pages where content is not easily accessible to a user on the transition from the mobile search results may not rank as highly.

I wonder, by their description, whether this describes exit-intent pop-ups OptinMonster made popular.

 Showing a popup that covers the main content, either immediately after the user navigates to a page from the search results, or while they are looking through the page.

One can hope.

Helping users easily access content on mobile

LibUX: A few things Brodie Austin learned doing usability tests on library websites

planet code4lib - Wed, 2016-08-24 04:43


 My #1 rule when it came to thinking about website usability was that no one was allowed to claim to know what “normal people” would think or do until we actually sat down with normal(ish) people.

So, you want to do usability testing on your library website

Galen Charlton: Visualizing the global distribution of Koha installations from Debian packages

planet code4lib - Wed, 2016-08-24 04:15

A picture is worth a thousand words:

Click to get larger image.

This represents the approximate geographic distribution of downloads of the Koha Debian packages over the past year. Data was taken from the Apache logs from, which MPOW hosts. I counted only completed downloads of the koha-common package, of which there were over 25,000.

Making the map turned out to be an opportunity for me to learn some Python. I first adapted a Python script I found on Stack Overflow to query and get the latitude and longitude corresponding to each of the 9,432 distinct IP addresses that had downloaded the package.

I then fed the results to OpenHeatMap. While that service is easy to use and is written with GPL3 code, I didn’t quite like the fact that the result is delivered via an Adobe Flash embed.  Consequently, I turned my attention to Plotly, and after some work, was able to write a Python script that does the following:

  1. Fetch the CSV file containing the coordinates and number of downloads.
  2. Exclude as outliers rows where a given IP address made more than 100 downloads of the package during the past year — there were seven of these.
  3. Truncate the latitude and longitude to one decimal place — we need not pester corn farmers in Kansas for bugfixes.
  4. Submit the dataset to Plotly with which to generate a bubble map.

Here’s the code:

#!/usr/bin/python # adapted from example found at import plotly.plotly as py import pandas as pd df = pd.read_csv('') df.head() # scale factor the size of the buble scale = 3 # filter out rows where an IP address did more than # one hundred downloads df = df[df['value'] <= 100] # truncate latitude and longitude to one decimal # place df['lat'] = df['lat'].map('{0:.1f}'.format) df['lon'] = df['lon'].map('{0:.1f}'.format) # sum up the 'value' column as 'total_downloads' aggregation = { 'value' : { 'total_downloads' : 'sum' } } # create a DataFrame grouping by the truncated coordinates df_sub = df.groupby(['lat', 'lon']).agg(aggregation).reset_index() coords = [] pt = dict( type = 'scattergeo', lon = df_sub['lon'], lat = df_sub['lat'], text = 'Downloads: ' + df_sub['value']['total_downloads'], marker = dict( size = df_sub['value']['total_downloads'] * scale, color = 'rgb(91,173,63)', # Koha green line = dict(width=0.5, color='rgb(40,40,40)'), sizemode = 'area' ), name = '') coords.append(pt) layout = dict( title = 'Koha Debian package downloads', showlegend = True, geo = dict( scope='world', projection=dict( type='eckert4' ), showland = True, landcolor = 'rgb(217, 217, 217)', subunitwidth=1, countrywidth=1, subunitcolor="rgb(255, 255, 255)", countrycolor="rgb(255, 255, 255)" ), ) fig = dict( data=coords, layout=layout ) py.iplot( fig, validate=False, filename='koha-debian-downloads' )

An interactive version of the bubble map is also available on Plotly.

HangingTogether: Slam bam WAM: Wrangling best practices for web archiving metadata

planet code4lib - Wed, 2016-08-24 01:14

The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM, of course) was launched last January and has been working hard–really hard–ever since. Twenty-five members from Partner libraries and archives have dug in to address the challenge of devising best practices for describing websites–which are, it turns out, very odd critters compared to other types of material for which descriptive standards and guidelines already exist. In addition, user needs and behaviors are quite different from those we’re familiar with.

Our plan at the outset: do an extensive literature review on both user needs and existing metadata practices in the web context, study relevant descriptive standards and institution-specific web archiving metadata guidelines, engage the community along the way to confirm the need for this work and obtain feedback, and, ultimately, issue two reports: the first on user needs and behaviors specific to archived web content, the second outlining best practices for metadata. The heart of the latter will be a set of recommended data elements accompanied by definitions and the types of content that each should contain.

At this juncture we’ve drawn several general conclusions:

  • Descriptive standards don’t address the unique characteristics of websites.
  • Local metadata guidelines have little in common with each other.
  • It’ll therefore be challenging to sort it all out and arrive at recommended best practices that will serve the needs of users of archived websites.

We’ve reviewed nine sets of institution-specific guidelines. The table below shows the most common data elements, some of which are defined very differently from one institution to another. Only three appear in all nine guidelines: creator/contributor, title, and description.

Collection name/title Language Creator/contributor Publisher Date of capture Rights/access conditions Date of content Subject Description Title Genre URL

Our basic questions: Which types of content are the most important to include in metadata records describing websites? And which generic data elements should be designated for each of these concepts?

Here are some of the specific issues we’ve come across:

  • Website creator/owner: Is this the publisher? Creator? Subject? All three?
  • Publisher: Does a website have a publisher? If so, is it the harvesting institution or the creator/owner of the live site?
  • Title: Should it be transcribed verbatim from the head of the home page? Or edited to clarify the nature/scope of the site? Should acronyms be spelled out? Should the title begin with, e.g., “Website of the …”
  • Dates: Beginning/end of the site’s existence? Date of capture by a repository? Content? Copyright?
  • Extent: How should this be expressed? “1 online resource”? “6.25 Gb”? “approximately 300 websites”?
  • Host institution: Is the institution that harvests and hosts the site the repository? Creator? Publisher? Selector?
  • Provenance: In the web context, does provenance refer to the site owner? The repository that harvests and hosts the site? Ways in which the site has evolved?
  • Appraisal: Does this mean the reason why the site warrants being archived? The collection of a set of sites as named by the harvesting institution? The scope of the parts of the site that were harvested?
  • Format: Is it important to be clear that the resource is a website? If so, how best to do this?
  • URL: Which URLs should be linked to? Seed? Access? Landing page?
  • MARC21 record type: When coded in the MARC 21 format, should a website be considered a continuing resource? Integrating resource? Electronic resource? Textual publication? Mixed material? Manuscript?

We’re getting fairly close to completing our literature review and guidelines analysis, at which point we’ll turn to determining the scope and substance of the best practices report. In addition to defining a set of data elements, it’ll be important to set the problem in context and explain how our analysis has led to the conclusions we draw.

So stay tuned! We’ll be sending out a draft for community review and are hoping to publish both reports within the next six months. In the meantime, please send your own local guidelines, as well as pointers to a few sample records, to me at Help us make sure we get it right!

About Jackie Dooley

Jackie Dooley leads OCLC Research projects to inform and improve archives and special collections practice.

Mail | Web | Twitter | Facebook | More Posts (19)


Subscribe to code4lib aggregator