You are here

Feed aggregator

Peter Murray: Free Software in Libraries, success stories and their impact on the library today

planet code4lib - Thu, 2016-09-01 00:29

Today I was privileged to present to the 6th International Congress of Technological Innovation, Innovatics 2016, organized by Duoc UC Libraries, Library of Santiago, and University of Chile Libraries. The conference was simultaneously translated in English and Spanish. To aid the translators, I wrote out the text of my presentation for them to review. Below is the text as it was intended to be presented; I did diverge in a few places mostly based on what others said earlier in the conference.

Evolution of Open Source in Libraries

Thank you for the opportunity to talk with you today. My name is Peter Murray, and I’ve been involved in open source projects in libraries for at least 20 years. After receiving my undergraduate degree in computer science, I went to work for my university library as they were bringing up their first library automation system. This was early in the days of information retrieval on the internet, and I adapted the Gopher program to offer a front end to the library’s resources. Gopher came from the University of Minnesota in the United States, and they released their code on the internet for free. There wasn’t a notion of organized open source at the time – it was just free software that anyone could download and adapt. There wasn’t a sense of community around the software, and the tools to share changes with each other were very rudimentary. Stretching back to the 1950s and 1960s, this was the era of “free software”.

During the mid-1990s I worked for Case Western Reserve University in Cleveland, Ohio. I was part of a team that saw the early possibilities of the World Wide Web and aggressively pursued them. We worked to reorganize the library services onto the web and to try to add personalization to the library’s website using a new programming language called Personal Home Page. We know it today as PHP. It was also at this time that the phrase “open source” was coined. “Open source” meant more than just the having the source code available. It also a recognition that organizations had a business case for structuring a community around the source code. In this case, it was release of the Netscape browser code that was the spark that ignited the use of the “open source” phrase.

In the early 2000s I worked for the University of Connecticut. During this time, we saw the formation of open source projects in libraries. Two projects that are still successful today are the DSpace and the Fedora repository projects. These two projects started with grants from foundations to create software that allowed academic libraries to efficiently and durably store the growing amount of digital files being produced by scholars. Both projects followed paths where the software was created for the needs of their parent organizations. It was seen as valuable by other organizations, and new developers were added as contributors to the project.

Also in the early 2000s the Koha integrated library system started to build its community. The core code was written by a small team of developers for a public library in New Zealand in the last few months of 1999 to solve a year-2000 issue with their existing integrated library system. Within a year, Koha publicly released its code, created a community of users on SourceForge – a website popular in the 2000s for hosting code, mailing lists, documentation, and bug reports. The tools for managing the activity of open source communities were just starting to be formed. There is a direct line between SourceForge – the most popular open source community of its time – and GitHub – arguably the most important source code hosting community today.

In the late 2000s I was working for a consortium of academic libraries in Ohio called OhioLINK. In this part of my career, I was with an organization that began actively using library open source software to deliver services to our users. Up until this point, I had – like many organizations – made use of open source tools – the HTTPd web server from Apache, MySQL as a database, PHP and Perl as programming languages, and so forth. Now we saw library-specific open source make headway into front-line library services. And now that our libraries were relying on this software for services to our patrons, we began looking for supporting organizations. DSpace and Fedora each created foundations to hold the source code intellectual property and hire staff to help guide the software. The DSpace and Fedora foundations then merged to become DuraSpace. Foundations are important because the become a focal point of governance around the software and a place where money could be sent to ensure the ongoing health of the open source project.

In the early 2010s I went to work for a larger library consortium called LYRASIS. LYRASIS has about 1,400 member libraries across much of the United States. I went to work at LYRASIS on a project funded by the Andrew W. Mellon Foundation on helping libraries make decisions about using open source software. The most visible part of the project was the website. It hosts decision support tools, case studies, and a repository of library-oriented open source software. We proposed the project to the Mellon Foundation because LYRASIS member libraries were asking questions about how they could make use of open source software themselves. Throughout the 2000s it was the libraries with developers that were creating and contributing to open source projects. Libraries without developers were using open source through service provider companies, and now they wanted to know more about what it meant to get involved in open source communities. was one place where library professionals could learn about open source.

The early 2010s also saw growth in service providers for open source software in libraries. The best example of that is this list of service providers for the Koha integrated library system. As we let this scroll through the continents, I hope this gives you a sense that large, well-supported, multinational projects are alive and well in the library field. Koha is the most impressive example of a large service provider community. Other communities, such as DSpace, also have worldwide support for the software. What is not represented here is the number of library consortia that have started supporting open source software for their members. Where it makes sense for libraries to pool their resources to support a shared installation of an open source system, those libraries can reap the benefits of open source.

Now here in the mid-2010s I’m working for a small, privately-held software development company called Index Data. Index Data got its start 20 years ago when its two founders left the National Library of Denmark to create software tools that they saw libraries needed. Index Data’s open source Z39.50 toolkit is widely used in commercial and open source library systems, as is its MasterKey metasearch framework. The project I’m working on now is called FOLIO, an acronym for the English phrase “Future of Libraries is Open”. I’ll be talking more about FOLIO this afternoon, but by way of introduction I want to say now that the FOLIO project is a community of library professionals and an open source project that is rethinking the role of the integrated library system in library services.

Revisit the Theme

With that brief review of the evolution of open source software in libraries, let’s return to the topic of this talk – Free Software in Libraries: Success Stories and Their Impact on Today’s Libraries. As you might have guessed, open source software can have a significant impact on how services are delivered to our patrons. In fact, open source software – in its best form – is significantly different from buying software from a supplier. On the one hand, when you buy software from a supplier you are at the mercy of that supplier for implementing new features and for fixing bugs in the software. You also have an organization that you can hold accountable. On the other hand, open source software is as much about the community surrounding the software as it is the code itself. And to be a part of the community means that you have rights and responsibilities. I’d like to start first with rights.


The rights that come along with open source are, I think, somewhat well understood. These are encoded in the open source licenses that projects adopt. You have the right to view, adapt, and redistribute the source code. You can use the software for any purpose you desire, even for purposes not originally intended by the author. And the one that comes most to mind, you have the right to run the software without making payment to someone else. Let’s look at these rights.


In the open source license, the creator of the software is giving you permission to use the software without needing to contact the author. This right cannot be revoked, so you have the assurance that the creator cannot suddenly interrupt your use, even if you decide to use the software for something the creator didn’t intend. This also means that you can bring together software from different sources and create a system that meets the needs of your users.


You have the right to make a copy of the software. You can copy the software for your own use or to give to a friend or colleague. You can create backup copies and run the software in as many places as you need. Most importantly, you have this right without having to pay a royalty or fee to the creator. One example of this is the Fenway Libraries Online in the Boston Massachusetts area. Ten libraries within the Fenway consortium needed a system to manage their electronic resource licenses and entitlements. After an exhaustive search for a system that met their requirements, they selected the CORAL project originally built at the University of Notre Dame. There is a case study on Fenway’s adoption of CORAL on the website.


A key aspect of open source is the openness of the code itself. You can look at the source code and figure out how it works. This is especially crucial if you want to move away from the system to something else; you can figure out, for instance, how the data is stored in a database and write programs that will translate that data from one system to another. Have you ever needed to migrate from one system to another? Even if you didn’t have to do the data migration yourself, can you see where it would be helpful to have a view of the data structures?


Hand-in-hand with the right to inspect open source code is the right to modify it to suit your needs. In almost all cases, the modifications to the code use the same open source license as the original work. What is interesting about modifications, though, is that sometimes the open source license may specify conditions for sharing modifications.


Ultimately, if the open source project you are using is moving in a different direction, you have the right to take the source code and start in your own direction. Much like a fork in the road, users of the open source project decide which branch of the fork to take. Forks can sometimes remain relatively close together, which makes it somewhat easy to move back and forth between them. Over time, though, forks usually diverge and go separate ways, or one will turn out not to be the one chosen by many, and it will die off. With this right to fork the code, it is ultimately the community that decides the best direction for the software. There was an example a few years ago within the Koha community where one service provider wanted to control the process of creating and releasing the Koha software in ways that a broader cross-section of the community didn’t appreciate. That segment of the community took the code and re-formed new community structures around it.


These rights – Use, Copy, Inspect, Modify, and Fork – form the basis of the open source license statement that is a part of the package. Some form of each of these rights are spelled out in the statement. What is left unsaid is the responsibilities of users of the open source software. These responsibilities are not specified in the open source license that accompanies the software, but they do form the core values of the community that grows around the development and enhancing of the project. Each community is different, just like each software package has its own features and quirks, but they generally have some or all of the following characteristics. And depending on each adopter's needs and capacity, there will be varying levels of commitments each organization can make to these responsibilities. As you work with open source code, I encourage you to keep these responsibilities in mind.


The first responsibility is to participate in the community. This can be as simple to joining web forums or mailing lists to get plugged into what the community is doing and how it goes about doing it. By joining and lurking for a while, you can get familiar with the community norms and find out what roles others are playing. The larger automation companies in libraries typically have users groups, and joining the community of an open source group is usually no different than that of a proprietary company. One of the key aspects of open source projects is how welcoming it is to new participants. The Evergreen community has what I think is a great web page for encouraging libraries to try out the software, read the documentation, and get involved in the community.

Report Bugs

Library software has bugs – the systems we use are just too complicated to account for every possible variation as the software is developed and tested. If you find something that doesn't work, report it! Find the bug reporting process and create a good, comprehensive description of your problem. Chances are you are not the only one seeing the problem, and your report can help others triangulate the issue. This is an area where open source is, I think, distinctly different from proprietary software. With proprietary systems, the list of bugs is hidden from the customers. You may not know if you are the only one seeing a problem or if the issue is in common with many other libraries. In open source projects, the bug list is open, and you can see if you are one among many people seeing the issue. In most open source projects, bug reports also include a running description of how the issue is being addressed – so you can see when to anticipate a fix coming in the software. As an aside for open source projects in the audience: make it easy for new community members to find and use your bug reporting process. This is typically a low barrier of entry into the community, and a positive experience in reporting a bug and getting an answer will encourage that new community member to stick around and help more. We'll talk about triaging bug reports in a minute.

Financially Support

Thinking back to our open source software rights, we know we can use the software without paying a license fee. That doesn't mean, though, that open source is free to write and maintain. Some of the most grown up software projects are backed by foundations or community homes, and these organizations hire developers and outreach staff, fund user group meetings, and do other things that grow the software and the community surrounding it. If your organization's operations are relying on a piece of open source, use some of your budget savings from paying for a vendor's proprietary system to contribute to the organization that is supporting the software. DuraSpace is a success story here. Since its founding, it has attracted memberships from libraries all around the world. Libraries don’t have to pay to use the DSpace or Fedora software. Those that do pay recognize that their membership dues are going to fund community and technical managers as well as server infrastructure that everyone counts on to keep the projects running smoothly.


As staff become more familiar with the open source system, they can share that expertise with others around them. This is not only personally rewarding, but it also improves the reputation of your organization. A healthy and growing open source community will have new adopters coming in all the time, and your experience can help someone else get started too. EIFL, an acronym for the English “Electronic Information for Libraries”, is a not-for-profit organization that works to enable access to knowledge in 47 countries in Africa, Asia, and Europe. One of their programs is to help libraries in developing countries adopt and use open source software for their own institutions. They gather groups of new users and match them with experienced users so they can all learn about a new open source software package at about the same pace. Through this mentoring program, these libraries now have capabilities that they previously didn’t have or couldn’t afford.


A few slides earlier, I encouraged new adopters to report bugs. That can quickly overwhelm a bug tracking system with issues that are not really issues, issues that have been solved and the code is waiting for the next release, issues where there is a workaround in a frequently asked questions document, and issues that are real bugs where more detail is needed for the developers to solve the problem. As a community member triaging bugs, you look for new reports that match your experience and where you can add more detail or point the submitter to a known solution or workaround. Sometimes this points to a need for better documentation (discussed in the next slide); other times it needs a fix or an enhancement to the software, and the report moves on to the development group. Another note for projects: provide a clear way for reported issues to move through the system. This can be as informal as a shared understanding in the community, or as formal as a state diagram published as part of the project's documentation that describes how an issue is tagged and moved through various queues until it reaches some resolution.


Open source software is often criticized — rightly so — for poor documentation. It is often the last thing created as part of the development process, and is sometimes created by developers who feel more comfortable using a writing voice full of jargon rather than a voice that is clear, concise, and targeted to end-users. Contributing to documentation is a perfect place for expert users but inexperienced coders to make the deepest impact on a project. You don't need to understand how the code was written, you just need to describe clearly how the software is used. Documentation can come in the form of user manuals, frequently-asked-questions lists, and requests to the developers to add help language or change the display of feature to make its use clearer.


Translation is also something that an experienced user can do to support a project. One sign of a mature open source project is that the developers have taken the time to extract the strings of field labels, help text, and user messages into a language file so that these strings can be translated into another language. Translating doesn't mean being able to write code; it just means being able to take these strings and convert them into another language. If you find an open source package that has all of the functionality you need but the native language of the system is not the native language of your users, providing translations can be a great way to make the software more useful to you while also opening up a whole new group of organizations that can use the project as well.


This is getting a little more complicated, but if you are running the software locally and can set up a test environment, as release candidates for new software come from the developers, try them out. Run a copy of your data and your workflows through the release candidate to make sure that nothing breaks and new features work as advertised. Some projects will create downloadable "virtual machines" or put the release candidate version of the software on a sandbox for everyone to test, and that lowers the barrier for testing to just about anyone.


How feature requests are made are another distinguishing characteristic between open source and proprietary systems. All proprietary systems have some way of registering feature requests and various processes for working through them. In an open source community, there is a lot more transparency about what is being worked on. All of the bug reports and feature requests are listed for everyone to see and comment on. There might even be a formal voting mechanism that guides developers on what to work on next. Volunteer developers from different organizations with similar needs can more easily find each other and tackle a problem together. Developers hired by the software's foundation or community home have a better understanding of what activity will have the biggest impact for the users. This all starts, though, with you making your requests and commenting on the requests of others.


Healthy projects need forward tension to keep moving, and one way to do that is with eyes-wide-open constructive criticism. It is easy and common for communities to get bogged down in doing things the same way when it might make sense to try a different technique or adopt a different tool. It is also, unfortunately, common for communities to become insular and to become unwelcoming to new people or people unlike themselves. Open source works best when a wide variety of people are all driving towards the same goal. Be aware, though that the good will within communities can be destroyed by unkind and insulting behavior. Just as meetings have adopted codes of conduct, I think it is appropriate for project communities to develop codes of conduct and to have action and enforcement mechanisms in place to step in when needed. This can be tough — in the cultural heritage field, participants in open source are typically volunteers, and it can be difficult to confront or offend a popular or prolific volunteer. The long-term health of the community requires it, though.


Only at the very last do we get to coding. The software can't exist without developers, but it is too easy to put the developers first and forget the other pillars that the community relies on — bug reporters and triage volunteers, documentation writers and translators, software testers and community helpers. If you are a developer, try fixing a bug. Pick something that is small but annoying — something that scratches your own itch but that the more mainstream developers don't have time to tackle. A heads-up to open source project leaders: create a mentorship pathway for new developers to join the project, and provide a mechanism to list "easy" bugs that would be useful for developers new to a project to work on.

Public Libraries

Throughout the presentation I’ve mentioned academic libraries and library organizations that are making use of open source now. Open source adoption is not limited to academic libraries, though, and I wanted to mention the work of the Meadville Public Library in rural Pennsylvania of the United States. There is a case study on the website where they describe their library and how they came to choose the Koha integrated library system. The Meadville Public Library has a small staff and an even smaller technology budget. When they decided to migrate to a new system in the mid-2000s, they realized they had a choice to pay the commercial software licensing fees to a traditional library vendor or to put that money towards building skills in the staff to host a system locally. The case study describes their decision-making process, including site visits, fiscal analysis, and even joining a “hackfest” developer event in France to help build new functionality that their installation would need. I invite you to read through the case study to learn about their path to open source software. This library uses open source almost exclusively throughout their systems – from their desktop computers and word processing software to their servers.


Making use of open source software is more often about the journey than the destination. In the end, our libraries need systems that enable patrons to find the information they are seeking and to solve the problems that they face. If nothing else, open source software in libraries is a different path for meeting those needs. Open source software, though, can be more. It can be about engaging the library and its staff in the process of designing, building, and maintaining those systems. It can be about the peer-to-peer exchange of ideas with colleagues from other institutions and from service providers on ways to address our patrons’ needs. And sometimes open source software can be about reducing the total cost of ownership as compared with solutions from proprietary software providers. Libraries across the world have successfully adopted open source software. There have been a few unsuccessful projects as well. From each of these successful and unsuccessful projects, we learn a little more about the process and grow a little more as a profession. I encourage you if you haven’t done so already, to learn about how open source software can help in reaching your library’s goals.

Thank you for your attention, and I am happy to take questions, observations, or to hear about your stories of using open source software.

DuraSpace News: AVAILABLE: Fedora 4.6.0 Release

planet code4lib - Thu, 2016-09-01 00:00

From A. Soroka, the University of Virginia

DuraSpace News: NOW AVAILABLE–TRAC Certified Long-term Digital Preservation: DuraCloud and Chronopolis for Institutional Treasures

planet code4lib - Thu, 2016-09-01 00:00

Austin, TX  An institution’s identity is often formed by what it saves for current and future access. Digital collections curated by the academy can include research data, images, texts, reports, artworks, books, and historic documents help define an academic institution’s identity.

Galen Charlton: The blossoming of the Mellie-cat

planet code4lib - Wed, 2016-08-31 23:58

A cat who has decided to take up more space in the world.

Sixteen years is long enough, surely, to get to know a cat.


Amelia had always been her mother’s child. She had father and sister too, but LaZorra was the one Mellie always cuddled up to and followed around. Humans were of dubious purpose, save for our feet: from the scent we trod back home Mellie seemed to learn all she needed of the outside world.

Her father, Erasmus, left us several years ago; while Mellie’s sister mourned, I’m not sure Rasi’s absence made much of an impression on our clown princess — after all, LaZorra remained, to provide orders and guidance and a mattress.

Where Zorri went, Mellie followed — and thus a cat who had little use for humans slept on our bed anyway.

Recently, we lost both LaZorra and Sophia, and we were afraid: afraid that Amelia’s world would close in on her. We were afraid that she would become a lost cat, waiting alone for comfort that would never return.

The first couple days after LaZorra’s passing seemed to bear our fears out. Amelia kept to her routine and food, but was isolated. Then, some things became evident.

Our bed was, in fact, hers. Hers to stretch out in, space for my legs be damned.

Our feet turned out not to suffice; our hands were required too. For that matter, for the first time in her life, she started letting us brush her.

And she enjoyed it!

Then she decided that we needed correction — so she began vocalizing, loudly and often.

And now we have a cat anew: talkative and demanding of our time and attention, confident in our love.

Sixteen years is not long enough to get to know a cat.

Karen Coyle: User tasks, Step one

planet code4lib - Wed, 2016-08-31 20:07
Brian C. Vickery, one of the greats of classification theory and a key person in the work of the Classification Research Group (active from 1952 to 1968), gave this list of the stages of "the process of acquiring documentary information" in his 1959 book Classification and Indexing in Science[1]:

  1. Identifying the subject of the search. 
  2. Locating this subject in a guide which refers the searcher to one or more documents. 
  3. Locating the documents. 
  4. Locating the required information in the documents. 

These overlap somewhat with FRBR's user tasks (find, identify, select, obtain) but the first step in Vickery's group is my focus here: Identifying the subject of the search. It is a step that I do not perceive as implied in the FRBR "find", and is all too often missing from library/use interactions today.

A person walks into a library... Presumably, libraries are an organized knowledge space. If they weren't the books would just be thrown onto the nearest shelf, and subject cataloging would not exist. However, if this organization isn't both visible and comprehended by users, we are, firstly, not getting the return on our cataloging investment and secondly, users are not getting the full benefit of the library.

In Part V of my series on Catalogs and Context, I had two salient quotes. One by Bill Katz: "Be skeptical of the of information the patron presents"[2]; the other by Pauline Cochrane: "Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?"[3]. Both of these address the obvious, yet often overlooked, primary point of failure for library users, which is the disconnect between how the user expresses his information need vis-a-vis the terms assigned by the library to the items that may satisfy that need.

Vickery's Three Issues for Stage 1 
Issue 1: Formulating the topic Vickery talks about three issues that must be addressed in his first stage, identifying the subject on which to search in a library catalog or indexing database. The first one is "...the inability even of specialist enquirers always to state their requirements exactly..." [1 p.1] That's the "reference interview" problem that Katz writes about: the user comes to the library with an ill-formed expression of what they need. We generally consider this to be outside the boundaries of the catalog, which means that it only exists for users who have an interaction with reference staff. Given that most users of the library today are not in the physical library, and that online services (from Google to Amazon to automated courseware) have trained users that successful finding does not require human interaction, these encounters with reference staff are a minority of the user-library sessions.

In online catalogs, we take what the user types into the search box as an appropriate entry point for a search, even though another branch of our profession is based on the premise that users do not enter the library with a perfectly formulated question, and need an intelligent intervention to have a successful interaction with the library. Formulating a precise question may not be easy, even for experienced researchers. For example, in a search about serving persons who have been infected with HIV, you may need to decide whether the research requires you to consider whether the person who is HIV positive has moved along the spectrum to be medically diagnosed as having AIDS. This decision is directly related to the search that will need to be done:

HIV-positive persons--Counseling of
AIDS (Disease)--Patients--Counseling of

Issue 2: from topic to query The second of Vickery's caveats is that "[The researcher] may have chosen the correct concepts to express the subject, but may not have used the standard words of the index."[1 p.4] This is the "entry vocabulary" issue. What user would guess that the question "Where all did Dickens live?" would be answered with a search using "Dickens, Charles -- Homes and haunts"? And that all of the terms listed as "use for" below would translate to the term "HIV (Viruses)" in the catalog? (h/t Netanel Ganin):

As Pauline Cochrane points out[4], beginning in the latter part of the 20th century, libraries found themselves unable to include the necessary cross-reference information in their card catalogs, due to the cost of producing the cards. Instead, they asked users to look up terms in the subject heading reference books used by catalog librarians to create the headings. These books are not available to users of online catalogs, and although some current online catalogs include authorized alternate entry points in their searches, many do not.* This means that we have multiple generations of users who have not encountered "term switching" in their library catalog usage, and who probably do not understand its utility.

Even with such a terminology-switching mechanism, finding the proper entry in the catalog is not at all simple. The article by Thomas Mann (of Library of Congress, not the German author) on “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” [5] shows not only how complex that process might be, but it also indicates that the translation can only be accomplished by a library-trained expert. This presents us with a great difficulty because there are not enough such experts available to guide users, and not all users are willing to avail themselves of those services. How would a user discover that literature is French, but performing arts are in France?:

French literature
Performing arts -- France -- History

Or, using the example in Mann's piece, the searcher looking for in information on tribute payments in the Peloponnesian war needed to look under "Finance, public–Greece–Athens".  This type of search failure fuels the argument that full text search is a better solution, and a search of Google Books on "tribute payments Peloponnesian war" does yield some results. The other side of the argument is that full text searches fail to retrieve documents not in the search language, while library subject headings apply to all materials in all languages. Somehow, this latter argument, in my experience, doesn't convince.

Issue 3: term order The third point by Vickery is one that keyword indexing has solved, which is "...the searcher may use the correct words to express the subject, but may not choose the correct combination order."[1 p.4] In 1959, when Vickery was writing this particular piece, having the wrong order of terms resulted in a failed search. Mann, however, would say that with keyword searching the user does not encounter the context that the pre-coordinated headings provide; thus keyword searching is not a solution at all. I'm with him part way, because I think keyword searching as an entry to a vocabulary can be useful if the syndetic structure is visible with such a beginning. Keyword searching directly against bibliographic records, less so.

Comparison to FRBR "find" FRBR's "find" is described as "to find entities that correspond to the user’s stated search criteria". [6 p. 79] We could presume that in FRBR the "user's stated search criteria" has either been modified through a prior process (although I hardly know what that would be, other than a reference interview), or that the library system has the capability to interact with the user in such a way that the user's search is optimized to meet the terminology of the library's knowledge organization system. This latter would require some kind of artificial intelligence and seems unlikely. The former simply does not happen often today, with most users being at a computer rather than a reference desk. FRBR's find seems to carry the same assumption as has been made functional in online catalogs, which is that the appropriateness of the search string is not questioned.

Summary There are two take-aways from this set of observations:

  1. We are failing to help users refine their query, which means that they may actually be basing their searches on concepts that will not fulfill their information need in the library catalog. 
  2. We are failing to help users translate their query into the language of the catalog(s). 

I would add that the language of the catalog should show users how the catalog is organized and how the knowledge universe is addressed by the library. This is implied in the second take-away, but I wanted to bring it out specifically, because it is a failure that particularly bothers me.

Notes*I did a search in various catalogs on "cancer" and "carcinoma". Cancer is the form used in LCSH-cataloged bibliographic records, and carcinoma is a cross reference. I found a local public library whose Bibliocommons catalog did retrieve all of the records with "cancer" in them when the search was on "carcinoma"; and that the same search in the Harvard Hollis system did not (carcinoma: 1889 retrievals; cancer 21,311). These are just two catalogs, and not a representative sample, to say the least, but the fact seems to be shown.

References[1] Vickery, B C. Classification and Indexing in Science. New York: Academic Press, 1959.
[2] Katz, Bill. Introduction to Reference Work: Reference Services and Reference Processes. New York: McGraw-Hill, 1992. p. 82 Cited in: Brown, Stephanie Willen. The Reference Interview: Theories and Practice. Library Philosophy and Practice 2008. ISSN 1522-0222
[3] Modern Subject Access in the Online Age: Lesson 3 Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr. Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255 Stable URL:
[4] Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL:
[5] Thomas Mann, “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” (June 13, 2007). PDF, 41 pp.
[6] IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records, 2009.

Library of Congress: The Signal: Nominations Sought for the U.S. Federal Government End of Term Web Archive

planet code4lib - Wed, 2016-08-31 15:58

TOP: January 21, 2013, Fifty-Seventh Inaugural Ceremonies, President Barack H. Obama and Vice President Joseph R. Biden Jr.Photo from U.S. Senate Photo Studio.
BOTTOM: January 20, 2005, Fifty-Fifth Inaugural Ceremonies, President George W. Bush and Vice President Dick Cheney. Photo from Architect of the Capitol.

This is a guest post by Abbie Grotke, lead information technology specialist of the Library of Congress Web Archiving Team

Readers of The Signal may recall prior efforts to archive United States Federal Government websites during the end of presidential terms. I last wrote about this in 2012 when we were working on preserving the government domain during the end of President Obama’s first term. To see the results of our 2008 and 2012 efforts, visit the End of Term Archive.

As the Obama administration comes to a close, the End of Term project team has formed again and we need help from you.

For the End of Term 2016 archive, the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office have joined together for a collaborative project to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2017. Partners are joining together to select, collect, preserve, and make the web archives available for research use.

This web harvest — like its predecessors in 2008 and 2012 — is intended to document the federal government’s presence on the web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com and social media content).

And that’s where you come in. You can help the project immensely by nominating your favorite .gov website, other federal government websites or governmental social media account with the End of Term Nomination Tool. Please nominate as many sites as you want. Nominate early and often. Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access and long-term preservation.

Jonathan Rochkind: Mythical Man-Months et al

planet code4lib - Wed, 2016-08-31 15:12

I’ve never actually read Fred Brooks’ Mythical Man-Month, but have picked up many of it’s ideas by cultural osmosis.  I think I’m not alone, it’s a book that’s very popular by reputation, but perhaps not actually very influential in terms of it’s ideas actually being internalized by project managers and architects.

Or as Brooks himself said:

Some people have called the book the “bible of software engineering.” I would agree with that in one respect: that is, everybody quotes it, some people read it, and a few people go by it.

Ha. I should really get around to reading it, I routinely run into things that remind me of the ideas I understand from it that I’ve just sort of absorbed (perhaps inaccurately).

In the meantime, here’s another good quote from Brooks to stew upon:

The ratio of function to conceptual complexity is the ultimate test of system design.

Quite profound really. Terribly frustrating to work with software packages can, I think, almost always be described in those terms: The ratio of function to conceptual complexity is far, far too low.  That is nearly(?) the definition of a frustrating to work with software package.

Filed under: General

Open Knowledge Foundation: What does personal data have to do with open data? Initial thoughts from #MyData2016

planet code4lib - Wed, 2016-08-31 13:45

This piece is part of a series of posts from MyData 2016 – an international conference that focuses on human centric personal information management. The conference is co-hosted by the Open Knowledge Finland chapter of the Open Knowledge International Network.

What does personal data have to do with open data? We usually preach NOT to open personal data, and to be responsible about it. So why should an open knowledge organisation devote a whole conference to topics related to personal data management?  I will explore these questions in a series of blog posts written straight from the MyData16 conference in Helsinki, Finland.

MyData is a very abstract concept that is still in the process of refinement. In its essence, MyData is about giving control of the personal data trail that we leave on the internet to the users. Under the MyData framework, users decide where to store their data and can control and guide the use this data can have. In most applications today, our data is closed and owned by other big corporations, where it is primarily used to make money. The MyData concept looks to bring back the control to the user, but also tries to develop the commercial use of the data, making everyone happy.

Under the MyData framework, users decide where to store their data and can control and guide the use this data can have.

Here is Mika Honkanen, vice chairman of the OK Finland board, explaining about MyData:


Personally, after this weekend’s announcement that Facebook will use Whatsapp messages to improve the ads experience (effectively using our private messages to generate revenue for Facebook), I understood something I already knew, but decided to deny – my own data is not free. Sure, I can opt out from the current terms of use, but that won’t last forever. Very few platforms give me the choice about what will happen to my data. I would even prefer to pay for an app to know that my data is mine, but very few apps even offer this. My data is rarely *open* for me. It is sometimes held hostage. How can we build a system that challenges this landscape and provides opportunities for users to understand their personal data?

For those of you who missed Open Knowledge Festival in 2012 (like me), Open Knowledge Finland know how to produce events. Besides the conference program (and super exciting evening program!), you can also find the Ultrahack, a 72 hours hackathon that will try and answer my questions above and will be involved in creating applicable uses to the MyData concept.  I am excited to see how it will turn out and what uses – social and fiscal ones, people can find.

For the following three days, keep following us on the OKI Twitter account for updates from the conference. Check the MyData website, and let us know if you want us to go to a session for you!

Equinox Software: Evergreen 2015: Rhythm

planet code4lib - Wed, 2016-08-31 12:08

Sundial by liz west. CC-BY on Flickr

In 2015, Evergreen saw two major releases, 2.8.0 and 2.9.0, and a number of maintenance releases.

Two major releases in 2015, just as there were two in 2014, and in each of the three years before that — just as there will be two major releases in 2016.

In 2015, the seventh Evergreen Conference was held in Hood River, Oregon — continuing an unbroken string of annual conferences that was started in 2009.

In 2015, Evergreen’s development team gained a new core committer, Kathy Lussier.

New folks started writing documentation; more libraries started using Evergreen; more bug reports were filed.

In 2015, in particular with the release of 2.9.0, a number of components of Evergreen that had served their purpose were removed. Gone was JSPac. Goodbye, old selfcheck page! Auf Nimmerwiedersehen, script-based circulation policies!

In 2015, work continued on the web-based staff client.

In 2015, the Evergreen web team took steps to ensure backwards compatibility.

To sum up: 2015 was not the most exciting year in the project’s history, but it was a solid one: a year continuing rhythms that had been established and strengthened as the project grew.

Rhythms matter to libraries, of course. There is the staccato of each visit to the library, each checkout, each reference question, each person finding refuge or hope or a few minute distraction. Themes arise and repeat each year: summer reading; the onslaught of undergraduates; conferences; board meetings and budgetary cycles. Sometimes a crescendo surprises us: the adoption of MARC; the disquiet and discussions of seeking to replace MARC; libraries deciding to reclaim their tools and embrace free software.

And the music does not stop: libraries must simultaneously embrace the now, do their part to keep the past alive, and look to the future.

— Galen Charlton, Infrastructure and Added Services Manager

This is the tenth in our series of posts leading up to Evergreen’s Tenth birthday.

DuraSpace News: Nominations Sought for the U.S. Federal Government Domain End of Term Web Archive

planet code4lib - Wed, 2016-08-31 00:00

From Abigail Grotke, Digital Library Project Manager, Library of Congress

Washington, DC  How would YOU like to help preserve the United States federal government .web domain for future generations? But, that's too huge of a swath of Internet real estate for any one person to preserve, right?! 

Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. But we need your help.

DuraSpace News: VIVO Updates for Aug 28–New VIVOs, Posters, Conferences, Documentation

planet code4lib - Wed, 2016-08-31 00:00

From Mike Conlon, VIVO project director

New sites!  August was a big month for new VIVO implementations.  Eight new implementations are underway:

Library of Congress: The Signal: The Signal is Evolving

planet code4lib - Tue, 2016-08-30 18:35

Minerva, Roman goddess of wisdom. Mosaic by Elihu Vedder within central arched panel leading to the Visitor’s Gallery. Library of Congress Thomas Jefferson Building, Washington, D.C. Digital photograph by Carol Highsmith, 2007. LC call number LOT 13860.

When The Signal debuted in 2011, its focus was exclusively on the challenge of digital preservation, which is why its URL was The Signal was a forum for news and information about digital preservation — unique problems and solutions, standards, collaborations and achievements. The Signal’s authors interviewed leaders in the field, profiled colleagues and drew attention to exemplary projects.

In time, The Signal became a leading source of information about digital preservation. The success of The Signal’s community engagement was evident in the volume of responses we got to our blog posts and the dialogs they sparked; some posts still attract readers and get comments years after the posts’ original publications.

The scope of The Signal has grown organically beyond digital preservation and we are reflecting that growth by changing The Signal’s URL to Old links will still work but will redirect to the new URL. If you subscribe to an RSS feed, please change that URL to

We will continue to share information about Library of Congress digital initiatives and cover broad topics such as digital humanities, digital stewardship, crowd sourcing, computational research, scholar labs, data visualization, digital preservation and access, eBooks, rights issues, metadata, APIs, data hosting and technology sharing and innovative trends.


SearchHub: Improving Search for Rackspace Email

planet code4lib - Tue, 2016-08-30 17:23

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting George Bailey and Cameron Baker’s talk, “Rackspace Email’s solution for indexing 50k documents per second”.

Customers of Rackspace Email have always needed the ability to search and review login, routing, and delivery information for their emails. In 2008, a solution was created using Hadoop MapReduce to process all of the logs from hundreds of servers and create Solr 1.4 indexes that would provide the search functionality. Over the next several years, the number of servers generating the log data grew from hundreds to several thousands which required the cluster of Hadoop and Solr 1.4 servers to grow to ~100 servers. This growth caused the MapReduce jobs for indexing the data to take anywhere from 20 minutes to several hours.

In 2015, Rackspace Email set out to solve this ever growing need to index and search billions of events from thousands of servers and decided to leverage SolrCloud 5.1. This talk covers how Rackspace replaced over ~100 physical servers with 10 and improved functionality to allow for documents to be indexed and searchable within 5 seconds.

George Bailey is a Software Developer for Rackspace Email Infrastructure.

Cameron Baker is a Linux Systems Engineer for Rackspace Email Infrastructure.

Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Improving Search for Rackspace Email appeared first on

Equinox Software: Evergreen 2014

planet code4lib - Tue, 2016-08-30 11:00

This past weekend I visited a farm in Central Washington and was able to see the full life cycle of crop production.  In one area, closed and light controlled, seeds germinate into small seedlings.  When large enough, the seedlings are tempered and prepared for movement out to the greenhouse.  In the greenhouse, the plants are carefully monitored and cultivated as they grow.  The last phase is moving the plants, now hardy and larger, out into the open air where, under the sun, they grow and fully develop for harvest.  My visit to the farm came at just the right time—there were fully grown plants ready for harvesting within the next few weeks and new seedlings, which will become next year’s crop, were just starting to grow.  While taking in this cyclical process of growth and harvest I couldn’t help but think about the growth of Evergreen over the years.

2014 is the year that saw the first seeds planted for the next generation of Evergreen.  While we all know and love the XUL staff client, the power and flexibility of newer technologies, such as AngularJS, was leading the Evergreen community to explore new options for a web based staff interface.  In January 2014, a prototype for the web client was released to the Evergreen community, thanks to the research and work of Bill Erickson and the rest Equinox development team.  Bill planted those first seeds and gave the community something to cultivate.  After evaluating the prototype, the community came together to move forward with the project. With the support of many development partners (BC Libraries Cooperative, Bibliomation, C/W MARS, GPLS, Grand Rapids Public Library, Howe Library, Kenton County Public Library, MassLNC, NC Cardinal, OhioNET, PaILS, Pioneer Library System, and SC LENDS), the web client project became a reality.  And with that, the project moved into the greenhouse, where real growth and careful cultivation could occur.

Like staging the crop on the farm, the development for the web client was broken up into sprints to tackle the modules individually to allocate proper time for each stage of growth and development.  Since 2014, Equinox has continued steady development on the web client sprints.  The goal of the web client was to maintain feature parity with the old client by porting over newer HTML interfaces and re-writing the older XUL interfaces.  Happily, and with much input from the users, many improvements to use and usability have been incorporated throughout the process.  In order to allow the web client to grow, the community decided to stop accepting new features into the XUL client, but development did not cease.  New features have been developed alongside the web client and upon implementation there will some new features such as customizable copy alerts and statistical popularity badges along with the new browser based interface.

The web client is currently in the last stages of the greenhouse phase of development.  Sprints 1, 2, and 3, Circulation, Cataloging, and Administration/Reports, respectively, are complete.  Sprint 4, Acquisitions and Serials, is currently in development and will be completed this fall. Sprints 5 (Booking, Offline Mode, etc.) and 6 (Bug Fixing) will round out the development phase and, upon completion, the Evergreen web client will move out of the greenhouse and into the community for use where it will continue to grow organically to meet the needs of the community.

As a trainer, I introduce new libraries to Evergreen and the Evergreen community and help translate their workflows to a new interface and ILS.  Evergreen feels like home to me and I hope that I have been able to help other libraries feel at home with Evergreen as well.  Through community support and development, Evergreen has undergone tremendous growth in the past 10 years.  It is constantly evolving and becoming a stronger ILS that meets the needs of its users.  The web client is the next phase of this evolution and it is a big step forward.  I’m looking forward to getting to know “Webby” and seeing what the harvest will bring in the next 10 years.  

–Angela Kilsdonk, Education Manager

This is the ninth in our series of posts leading up to Evergreen’s Tenth birthday.

Conal Tuohy: Linked Open Data Visualisation at #GLAMVR16

planet code4lib - Tue, 2016-08-30 02:02

On Thursday last week I flew to Perth, in Western Australia, to speak at an event at Curtin University on visualisation of cultural heritage. Erik Champion, Professor of Cultural Visualisation, who organised the event, had asked me to talk about digital heritage collections and Linked Open Data (“LOD”).

The one-day event was entitled “GLAM VR: talks on Digital heritage, scholarly making & experiential media”, and combined presentations and workshops on cultural heritage data (GLAM = Galleries, Libraries, Archives, and Museums) with advanced visualisation technology (VR = Virtual Reality).

The venue was the Curtin HIVE (Hub for Immersive Visualisation and eResearch); a really impressive visualisation facility at Curtin University, with huge screens and panoramic and 3d displays.

There were about 50 people in attendance, and there would have been over a dozen different presenters, covering a lot of different topics, though with common threads linking them together. I really enjoyed the experience, and learned a lot. I won’t go into the detail of the other presentations, here, but quite a few people were live-tweeting, and I’ve collected most of the Twitter stream from the day into a Storify story, which is well worth a read and following up.

My presentation

For my part, I had 40 minutes to cover my topic. I’d been a bit concerned that my talk was more data-focused and contained nothing specifically about VR, but I think on the day the relevance was actually apparent.

The presentation slides are available here as a PDF: Linked Open Data Visualisation

My aims were:

  • At a tactical level, to explain the basics of Linked Data from a technical point of view (i.e. to answer the question “what is it?”); to show that it’s not as hard as it’s usually made out to be; and to inspire people to get started with generating it, consuming it, and visualising it.
  • At a strategic level, to make the case for using Linked Data as a basis for visualisation; that the discipline of adopting Linked Data technology is not at all a distraction from visualisation, but rather a powerful generic framework on top of which visualisations of various kinds can be more easily constructed, and given the kind of robustness that real scholarly work deserves.
Linked Data basics

I spent the first part of my talk explaining what Linked Open Data means; starting with “what is a graph?” and introducing RDF triples and Linked Data. Finally I showed a few simple SPARQL queries, without explaining SPARQL in any detail, but just to show the kinds of questions you can ask with a few lines of SPARQL code.

What is an RDF graph?

While I explained about graph data models, I saw attendees nodding, which I took as a sign of understanding and not that they were nodding off to sleep; it was still pretty early in the day for that.

One thing I hoped to get across in this part of the presentation was just that Linked Data is not all that hard to get into. Sure, it’s not a trivial technology, but barriers to entry are not that high; the basics of it are quite basic, so you can make a start and do plenty of useful things without having to know all the advanced stuff. For instance, there are a whole bunch of RDF serializations, but in fact you can get by with knowing only one. There are a zillion different ontologies, but again you only need to know the ontology you want to use, and you can do plenty of things without worrying about a formal ontology at all. I’d make the case for university eResearch agencies, software carpentry, and similar efforts, to be offering classes and basic support in this technology, especially in library and information science, and the humanities generally.

Linked Data as architecture

People often use the analogy of building, when talking about making software. We talk about a “build process”, “platforms”, and “architecture”, and so on. It’s not an exact analogy, but it is useful. Using that analogy, Linked Data provides a foundation that you can build a solid edifice on top of. If you skimp on the foundation, you may get started more quickly, but you will encounter problems later. If your project is small, and if it’s a temporary structure (a shack or bivouac), then architecture is not so important, and you can get away with skimping on foundations (and you probably should!), but the larger the project is (an office building), and the longer you want it to persist (a cathedral), the more valuable a good architecture will be. In the case of digital scholarly works, the common situation in academia is that weakly-architected works are being cranked out and published, but being hard to maintain, they tend to crumble away within a few years.

Crucially, a Linked Data dataset can capture the essence of what needs to be visualised, without being inextricably bound up with any particular genre of visualisation, or any particular visualisation software tool. This relative independence from specific tools is important because a dataset which is tied to a particular software platform needs to rely on the continued existence of that software, and experience shows that individual software packages come and go depressingly quickly. Often only a few years are enough for a software program to be “orphaned”, unavailable, obsolete, incompatible with the current software environment (e.g. requires Windows 95 or IE6), or even, in the case of software available online as a service, for it to completely disappear into thin air, if the service provider goes bust or shuts down the service for reasons of their own. In these cases you can suddenly realise you’ve been building your “scholarly output” on sand.

By contrast, a Linked Data dataset is standardised, and it’s readable with a variety of tools that support that standard. That provides you with a lot of options for how you could go on to visualise the data; that generic foundation gives you the possibility of building (and rebuilding) all kinds of different things on top of it.

Because of its generic nature and its openness to the Web, Linked Data technology has become a broad software ecosystem which already has a lot of people’s data riding on it; that kind of mass investment (a “bandwagon”, if you like) is insurance against it being wiped out by the whims or vicissitudes of individual businesses. That’s the major reason why a Linked Data dataset can be archived and stored long term with confidence.

Linked Open Data is about sharing your data for reuse

Finally, by publishing your dataset as Linked Open Data (independently of any visualisations you may have made of it), you are opening it up to reuse not only by yourself, but by others.

The graph model allows you to describe the meaning of the terms you’ve used (i.e. the analytical categories used in your data can themselves be described and categorised, because everything is a node in a graph). This means that other people can work out what your dataset actually means.

The use of URIs for identifiers means that others can easily cite your work and effectively contribute to your work by creating their own annotations on it. They don’t need to impinge on your work; their annotations can live somewhere else altogether and merely refer to nodes in your graph by those nodes’ identifiers (URIs). They can comment; they can add cross-references; they can assert equivalences to nodes in other graphs, elsewhere. Your scholarly work can break out of its box, to become part of an open web of knowledge that grows and ramifies and enriches us all.

Equinox Software: Statistical Popularity Badges

planet code4lib - Tue, 2016-08-30 01:55

Statistical Popularity Badges allow libraries to set popularity parameters that define popularity badges, which bibliographic records can earn if they meet the set criteria.  Popularity badges can be based on factors such as circulation and hold activity, bibliographic record age, or material type.  The popularity badges that a record earns are used to adjust catalog search results to display more popular titles (as defined by the badges) first.  Within the OPAC there is a new sort option called “Sort by Popularity” which will allow users to sort records based on the popularity assigned by the popularity badges.

Popularity Rating and Calculation

Popularity badge parameters define the criteria a bibliographic record must meet to earn the badge, as well as which bibliographic records are eligible to earn the badge.  For example, the popularity parameter “Circulations Over Time” can be configured to create a badge that is applied to bibliographic records for DVDs.  The badge can be configured to look at circulations within the last 2 years, but assign more weight or popularity to circulations from the last 6 months.

Multiple popularity badges may be applied to a bibliographic record.  For each applicable popularity badge, the record will be rated on a scale of 1-5, where a 5 indicates the most popularity.  Evergreen will then assign an overall popularity rating to each bibliographic record by averaging all of the popularity badge points earned by the record.  The popularity rating is stored with the record and will be used to rank the record within search results when the popularity badge is within the scope of the search.  The popularity badges are recalculated on a regular and configurable basis by a cron job.  Popularity badges can also be recalculated by an administrator directly on the server.

Creating Popularity Badges

There are two main types of popularity badges:  point-in-time popularity (PIT), which looks at the popularity of a record at a specific point in time—such as the number of current circulations or the number of open hold requests; and temporal popularity (TP), which looks at the popularity of a record over a period of time—such as the number of circulations in the past year or the number of hold requests placed in the last six months.

The following popularity badge parameters are available for configuration:

  • Holds Filled Over Time (TP)
  • Holds Requested Over Time (TP)
  • Current Hold Count (PIT)
  • Circulations Over Time (TP)
  • Current Circulation Count (PIT)
  • Out/Total Ratio (PIT)
  • Holds/Total Ratio (PIT)
  • Holds/Holdable Ratio (PIT)
  • Percent of Time Circulating (Takes into account all circulations, not specific period of time)
  • Bibliographic Record Age (days, newer is better) (TP)
  • Publication Age (days, newer is better) (TP)
  • On-line Bib has attributes (PIT)
  • Bib has attributes and copies (PIT)
  • Bib has attributes and copies or URIs (PIT)
  • Bib has attributes (PIT)

To create a new Statistical Popularity Badge:

  1. Go to Administration>Local Administration>Statistical Popularity Badges.
  2. Click on Actions> Add badge.
  3. Fill out the following fields as needed to create the badge:

(Note: only Name, Scope, Weight, Recalculation Interval, Importance Interval, and Discard Value Count are required)

  • Name: Library assigned name for badge.  Each name must be unique.  The name will show up in the OPAC record display.  For example: Most Requested Holds for Books-Last 6 Months.  Required field.
  • Description: Further information to provide context to staff about the badge.
  • Scope: Defines the owning organization unit of the badge.  Badges will be applied to search result sorting when the Scope is equal to, or an ancestor, of the search location.  For example, a branch specific search will include badges where the Scope is the branch, the system, and the consortium.  A consortium level search, will include only badges where the Scope is set to the consortium.  Item specific badges will apply only to records that have items owned at or below the Scope.  Required field.
  • Weight:  Can be used to indicate that a particular badge is more important than the other badges that the record might earn.  The weight value serves as a multiplier of the badge rating.  Required field with a default value of 1.
  • Age Horizon:  Indicates the time frame during which events should be included for calculating the badge.  For example, a popularity badge for Most Circulated Items in the Past Two Years would have an Age Horizon of ‘2 years’.   The Age Horizon should be entered as a number followed by ‘day(s)’, ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Use with temporal popularity (TP) badges only.
  • Importance Horizon: Used in conjunction with Age Horizon, this allows more recent events to be considered more important than older events.  A value of zero means that all events included by the Age Horizon will be considered of equal importance.  With an Age Horizon of 2 years, an Importance Horizon of ‘6 months’ means that events, such as checkouts, that occurred within the past 6 months will be considered more important than the circulations that occurred earlier within the Age Horizon.
  • Importance Interval:  Can be used to further divide up the timeframe defined by the Importance Horizon.  For example, if the Importance Interval is ‘1 month, Evergreen will combine all of the events within that month for adjustment by the Importance Scale (see below).  The Importance Interval should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Required field.
  • Importance Scale: The Importance Scale can be used to assign additional importance to events that occurred within the most recent Importance Interval.  For example, if the Importance Horizon is ‘6 months’ and the Importance Interval is ‘1 month’, the Importance Scale can be set to ‘6’ to indicate that events that happened within the last month will count 6 times, events that happened 2 months ago will count 5 times, etc. The Importance Scale should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.
  • Percentile:  Can be used to assign a badge to only the records that score above a certain percentile.  For example, it can be used indicate that only want to assign the badge to records in the top 5% of results by setting the field to ‘95’.  To optimize the popularity badges, percentile should be set between 95-99 to assign a badge to the top 5%-1% of records.
  • Attribute Filter:  Can be used to assign a badge to records that contain a specific Record Attribute.  Currently this field can be configured by running a report (see note below) to obtain the JSON data that identifies the Record Attribute.  The JSON data from the report output can be copied and pasted into this field.   A new interface for creating Composite Record Attributes will be implemented with future development of the web client.
    • To run a report to obtain JSON data for the Attribute Filter, use SVF Record Attribute Coded Value Map as the template Source.  For Displayed Fields, add Code, ID, and/or Description from the Source; also display the Definition field from the Composite Definition linked table.  This field will display the JSON data in the report output.  Filter on the Definition from the Composite Definition liked table and set the Operator to ‘Is not NULL’.
  • Circ Mod Filter: Apply the badge only to items with a specific circulation modifier.  Applies only to item related badges as opposed to “bib record age” badges, for example.
  • Bib Source Filter:  Apply the badge only to bibliographic records with a specific source.
  • Location Group Filter:  Apply the badge only to items that are part of the specified Copy Location Group.  Applies only to item related badges.
  • Recalculation Interval: Indicates how often the popularity value of the badge should be recalculated for bibliographic records that have earned the badge.  Recalculation is controlled by a cron job.  Required field with a default value of 1 month.
  • Fixed Rating: Can be used to set a fixed popularity value for all records that earn the badge.  For example, the Fixed Rating can be set to 5 to indicate that records earning the badge should always be considered extremely popular.
  • Discard Value Count:  Can be used to prevent certain records from earning the badge to make Percentile more accurate by discarding titles that are below the value indicated.   For example, if the badge looks at the circulation count over the past 6 months, Discard Value Count can be used to eliminate records that had too few circulations to be considered “popular”.  If you want to discard records that only had 1-3 circulations over the past 6 months, the Discard Value Count can be set to ‘3’.  Required field with a default value of 0.
  • Last Refresh Time: Displays the last time the badge was recalculated based on the Recalculation Interval.
  • Popularity Parameter: Types of TP and PIT factors described above that can be used to create badges to assign popularity to bibliographic records.
  1. Click OK to save the badge.

New Global Flags

OPAC Default Sort:  can be used to set a default sort option for the catalog.  Users can always override the default by manually selecting a different sort option while searching.

Maximum Popularity Importance Multiplier:  used with the Popularity Adjusted Relevance sort option in the OPAC.  Provides a scaled adjustment to relevance score based on the popularity rating earned by bibliographic records.  See below for more information on how this flag is used.

Sorting by Popularity in the OPAC

Within the stock OPAC template there is a new option for sorting search results called “Most Popular”.  Selecting “Most Popular” will first sort the search results based on the popularity rating determined by the popularity badges and will then apply the default “Sort by Relevance”.  This option will maximize the popularity badges and ensure that the most popular titles appear higher up in the search results.

There is a second new sort option called “Popularity Adjusted Relevance” that can be turned on by editing the ctx.popularity_sort setting in the OPAC template configuration.  The “Popularity Adjusted Relevance” sort option can be used to find a balance between popularity and relevance in search results.  For example, it can help ensure that records that are popular, but not necessarily relevant to the search, do not supersede records that are both popular and relevant in the search results.  It does this by sorting search results using an adjusted version of Relevance sorting.  When sorting by relevance, each bibliographic record is assigned a baseline relevance score between 0 and 1, with 0 being not relevant to the search query and 1 being a perfect match.  With “Popularity Adjusted Relevance” the baseline relevance is adjusted by a scaled version of the popularity rating assigned to the bibliographic record.  The scaled adjustment is controlled by a Global Flag called “Maximum Popularity Importance Multiplier” (MPIM).  The MPIM takes the average popularity rating of a bibliographic record (1-5) and creates a scaled adjustment that is applied to the baseline relevance for the record.  The adjustment can be between 1.0 and the value set for the MPIM.  For example, if the MPIM is set to 1.2, a record with an average popularity badge score of 5 (maximum popularity) would have its relevance multiplied by 1.2—in effect giving it the maximum increase of 20% in relevance.  If a record has an average popularity badge score of 2.5, the baseline relevance of the record would be multiplied by 1.1 (due to the popularity score scaling the adjustment to half way between 1.0 and the MPIM of 1.2) and the record would receive a 10% increase in relevance.  A record with a popularity badge score of 0 would be multiplied by 1.0 (due to the popularity score being 0) and would not receive a boost in relevance.

Popularity Badge Example

A popularity badge called “Long Term Holds Requested” has been created which has the following parameters:

Popularity Parameter:  Holds Requested Over Time

Scope: CONS

Weight: 1 (default)

Age Horizon: 5 years

Percentile: 99

Recalculation Interval: 1 month (default)

Discard Value Count: 0 (default)

This popularity badge will rate bibliographic records based on the number of holds that have been placed on it over the past 5 years and will only apply the badge to the top 1% of records (99th percentile).

If a keyword search for harry potter is conducted and the sort option “Most Popular” is selected, Evergreen will apply the popularity rankings earned from badges to the search results.

Title search: harry potter. Sort by: Most Popular.

The popularity badge also appears in the bibliographic record display in the catalog. The name of the badge earned by the record and the popularity rating are displayed in the Record Details.

A popularity badge of 5.0/5.0 has been applied to the most popular bibliographic records where the search term “harry potter” is found in the title. In the image above, the popularity badge has identified records from the Harry Potter series by J.K. Rowling as the most popular titles matching the search and has listed them first in the search results.

Equinox Software: Copy Alerts

planet code4lib - Tue, 2016-08-30 01:38

The Copy Alerts feature allows library staff to add customized alert messages to copies. The copy alerts will appear when a specific event takes place, such as when the copy is checked in, checked out, or renewed. Alerts can be temporary or persistent: temporary alerts will be disabled after the initial alert and acknowledgement from staff, while persistent alerts will display each time the alert event takes place. Copy Alerts can be configured to display at the circulating or owning library only or, alternatively, when the library at which the alert event takes place is not the circulating or owning library. Copy Alerts at check in can also be configured to provide options for the next copy status that should be applied to an item. Library administrators will have the ability to create and customize Copy Alert Types and to suppress copy alerts at specific org units.

Adding a Copy Alert

Copy Alerts can be added to new copies or existing copies using the Volume/Copy Editor. They can also be added directly to items through the Check In, Check Out, Renew, and Item Status screens.

To add a Copy Alert in the Volume/Copy Editor:

1. Within the Volume/Copy Editor, scroll to the bottom of the screen and click on Copy Alerts.

2. A New Copy Alert window will pop up.

3. Select an alert Type and enter an additional alert message if needed. Check the box next to Temporary if this alert should not appear after the initial alert is acknowledged. Leaving the Temporary box unchecked will create a persistent alert that will appear each time the action to trigger the alert occurs, such as check in or check out.

4. Click OK to save the new Copy Alert. After a Copy Alert has been added. Clicking on the Copy Alerts button in the Volume/Copy Editor will allow you to add another Copy Alert and to view and edit Existing Copy Alerts.

5. Make any additional changes to the item record and click Store Selected to store these changes and the new copy alert(s) to the Completed Copies tab. If you are done modifying the copy, click Save & Exit to finalize the changes.

To add a Copy Alert from the Check In, Check Out, or Renewal screens:

1. Navigate to the appropriate screen, for example to Circulation>Check In.
2. Scan in the item barcode.
3. Select the item row and go to Actions>Add Copy Alerts or right click on the item row and select Add Copy Alerts.

4. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.

To add a Copy Alert from the Item Status screen:

1. Go to the Detail View of the Item Status screen.
2. In the bottom left-hand corner of the item record there is a Copy Alerts option. Click Add to create a new copy alert.

3. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.

Triggering a Copy Alert

The Copy Alert will appear when the action required to trigger the alert occurs. For example, the Normal Checkin Alert will appear when the item is checked in:

If Next Status options have been configured for the Checkin Alert, staff will see a drop down menu that allows then to select the next Status for the copy:

Managing Copy Alerts

Copy Alerts can be managed from the Item Status screen. Within the Quick Summary tab of the Detailed View of an item, click on Manage to view and Remove copy alerts.

Administration of Copy Alerts

Copy Alert Types

Copy Alert Types are created and managed in Administration>Local Administration>Copy Alert Types. Copy Alert Types define the action and behavior of an alert message type. The Alert Types included in a stock installation of Evergreen are:

• Normal checkout
• Normal checkin
• Checkin of missing copy
• Checkin of lost-and-paid copy
• Checkin of damaged copy
• Checkin of claims-returned copy
• Checkin of long overdue copy
• Checkin of claims-never-checked-out copy
• Checkin of lost copy

To create a new Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on Create and fill out the following fields as needed:
Name: name of the Copy Alert Type.
Active: indicates if the alert type is currently in use (Yes) or not (No).
State: indicates the Copy Status of the item at the time of the event.
Event: the action that takes place in the ILS to trigger the alert.
Scope Org Unit: indicates which org unit(s) the alert type will apply to.
Next Status: can be used with temporary Checkin Alerts only. If a next status is configured, staff will be presented with a list of statuses to choose from when the item is checked in. Next statuses should be configured by using the Copy Status ID # surrounded by curly brackets. For example {7, 11}.
Renewing?: indicates if the alert should appear during a renewal.
Invert location?: if set to yes, this setting will invert the following two settings. For example, if an alert is set to appear at the Circulating Library only, inverting the location will cause the alert to appear at all libraries except the Circulating Library.
At Circulation Library?: indicates if the alert should appear at the circulation library only.
At Owning Library?: indicates if the alert should appear at the owning library only.
3. Click Save.

To edit an existing Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on the type and go to Actions>Edit or right-click and select Edit.
3. Make changes to the existing configuration and click Save.

Copy Alert Suppression

The Copy Alert Suppression interface can be used to suppress alert types at a specific org unit. Suppression of alerts will adhere to the organization unit hierarchy. For example, if an alert is suppressed at the System level, it will be suppressed for all descendent branches.

To suppress an alert type:

1. Go to Administration>Local Administration>Copy Alert Suppression.
2. Click Create and select the Alert Type that you want to suppress from the drop down menu.
3. Next, select the Org Unit at which the alert should be suppressed.
4. Click Save.

DuraSpace News: NEW RELEASE: Message-based Integrations for Fedora

planet code4lib - Tue, 2016-08-30 00:00

From Aaron Coburn, Programmer and Systems Administrator, Amherst College

Amherst, MA  I would like to announce the immediate availability of version 4.6.0 of the Fedora Messaging Toolbox.

The messaging toolbox is designed to support a variety of asynchronous integrations with external tools and services, such as a Solr search engine or an external Triplestore. Version 4.6.0 of the messaging toolbox is compatible with both the forthcoming 4.6.0 release of the Fedora Commons server and previous releases of Fedora.

DuraSpace News: Learn More About Scholars@Duke

planet code4lib - Tue, 2016-08-30 00:00

From Julia Trimmer, Manager, Faculty Data Systems & Analysis, Office of the Provost, Duke University

Durham, NC  Will you be attending the Symplectic User Conference at Duke University on September 13 and 14?  If you would like to get together around that event to learn more about VIVO at Duke University, members of the Scholars@Duke team are available to meet before or after the event.

DuraSpace News: NEW Fedora Repository Web Site

planet code4lib - Tue, 2016-08-30 00:00

Austin, TX  DuraSpace is pleased to announce that the Fedora team recently completed a redesign of The site was designed in consultation with members of the Fedora Leadership Group and reflects a modern, mobile-friendly approach that makes it easy to find key items first.


Subscribe to code4lib aggregator