You are here

Feed aggregator

David Rosenthal: Walking Away From The Table

planet code4lib - Thu, 2016-12-22 16:00
Last time we were buying a car, at the end of a long and frustrating process we finally decided that what we wanted was the bottom end of the range, with no options. The dealer told us that choice wasn't available in our market. We said "OK, call us if you ever find a car like that" and walked away. It was just over two weeks before we got the call. At the end of 2014 I wrote:
The discussions between libraries and major publishers about subscriptions have only rarely been actual negotiations. In almost all cases the libraries have been unwilling to walk away and the publishers have known this. This may be starting to change; Dutch libraries have walked away from the table with Elsevier.Actually, negotiations continued and a year later John Bohannon reported for Science that a deal was concluded:
A standoff between Dutch universities and publishing giant Elsevier is finally over. After more than a year of negotiations — and a threat to boycott Elsevier's 2500 journals — a deal has been struck: For no additional charge beyond subscription fees, 30% of research published by Dutch researchers in Elsevier journals will be open access by 2018. ... The dispute involves a mandate announced in January 2014 by Sander Dekker, state secretary at the Ministry for Education, Culture and Science of the Netherlands. It requires that 60% of government-funded research papers should be free to the public by 2019, and 100% by 2024.By being willing to walk away, the Dutch achieved a partial victory against Elsevier's defining away of double-dipping, their insistance that author processing charges were in addition to subscriptions not instead of subscriptions. This is a preview of the battle over the EU's 2020 open access mandate.

The UK has just concluded negotiations, and a major German consortium is in the midst of them. Below the fold, some commentary on their different approaches.

In the UK, JISC Collections negotiates a national deal with each publisher; Universities can purchase from the publisher under the deal. Tim Gowers reports that:
The current deal that the universities have with Elsevier expires at the end of this year, and a new one has been negotiated between Elsevier and Jisc Collections, the body tasked with representing the UK universities.According to Gowers, JISC Collections had some key goals in the negotiation:
  1. No real-terms price increases.
  2. An offsetting agreement for article processing charges.
  3. No confidentiality clauses.
  4. A move away from basing price on “historic spend”.
  5. A three-year deal rather than a five-year deal.
Gowers quotes a JISC representative saying:
We know from analysis of the experiences of other consortia that Elsevier really do want to reach an agreement this year. They really hate to go over into the next year …

A number of colleagues from other consortia have said they wished they had held on longer …

If we can hold firm even briefly into 2017 that should have quite a profound impact on what we can achieve in these negotiations.This isn't what happened:
But this sensible negotiating strategy was mysteriously abandoned, on the grounds that it had become clear that the deal on offer was the best that Jisc was going to get.Gowers' assessment of the eventual deal against the goals is bleak
  1. it is conceivable that [JISC] will end up achieving their first aim of not having any real-terms price increases: this will depend on whether Brexit causes enough inflation to cancel out such money-terms price increases as there may or may not be
  2. there is no offsetting agreement.
  3. when Elsevier insisted on confidentiality clauses, [JISC] meekly accepted this. ... It is for that reason that I have been a bit vague about prices above.
  4. "The agreement includes the ability for the consortium to migrate from historical print spend and reallocate costs should we so wish." I have no information about whether any “migration” has started, but my guess would be that it hasn’t
  5. the deal is for five years and not for three years.
In other words, because JISC wasn't prepared to walk away they achieved little or nothing. In particular, they failed to make the progress the Dutch had already achieved against Elsevier's double-dipping on APCs.

Contrast this with the German DEAL project:
The DEAL project, headed by HRK (German Rectors' Conference) President Prof Hippler, is negotiating a nationwide license agreement for the entire electronic Elsevier journal portfolio with Elsevier. Its objective is to significantly improve the status quo regarding the provision of and access to content (Open Access) as well as pricing. It aims at relieving the institutions' acquisition budgets and at improving access to scientific literature in a broad and sustainable way.

In order to improve their negotiating power, about 60 major German research institutions including Göttingen University cancelled their contracts with Elsevier as early as October 2016. Others have announced to follow this example.The 60 institutions are preparing their readers for the result of cancellation:
From 1 January 2017 on, Göttingen University — as well as more than 60 other major German research institutions — is to be expected to have no access to the full texts of journals by the publisher Elsevier. In Göttingen, this applies to 440 journals. There will be access to most archived issues of Göttingen journals (PDF 95 KB), but there may be no access to individual Göttingen e-packages for the economic sciences in particular (PDF 89 KB).

From this time on, we will offer you a free order service on articles of these journals: Please send your email request citing the necessary bibliographical data to our colleagues at the library (email). Should an inter-library loan not be possible, we will endeavor to procure the article on another delivery route for you. This service will be free of charge.Elsevier's press release indicates that DEAL is sticking to the strategy JISC abandoned:
Since such negotiations for 600+ institutions are complex, both sides have met regularly during the second half of this year and it was a mutual agreement to pause talks until early in the new year.And that being hard-nosed has an impact:
In fact, it was those institutions themselves that informed us of their intention not to auto-renew their expiring individual access agreements based on the assumption that a national deal would be reached by the end of 2016. It goes without saying that all institutions, even if they cancelled their contracts, will be serviced beyond 2016 should they so choose.It will be very interesting to see how this plays out.

LITA: Jobs in Information Technology: December 22, 2016

planet code4lib - Thu, 2016-12-22 15:11

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Brandeis University Library, Digital Initiatives Librarian, Boston, MA

UC Riverside, University Library, Business Systems Analyst, Riverside, CA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

FOSS4Lib Recent Releases: YAZ - 5.19.0

planet code4lib - Thu, 2016-12-22 13:23

Last updated December 22, 2016. Created by Peter Murray on December 22, 2016.
Log in to edit this page.

Package: YAZRelease Date: Friday, December 16, 2016

Open Knowledge Foundation: New Report: Making Citizen-Generated Data Work

planet code4lib - Thu, 2016-12-22 10:00

Read the full report here.

We are pleased to announce a new research series investigating how citizens and civil society create data to drive sustainable development. The series follows on from earlier papers on Democratising The Data Revolution and how citizen-generated data can change what public institutions measure. The first report “Making Citizen-Generated Data Work” asks what makes citizens and others want to produce and use citizen-generated data. It was written by myself, Shazade Jameson, and Eko Prasetyo.

“The goal of Citizen-Generated Data is to monitor, advocate for, or drive change around an issue important to citizens”

The report demonstrates that citizen-generated data projects are rarely the work of individual citizens. Instead, they often depend on partnerships to thrive and are supported by civil society organisations, community-based organisations, governments, or business. These partners play a necessary role to provide resources, support, and knowledge to citizens. In return, they can harness data created by citizens to support their own mission. Thus, citizens and their partners often gain mutual benefits from citizen-generated data.

“How can the success of these projects be encouraged and what factors support strategic uptake of citizen-generated data in the short and long term”

But if CGD projects rely on partnerships, who has to be engaged, and through which incentives, to enable CGD projects to achieve their goals? How are such multi-stakeholder projects organised, and which resources and expertise do partners bring into a project? What can other projects do to support and benefit their own citizen-generated data initiatives? This report offers recommendations to citizens, civil society organisations, policy-makers, donors, and others on how to foster stronger collaborations.

Read the full report here.

Brown University Library Digital Technologies Projects: Fedora 4 – testing

planet code4lib - Wed, 2016-12-21 19:04

Fedora 4.7.1 is scheduled to be released on 1/5/2017, and testing is important to ensure good quality releases (release testing page for Fedora 4.7.1).

Sanity Builds

Some of the testing is for making sure the Fedora .war files can be built with various options on different platforms. To perform this testing, you need to have 3 required dependencies installed, and run a couple commands.


Java 8 is required for running Fedora. Git is required to clone the Fedora code repositories. Finally, Fedora uses Maven as its build/management tool. For each of these dependencies, you can grab it from your package manager, or download it (JavaGitMaven).

Build Tests

Once your dependencies are installed, it’s time to build the .war files. First, clone the repository you want to test (eg. fcrepo-webapp-plus):

git clone

Next, in the directory you just created, run the following command to test building it:

mvn clean install

If the output shows a successful build, you can report that to the mailing list. If an error was generated, you can ask the developers about that (also on the mailing list). The generated .war files will be installed to your local Maven repository (as noted in the output of the “mvn clean install” command).

Manual Testing

Another part of the testing is to perform different functions on a deployed version of Fedora.


One way to deploy Fedora is on Tomcat 7. After downloading Tomcat, uncompress it and run ./bin/ You should see the Tomcat Welcome page at localhost:8080.

To deploy the Fedora application, shut down your tomcat instance (./bin/ and copy the fcrepo-webapp-plus war file you built in the steps above to the tomcat webapps directory. Next, add the following line to a new file in the bin directory of your tomcat installation (update the fcrepo.home directory as necessary for your environment):

export JAVA_OPTS=”${JAVA_OPTS} -Dfcrepo.home=/fcrepo-data -Dfcrepo.modeshape.configuration=classpath:/config/file-simple/repository.json”

By default, the fcrepo-webapp-plus application is built with WebACLs enabled, so you’ll need a user with the “fedoraAdmin” role to be able to access Fedora. Edit your tomcat conf/tomcat-users.xml file to add the “fedoraAdmin” role and give that role to whatever user you’d like to log in as.

Now start tomcat again, and you should be able to navigate to http://localhost:8080/fcrepo-webapp-plus-4.7.1-SNAPSHOT/ and start testing Fedora functionality.

Library of Congress: The Signal: “Volun-peers” Help Liberate Smithsonian Digital Collections

planet code4lib - Wed, 2016-12-21 15:56

Scan of Chamaenerion Latifolium. US National Herbarium, Smithsonian.

The Smithsonian Transcription Center creates indexed, searchable text by means of crowdsourcing…or as Meghan Ferriter, project coordinator at the TC describes it, “harnessing the endless curiosity and goodwill of the public.” As of the end of the current fiscal year, 7,060 volunteers at the TC have transcribed 208,659 pages.

The scope, planning and execution of the TC’s work – the in-house coordination among the Smithsonian’s units and the external coordination of volunteers — is staggering to think about. The Smithsonian Institution is composed of 19 museums, archives, galleries and libraries; nine research centers; and a zoo. Fifteen of the Smithsonian units have collections in the TC, which is run by Ching-hsien Wang, Libraries and Archives System Support Branch manager with the Smithsonian Institution Office of the Chief Information Officer.

Ferriter said, “To manage a project of this scope, one must understand and troubleshoot the system and unit workflows as well as work with unit representatives as they select content and set objectives for their projects.  Neither simply building a tool nor merely inviting participation is enough to sustain and grow a digital project, whatever the scale.”

The TC benefits from the Smithsonian’s online collections. Though individual units may have their own databases, they all link to a central repository, the Smithsonian’s “Enterprise Digital Asset Network,” or EDAN, which is searchable from the Smithsonian’s Collections Search Center. The TC leverages the capabilities of EDAN and builds on the foundation of data and collections-management systems supported by the the Office of the Chief Information Officer. In some cases, for example, a unit may have digitized a collection and the TC arranges for volunteers to add metadata.

Ching-hsien Wang.

Each unit has a different goal for its digital collections. The goal for one project might be to transcribe handwritten notes; the goal for another project might be to key in text from a scanned document. A project might call for geotagging or adding metadata from controlled vocabularies (pre-set tags, used to avoid ambiguities or sloppy mistakes). But the source for each TC project is always a collection of digital files that a volunteer can access online.

Sharing data across the Smithsonian’s back end is an impressive technological feat but it’s only half of this story. The other half is about the relationship between the TC and the volunteers. And the pivotal component that enables the two sides to engage effectively: trust.

The TC’s role at the Smithsonian is as an aggregator, making bulk data available for volunteers to process and directing the flow of volunteer-processed data to the main repository. So, more than just trafficking in data, the TC nurtures its relationships with volunteers by means of technical fail-safe resources and down-to-earth, sincere human engagement.

Ferriter shows her respect for the volunteers when she refers to them as “volunpeers.” Ferriter said, ” ‘Volunpeers’ indicates the ways unit administrators and Smithsonian staff experience the TC along with volunteers. ‘Volunpeers’ underscores the values articulated by volunteers describing their activities and personal goals on the TC, including to learn, to help and to give back to something bigger….Establishing a collaborative space that uses peer-review resources brings to the foreground what is being done together rather than exclusively highlighting what is being done by particular individuals.”

TC staff made a crucial discovery when they figured out that what motivated people to volunteer was a sincere desire to help. Wang said, “Volunteers feel privileged and take the responsibility seriously. And they like that the Smithsonian values what they do.”

Meghan Ferriter.

Ferriter said, “Volunteers indicated they were seeking increased behind-the-scenes access as a reward for participating, rather than receiving discounts or merchandise from Smithsonian vendors.” So TC staff developed a close relationship with the volunteers and they remain in constant contact my means of social media.

“Communicating in an authentic way is central to my strategy,” Ferriter said. “Being authentic includes being vulnerable and expressing real enthusiasm. It also entails revealing my lack of knowledge while learning alongside volunteers. My strategy incorporates an inclusive attitude with the intent of shortening the distance of institutional authority and public positioning.”

Institutional authority — or the perception of institutional authority — can be a potential obstacle to finding volunteers. Wang said the Smithsonian — like other staid old institutions — was perceived several years ago to have an image problem. She said that research indicated, “People think it’s nothing but old white men scientists.” Wang and Ferriter do not suggest that the solution is for the TC to appear young and hip and “with it.” Rather the TC demonstrates its inclusiveness in a very real and sincere way: by reaching out to any and all volunteers and treating them with appreciation and respect.

Volunteers are always publicly credited for their work. They can download and review PDFs of what they’ve done once a project is finished. Ferriter said, “I advise Smithsonian staff members who want to be part of the Transcription Center, ‘You need to understand that there is a commitment that you’re making to participate in this project, which requires you to be involved with communicating with the public, to answer their questions, to tell them specific details about projects, to be prepared to provide a behind-the-scenes tour.”

Scan of handwritten document from “The Legend of Sgenhadishon.” National Anthropological Archives, the Smithsonian.

Each project includes three steps: transcription, review and approval. One of the remarkable results of the TC/volunteer relationship is that the review process has become so thorough and consistently reliable, and  volunteers behave so professionally and responsibly, there is often little change required during the approval phase. This trust in the reviewers — trust that the reviewers earn and deserve — saves a significant amount of staff time for the Smithsonian in the approval phase.

Another remarkable result of the volunteers’ dedication is that TC staff has found that their manual transcriptions are statistically far superior than OCR, which often tends to be “dirty” and requires additional time and labor to correct.

Ferriter said that as successful as the Transcription Center is, as evidenced by the amount of digital collections it has made keyword searchable, there remain further opportunities to look at the larger picture of inter-related data. “The story may be more than merely what is contained within the TC project,” Ferriter said. “There are opportunities to connect the project to its significance in history, science and other related SI and cultural heritage collections.”

When those opportunities arise, the volunpeers will no doubt help make the connections happen.

FOSS4Lib Recent Releases: veraPDF - 0.26

planet code4lib - Wed, 2016-12-21 14:17

Last updated December 21, 2016. Created by Peter Murray on December 21, 2016.
Log in to edit this page.

Package: veraPDFRelease Date: Wednesday, December 21, 2016

Open Knowledge Foundation: PersonalData.IO helps you get access to your personal data

planet code4lib - Wed, 2016-12-21 13:00

PersonalData.IO is a free and open platform for citizens to track their personal data and understand how it is used by companies. It is part of the MyData movement, promoting a human-centric approach to personal data management.

A lot of readers of this blog will be familiar with Freedom of Information laws, a legal mechanism that forces governments to be more open. Individuals, journalists, startups and other actors can use this “right-to-know” to understand what the government is doing and try to make it function better. There are even platforms that help facilitate the exercise of this right, like MuckRock, WhatDoTheyKnow or FragDenStaat. These platforms also have an education function around information rights.

In Europe we enjoy a similar right with respect to personal data held by private companies, but it is often very hard to exercise it. We want to change that, with PersonalData.IO.

Image credit: Kevin O’Connor (CC BY)

What is personal data?

In European law, the definition of personal data is extremely broad: `any information relating to an identified or identifiable natural person`. Unlike in the U.S., the concept of identifiability is crucial in defining personal data, and ever-expanding to match technical possibilities: if some intermediate identifier (license plate, cookie, dynamic IP address, phone number, etc) can reasonably be traced back to you given likely evolution of technology, all the data associated to that identifier becomes personal data.

Why should you care?

Holding personal data often translates into power over people, which in turn becomes economic might (at the extreme, think Facebook, Google, etc). This situation often creates uncomfortable issues of transborder transparency and accountability, but also hinders the appearance of other innovative uses for the data, for instance for research, art, business, education, advocacy, journalism, etc.

Examples PersonalData.IO portal

Leveraging the same mechanisms as FOI portals, we are focused on making such requests easier to initiate, to follow through, to share and then to clone. Processing the requests in the open helps increase the pressure on companies to comply. In practice, we have taken the Froide software developed by Open Knowledge Germany, themed it to our needs and made some basic modifications in the workflow. Our platform is growing its user base slowly, but we benefit from many network effects: for any given company, you only need one person to go through the process of getting their hands on their data, and afterwards everyone benefits!


Getting to the data is only the first step. The bar is still pretty high then to make it really useful. In May 2018, new regulations will come into place in Europe to help individuals leverage their personal data even more: individuals will enjoy a new right to data portability, i.e. the right to transfer data from one service to another.

In anticipation a whole movement has arisen focused on human-centric personal data management, called MyData. OpenKnowledge Finland recently organised a conference with tons of people building new services giving you more control over all that data! I am looking forward to a tool helping individuals turn their personal data into Open Data (by scraping direct identifiers, for instance). Many companies will also benefit from the Frictionless Data project, since there will be a requirement to transfer that data “in a structured, commonly used, machine-readable and interoperable format”.

Image credit: Salla Thure (Public Domain)

In anticipation of this exciting ecosystem, we want to build experiences expanding access to this data with PersonalData.IO and to encourage companies to broaden their view of what constitutes personal data. The more data is considered personal data, the more you will be in control. Feel free to join us!

You can sign up to our mailing list or directly to the portal itself and initiate new requests. You can also follow us on Twitter or contact us directly. We welcome individual feedback and ideas and are always looking for new partners, developers and contributors!

District Dispatch: School district kicks out fair use

planet code4lib - Tue, 2016-12-20 21:26

I have an email folder called “sad, but true,” where I store weird news items, usually about copyright/fair use: The Irish daycare centers that were sued for showing DVDs at playschool, and each child was each charged a three euro penalty fee. The news that Rod Stewart was being sued over the rights to an image of his own head. The litigation around who was the rights holder of the selfie of a monkey. (Yes, the monkey took the photograph of himself.) These tales are like the ones in the “sad, but true” folder.

Photo credit: daveynyn for Flickr

Here’s a “sad, but true” tale I found recently, and it has a horrible conclusion:

The story is about a school board that was deciding whether to “take an official policy position on the legally tricky issue of copyright compliance for its 18,000 employees and 159,000 students.” The school board debated whether having a copyright policy was more trouble than it was worth. (That made me laugh). After some discussion, the school board ultimately thought they needed to “take a stand.”

The board decided to remove the section on fair use because it would give teachers, staff and students “a false sense of security.” Multiple attorneys advised and the School Board Association agreed that “the problem with fair use is that there’s no specific provision in the law” that gives a “cut and dried” definition of “what is fair use.” The committee also agreed to “drop the section about providing training on fair use” (well, yeah).

But there’s more. The committee chair suggested that, because fair use is “super broad,” teachers should always ask for guidance from the principal, but the group felt it would be too much of a burden for the principals. The committee reached consensus on the issue and ruled that “employees, students and visitors are prohibited from the use or duplication of any copyright materials not allowed by copyright law, fair use guidelines sanctioned by Congress, licenses or contractual agreements.” This is crystal clear, and will not be confusing to people.

I then envisioned the things that the schools in this district would have to stop doing. Without fair use, using digital technologies and networks for teaching and classroom purposes would have to be severely curtailed. No cutting and pasting! No linking! No collages (maybe no art class altogether because it would be too risky). If a student is required to write a story, it better be original (and when I say original, I mean original). No music over the intercom system during morning assembly. I suppose you could read (but not aloud) and watch DVDs in classrooms. Class assignments would be very hard for teachers to craft, and new standards for assessing performance would need to be created for not learning. School talent shows would have to be prohibited, no yearbook, no student newspaper. (Reader: I wonder how many things you could name that this school district could not do without fair use. Let’s make a big list.)

This poor school district. Being advised by attorneys to refrain from using fair use because there are no cut and dried answers to “what is fair use.” Resorting to copyright guidelines that are not even in the copyright law. Censoring student assignments. Never being allowed to build on an existing work, or even comment on it. Of course, you could just get permission from the rights holder before doing anything – that would give one a “sense of security” (as if there are security guards in the school that arrest people who copy more than 10% of a book). This sad school district without fair use. Teachers can’t teach, students can’t learn and you’ll never be able to sing a song in the talent show.

The post School district kicks out fair use appeared first on District Dispatch.

FOSS4Lib Upcoming Events: West Coast Hydra Regional Meeting

planet code4lib - Tue, 2016-12-20 21:25
Date: Friday, February 17, 2017 - 08:00 to 16:30Supports: Hydra

Last updated December 20, 2016. Created by Peter Murray on December 20, 2016.
Log in to edit this page.

February 10, 2017 (UC Santa Cruz) - Hydra - DuraSpace Wiki

David Rosenthal: Reference Rot Is Worse Than You Think

planet code4lib - Tue, 2016-12-20 19:32
At the Fall CNI Martin Klein presented a new paper from LANL and the University of Edinburgh, Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. Shawn Jones, Klein and the co-authors followed on from the earlier work on web-at-large citations from academic papers in Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot, which found:
one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.Reference rot comes in two forms:
  • Link rot: The resource identified by a URI vanishes from the web. As a result, a URI reference to the resource ceases to provide access to referenced content.
  • Content drift: The resource identified by a URI changes over time. The resource’s content evolves and can change to such an extent that it ceases to be representative of the content that was originally referenced.
SourceThe British Library's Andy Jackson analyzed the UK Web Archive and found:
I expected the rot rate to be high, but I was shocked by how quickly link rot and content drift come to dominate the scene. 50% of the content is lost after just one year, with more being lost each subsequent year. However, it’s worth noting that the loss rate is not maintained at 50%/year. If it was, the loss rate after two years would be 75% rather than 60%. This indicates there are some islands of stability, and that any broad ‘average lifetime’ for web resources is likely to be a little misleading.Clearly, the problem is very serious. Below the fold, details on just how serious and discussion of a proposed mitigation.

This work is enabled by the support by Web archives for RFC7089, which allows access to preserved versions (Mementos) of web pages by [url,datetime]. The basic question to ask is "does the web-at-large URI still resolve to the content it did when it was published?".

The earlier paper:
estimated the existence of representative Mementos for those URI references using an intuitive technique: if a Memento for a referenced URI existed with an archival datetime in a temporal window of 14 days prior and after the publication date of the referencing paper, the Memento was regarded representative.The new paper takes a more careful approach:
For each URI reference, we poll multiple web archives in search of two Mementos: a Memento Pre that has a snapshot date closest and prior to the publication date of the referencing article, and a Memento Post that has a snapshot date closest and past the publication date. We then assess the similarity between these Pre and Post Mementos using a variety of similarity measures.Incidence of web-at-large URIs They worked with three corpora (arXiv, Elsevier and PubMed Central) with a total of about 1.8M articles referencing web-at-large URIs. This graph, whose data I took from Tables 4,5,6 of the earlier paper, shows that the proportion of articles with at least one web-at-large URI was increasing rapidly through 2012. It would be interesting to bring this analysis up-to-date, and to show not merely the proportion through time of articles with at least one web-at-large URI as in this graph, but also histograms through time of the proportion of citations that were to web-at-large URIs.

From those articles 3,983,985 URIs were extracted. 1,059,742 were identified as web-at-large URIs, for 680,136 of which it was possible to identify [Memento Pre, Memento Post] pairs. Eliminating non-text URIs left 648,253. They use four different techniques to estimate similarity. By comparing the results they set an aggregate similarity threshold, then:
We apply our stringent similarity threshold to the collection of 648,253 URI references for which Pre/Post Memento pairs can be compared ... and find 313,591 (48.37%) for which the Pre/Post Memento pairs have the maximum similarity score for all measures; these Mementos are considered representative.Then they:
use the resulting subset of all URI references for which representative Mementos exist and look up each URI on the live web. Predictably, and as shown by extensive prior link rot research, many URIs no longer exist. But, for those that still do, we use the same measures to assess the similarity between the representative Memento for the URI reference and its counterpart on the live web.This revealed that over 20% of the URIs had suffered link rot, leaving 246,520. More now had different content type, or no longer contained text to be compared. 241,091 URIs remained for which:
we select the Memento with an archival date closest to the publication date of the paper in which the URI reference occurs and compare it to that URI’s live web counterpart using each of the normalized similarity measures.The aggregated result is:
a total of 57,026 (23.65%) URI references that have not been subject to content drift. In other words, the content on the live web has drifted away from the content that was originally referenced for three out of four references (184,065 out of 241,091, which equals 76.35%).Another way of looking at this result is that the authors could find only 57,026 out of 313,591 URIs for which matching Pre/Post Memento pairs existed could be shown not to have rotted, or 18.18%. For 334,662 out of 648,253 references with Pre/Post Memento pairs, or 51.63%, the referenced URI changed significantly between the Pre and Post Mementos, showing that it was probably unstable even as the authors were citing it. The problem gets worse through time:
even for articles published in 2012 only about 25% of referenced resources remain unchanged by August of 2015. This percentage steadily decreases with earlier publication years, although the decline is markedly slower for arXiv for recent publication years. It reaches about 10% for 2003 through 2005, for arXiv, and even below that for both Elsevier and PMC. Similarity over time at arXivThus, as this arXiv graph shows, they find that, after a few years, it is very unlikely that a reader clicking on a web-at-large link in an article will see what the author intended. They suggest that this problem can be addressed by:
  • Archiving Mementos of cited web-at-large URIs during publication, for example using Web archive nomination services such as
  • The use of "robust links":
    a link can be made more robust by including:
    • The URI of the original resource for which the snapshot was taken;
    • The URI of the snapshot;
    • The datetime of linking, of taking the snapshot.
The robust link proposal describes the model of link decoration that Klein discussed in his talk:
information is conveyed as follows:
  • href for the URI of the original resource for which the snapshot was taken;
  • data-versionurl for the URI of the snapshot;
  • data-versiondate for the datetime of linking, of taking the snapshot.
But this has a significant problem. The eventual reader will click on the link and be taken to the original URI, which as the paper shows, even if it resolves is very unlikely to be what the author intended. The robust links site also includes JavaScript to implement pop-up menus giving users a choice of Mementos,which they assume a publisher implementing robust links would add to their pages. An example of this is Reminiscing About 15 Years of Interoperability Efforts. Note the paper-clip and down-arrow appended to the normal underlined blue link rendering. Clicking on this provides a choice of Mementos.

The eventual reader, who has not internalized the message of this research, will click on the link. If it returns 404, they might click on the down-arrow and choose an alternate Memento. But far more often they will click on the link and get to a page that they have no way of knowing has drifted. They will assume it hasn't, so will not click on the down-arrow and not get to the page the author intended. The JavaScript has no way to know that the page has drifted, so cannot warn the user that it has.

The robust link proposal also describes a different model of link decoration:
information is conveyed as follows:
  • href for the URI that provides the specific state, i.e. the snapshot or resource version;
  • data-originalurl for the URI of the original resource;
  • data-versiondate for the datetime of the snapshot, of the resource version.
If this model were to be used, the eventual reader would end up at the preserved Memento, which for almost all archives would be framed with information from the archive. This would happen whether or not the original URI had rotted or the content had drifted. The reader would both access, and would know they were accessing, what the author intended. JavaScript would be needed only for the case where the linked-to Memento was unavailable, and other Web archives would need to be queried for the best available Memento.

The robust links specification treats these two models as alternatives, but in practice only the second provides an effective user experience without significant JavaScript support beyond what has been demonstrated. Conservatively, these two papers suggest that between a quarter and a third of all articles will contain at least one web-at-large citation and that after a few years it is very unlikely to resolve to the content the article was citing. Given the very high probability that the URI has suffered content drift, it is better to steer the user to the contemporaneous version if one exists.

With some HTML editing, I made the links to the papers above point to their DOIs at, so they should persist, although DOIs have their own problems. I could not archive the URLs to which the DOIs currently resolve, apparently because PLOS blocks the Internet Archive's crawler. With more editing I decorated the link to Andy Jackson's talk in the way Martin suggested - the BL's blog should be fairly stable, but who knows? I saved the two external graphs to Blogger and linked to them there, as is my habit. Andy's graph was captured by the Internet Archive, so I decorated the link to it with that copy. I nominated the arXiv graph and my graph to the Internet Archive and decorated the links with their copy.

The difficulty of actually implementing these links, and the increased understanding of how unlikely it is that the linked-to content will be unchanged, reinforce the arguments in my post from last year entitled The Evanescent Web:
All the proposals depend on actions being taken either before or during initial publication by either the author or the publisher. There is evidence in the paper itself ... that neither authors nor publishers can get DOIs right. Attempts to get authors to deposit their papers in institutional repositories notoriously fail. The LOCKSS team has met continual frustration in getting publishers to make small changes to their publishing platforms that would make preservation easier, or in some cases even possible. Viable solutions to the problem cannot depend on humans to act correctly. Neither authors nor publishers have anything to gain from preservation of their work.It is worth noting that discussions with publishers about the related set of changes discussed in Improving e-Journal Ingest (among other things) are on-going. Nevertheless, this proposal is more problematic for them. Journal publishers are firmly opposed to pointing to alternate sources for their content, such as archives, so they would never agree to supply that information in their links to journal articles. Note that very few DOIs resolve to multiple targets. They would therefore probably be reluctant to link to alternate sources for web-at-large content from other for-profit or advertising-supported publishers, even if it were open access. The idea that journal publishers would spend the effort needed to identify whether a web-at-large link in an article pointed to for-profit content seems implausible.

Update: Michael Nelson alerted me to two broken links, which I fixed. The first was my mistake during hand-editing the HTML to insert the Memento links. The second is both interesting and ironic. The link with text "point to their DOIs at" referred to
Persistent URIs Must Be Used To Be Persistent, a paper by Herbert van de Sompel, Martin Klein and Shawn Jones which shows:
a significant number of references to papers linked by their locating URI instead of their identifying URI. For these links, the persistence intended by the DOI persistent identifier infrastructure was not achieved.I.e. how important it is for links to papers to be links. I looked up the paper using Google Scholar and copied the link from the landing page at the ACM's Digital Library, which was The link I copied was, which is broken!

LibUX: User Testing with

planet code4lib - Tue, 2016-12-20 17:16

Kristen Droesch and Iris Bierlein, from the NYU Libraries UX department, wrote a lot about using for observed testing – especially in lieu of something like GoToMeeting.

Lookback looks pretty boss. The main obstacle I think for libraries would be working it into a probably slim — let alone non-existant — user testing budget. It’s low price is $29/month. I would prioritize observed user testing higher than not, but if it’s coming out of the same budget as a browser-testing tool like Browserstack, or an application like Sketch, then suddenly the organizational pocket might be feeling a little tight.

This is what we’re talking about.

Sounds like it hits the sweet spot.

They add that

Even compared to tools like Silverback and Morae, which have been specifically designed for user testing, Lookback is the clear winner. Other existing tools have a host of limitations. Let’s put aside for a moment the fact that the Silverback update is nowhere on the horizon and focus on the usability issues of both it and Morae: sharing videos is a less-than-simple process, Morae is limited to Windows, the general user experience with both is clunky from a moderator’s perspective, etc. The list goes on and on.

So in our quest to find something that was designed with user testing in mind, and was less clunky/more intuitive, we found our way to Lookback. Compared to GoToMeeting, with its complicated setup, Lookback is a dream; installing it on multiple devices is a simple, clean process, and the cloud-based nature of this tool makes it especially useful for collaboration.

Kudos to Carli Spina for sharing this post in our slack.

Open Knowledge Foundation: Divide, rant and conquer: Addressing the difficulty of 2016 and the future of open government at #OGP16

planet code4lib - Tue, 2016-12-20 10:51

Mor Rubinstein reports on one of the Civil Society Morning workshop sessions during the Open Government Partnership Summit. The structure of the session involved ‘ranting’ in turns with fellow attendees. As 2016 draws to a close and a new year begins, the session serves as a useful reminder of the cathartic and productive processes of ranting and listening as necessary steps toward progress.

About a month ago, I got an exciting email from the Open Government Partnership support unit in which I was invited to host a workshop during the civil society morning of the OGP Summit in Paris with Zuzana Wienk about the future of the open government movement.

Session in action!

To be honest, 2016 was a very challenging year for open government, and in many ways, this movement often feels just unrealistic. Maybe citizens don’t really care about the facts anymore, but about their emotions. And those emotions are usually a combination of fear and hate of the other, the unknown and change. I was really upset and started to rant about it. A LOT. So when I got the opportunity to actually host a session, I thought – What if I could actually take other people’s rants into a productive space? How can we get all of the negative out, look it in the eye without being afraid of it, and then move from there to somewhere better?

What if I could actually take other people’s rants into a productive space? How can we get all of the negativity out, look it in the eye without being afraid of it, and then move from there to somewhere better?

Thanks to Google, I found the method of “Rant for a productive meeting”. Briefly, the principals are simple – divide the room into pairs, allow the couples to meet with one another for a minute. After that let the pairs rant in turns. For three minutes one person speaks and rants and the other person will listen and will prompt for reactions. In the next three minutes, the partners swap roles. After this rant period, a new question is raised into the room: “Now what?” Allow the participants to write (on a post-it of course) their thoughts and ideas on how to move forward. As the last step – share!

So we had around 16 participants from different regions in the session. You can find their thoughts just below. If you think any of ideas worth pursuing or discussing, just start a discussion about it on our forum here.

Here they are in raw – ideas from our session

Here are the Open Government Future ideas that came out from the rant session (to make it an easier read, we divided them into themes):

Civil society
  • Linking and leveraging with other initiatives to achieve greater results.
  • Now, what? Civic monitoring not only related to National Actions Plans, sustained with a small percentage of funding.
  • We need to call out Open Washing
  • Be proactive to share data CSOs
  • More educational programs – Civic educations
  • Better coordination of national society.
  • How to sustain in the long term a municipal civil society monitoring ecosystem?
    • Solution: Every local context has project funded with public money.
    • Every (almost) project funded have correction issues.
    • Use a little part of the budget to the project to fund civic monitoring actions.
  • Empowering individuals legislatures / elected officials.
  • Dependencies of politicians & businesses
  • Government to engage the youth to support government openness.
  • Share knowledge within government institutions to avoid duplication – reinventing the wheel.
  • Engagement of the EU institutions
Relationships between government and citizens
  • Rebirth of the Socratic dialogue
  • Democratic participation digital tools (e.g.,. Parliament hackathons)
  • Opinions != Facts
  • We need fact checking
Process of OGP
  • For OGP, the first five years have been about quantity, next five years should be about quality.
  • Now, what? Locally based processes. Cities involved in subnational OGP is not enough.
  • OGP needs to connect more deliberately with international processes (SDG, FFD, etc.)
  • Use innovative ways to share data
  • Clarify engagement and action opportunities for civil society with OGP and for opposition parties.
  • OGP needs to “Speak” to the citizens, adapting its communications tools and vocabulary / bridging the civil society and citizens gap.
  • A strategic planning event with CSO steering committee participation in a near future OGP, what’s next?
  • Learn from other’s experience and build on lessons learnt.
  • We need to make OGP sexier! (Link with other related agendas, better comms, better citizen language).
  • Clear and coordinate M&E framework to track changes over time
  • Integrating OGP into the national development agenda.
  • OGP agenda should transcend political transitions or change of government or agenda of a country e.g. – Brexit, Trump.

I hope that out of these ideas, we can get a better and vibrant open government community in 2017!

Credit: Open Government Partnership/Photograph by Evan Abramson

Coral Sheldon-Hess: Switching tracks

planet code4lib - Mon, 2016-12-19 21:13

It has recently come to my attention that I do not—should not—actually have to spend any more time being a web developer, if my ultimate goal is to go into data science. More experience with frameworks and more practice coding for cross-browser compatibility could both be fine things, but they wouldn’t really get me closer to my goal. If I want to be a data scientist I can just … switch, and do that.

“Just” is a misleadingly tiny word. I have relevant experience and skills, but if I want to switch fields, I’ll have to build up more of both.

When a person is in the opposite boat—wanting to get into web development—the temptation (assuming they have the cash at hand) is to go to a coding bootcamp. Good bootcamps make sure their graduates have placements—as apprentices or fellows or junior developers, depending upon the program—and are therefore a pretty decent investment. There’s even one in Pittsburgh that does not cost money to attend, and it’s graduating solid web developers.

As far as I know, though, there’s nothing like that for data science. Even if there is, it’s not in Pittsburgh, and I don’t have thousands of dollars lying around, so I would not be tempted. My library school alma mater has a data science program, but “get another masters degree” is not on my todo list, unless it’s followed immediately by a PhD—which, honestly, isn’t off the table. I’ve danced around the idea of a PhD for years, and the University of Pittsburgh’s announcement that they are combining their School of Information with their Department of Computer Science makes the whole prospect a lot more interesting to me, all of a sudden—provided I can pay for the whole thing with research and teaching fellowships, anyway. But I digress…

To be ready for employment or a PhD program or anything else really interesting in data science, I have some work to do. Luckily(?), I also have a full semester with no teaching responsibilities and no big contracts, starting in January. (I’ll still take small contracts, of course.) So my plan is to teach (or re-teach) myself the things I don’t know (or don’t remember) about data-focused development, statistics, databases, machine learning, and the theoretical underpinnings of computer science, over the course of the spring semester, 2017. I think I’m starting from a strong enough position that I’ll be a solid asset to a data science team after a single semester of (very) hard work. If contracts pile up, or some aspect of what I want to learn takes longer than I expect, and I go into the summer, so be it.

All of this is to say: I plan to blog about what I’m doing, in part because any practitioners who see my posts may see gaps in what I’m covering, in part for accountability, and in part to help lay a roadmap for other people who want to do what I’m trying to do. (Some of that roadmap will be “avoid this stretch of road,” I can already tell.)

A note for beginning programmers:

I already know how to program, and Python is generally my language of choice, so I’m starting at an intermediate level, with that. If you don’t already know how to program, I recommend Python for Everybody (at least course 1 and course 2). I haven’t taken it, but the folks I know who have, liked it. When I was first learning Python, I took Rice’s Coursera course, before it was split into two; it was pretty good, but I dropped it once it became more about geometry than about Python. You might consider trying both and sticking with whichever works best for you.

The O’Reilly book Learning Python is incredibly detailed and … honestly, a bit much for a beginner. Python wasn’t my first language, and I still kept hitting walls with that book—not so much because I couldn’t follow it (though that would be a concern, if it were one’s very first programming book!), but because it starts out with far too many details and too little of practical use. I’ll make another run at it, sometime, but my “Python books for newbies” advice is sadly limited, sorry. Maybe someone will comment with a good suggestion.

Although I don’t like it, by itself, for new language acquisition, I do like to work through Code Academy as a refresher and to build up muscle memory for syntax. They have a course in Python that is likely also worth a look, as a supplement to some other resource.

My background (just for context, feel free to skip):

I earned a minor in computer science while I was studying to be an electrical engineer. I took data structures, but not algorithms. I learned assembly and built a processor, but never took operating systems. I used to know about linked lists (OK, those I still pretty much get) and binary trees and QuickSort, but the knowledge has faded with disuse. People I graduated with, who took exactly the same classes I did (and, cough, didn’t do as well in them), came right out of school and started work as programmers, but I didn’t come out with the confidence to do that.

I did do a significant amount of scientific programming, though. A large part of my ECE graduate work was done in MATLAB, analyzing and making visualizations with data pulled out of (essentially) csv files and stored in cells (matrices of matrices). Some of that code is still out there, though 1) I won’t help you find it (I did not write for readability or reuse), and 2) the data is long gone. Fun fact: a lot of my interest in institutional data management is very, very personal.

More recently, I have written Python scripts to pull XML metadata out of institutional and subject repositories and standardize it into JSON. I really enjoyed that work, more than I’ve enjoyed any web development, prior or since. That code is easy to find, though as I understand it, the system it was part of has since been redesigned, so that nothing I wrote is directly in production. (I mean, it likely influenced what is in production. But I’m not sure that any one line of code I wrote is still in the system anywhere.)

As an engineer, of course I’ve taken probability and statistics. As an electrical engineer focusing on wireless propagation and networking, I have more statistics background than you can shake a stick at. But I haven’t used any of it in roughly a decade, which means I have retained very little of it. We engineers also treated statistics somewhat differently than it seems like social scientists (just for instance) do, so while I’ll have some things to relearn, I’ll also probably have to learn some of it for the first time. I’m oddly excited about taking a math class again!

As far as databases go, I know … more than nothing about relational databases, as a thing, and I have administered more than a few MySQL databases, including writing simple queries. But “databases,” as an area of knowledge, has always felt like something I’ve been weak on. I’m looking forward to building skills in that area, too.

The current plan (subject to change):

I plan to work through all of the courses in Coursera’s/the University of Michigan’s Applied Data Science With Python specialization. (Because I don’t know how often they run it, I’m actually taking the first course in the specialization right now.) This is … very intermediate. I’ve found it challenging, in part because they fly through the explanations, without a lot of examples, and (I think) in part because there is no official accompanying book. I’ve managed to finish the first programming assignment out of three, with 100%, but it was a real challenge.

My advice to someone going down this path is not to skip UM’s courses, but rather to do some preparation beforehand. I definitely regret not having worked through Data Science from Scratch first—something I’ll remedy as early in January as I can. The first course in the specialization also recommends, as supplementary texts, Python for Data Analysis (which I can use through my employer’s Safari Books Online, assuming they don’t revoke my credentials during any semester when I’m not teaching) and Learning the Pandas Library (which I went ahead and bought, because $9.99 is under my pain threshold for a technical book). Reading through probably either of those would make the course significantly easier.

I assume the second course will start in January, though it does not yet have a start date listed.

In January I’ll also start Basic Statistics by Coursera/the University of Amsterdam. It’s an eight week class, which is long for a MOOC, but it covers a lot of ground. Hopefully after that course I’ll have a better idea of what I need to do next, to build my statistical skills. Unfortunately, the course textbook is well above my pain threshold for a book. If you run across a used copy (with the CD, ideally) for a reasonable price, or if you have it and would let me borrow it, please let me know?

As far as rebuilding my theoretical CS skills—and adding ones I never got to—I’m going to take a little bit of a shortcut. Then again, it’s a 500 page book, so maybe “shortcut” isn’t the right term. I plan to work through The Impostor’s Handbook, which is supposed to cover those things that a CS degree covers, outside of practical programming skills. (Hat tip to Andromeda Yelton for pointing it out to me!)

I don’t have a solid plan for learning about databases, yet; these other skills are a higher priority. I’m open to suggestions, though.

I also don’t have a solid plan for learning about machine learning, but I just found this and am thinking of signing up right now, because it looks amazing. (I’m supposed to take a week off between Christmas and New Years. I sort of promised my spouse. But this course looks so good!)

Oh, and, somewhere in there, I plan to learn D3.js. Unless there’s something better that I should be learning, to create interactive visualizations on the web?

Anyway, I’m really excited about getting to spend a semester learning all of this interesting material—as evidenced by my taking one or two of the classes immediately, with no break, after a kind of hellish semester. (I promised; there will be a break.)

I’m also interested in ideas about what topics I’m not covering, but should be, or alternate approaches to covering any of the topics I am covering. It’s very Coursera- and O’Reilly-heavy, right now, I know. (And, while I’m doing as much of it for free as I can, that is not a hard rule. If a resource is really good, I don’t mind if it costs money, within reason.)


Image at the top of the post is courtesy of Thomas Tydal, on Quora

FOSS4Lib Upcoming Events: Hydra Connect 2017

planet code4lib - Mon, 2016-12-19 21:04
Date: Monday, September 11, 2017 - 08:00 to Thursday, September 14, 2017 - 17:00Supports: Hydra

Last updated December 19, 2016. Created by Peter Murray on December 19, 2016.
Log in to edit this page.

Hydra Connect 2017 - DuraSpace Wiki

FOSS4Lib Upcoming Events: SobekCM Code Camp Florida

planet code4lib - Mon, 2016-12-19 21:01
Date: Monday, April 24, 2017 - 08:00 to Wednesday, April 26, 2017 - 17:00Supports: SobekCM Digital Repository Software

Last updated December 19, 2016. Created by Peter Murray on December 19, 2016.
Log in to edit this page.

SobekCM Code Camp Florida 2017

Peter Murray: RA21: A new effort to ease the user experience for off-campus access to licensed content

planet code4lib - Mon, 2016-12-19 19:35

Earlier this year the STM Association — a trade association for academic and professional publishers — started a project called RA21: Resource Access in the 21st Century. The project is a renewed approach to moving past network address recognition and proxy agents as a way of authenticating access to licensed content. I describe the RA21 effort in general on the Index Data blog and listed some of the potential impacts on the FOLIO project.

I attended an RA21 Project meeting last week and was struck by something that one of the presenters (I think it was Chris Shillum from Elsevier) said about the longevity of network address recognition. Paraphrasing, the utter simplicity of authenticating on-campus users with IP address recognition has slowed the development of solutions that enable off-campus access. I mean, it doesn’t get any easier than the black-and-white decision on the server end to determine if the request is coming from a recognized address. If it is, you’re in! If not, you’re presented with the option to pay money to view the article. What if we added a little bit of process for on-campus users to benefit access by off-campus users? Say we asked on-campus users to authenticate themselves with a campus user id and password before getting access. And say we made it easier by automatically sending users to their campus single sign-on system when we could. And say that when we couldn’t, we made the user experience of getting to their campus single sign-on system through consistent use of graphic design, wording, and page design by the content providers. That is what is envisioned for the RA21 project.

Jonathan Rochkind: Segmenting “Catalog” and “Articles” in EDS API

planet code4lib - Mon, 2016-12-19 19:23

About 4 years ago, I posted a long position paper arguing that a “bento-style” search  was appropriate for the institution I then worked at. (I’ve taken to calling it a “search dashboard” approach too since then.)   The position paper stated that this recommendation was being made in the face of actually existing technical/product constraints at the time; as well as with the (very limited) research/evidence we had into relevant user behavior and preferences. (And also because for that institution at the time, a bento-style search could be implemented without any additional 6-figure software licenses, which some of the alternatives entailed).

I never would have expected that 4 years later the technical constraint environment would be largely unchanged, and we would not have (so far as I’m aware) any significant additional user research (If anyone knows about any write-ups, please point us to them). But here we are. And “bento style” search has kind of taken over the landscape.

Putting reasons for that and evaluations of whether it’s currently the best decision aside, for a client project I have been implementing a “bento style” search dashboard with several of the components targetting the EDS API.  (The implementation is of course using the bento_search gem, expect a new release in the near future with many enhancements to the EDS adapter).

The client wanted to separate “Catalog” and “Article” results in separate “bento” boxes — clicking the “see all results” link should take the user to the EDS standard interface, still viewing results limited to “Catalog” and “Articles”. It was not immediately clear how to best accomplish that in EDS.  The distinction could be based on actual source of the indexed records (indexed from local ILS, vs EDS central index), or on format (‘article’ vs ‘monograph and things like that’, regardless of indexing source).  I was open to either solution in exploring possibilities.

I sent a query to the Code4Lib listserv for people doing this with EDS and discovered: This is indeed a very popular thing to do with EDS; People are doing it a whole variety of different kind of hacky ways.   The conclusion is I think probably the best way might be creating a custom EDS “limiter” corresponding to a “(ZT articles)” query, but I’m not sure if anyone is actually doing that, and i haven’t tried it yet myself.

Possibilities identified in people’s off-list responses to me:

  • Some people actually just use full un-limited EDS results for “Articles”, even though it’s labelled “Articles”! Obviously not a great solution.

  • Some people set up different EDS ‘profiles’, one which just includes the Catalog source/database, and one which includes all source/databases except ‘Catalog’.  This works, but I think doesn’t give the user a great UI for switching back and forth once they are in the EDS standard interface, or choosing to search over everything once they are there — although my client ultimately decided this was good enough, or possibly even preferred to keep ‘catalog’ and ‘articles’ entirely separate in the UI.
  • One person was actually automatically adding “AND (ZT article)” to the end of the user-entered queries. Which actually gives great results. Interestingly, it even returns some results marked “Book” format type in EDS — because they are book chapters, which actually seems just right. On the API end, this is just fine to invisibly add an “AND (ZT article)” to the end of the query. But once we direct to ‘view all results’, redirecting to a query that has “AND (ZT article)” at the end looks sloppy, and doesn’t give the user a good UI for choosing to switch between articles, catalog, and everything, once they are there in the EDS standard interface.

  • Some people are using the EDS format “source type” facets, limiting to certain specified ‘article-like’ values.  That doesn’t seem as good as the “(ZT article)” hack, because it won’t include things like book chapters that are rightly included in “(ZT article)”.  But it may be good enough or the best available option.  But, while I believe I can do that limit fine form the API, I haven’t figured out any way to ‘deep link’ into EDS results with a pre-selected query that has pre-selected “source type” facet limits.  Not sure if there any parameters I can add on to the `?direct=true&bquery=your-query-here`  “deep link” URL to pre-select source type facets.

Through some communication with an internal EDS developer contact, I learned it ought to be possible to create a custom “limiter” in EDS corresponding to the AND (ZT articles)hack. I’m not sure if anyone is actually doing this, but sounds good to me for making an “Articles Only” limiter which can be used in both standard EDS and via API. The instructions I was given were:

Good news, we can do your best option here.  We’ve got a feature called “Custom Limiters” that should do the trick. Take a look at how this search “pre-selects” the custom limiter and removes the syntax from the search query. In order to accomplish this, the library needs to add a custom limiter for the specific search syntax you’d like to use.  In this case, this needs to be pasted in the bottom branding of their EDS profile: [gah, having so much trouble getting WP to let me put example script tag in here. --jrochkind ]  (pretend this is an <)script type="text/javascript" src=""> </script> This script catches any use of “AND PZ Article” and instead simulates a limiter based on that search syntax.

I haven’t actually tried this myself yet, but it sounds like it should probably work (modulo the typo “PZ Article” for “ZT Article”, which I think is the right one to use on EDS).  Hard to be sure of anything until you try it out extensively with EDS API, but sounds good to me.

Filed under: General

Islandora: Islandoracon CFP Extended to January 13th

planet code4lib - Mon, 2016-12-19 19:08

Islandoracon is happening May 15 - 19 in Hamilton, Ontario, Canada. Join the Islandoracon community for a week of sessions, workshops, and collaboration. Our Call for Proposals for sessions, posters, and post-conference events is being extended to January 13th, 2017.

This year’s conference theme is Beyond the Island. Since its creation at the University of Prince Edward Island in 2006, Islandora has spread around the world. It has grown to include diverse institutions, collections, and strategies for digital repository management that add to the richness of the Islandora community. The 2017 Islandoracon will celebrate these multifaceted visions of Islandora that are continually emerging, inspiring constant revision in the concept of a digital repository.

If your session is selected, you will be entitled to register for the conference at a reduced Speaker's rate.


District Dispatch: The way forward – better together

planet code4lib - Mon, 2016-12-19 16:33

Guest post by the Chair of the Committee on Legislation, Dr. Ann Dutton Ewbank.

ALA’s Washington Office has, for the past 70+ years, been the voice of libraries in the nation’s capital. That voice — your voice — has had a significant positive impact on laws, policies and attitudes that deeply affect matters about which libraries and librarians care deeply because they profoundly affect what you do for your patrons, communities and ultimately the nation.

Over the years, library values, grassroots advocacy and direct lobbying have positively influenced scores of bills, agency rules and even presidential orders that have defined our intellectual freedom, privacy, and abilities to use and provide our patrons copyrighted material, government information, access to the internet and more.

It’s clear that the recent elections have raised important questions, often passionately expressed, about how ALA can and should pursue its members’ and the Association’s policy goals, and defend its core principles, in Washington in the coming Administration. The Committee on Legislation and the Washington Office recently shared some thoughts about that with ALA’s Council and leadership in the statements reprinted in full below. Both the Committee and Washington Office are committed to continuing this important introspection and discussion and hope that you’ll share word of two near-term opportunities to make your views heard.

First, please take time to share your thoughts about the Committee’s and Office’s statements by commenting on them here on District Dispatch. As Chair of the Committee on Legislation (COL), I’m committed to promptly and personally replying online to foster a real dialogue about these critical matters.

Second, the Committee on Legislation will devote the second half of its joint public meeting with the ALA Legislation Assembly (LA) at the upcoming mid-winter meeting in Atlanta entirely to an open 90-minute discussion of these issues. Visitors will be welcome to ask questions and share their views with the Committee and Assembly.

That joint COL/LA meeting will take place on Friday, January 20th at 1:00 pm in GWCC Room B 214. Suggestions for other means of continuing to share thoughts and ideas after the Midwinter Meeting for those unable to attend also are very welcome.

ALA has risen to – and met – many public policy challenges throughout its history. If there’s a common lesson to be drawn from those experiences it’s that we’re better together. COL and the Washington Office feel strongly that that’s the best way forward in these challenging times. Please join us as often and in any way that you can.

Statement from the Committee on Legislation (12/12/2016):

In light of the recent discussions among the membership, Council and the Executive Board, the Committee on Legislation (COL) has been asked by the Executive Committee to provide a statement to Council regarding our work.

COL is a committee of ALA Council. COL believes that ALA’s core values (including but not limited to equity, diversity, inclusion, intellectual freedom, and privacy) are best supported by robust federal funding for libraries of all types, policy legislation, and Executive Branch procedures that are in alignment with these core values.  Our charge is as follows:

“To have full responsibility for the association’s total legislative program on all levels: federal, state, and local. To recommend legislative policy and programs for council approval and to take the necessary steps for implementation. To protest any legislative or executive policy adversely affecting libraries. To seek rulings and interpretations of laws and regulations affecting the welfare and development of libraries. To represent the ALA before the executive and legislative branches of government as required at all levels. To provide a forum within ALA to gather information about needed legislation and to keep all units of the association informed of the ALA legislative programs. To direct the activities of all units of the association in matters relating to legislation.”

Some confusion may exist surrounding the roles of The Office of Government Relations (OGR) and the Washington Office in regard to recent statements issued following the Presidential election.  OGR is one of two offices* in the Washington Office, and reports to the Committee on Legislation. OGR is the lobbying arm of the Washington Office and actively lobbies for ALA core values, issues, and policies approved by the ALA Council after being vetted by COL.

COL is in constant, year-round contact with OGR, including an annual retreat, nine hours of meetings at both Annual and Midwinter, monthly conference calls, and regular email communication. Members of COL are thoughtful and deliberate as they vet each policy position and make recommendations to ALA Council and provide direction to OGR.

To this end, OGR, under the direction of ALA policy and COL, works with elected officials on both sides of the aisle, in order to secure positive federal legislation for libraries of all types and to advocate for federal policies that are consistent with ALA’s core values. This includes continuing to lobby for E-rate, LSTA, the Library of Congress, IMLS, ESSA, balanced copyright, privacy protections, and a host of other federal programs and policies that directly impact libraries. Continued and increased support for these programs are vital to libraries’ ability to enact the core values that are expressed by ALA.

Be assured that COL will continue to keep ALA’s core values at the forefront as we make recommendations to ALA Council and provide direction to OGR. We invite all ALA members to join us at our open meetings at any Midwinter and Annual conference. We invite dialogue and questions from ALA members. We thank members who have expressed their thoughts and opinions during this time and ask that members continue to communicate with COL.

We also recommend that library supporters participate in National Library Legislative Day, subscribe to District Dispatch, and respond to Calls for Action around legislation and policy.  Our library patrons depend on the services we deliver and we need to make sure those expanding services can continue with positive legislation and policy decisions.  And that only comes from very active advocacy on our part. Together we can make a positive impact for libraries of all types.

If you are interested in learning more about the work of COL, please see the following article:

Ewbank, A., Masland, J.T., & Zabriskie, C. (2016). Library issues at the federal level: An introduction to ALA’s Washington Office and the Committee on Legislation. Political Librarian, 2(1), 11-16.


Ann Dutton Ewbank
COL Chair, 2015-2017

Statement from the ALA Washington Office (12/14/2016):

Dear Councilors:

I wanted to follow up on the message from the Committee on Legislation by providing some background on how we have worked with new Administrations and Congresses in the past, and how this work is always informed by our core values as an Association.

After every federal election, both the Offices of Government Relations and Information Technology Policy in ALA’s Washington Office actively engage in educating new political actors on library issues.  Specifically, we reach out to every newly elected Member of Congress in both the House and Senate, and to every new President’s Administration, to share our white papers, one-pagers and brochures on all the various issues that affect library professionals ability to deliver the best services to their patrons.  Both OGR and OITP will continue to work closely with COL and the OITP Advisory Committee to promote the best in library services.

This practice gives us a valuable early chance to fully explain how a broad range of laws and policies can potentially help or hurt our ability to deliver needed services across the nation.  These issues include the obvious, like LSTA funding and privacy, and more complicated matters such as copyright, surveillance and even aspects of immigration policy. Eight years ago, for example, we worked with both the incoming Congress and Obama Administration on their efforts to implement a Stimulus Program, which led to libraries benefiting directly from the “BTOP Program,” which underwrote library broadband services across the country.  Similarly, in the immediately prior Bush Administration, ALA’s direct engagement with the White House and Congress resulted in First Lady Laura Bush’s championing, and Congress’ creation of a dedicated title supporting  Librarian Education in LSTA.

As it has for over 70 years, the ALA Washington Office will continue to work with all federal decision-makers to maximize libraries’ and librarians’ ability to make the most and best library services possible and accessible to all of our patrons, and to keep libraries at the core of their communities.  These actions are always aligned with our core values of access, confidentiality/privacy, diversity, education and lifelong learning, intellectual freedom,  social responsibility,  professionalism, and service, and in concert with ALA’s  policies in these areas. Much of our work  with federal programs, including LSTA and the E-Rate, directly supports the ability of thousands of libraries to provide equity of access and inclusion for all.

Emily Sheketoff


The post The way forward – better together appeared first on District Dispatch.


Subscribe to code4lib aggregator