You are here

Feed aggregator

Ed Summers: Assemblages of Appraisal

planet code4lib - Wed, 2017-07-12 04:00

I had the opportunity to put together a poster for AERI this year. The poster presents a paper that I recently gave at CSCW (Summers & Punzalan, 2017). Creating it was a surprisingly useful process of distilling the paper to its essentials while re-presenting it visually. It occurred to me that the poster session audience and the typical web audience have something in common: limited attention. So I reworked the poster content here as a blog post to try to make my research findings a bit more accessible.

Even after over 20 years of active web archiving we know surprisingly little about how archivists appraise and select web content for preservation. Since we can’t keep it all, how we decide what to keep from the web is certain to shape the historical record (Cook, 2011). In this context, we ask the following research questions:

  1. How are archivists deciding what to collect from the web?

  2. How do technologies for web archiving figure in their appraisal decisions?

  3. Are there opportunities to design more useful systems for the appraisal of content for web archives?


To answer these questions I conducted a series ethnographic interviews with 29 individuals involved in the selection of web content. Participants include web archivists as well as researchers, managers, local government employees, volunteers, social activists, and entrepreneurs. The field notes from these interviews were analyzed using inductive thematic analysis.

Analysis began with reading all the field notes together, followed by line by line coding. While coding was done without reference to an explicit theoretical framework, it was guided by an interest in understanding archival appraisal as a sociotechnical and algorithmic system (Botticelli, 2000 ; Kitchin, 2016).


Coding and analysis surfaced six interconnected and interdependent themes that fell into two categories, the social and the technical, which are illustrated here in green and yellow respectively.

Appraisal in the context of web archiving is a complex interplay between the following:

Crawl Modalities: The selection strategies designed into tools and chosen by archivists in their work: domains, websites, documents, topics, and events.

Information Structures: Specific formations of web content that archivists interacted with during appraisal: hierarchies, networks, streams, and lists.

Tools: Configurations of tools that were used: archiving services, storage, spreadsheets, email, social media, content management systems.

People: Field archivists, managers, technicians, journalists, volunteers, software developers, groups (activists, professional), and institutions.

Time: How long to collect, how often to collect, how quickly web content needed to be gathered, perceptions of change in content.

Money: Grants from foundations and agencies to support collection activities, staffing, subscription fees, relationship between money and storage.


The findings highlighted sites of breakdown that are illustrated by the red lines in the thematic diagram. These breakdowns are examples of infrastructural inversion (Bowker & Star, 2000), or sites where the infrastructure of web archiving became legible.

Breakdowns between People and Tools were seen in the use of external applications such as email, spreadsheets and forms to provide missing communication features for documenting provenance and appraisal decisions.

Breakdowns between Crawl Modalities, Information Structures and Tools were also evident when archivists improvised communication tools to coordinate selection decisions when geopolitical boundaries complicated collection policies.

Breakdowns in Money, Crawl Modalities and Information Structures occurred when archivists could not determine how much it would cost to archive a website, and attempted to estimate the size of websites.

Appraisal decisions depend on visualizations of the material archive.

While our chosen research methodology and findings do not suggest specific implications for design (Dourish & Bell, 2011) they do highlight rich sites for for repair work as well as improvisational and participatory design (Jackson, 2014).


Thank you to Ricky Punzalan for much guidance during the planning and execution of the study. Leah Findlater and Jessica Vitak also helped in the selection of research methods. Nicholas Taylor, Jess Ogden and Samantha Abrams provided lots of useful feedback on early drafts, as well as pointers into the literature that were extremely helpful.

I also want to thank the Maryland Institute for Technology in the Humanities and the Documenting the Now project (funded by the Mellon Foundation) who provided generous support for this research. My most heartfelt thanks are reserved for the members of the web archiving community who shared their time, expertise and wisdom with me.

Noun Project images by Nirbhay, il Capitano, Creative Stall, Setyo Ari Wibowo, Agni, and Shuaib Usman Yusuf.


Botticelli, P. (2000). Records appraisal in network organizations. Archivaria, 1(49), 161–191.

Bowker, G. C., & Star, S. L. (2000). Sorting things out: Classification and its consequences. MIT Press.

Cook, T. (2011). We are what we keep; we keep what we are: Archival appraisal past, present and future. Journal of the Society of Archivists, 32(2), 173–189.

Dourish, P., & Bell, G. (2011). Divining a digital future: Mess and mythology in ubiquitous computing. MIT Press.

Jackson, S. J. (2014). Media technologies: Essays on communication, materiality and society. In P. Boczkowski & K. Foot (Eds.),. MIT Press. Retrieved from

Kitchin, R. (2016). Thinking critically about and researching algorithms. Information, Communication & Society, 20(1), 1–16.

Summers, E., & Punzalan, R. (2017). Bots, seeds and people: Web archives as infrastructure. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing (pp. 821–834). New York, NY, USA: ACM.

District Dispatch: Ready to Code Faculty get ready to go at ALA Annual!

planet code4lib - Tue, 2017-07-11 20:31

During ALA’s recent Annual Conference in Chicago, Libraries Ready to Code (RtC) Faculty Fellows and Phase II project team met in person for the first time and for a full day and a half workshop during ALA’s Annual Conference in June. The purpose: to get deep into defining computational thinking in a way that resonates with the library community, parsing out RtC concepts and deconstructing faculty syllabi with these things in mind.

Ready to Code Phase II Faculty Fellows and project team met at Google Chicago during ALA’s Annual Conference in June. Photo credit: Emily Wagner

ALA’s Office for Information Technology Policy (OITP) launched Phase II of the RtC project along with our partner, Google, Inc., in January 2017. Phase II focuses on Library and Information Science graduate programs and consists of a faculty cohort of six RtC Fellows that will redesign one of their current technology/media courses based on RtC concepts (i.e., increasing access and exposure to CS, changing perceptions of who does CS, and connecting CS to youth interests or CS+X). Faculty will pilot the redesigned courses at their institutions this fall.

RtC Faculty Fellows are: Dr. Colette Drouillard, Valdosta State University (GA); Dr. Melissa Johnston, University of West Georgia; Dr. Rachel Magee, University of Illinois at Urbana-Champaign; Dr. Jennifer Moore, Texas Woman’s University; Dr. Joe Sanchez, City University of New York; and Dr. Natalie Taylor, University of South Florida. Phase II RtC project team members are: Marijke Visser (OITP), Linda Braun (Librarians & Educators Online), Mega Subramaniam (University of Maryland) and Caitlin Martin (Stanford University).

Ready to Code Phase II Faculty Fellows brainstormed content for LIS course syllabi with sticky notes and marker scribbles. Photo credit: Caitlin Martin

RtC Faculty Fellow Rachel Magee describes the workshop this way:

Attending the Ready to Code Workshop at this year’s ALA Annual Meeting was a unique opportunity to collaborate with other faculty who teach “pre-service librarians,” or students currently completing library school. Our group included professors from all over the country, and we were able to work together to develop our understandings of computational thinking and brainstorm ways to incorporate it into classes for students specifically focused on youth services.

Our classes range from in-person courses to online classes that meet both asynchronously and synchronously. We’re all focused on youth services broadly, but each class has its own flavor. My course is built on service learning and requires students to volunteer in an organization that serves or supports youth. Bringing Ready to Code concepts into this course will include in-depth discussions of ways these organizations engage with computational thinking, and give students the opportunity to plan these kinds of programs themselves.

At the end of the workshop, Fellows not only left with a framework and timeline for redesigning their syllabi, they left with a strong level of commitment to the importance of their work as RtC Fellows. Throughout the summer, the Fellows will continue to connect with their cohort colleagues as they fine tune the approaches they will take in embedding RtC into their courses.

While the end of the fall semester seems like a ways off, Phase II will culminate in graduate level course models that equip MLIS students to deliver coding programs through public and school libraries that foster computational thinking skills among the nation’s youth. Well-trained MLIS graduates will enable libraries around the country to broaden and diversify access to computer science education. Faculty Fellows will share revised syllabi and course models with colleagues across the LIS community and serve as ambassadors to encourage other faculty to embed RtC concepts in curriculum.

Additional information is available on the Libraries Ready to Code website.


The post Ready to Code Faculty get ready to go at ALA Annual! appeared first on District Dispatch.

District Dispatch: Protecting public access to earth science information

planet code4lib - Tue, 2017-07-11 15:32

This week, ALA sent a letter calling for continued funding for the U.S. Geological Survey (USGS) Library and public access to its collections and services. 

USGS is a scientific agency within the Interior Department. The USGS Library bills itself as “largest earth science library in the world.”

President Trump’s budget proposes to cut $137.8 million from USGS, a 13% reduction from the current year. Out of that total, $3 million would come from the USGS Library – reportedly, a 52% cut. Such a drastic and disproportionate cut would close at least three, and potentially all four, of the library’s locations, and would eliminate public access to its collections.

Implementing these proposed library cuts would be penny wise and pound foolish. Investments in the USGS Library provide at least a 7:1 return, according to a group of trade and science organizations. Industries ranging from energy to mining to insurance widely use the maps, publications, and other resources that the USGS Library provides.

The House Interior Appropriations subcommittee is scheduled to begin consideration of its funding bill for USGS this week, with the Senate expected to follow at a later date. Amidst the larger “Fight for Libraries!” campaign to preserve federal funding for libraries, we hope that Congress will also remember the value of the USGS Library to America’s economic and scientific enterprise.

The post Protecting public access to earth science information appeared first on District Dispatch.

Open Knowledge Foundation: Are Indian laws really ‘open’?

planet code4lib - Tue, 2017-07-11 14:07

Opening up India’s laws – the journey of, a non-profit legal tech organization in India

India seems to fare reasonably well on ‘open data’ and ‘open government’ indicators.  However its abysmal record with respect to enforcement of rights and its civil and criminal justice systems points towards an inability to use this information meaningfully and for the purpose of actual legal empowerment. Nyaaya, a new legal tech organization in India, is attempting to make an intervention by applying standard legislative markup tools for the first time in the country and by developing plain language explainers and guides.

Surveys conducted in other jurisdictions point towards a direct relationship between people’s awareness of the law and legal services and problem resolution strategies. For example, consider the case of a senior citizen who has been abandoned by her children and needs financial help in supporting herself. There is more than one law she can look at to address her problem in India – provisions in the criminal procedure code, religious laws of certain communities or a special law on the maintenance of senior citizens. Even to take the first step of approaching a lawyer, she needs to have already known that she has a right to maintenance under various laws or be able to easily find this information. Assuming she’s taken this first step, the senior citizen is very likely to be misinformed or not be presented with her full range of options given the poor quality of legal representation one is very likely to find in most Indian courts. Access to accurate and easy-to-comprehend legal information is a critical enabler in the realization of various constitutional and legal rights.

The Global Open Data Index 2016, published by Open Knowledge International, describes India’s national laws as 45% ‘open’. Yet, as we discuss in this post, the way in which these laws have been made available do not allow for citizens to extract even simple legal information easily.

The central government in India in its effort to publish laws digitally put together a portal known as IndiaCode in the mid 1990s. The design and search functions of this portal were rudimentary – the laws are published without more recent amendments and in completely unreadable formats. For example, the image below indicates how the primary and most important criminal law of the country (i.e, the Indian Penal Code, 1860) continues to be published on the portal.

India has a quasi-federal structure – laws are issued by the central government and its various states. At the state level, there are only a few state governments or courts which have actually made complete sets of their laws available online. Even then, most states continue to publish laws in formats which are not machine readable. For example, the eastern state of Odisha publishes its laws in this format:

A third very important body of statutory law in the country, is subordinate law in the form of rules, regulations, circulars, notifications and guidelines. These are again published in an ad-hoc manner by both central and state governments. While the Gazette published by the central government has published more recent rules and regulations, these again take the form of PDFs and very often scanned PDFs.

These different resources suffer from numerous problems including the fact that they are not searchable, updated to reflect amendments, hyperlinked or published in user-friendly formats. Consider one such problem – the lack of updated or consolidated laws make the laws unreliable for use by laypersons who are generally unaware of the fact that they must look at a law with all amendments to understand the position of law as it stands today. The laws passed by the Central Government can be considered as some of the most important laws regulating a person’s life in India – even these were unavailable in a consolidated version until 2016. A decision by the Chief Information Commission (authority set up under the freedom of information law in India) in 2015 recognized the responsibility of the government in making consolidated versions of the law available. Following this decision, the Central Government has, as of date, made most laws between 1947 and 2016 available on IndiaCode.

However, given that the government has approached this exercise without giving much thought to best practices and without actually being interested in ‘opening’ laws, in the same way that it has approached opening other forms of data, this manually, at a slow pace with the output taking the form of PDFs. Recognizing that this approach is not sustainable in the long run, the Delhi High Court has very recently directed the government to consider developing technical standards in the drafting and publication of laws.

The unavailability of laws passed by different types of governments, the inaccessible format of the available documents and the text of the law itself has essentially allowed the understanding of laws and legal information to remain the privilege of a few (lawyers and judges). Locking up of laws in this manner contributes significantly to the alienation of the justice system and public services from the public.

To understand how we could use laws and legal information in bridging this gap, a small team of lawyers and technologists got together with the support of a visionary philanthropist in 2016 to form It was started with the simple goal of being a single resource for national and state laws along with plain language explainers. Nyaaya is a digital resource that is trying to bridge the gap between the availability of laws and legal literacy by making laws machine readable and by providing simple legal information to answer simple legal queries as well as complex legal processes.

It aims to be a single resource where one can connect the dots between different types of laws so that it can be used fruitfully in understanding their rights under the law. Our constant endeavour is to use design principles to connect the dots between unstructured and disaggregate legal information in different forms such as Acts, Rules, Regulations and Notifications. While the linking of legal information is currently being done by an internal team, we are also building systems which will allow for external volunteers and contributors to make the connections and create content.

A key component in this integrated platform will be presentation and tagging of annotations to individual legal provisions. These annotations can help not only serve the purpose of an explainer or commentary for the section, but also help in understanding the interplay between the primary law with subordinate legislation and state amendments to federal laws (a unique form of statutory instrument in India).

To make legal information consumable and empowering, the information must also be machine readable. Making laws machine readable can monumentally improve the users’ search for accurate legal information to their queries. Going back to the example, technology that is able to identify maintenance provisions in the gamut of Indian laws will deliver faster and more accurate results for the senior citizen who is using the internet as her first source of information. Nyaaya uses Indigo a tool developed by a civic-tech organisation in South Africa, OpenUp in order to capture and manage laws in Akoma-Ntoso. Akoma-Ntoso is a popular XML schema for describing legislative documents. The laws marked up in Akoma-Ntoso which are published on Nyaaya are available on Github here.

Example from the Guide to Marriage, Divorce and Maintenance

The biggest challenge when it comes to making laws machine readable is parsing and extraction of laws. The long legacy of private publishing has continued to influence the presentation of laws being published online. Government bodies publish laws in the manner in which they are extracted – as scanned PDFs (which are highly unstructured data). Further, the traditional structure of marginal notes and footnotes (indicating amendments) make it difficult for machines to identify the components of a law correctly and also makes publishing of laws at different points of time (prior to and after an amendment) next to impossible.  

India, being a country with 29 states and 7 union territories, there is an entire body of law which haven’t even been published digitally, let alone being published in machine-readable and open formats. As a country, open data enthusiasts in India have several things going their way – a government which is (at least superficially) enthusiastic about embracing technology, a robust freedom of information law and a growing open data movement. However, opening up of laws is only a very recent and new focus for a limited section of civil society and government.

While the initial inspiration behind this initiative was to make laws accessible to laypersons, our experience this last year has helped us realize how critical this is to access to justice and public services. The journey of having to re-imagine laws and legal information this last year has also helped us realize that the need of the hour is developing an India-specific legislative mark-up language. We’re smitten by the transformative potential that a combination of laws, design and technology offers and hope to use the next year to make giant leaps in the manner simple and useful legal information is delivered to the general public (including through regional languages!)


Zotero: New Features for Chrome and Safari Connectors

planet code4lib - Mon, 2017-07-10 23:51

We are excited to announce major improvements to the Zotero Connectors for Chrome and Safari.


The Zotero Connector for Chrome now includes functionality that was previously available only in Zotero for Firefox.

Automatic Institutional Proxy Detection

Many institutions provide a way to access electronic resources while you are off-campus by signing in to a web-based proxy system. The Zotero Connector for Chrome makes this more convenient by automatically detecting your institutional proxy. Once you’ve accessed a site through the proxy, the connector will automatically redirect future requests to that site through the proxy (e.g., if you open a link to, you’ll be automatically redirected to

Proxy detection does not require manual configuration. You can disable or customize it from the connector preferences.

Style Installation and Automatic File Importing

Zotero for Firefox has long included support for installing citation styles and importing reference manager files such as BibTeX or RIS when opened in Firefox. This functionality is now supported by the Zotero Connector for Chrome.

Clicking on a link to a CSL file will display a prompt to install the style into Zotero. Clicking on a BibTeX or RIS file that is served correctly by the website will prompt to import its data into Zotero. If you choose ‘Cancel’, you can download the file normally.

Note that this feature requires Zotero 5.0.


If you have the latest version of Chrome installed, you should be automatically updated to version 5.0.5 or later of the Zotero Connector. If you’re not currently using Zotero with Chrome, you can install the extension from the downloads page. Make sure you’ve installed Zotero 5.0 as well for the best experience.


We’ve brought several features to Safari that were previously available only in Chrome and Firefox.

Streamlined Saving

Previously, on pages where Zotero could not detect any high-quality data, the Zotero save button was inactive, and you could save a basic webpage item and snapshot by right-clicking on the page and selecting “Save Page to Zotero”. In the latest version of the Zotero Connector for Safari, it’s possible to save any page to Zotero with the click of a button.

When high-quality data is available, the button will show the same icon as before: book, newspaper, etc. On all other pages, you’ll see an empty page icon, and clicking on it will create a basic webpage item and snapshot in Zotero. Hovering over the icon will tell you which translator, if any, Zotero would use to save the page.

One-click PDF saving

You can also now save PDFs to Zotero with a single click.

Previously, saving a PDF to Zotero from Safari required dragging from the address bar into Zotero or saving the PDF to disk and adding it to Zotero manually. Now, when you’re viewing a PDF in Safari, simply click the toolbar button to save the PDF as a top-level attachment in Zotero, after which you can use “Retrieve Metadata from PDF” or “Create Parent Item” to create a full bibliographic item. Note that, when possible, it’s still preferable to save from a webpage with full metadata.

Secondary Translators

On some webpages, Zotero can save data using multiple translators, but up until now it hasn’t been possible to choose a secondary translator from Safari. You can now long-press the Zotero button to see additional options for saving from the current page, including saving as a regular webpage instead of using a translator.

Secondary translators may provide different data for the page itself or data for other sources referenced in the page (for example, cited references on Wikipedia, as in the example above).


You can get the new version of the Safari extension from the downloads page or by updating to 5.0.0 or later from the Extensions pane of the Safari preferences. Make sure you’ve installed Zotero 5.0 as well for the best experience.

Zotero: A Unified Zotero Experience

planet code4lib - Mon, 2017-07-10 23:50

Since the introduction of Zotero Standalone in 2011, Zotero users have had two versions to choose from: the original Firefox extension, Zotero for Firefox, which provides deep integration into the Firefox user interface, and Zotero Standalone, which runs as a separate program and can be used with any browser.

Starting with the release of Zotero 5.0, Zotero for Firefox and Zotero Standalone have been replaced by a single standalone application for users of all browsers. A new Zotero Connector for Firefox, similar to the extensions available for Chrome and Safari, allows saving to the Zotero application or in a single click.

If you’re using Zotero for Firefox, you’ll be automatically upgraded to the new Zotero Connector for Firefox in the next few weeks, and you’ll need to install the standalone Zotero application — either 4.0 or 5.0 — to access your data going forward. The Zotero application offers the same interface and runs off the same database as Zotero for Firefox, so you’ll be able to pick up right where you left off.

If you’re already using Zotero Standalone, you can continue using Zotero as you were before — with some new features available in the Zotero Connector for your browser.

We know that many people rely on Zotero for Firefox, so we wanted to take a moment to explain why this is happening and what it means for the future of Zotero.

The primary reason for this change is technical, and, unfortunately, out of our control: Mozilla is discontinuing the powerful extension framework on which Zotero for Firefox is based in favor of WebExtensions, a new framework based on the Chrome extension model. WebExtensions offer many advantages, including improved browser performance, improved security, a fine-grained permissions model, and the ability to create a single extension that runs in both Chrome and Firefox (as we’ve done with the Zotero Connector). The trade-off is that it’s no longer possible to create an extension like Zotero for Firefox that affects the browser in more profound ways (adding an entirely new pane or tab, creating native-looking windows, adding options to the open/save dialog, etc.) or that uses low-level features of Firefox to manage a database, access the filesystem, or run local programs.

But while we’re not able to continue offering the full version of Zotero for Firefox, we think that this change will ultimately benefit the Zotero ecosystem going forward. The Zotero interface will no longer need to fit into a small browser pane, allowing for a much richer user experience. Offering a single version will mean that documentation and instruction can be greatly simplified. And most importantly, Zotero developers will be able to spend less time maintaining separate versions and responding to Firefox changes and more time improving Zotero for everyone.

In the lead-up to Zotero 5.0, we’ve worked hard to add features that were previously available only in Zotero for Firefox, such as institutional proxy support, to the existing Chrome and Safari connectors and the new Zotero Connector for Firefox. Those changes are documented in a separate post. As Mozilla and Chrome add additional capabilities to their extension frameworks, we’ll continue to add new features to the connectors.

When we launched Zotero a little over 10 years ago, the Firefox extension framework allowed us to create a new type of research tool that lived where people worked, in the browser itself. Today, we think we can offer the best of both worlds — unparalleled extensions for every browser, all connecting to a powerful standalone app. While it’s tough to say goodbye to the original version of Zotero, we couldn’t be more excited about Zotero’s future, beginning with Zotero 5.0. Here’s to the next 10 years.

Zotero: Zotero 5.0

planet code4lib - Mon, 2017-07-10 23:50

We’re delighted to announce the release of Zotero 5.0, the next major version of Zotero and the biggest upgrade in Zotero’s history. Zotero 5.0 brings many new features, as well as a huge number of changes under the hood to improve Zotero’s responsiveness and stability and lay the groundwork for other big developments coming soon. We’ll be highlighting some of the new features in upcoming posts, but for now see the changelog for more details on all that’s new.

Download Zotero 5.0 now to get started with the new version.

If you’re already using Zotero Standalone 4.0, you’ll be offered the choice to upgrade within the next few weeks, or you can update now via Help -> Check for Updates. (Windows users may wish to reinstall from the download page instead to rename the program from “Zotero Standalone” to “Zotero”.) Your database will be automatically upgraded to work with the new version.

If you’re using Zotero 4.0 for Firefox, be aware that Zotero 5.0 now runs only as a standalone application, and a new Zotero Connector for Firefox replaces the full Firefox extension. We’ve written a separate post explaining this change. Existing Zotero for Firefox users will be automatically upgraded to the Zotero Connector for Firefox within the next few weeks and will need to install Zotero 5.0 to continue accessing their Zotero data locally. If you install Zotero 5.0 now, be sure to install the Zotero Connector for Firefox from the download page as well.

Thanks to everyone from the Zotero community who has helped test Zotero 5.0 over the last year and get it ready for today. We’re excited to finally share it with the world!

District Dispatch: First key vote on FY18 library funding set for July 13

planet code4lib - Mon, 2017-07-10 23:06

Earlier today, the House Appropriations Subcommittee that oversees library funding announced that it will meet and vote this Thursday afternoon, July 13 at 4:30 p.m., to consider a key vote on a large spending bill that will save, trim or totally wipe out funding for the Institute of Museum and Library Services (IMLS) and funding for the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL). That means that if you happen to live in one of the 13 congressional districts (in 12 states) represented by a member of this powerful and pivotal Subcommittee, you also have incredible power for the next 48 hours to help determine what happens to IMLS, LSTA and IAL funding for FY 2018.

This Thursday will be the first key vote in the House of Representatives related to federal library funding in FY2018. Photo credit: CT Health Notes blog

Please, use that power now. To find out if you’re a key constituent, enter your zip code at ALA’s Legislative Action Center. If you are, the Center is set to help you quickly send an email to or Tweet at your Representative. (If you’re not in one of these 13 congressional districts, please don’t be discouraged; we will definitely need you later!)

The message is simple: “Save IMLS. Give LSTA $186.6 million and IAL $27 million in FY18. Thank you!”

That’s it. There will be more votes in other committees later, but this first one could set the stage for all others. We have until Thursday afternoon to convince 13 key members of Congress to do the right thing. Now’s really the time to fight for libraries!

The post First key vote on FY18 library funding set for July 13 appeared first on District Dispatch.

Islandora: Islandora CLAW: Check out the MVP Sandbox!

planet code4lib - Mon, 2017-07-10 16:38

At Open Repositories 2017 in Brisbane, Australia, the Islandora project unveiled a brand new sandbox site where you can take Islandora CLAW for a test drive, with a few test collections and theming set up by our awesome Islandora UI Interest Group. Check it out at and login with:

  • User: sandbox
  • Pass: Cl@wS@ndb0x

This sandbox is, like Islandora CLAW itself, still something of a work in progress, so you can expect to see new functions and configurations appear over the next few weeks, and a lot of changes to come as we work with the community to determine development priorities post-MVP (for more background on that process, check out the CLAW FAQ). But for now, please come in and kick the tires and get a feel for Drupal 8 with Fedora 4. Please let us know about bugs, improvements, and new features you'd like to see by filing an issue in GitHub (not sure how? check out for instructions).

Some differences you'll notice from the old Islandora:

  • "Add Content" - just like you're making a Drupal page. There is no special menu to add new Islandora objects in CLAW. Add them from any part of the site and indicate the parent, instead of having to navigate to the collection you want the object created in.
  • Your objects are nodes. Don't like the standard collection view? Go ahead and use Drupal Views to build whatever you like.
  • No XML Form Builder. Want to edit your metadata forms? You can use the regular Drupal form builder for each content type to add and edit fields, manage displays, and even map fields to RDF.

A few known issues that aren't quite as we'd like them yet:

  • Derivative creation is still being developed, so you have to add your own thumbnails and medium-sized files.
  • You can only use jpegs for the MEDIUM_SIZE datastream. This is a quick fix, but it didn't quite make it into the sandbox, so expect to see this corrected in the near future.

We will periodically wipe and restore the sandbox, so don't put anything there that you want to keep. If you would like your own enduring copy of Islandora CLAW to play with, you can build your own virtual machine with the Islandora CLAW Vagrant.

You can see slides from the sessions where we launched the next phase of CLAW here:

Eric Hellman: Creative Works *Ascend* into the Public Domain

planet code4lib - Mon, 2017-07-10 16:26

It's a Wonderful Life, the movie, became a public domain work in 1975 when its copyright registration was not renewed. It had been a disappointment at the box office, but became a perennial favorite in the 80s as television stations began to play it (and play it again, and again) at Christmas time, partly because it was inexpensive content. Alas, copyright for the story it was based on, The Greatest Gift by Philip Van Doren Stern, HAD been renewed, and the movie was thus a derivative work on which royalties could be collected. In 1993, the owners of the story began to cash in on the film's popularity by enforcing their copyright on the story.

I learned about the resurrection of Wonderful Life from a talk by Krista Cox, Director of Public Policy Initiatives for ARL (Association of Research Libraries) during June's ALA Annual Conference. But I was struck by the way she described the movie's entry into the public domain. She said that it "fell into the public domain". I'd heard that phrase used before, and maybe used it myself. But why "fall"? Is the public domain somehow lower than the purgatory of being forgotten but locked into the service of a copyright owner? I don't think so. I think that when a work enters the public domain, it's fitting to say that it "ascends" into the public domain.

If you're still fighting this image in your head, consider this example: what happens when a copyright owner releases a poem from the chains of intellectual property? Does the poem drop to the floor, like a jug of milk? Or does it float into the sky, seen by everyone far and wide, and so hard to recapture?

It is a sad quirk of the current copyright regime that the life cycle of a creative work is yoked to the death of its creator. That seems wrong to me. Wouldn't it be better use the creator's birth date? We could then celebrate an author's birthday by giving their books the wings of an angel. Wouldn't that be a wonderful life?

Harvard Library Innovation Lab: LIL Talks: Comedy

planet code4lib - Mon, 2017-07-10 16:12

This is a guest post by our LIL interns — written by Zach Tan with help from Anna Bialas and Doyung Lee

This week, LIL’s resident comic (and staff member) Brett Johnson taught a room full of LIL staff, interns, and fellows the finer intricacies of stand up comedy, which included the construction of a set, joke writing, and the challenges and high points of the craft.

As one example, Brett showed and broke down multiple jokes into the core structure of setup and punch line (or, platform and dismount) for analysis. Additionally, we were also given an insight into the industry where we often take for granted the sheer amount of work, honing, and refining that goes into a set.

We also explored what it meant to be a comic, and how the immediacy of audience reaction and enjoyment means that stand up comedy is one of the only art forms with an extremely evident (and sometimes, brutal) line between success and failure.

Though the talk was littered with choice jokes and funny bits, we definitely came away with a refreshing look into some aspects of stand-up comedy that rarely goes noticed.

Islandora: Islandora CLAW Calls: Alternating Schedule Starts this Week

planet code4lib - Mon, 2017-07-10 15:46

This is just a reminder that this week will mark the first time we're using our alternating schedule for CLAW Calls, alternating between 9:00 AM EST to 1:00 PM EST each week, so that we can welcome Islandora users from more time zones. This week's meeting will be at 9:00AM EST on July 12th. Agenda here.

If this is your first CLAW Call (or your first CLAW Call in a while), don't worry. We'll start with a long recap and update to get you up to speed. We hope to see you there!

LITA: Call for Panelists – 2018 ALA LITA Top Trends Program

planet code4lib - Mon, 2017-07-10 15:45

We are currently seeking nominations for panelists for the 2018 ALA Midwinter LITA Top Tech Trends program in Denver, CO!  You may nominate yourself or someone you know who would be a great addition to the panel of speakers.

LITA’s Top Trends Program has traditionally been one of the most popular programs at ALA. Each panelist discusses two trends in technology impacting libraries and engages in a moderated discussion with each other and the audience.

Submit your nominations at is Saturday, September 30, 2017.

The LITA Top Tech Trends Committee will review each submission and select panelists based on their proposed trends, experience, and overall balance to the panel. Submission Guidelines can be found at

For more information about the Top Tech Trends program, please visit


Brown University Library Digital Technologies Projects: Using synonyms in Solr

planet code4lib - Mon, 2017-07-10 14:43

A few days ago somebody reported that our catalog returns different results if a user searches for “music for the hundred years war” than if the user searches for “music for the 100 years war”.

To handle this issue I decided to use the synonyms feature in Solr. My thought was to tell Solr that “100” and “hundred” are synonyms and they should be treated as such. I had seen a synonyms.txt file in the Solr configuration folder and I thought it was just a matter of adding a few lines to this file and voilà synonyms will kick-in. It turns out using synonyms in Solr is a
bit more complicated than that, not too complicated, but not as straightforward as I had thought.

Configuring synonyms in Solr

To configure Solr to use synonyms you need to add a filter to the field type where you want synonyms to be used. For example, to enable synonyms for the text field in Solr I added a filter using the SynonymFilterFactory in our schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer> </fieldType>

You can add this filter for indexing, for querying, or both. In the example above I am only configuring the use of synonyms at query time.

Notice how the SynonymFilterFactory references a synonyms.txt file. This text file is where synonyms are defined. Notice also the expanded=true setting.

File synonyms.txt accepts the list of synonyms in two formats. The first format is just a list of words that are considered synonyms, for example:


With this format every time Solr see “100” or “hundred” in a value it will automatically expand the value to include “100” and “hundred”. For example, if we were to search for “music for the hundred years war” it will actually search for “music for the 100 hundred years war”, notice how it now includes both variations (100 and hundred) in the text to search. The same will be true if we were to search for “music for the 100 years war”, Solr will search for both variations.

A second format we can use to configure synonyms is by using the => operator to consolidate various terms into a different term, for example:

 100 => hundred

With this format every time Solr sees “100” it will replace it with “hundred”. For example if we search for “music for the 100 years war” it will search for “music for the hundred years war”. Notice that in this case Solr will include “hundred” but drop “100”. The => in synonyms.txt is a shortcut to override the expand=true setting to replace the values on the left with the values on the right side.

Testing synonym matching in Solr

To see how synonyms are applied you can use the “Analysis” option available on the Solr dashboard page.

The following picture shows how this tool can be used to verify how Solr is handling synonyms at index time. Notice, in the highlighted rectangle, how “hundred” was indexed as both “hundred” and “100”.

We can also use this tool to see how values are handled at query time. The following picture shows how a query for “music for the 100 years war” is handled and matched to an original text “music for the hundred years war”. In this particular case synonyms are enabled in the Solr configuration only at query time which explains why the indexed value (on the left side) only has “hundred” but the value used at query time has been expanded to included both “100” and “hundred” which results in a match.

Index vs Query time

When configuring synonyms in Solr is important to consider the advantages and disadvantages of using them at index time, query time, or both.

Using synonyms at query time is easy because you don’t have to change your index to add or remove synonyms. You just add/remove lines from the synonyms.txt file, restart your Solr core, and the synonyms are applied in subsequent searches.

However, there are some benefits of using synonyms at index time particularly when you want to handle multi-term synonyms. This blog post by John Berryman and this page on the Apache documentation for Solr give a good explanation on why multi-term synonyms are tricky and why applying synonyms at index time might be a good idea. An obvious disadvantage of applying synonyms at index time is that you need to reindex your data for changes to the synonyms.txt to take effect.

Terry Reese: MarcEdit 7 Main Window Wireframes and other notes

planet code4lib - Mon, 2017-07-10 14:35

I’ve been thinking about the new UI for MarcEdit 7.  I haven’t decided yet if the main window should have the ribbon or keep the menus (menus seem most appropriate for the MARC Tools and Main Window) – but the main thing I wanted to do with the new UI in MarcEdit 7, is to try and find ways to surface tools based on common questions/actions; as well as push the last used tools up.  I’ll keep the user defined buttons (I like those, I use them all the time).  One of the other things I’ll likely end up doing, presently I keep the MARCNext toolset framed in the main windows (as well as the about window). I’ll be pushing those into their own windows as the way it currently works – it complicates updates.  Also, all fonts will be updated from 8.5 point (the default) to 10.5 pt.  I’d like to set the default font to the Google Noto Fonts, but distributing the font is out of the question (the font set is 450 mbs in total — but maybe I can include something in the installer to allow users to select this font for download if they want…I’ll have to think about it).  With that, I’ll be improving the accessibility functionality so that users can continue to easily update font sizes.  In fact, I’ll be changing the window that shows when you first install to be a series of questions (rather than showing the preferences).  The questions will be:

  1. Preferred Font/Size: I’ll show current settings and sample of typography
  2. MARC Flavor:  You tell me are you using MARC21 or something else
  3. Default Z39.50/SRU servers (you’ll have a list of known servers to select from this way you have servers in the tool at the beginning)
  4. Link to the Tutorials/Help


After this, you’ll have the option to select the preferences and update all of the options.  But I’m looking for ways to make this easier so when users install MarcEdit 7 for the first time, you don’t have to look for specific settings (specifically fonts).

Feedback is welcome.



  • * Note — this wireframe is for the windows/Linux version.  Some of these concepts will make it to the MarcEdit Mac version — but I try to keep that development in line with Apples UI recommendations when possible.

FOSS4Lib Recent Releases: VuFind - 4.0

planet code4lib - Mon, 2017-07-10 13:45

Last updated July 10, 2017. Created by Demian Katz on July 10, 2017.
Log in to edit this page.

Package: VuFindRelease Date: Monday, July 10, 2017

HangingTogether: Visualizing Digital Humanities Workshop

planet code4lib - Mon, 2017-07-10 12:00

Last month I and four OCLC colleagues—William Harvey, Rob Koopman, Shenghui Wang, and Jeff Young—participated in a “Visualizing Digital Humanities” workshop held at the Lorentz Center at the University of Leiden (photo at right).  It brought together experts who had created “research datasets”, scholars, and visualization experts. Two of the four datasets were created by OCLC Research:

  • “Translation History of Works”, a linked data dataset consisting of bibliographic information about five works and all their associated translations extracted from WorldCat and enhanced by data retrieved from Wikidata.
  • “Semantic Maps of Research Disciplines”, consisting of clusters of documents in three categories, Digital Humanities, Digital Libraries, and Digital Curation.

The workshop was divided into plenary sessions where experts gave talks about their particular areas and seven working sessions where visualization experts and scholars collaborated with the dataset creators to come up with visualizations. I presented Challenges of Multilingualism, which focused on translations, as that is how we learn about other cultures and other cultures learn about us. I summarized OCLC Research’s exploration of enriching WorldCat records with data extracted from Wikidata to associate translations to the original work when that information is not included in the MARC records. (The full program is here).

We all enjoyed collaborating with experts during the five-day workshop. To sum up our experiences:

  • The opportunity to work in the same room, on the same problems, with some of the world’s experts was immensely For our translation dataset, our scholar noted the importance of including commentaries and interpretations (which we had excluded) along with different editions of the original and translations.
  • Visualization helps minds efficiently synthesize and transmit knowledge. Continuous conversations about the purpose of the visualization is critical so that the design is not led astray.
  • Team dynamics have a big impact on projects, especially in time-limited sprints. Face-to-face discussions were crucial to our efficiency. We were all impressed how quickly teams had something to show everyone!
  • Visualization researchers are not engineers—building a tool is not their focus. We were surprised at the reluctance for sharing modifications to open source software in this community.
  • For digital humanities, any visualization tool that helps researchers to locate source data that warrant “close reading” is valuable.

We returned with ideas for possible future research and papers we might publish about our work.

District Dispatch: Congratulations to Marijke Visser

planet code4lib - Mon, 2017-07-10 11:16

Associate Director and Senior Policy Advocate Marijke Visser (center) speaking on a panel about E-rate alongside her colleagues at ALA’s most recent National Library Legislative Day.

I had a great experience at the recent Conference in Chicago hope you did, too. There is such positive energy being amongst thousands of library folks dedicated to advancing services in their respective communities, the library profession and ALA. A particular highlight for me was announcing Marijke Visser’s promotion to Associate Director and Senior Policy Advocate at the Friday meeting of the OITP Advisory Committee.

Under Marijke’s leadership, ALA’s engagement and profile in libraries and coding have blossomed. The most recent development is Google’s sponsorship of $500,000 of funding to ALA to provide grants to libraries for coding initiatives, with a focus on promoting youth engagement by girls and other underrepresented groups in computer science-related educational and career paths. But this development is only the latest installment in a multi-year initiative with Google that includes an in-process project on MLS education and coding and an initial project to characterize the lay of the land.

Marijke’s work also serves to strengthen our collaboration with ALA divisions. In the latest effort, YALSA will administer the grant program (the request for proposals will be released in the next few weeks: please apply!). AASL, ALSC, YALSA and OITP will comprise the selection committee.

Through Marijke’s initiative, we also are collaborating with Rosen Publishing. Already one advocacy video for Libraries Ready to Code has been produced through the generosity of Rosen Publishing and more is in the works. In addition, we have a new collaboration with the National Center for Women & Information Technology and a budding one with another key organization in coding (stay tuned!).

Last, but certainly not least, is Marijke’s leadership in advocating for libraries within the E-rate program. Her work with the Federal Communications Commission (FCC) helped lead to an increase in annual E-rate funding from $2.4 billion to $3.9 billion. We expect E-rate to rise on the FCC policy agenda once net neutrality is addressed, so work has begun on our strategy for advocacy this fall and into 2018. Marijke’s reputation also has developed within the world of telecommunications policy more broadly, as illustrated by her invitation to serve on an advisory committee of NTCA—the Rural Broadband Association.

I hope that you will join me in congratulating Marijke on this well-deserved recognition.

The post Congratulations to Marijke Visser appeared first on District Dispatch.

Terry Reese: MarcEdit 6.3 Windows Update

planet code4lib - Mon, 2017-07-10 04:32

As noted this week – I’ve updated MarcEdit 6.3. The updates are as follows:


* Enhancement: Exact Match searching in the Extract, Delete Selected Records tool

* Enhancement: Exact Match searching in the Find/Replace Tool

* Enhancement: Work updates in the Linked data tool to support the new MAC proposal

* Update: Performance improvements in the editor for loading large files faster. This was planned for MarcEdit 7, but I decided to make the change so that the final versions that support XP include this work.

* Update: Context Search additions/improvements

* Bug Fixes including: API updates (streaming function for exporting tab delimited was throwing an error), merge update when using 022$a as a query index, etc.)

* Plugin-framework updates: This requires making a change to the Interfaces that allows plugins and marcedit to speak to each other. I’ll be updating all plugins as a result. Newer versions of MarcEdit will update your plugins automatically

* Accessibility changes (couple forms weren’t scaling correctly with large fonts, large resolutions – this has been corrected)

Please note *you must update any plugins* after this update. If you don’t update, you must *decline* the plugin updates when offered. I had to update the plugin framework, which includes the Interface host file. This should mean anything to anyone, but the gist is, I had to change an assembly signature so once the update happens, you will have to update your plugins. I believe I’ve updated all the plugins that are currently in use. If I’ve missed something, let me know.

Updates are available via the automatic updates or from

Questions, let me know.


DuraSpace News: EXTENDED: Early Bird Discount for Fedora and Samvera Camp

planet code4lib - Mon, 2017-07-10 00:00

From the organizers of Fedora and Samvera Camp at Oxford

DuraSpace and Data Curation Experts are set to offer the Fedora and Samvera Camp at Oxford University, Sept 4 - 8, 2017. The camp will be hosted by Oxford University Oxford, UK and is supported by Jisc. The organizers are pleased to announce that the early bird discount registration price has been extended until July 24.


Subscribe to code4lib aggregator