You are here

Feed aggregator

District Dispatch: Happy 72nd birthday to the Washington Office

planet code4lib - Sun, 2017-10-01 19:07

In 1945, the ALA announced the establishment of the Washington Office. It began operation 72 years ago today on October 1. The Washington Office was charged with educating and working with legislators and public officials to obtain funding and policies that benefit libraries and public access to information. In addition, the Washington Office was—and continues to be—responsible for making official comments on proposed regulations and advocating for legislation that supports libraries and library service through the press and personal contacts, in cooperation with state and local library agencies.

We are grateful to ALA’s members for their steadfast engagement and for allowing us the privilege of being ALA’s voice in Washington for over seven decades.

Here are seven ways you can celebrate our birthday this week:

  1. Sign up to receive ALA’s advocacy alerts at
  2. Let us know about your recent meetings with your elected officials.
  3. Browse the Washington Office’s newsletters in the archives.
  4. Check out archival photos of our leaders over the years.
  5. Save the date for National Library Legislative Day 2018—May 7 and May 8, 2018
  6. Follow ALA on Twitter and send the Washington Office a message via the hashtag #ALAWO.
  7. Speaking of archives, do you have some history (photos, postcards, documents, stories, etc.) related to the Washington Office? Send us a note at and share what you have.

The post Happy 72nd birthday to the Washington Office appeared first on District Dispatch.

Access Conference: How did we do?

planet code4lib - Sat, 2017-09-30 10:00

Thanks for coming to Access 2017. We hope you had a good time at the conference and in Saskatoon. While you are waiting at the airport please take a few minutes to fill out the post-conference survey and let us know what you liked, what you would like to see next year, and where we missed the mark.  This information really helps out the organizers for next year.

Take the survey (it’s really quick, we promise)

District Dispatch: Libraries again oppose unneeded, risky Section 108 update

planet code4lib - Fri, 2017-09-29 22:31

As reported last month, ALA and the other members of the Library Copyright Alliance (LCA) have been scrutinizing the Copyright Office’s extensive new analysis of and recommendations for statutory changes to Section 108 of the Copyright Act. Section 108 (a.k.a. “the library exception”) allows libraries to make copies for preservation and other purposes, including interlibrary loan. The report was released on September 15.

In a statement released last Friday, LCA commended the Copyright Office for a thorough report and its balanced and well-reasoned legislative suggestions for updating section 108. Of special note was the Copyright Office’s strong, unequivocal rejection of arguments long made by some commercial stakeholders that libraries may not rely on both section 108 and fair use to undertake section 108-related activities. The Office couldn’t have been clearer in its conclusion that “it is essential that the fair use savings clause stay in section 108.”

That said, LCA nevertheless reiterated its prior calls on Congress not to take up section 108 reform and instead address other, more pressing copyright matters. In support of its position, LCA again cited strong fair use court decisions and the uncertainties inherent in a political legislative process that ultimately could weaken libraries’ rights. Notwithstanding LCA’s objections, many in Washington consider it probable that section 108 (and possibly other targeted) copyright reform legislation will be introduced later this year. If and when it is, the help of library supporters may be needed to remind Congress of libraries’ “thanks, but no thanks” perspective on section 108.

The post Libraries again oppose unneeded, risky Section 108 update appeared first on District Dispatch.

District Dispatch: ALA advocacy panel at New York Comic Con

planet code4lib - Fri, 2017-09-29 13:00

This year, New York Comic Con (NYCC) is teaming up with The New York Public Library to expand offerings for educators and librarians by providing them with a space for panels, networking, and workshops on Thursday, October 5.

ALA has organized a panel that will take place in the Celeste Auditorium at 10:30 a.m. The panel, Citizen-Centric Library Advocacy: Building People Power for Your Branch, will offer a space to talk about how library staff and patrons can use personal stories to advocate for their library. The following panelists will join us:

Ricci Yuhico
Managing Librarian, Young Adult Services, The New York Public Library (NYPL)

Ricci is a recent transplant to NYC from Miami. In the daytime, she is the Managing Librarian for Young Adult Services in NYPL’s Mid-Manhattan Library. Outside of work, she is the Advocacy Chair for Urban Librarians Unite. In previous experiences in the political realm, she co-founded a political action committee in 2013 to 2015, CALM, Community Advocates for Libraries in Miami, with other librarians and library advocates and advocated heavily for fully-funding the Miami-Dade Public Library System. CALM also partnered with local organizations and stakeholders to form the Coalition to Save Our Libraries.

Davis Erin Anderson
Program Manager, Technology, Culture, and Libraries, Metropolitan New York Library Council (METRO)

Davis Erin Anderson’s work focuses on the cultural impact of internet technologies as they effect libraries. Davis manages projects like METRO’s contribution to the Mozilla Web Literacy for Library Staff pilot program and NYC Digital Safety, a new initiative that will train frontline library staff on internet privacy and security. Her work includes planning, developing, and recruiting trainers for a wide range of professional development events in the library field and beyond.

Robin Lester Kenton
Vice President, Marketing & Communications, Brooklyn Public Library (BPL)

Robin is the Vice President of Marketing & Communications at BPL, the nation’s fifth-largest public library system. She is charged with overseeing most of BPL’s public-facing communications, including social media, print materials, website management, press relations and email marketing. Her team promotes library services and materials and encouraged Brooklynites to become more involved in the success of their local branch. Prior to her role at BPL, Robin served as the Director of Strategic Communications for the NYC Department of Transportation, driving community involvement in local projects such as the planning and rollout of the city’s bike share program.

Nicholas Higgins
Director, Outreach Services, Brooklyn Public Library (BPL)

Nicholas is the Director of Outreach Services at BPL, where he oversees services for older adults, correctional services, immigrant services and outreach to individuals and families experiencing homelessness. From 2009 to 2013, he oversaw NYPL’s Correctional Services Department, developing several innovative programs, including mobile libraries at city jails, an early literacy and book-recording program for incarcerated parents, and a 12-week literature class offered for men in federal prison. He received a MLS from the Pratt Institute and a BA in British Literature from Hunter College.

Hours for Comic Con at NYPL are 9 a.m. to 5:30 p.m. at NYPL’s Stephen A. Schwarzman Building (enter at 42nd Street near 5th Avenue). This is a free, ticketed Comic Con event for librarians, teachers, and educators; a NYCC badge valid for Thursday is required for entry.

The post ALA advocacy panel at New York Comic Con appeared first on District Dispatch.

Library of Congress: The Signal: Introducing Beyond Words

planet code4lib - Thu, 2017-09-28 18:30

As a part of Library of Congress Labs release last week, the National Digital Initiatives team launched Beyond Words. This pilot crowdsourcing application was created in collaboration with the Serial and Government Publications Division and the Office of the Chief Information Officer (OCIO) at the Library of Congress. In our first week and a half, we’ve hosted nearly 1,300 volunteers and marked over 30,000 pictures in historic newspaper pages. In this post, we explore the goals, background, workflows, possibilities, and more on our progress so far with Beyond Words.

Beyond Words Goals and Background

You’ll find Beyond Words is part of our recently launched within our Experiments section. As a pilot, the main goal of Beyond Words is to identify and caption pictures in newspaper pages to create public domain data for researchers to use. The crowdsourced data that are collaboratively generated in Beyond Words are released into the public domain, then available for download as JSON data and exploration in a public gallery.

Beyond Words Picture Gallery – Search and Filter

Our secondary goal is to generate feedback about the workflow, instructions, and resulting data. Beyond Words may change quickly and will continue to serve as an experimental application. The pilot is also an opportunity to continue to learn from and apply lessons from other cultural heritage institutions with established transcription programs such as the U.S. National Archives and Records Administration Citizen Archivist and the Smithsonian Institution Transcription Center, as well as examples from the Library including Flickr Commons. Beyond Words further allows us to observe activity and pain points as we begin the design of our forthcoming transcription and tagging platform.

Building Beyond Words

Beyond Words is a web-based application that was developed as an Innovator-in-Residence project by Library of Congress OCIO developer Tong Wang. Beyond Words is an open source crowdsourcing pilot built as an instance of Scribe, the NEH-funded collaboration between the New York Public Library and Zooniverse. You can learn more about our implementation of Scribe on GitHub and watch for updates.

The newspaper pages that are marked and transcribed in Beyond Words are selected from Chronicling America. Chronicling America is a dynamic project that currently supports over 12 million newspaper pages from 40 states, with new papers added every day. Since we designed Beyond Words as a pilot, we needed to hone in on a focused set of newspapers. We targeted the centennial commemoration of World War I and limited our range to the U.S. declaration of war through the cessation of hostilities, 06 April 1917 to 11 November 1918. Since new pages are added each day, we also limited our data set to what was available in Chronicling America, in the date range, as of 14 September 2017.

Jumping in: Tasks & Tips 

How does Beyond Words work? First: No log in! Secondly, you’ll need to know what we’re seeking. We ask that you mark pictures and transcribe the title, caption, and cutline when present; you’ll also categorize the picture type and make a note of the artist, if present. We use the word “pictures” in the instructions to include photographs, illustrations, editorial cartoons, comics, and maps. However, we are excluding advertisements–despite interesting content that lasts–in this pilot newspaper set.

On Beyond Words, you can get started right away by selecting one of three steps: mark, transcribe, verify. At least two people must agree in their task in each step; matching marks and transcriptions to skip the verify step. If inconsistencies emerge, the best transcription, category, and artist (if present) is selected by volunteers in the verify step. Our tutorial shows how to break out the title, caption, and cutline–watch for all three, plus category AND artist as you verify.

Transcribing Captain Wickersham’s Promotion

We ask that you take your time as you work to carefully identify the pictures. Pages without images should be marked “Done.” Some of the older photographs may look like illustrations; watch for mix ups of illustration and map. Also keep in mind that the artist is often included in very small print. Common photographers include Underwood & Underwood and Harris & Ewing. You’ll see comics from A.D. Condo, Hop, and W. R. Allman.

Want additional hints? This application works best on a desktop or laptop with a mouse. Zoom in using your keyboard or the zoom tool. You can also begin your Beyond Words activity in a preferred state from the home page. Reminders of instructions are found in the “View A Tutorial” section, as well as the FAQ. Want to transcribe a picture right after you mark it? Select “Transcribe this page now!” And at any point in any of the three steps, you can view the original page in Chronicling America.

We invite you to have fun and do your best; the newspapers are fascinating but marking and transcription isn’t always easy. Remember to take breaks and send us feedback! If you are inspired by what you are learning while using Beyond Words, you can explore Library of Congress World War I collections.

Doors to Discoveries

What might a volunteer discover while marking, transcribing, and verifying newspaper pictures? Certainly many social and cultural changes that marked the Great War era. On 05 January 1918, you’ll see “Women Performing Hard Tasks of Men in Big Chemical Plants” and “Capable Women and their doings” in Ogden, Utah. Another page reveals a significant victory of Florence Ellinwood Allen: successfully defending a women’s suffrage amendment to the charter of East Cleveland before the Supreme Court of Ohio.

Verifying Miss Allen’s Victory before Ohio Supreme Court

There are also views into African American papers like the Nashville Globe, established in response to the extension of Jim Crow to Nashville’s city transportation system; the paper began as a means of documenting black business owners and their attempts to establish an alternate streetcar system. The Nashville Globe ran from 1906 to 1960.


We’re continuing to seek and receive feedback on Beyond Words on formatting text, improving accessibility, extending the volunteer experience, greater precision around identifying artists, and more. We hope that educators, researchers, and artists will take advantage of the ability to group image collections by time frame, such as identifying all historic cartoons appearing in World War I era newspapers. If you create something with the data set, tweet us and use the hashtag #BuiltwithLC.

With over 1,200 images waiting to be verified, we could use your help! Thanks in advance for joining us and for your feedback; we’ll share what we’re learning again soon.

Access Conference: Are you as smart as an Access organizer?

planet code4lib - Thu, 2017-09-28 15:00

If you are sticking around Saskatoon on Friday, join us for a very special Access Trivia Night at Amigos Cantina (home of the best burritos in town). Trivia gets started at 9pm in the Amigos backroom (turn left and go past the washrooms). Teams of up to 4, but you don’t need to find a team to play. Just turn up and we will find you a group. Everyone is welcome (you don’t have to be an Access attendee) – make a new friend and invite them! Saskatoonians are friendly like that.

$5 each to play and there will also be a 50/50 draw with proceeds going toward the diversity scholarships for the 2018 conference. We also have some coveted prizes for the winning team (in addition to bragging rights).

Your Access 2017 organizers have been spending as much time quizzing each other with Trivial Pursuit cards from the 1970’s as we have planning this conference so I hope you are up to the challenge.

If you are a punk fan (or a fan of supporting local independent radio) Amigos also has a line of 3 prairie punk bands – Chunder Buffet, Wint, and Owners –  starting at 10pm after trivia – $7 at the door.


David Rosenthal: Web DRM Enables Innovative Business Model

planet code4lib - Thu, 2017-09-28 15:00
Earlier this year I wrote at length about the looming disaster that was Web DRM, or the W3C's Encrypted Media Extensions (EME). Ten days ago, after unprecedented controversy, a narrow majority of W3C members made EME official.

So now I'm here to tell you the good news about how the combination of EME and the blockchain, today's sexiest technology, solves the most pressing issue for the Web, a sustainable business model. Innovators like the Pirate Bay and Showtime are already experimenting with it. They have yet to combine it with EME and gain the full benefit. Below the fold, I explain the details of this amazing new business opportunity. Be one of the first to effortlessly profit from the latest technology!

The Web has two traditional business models. As I wrote back in March, both are struggling:
  • Paywalled content. It turns out that, apart from movies and academic publishing, only a very few premium brands such as The Economist, the Wall Street Journal and the New York Times have viable subscription business models based on (mostly) paywalled content. Even excellent journalism such as The Guardian is reduced to free access, advertising and voluntary donations. ...
  • Advertising-supported content. The market for Web advertising is so competitive and fraud-ridden that Web sites have been forced into letting advertisers run ads that are so obnoxious and indeed riddled with malware, and to load up their sites with trackers, that many users have rebelled and use ad-blockers. ...
The innovative third business model that sites are starting to use is to mine cryptocurrency in the reader's browser, using technology from Coinhive. TorrentFreak estimated that The Pirate Bay could make $12K/month in this way.

The problem with this approach is twofold. First, it annoys the readers by consuming CPU:
Needless to say, the reaction has not been good -- even from the Pirate Bay's own moderators. Over on Reddit, there are complaints about "100% CPU on all 8 threads while visiting TPB," and there are also threads on the PirateBay Forum.BleepingComputer tested a Chrome extension that used Coinhive and reported:
The impact on our test computer was felt immediately. Task Manager itself froze and entered a Not Responding state seconds after installing the extension. The computer became sluggish, and the SafeBrowse Chrome extension continued to mine Monero at all times when the Chrome browser was up and running.

It is no wonder that users reacted with vitriol on the extension's review section. A Reddit user is currently trying to convince other users to report SafeBrowse as malware to the Chrome Web Store admins Second, it is easy for annoyed readers to see the cause of their problems:
The code in question is tucked away in the site’s footer and uses a miner provided by Coinhive. This service offers site owners the option to convert the CPU power of users into Monero coins.

The miner does indeed appear to increase CPU usage quite a bit. It is throttled at different rates (we’ve seen both 0.6 and 0.8) but the increase in resources is immediately noticeable. Then it is easy for them to disable the cryptocurrency miner:
noscript will block it from running, as will disabling javascript.Ad-blockers have rapidly adapted to this new incursion:
At least two ad blockers have added support for blocking Coinhive's JS library — AdBlock Plus and AdGuard — and developers have also put together Chrome extensions that terminate anything that looks like Coinhive's mining script — AntiMiner, No Coin, and minerBlock. So, is this new business model doomed to failure? No! This is where EME comes in. The whole goal of EME is to ensure that the reader and their browser neither know what encrypted content is doing, nor can do anything about it. All that is needed for robust profitability is for the site to use EME to encrypt the payload with the cryptocurrency miner. The reader and their browser may see their CPU cycles vanishing, but they can't know why nor be able to stop it. Is this brilliant, or what?

District Dispatch: Exploring the Washington Office special collections

planet code4lib - Thu, 2017-09-28 13:30

This is a guest post from Andrew Staton, our fall special collections intern joining us from the University of Maryland. Andrew is a genealogist and budding archivist with two semesters left at UMD. He graduated from the College of Charleston in Charleston, South Carolina with a Bachelor of Arts in history and a Bachelor of Arts in historic preservation and community planning.

My first projects as an intern in the ALA Washington Office have focused on three of the Washington Office’s archival collections – two that contain mostly bound materials such as books and reports, and one photographic collection representing events and staff from the office’s nearly seventy-five-year history.

The historical theme of library advocacy in Washington is visible throughout all three of these special collections. The photographs, for example, depict the history of the Washington Office and a variety of events connecting libraries and the government from the 1950s to the 2000s. These photos visually place the Washington Office at the forefront of library advocacy throughout its history, particularly through former directors Julia Bennett, Germaine Krettek and Eileen Cooke—the combined tenure of whom spans over forty years.

I think the moments captured in the photographs are important for two reasons: they show how the ALA and the Washington Office have been consistently at the forefront of library advocacy and they illustrate how U.S. Presidents in recent history have demonstrated their commitment to the library and information professions.

President John F. Kennedy celebrates the swearing in of the US Commissioner of Education, Francis Keppel, in December 1962. Germaine Krettek, director of the Washington Office, looks on from the far left of the photo.

The bound materials I have processed also point to our rich history of advocacy. The first collection of reports that I processed related to the National Commission on Libraries and Information Services (NCLIS), an organization that lasted from 1970 to 2002. The reports detail the development of many library programs that continue to exist and thrive today. For example, an August 1974 NCLIS report makes the argument for a national interlibrary loan program, a system that continues to be widely used in the 21st century.

The second collection of reports are related to the White House Conference on Libraries and Information Services (WHCLIS), first held in 1979 under President Jimmy Carter and again in 1991 under President George H. W. Bush. WHCLIST – the White House Conference on Libraries and Information Science Taskforce – turned over its assets to the ALA Washington Office in 2012 to foster a new generation of library advocacy through an annual award that sponsors an attendee to National Library Legislative Day.

The purpose of NCLIS and WHCLIS was to foster a more standardized, universal dialogue regarding the world of libraries and information. Both were responsible for important constructive conversations that brought the profession forward.

Over the next half of my internship, I am looking forward to digging into more archival files and artifacts. Now, we are boxing up the photos and NCLIS and WHCLIS reports to send them to the American Library Association Archives, housed at the University of Illinois at Urbana-Champaign. Once there, they will be digitized and made searchable—just in time for the Washington Office to celebrate their 75th anniversary.

Director of the Washington Office Germaine Krettek (far left) and Executive Director of the ALA David Clift (second to left, behind Ms. Krettek) attend a 1964 White House event with President Lyndon Johnson (far right).

Germaine Krettek, director of the Washington Office, shakes hands with President Richard Nixon in the Oval Office during a meeting with representatives of the American Library Trustees Association (ALTA, a division of the ALA) in October 1969.

Also present at President Nixon’s October 1969 meeting with ALTA representatives is then-House Minority Leader and future President Gerald Ford, far left.

Washington Office director Eileen Cooke, right, shakes hands with then-Second Lady (and future First Lady) Barbara Bush during a 1983 event in Washington.

A welcome letter from President Jimmy Carter greets guests at the first White House Conference on Libraries and Information Services (WHCLIS) in November 1979 – at which he also spoke to attendees about his previous service as a library trustee in Georgia.

Future President Bill Clinton, then the governor of Arkansas, is a speaker at the November 1979 WHCLIS, with his accomplishments relating to libraries and the information profession highlighted in the event’s program.

The post Exploring the Washington Office special collections appeared first on District Dispatch.

DuraSpace News: Introducing the DuraSpace Code of Conduct

planet code4lib - Thu, 2017-09-28 00:00

DuraSpace is an international community dedicated to providing a welcoming, inclusive and positive experience for all members of our community. This includes formal gatherings, social settings, and online activities including mailing lists, wikis, web sites, IRC channels, Slack, and/or private correspondence. Accordingly, all community members are expected to show respect and courtesy to other attendees throughout events and online as well as in-person interactions associated with Duraspace.

Evergreen ILS: Release candidate for Evergreen 3.0 now available

planet code4lib - Wed, 2017-09-27 23:40

The release candidate of Evergreen 3.0 is now available for testing from the downloads page.

The release candidate includes various bugfixes and improvements, including

  • Fixing the ability to record credit card payments in the web staff client.
  • Improvements to the upgrade script.
  • Several other fixes; the complete list can be found on Launchpad.

Evergreen 3.0 will be a major release that includes:

  • community support of the web staff client for production use
  • serials and offline circulation modules for the web staff client
  • improvements to the display of headings in the public catalog browse list
  • the ability to search patron records by date of birth
  • copy tags and digital bookplates
  • batch editing of patron records
  • better support for consortia that span multiple time zones
  • and numerous other improvements

For more information on what’s coming in Evergreen 3.0.0, please read the updated draft of the release notes.

Users of Evergreen are strongly encouraged to use the release candidate to test new features and the web staff client; bugs should be reported via Launchpad. General release of 3.0.0 is scheduled for Tuesday, 3 October.

Evergreen admins installing the release candidate or upgrading a test system to it should be aware of the following:

  • OpenSRF 3.0.0-alpha is the minimum required version of OpenSRF.
  • The minimum version of PostgreSQL required to run Evergreen 3.0 is PostgreSQL 9.4.
  • Evergreen 3.0 requires that the open-ils.qstore service be active.
  • SIP2 bugfixes in Evergreen 3.0 require an upgrade of SIPServer to be fully effective.
  • There is no database upgrade script to go from 3.0-beta2 to 3.0-rc. We recommend testing an upgrade by starting from a 2.12 test system.

LITA: Jobs in Information Technology: September 27, 2017

planet code4lib - Wed, 2017-09-27 18:43

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University at Albany, State University of New York, Desktop Support Technician, Albany, NY

Rebecca M. Arthurs Memorial Library, Public Library Director, Brookville, PA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

LITA: Call for Panelists – 2018 ALA LITA Top Trends Program – Final Reminder!

planet code4lib - Wed, 2017-09-27 16:51

We are currently seeking nominations for panelists for the 2018 ALA Midwinter LITA Top Tech Trends program in Denver, CO!  You may nominate yourself or someone you know who would be a great addition to the panel of speakers.

LITA’s Top Trends Program has traditionally been one of the most popular programs at ALA. Each panelist discusses two trends in technology impacting libraries and engages in a moderated discussion with each other and the audience.

Submit your nominations at  Deadline is Saturday, September 30, 2017.

The LITA Top Tech Trends Committee will review each submission and select panelists based on their proposed trends, experience, and overall balance to the panel. Submission Guidelines can be found at

For more information about the Top Tech Trends program, please visit

Islandora: New to Islandora: The Technical Advisory Group

planet code4lib - Wed, 2017-09-27 13:49

Along with our recent shift to calling the Islandora Roadmap Committee the Islandora Coordinating Committee comes another addition to our governing structure: the Islandora Technical Advisory Group (TAG). This new body, capped at eight member so it stays agile, will work with the Technical Lead (Danny Lamb) to make recommendations about the development of Islandora, while considering the very diverse needs of our community. Members of the Technical Advisory Group can be drawn from any member institution in the Islandora Foundation, and are elected by and report to the Board of Directors.

Activities of the TAG include:

  • Remaining abreast of trends in community development and their implications for the larger code base.
  • Researching and advising on architectural decisions.

  • Advising on development and deprecation timelines for major versions.

  • Recommending priorities for community sprints.

  • At the request of the Committers, providing guidance about specific development issues.

  • Liaising with the Islandora community to clarify optimal development goals and general needs.

  • Providing guidance and support to the Technical Lead for making decisions and setting priorities.

  • Presenting recommendations to the Board of Directors and/or Roadmap regarding things such as:

    • Licensing issues

    • Spending of money (such as to hire contractors or pay for development tools)

    • Suggesting new nominees at lower levels of membership for the TAG

    • Acceptance of new modules into the code base

  • Organizing meetings and communicating regularly with stakeholders.

Meetings of the TAG will be limited to members, but agendas will be published ahead of time with a request for suggested items from the community, and notes will be published after each meeting.

The initial members of the Islandora Technical Advisory Group, as elected by the Board of Directors, are:

  • Jared Whiklo (University of Manitoba)
  • Bryan Brown (Florida State University)

  • William Panting (discoverygarden, Inc.)

  • Jonathan Green (LYRASIS)

  • Derek Merleaux (Born-Digital)

  • Rosie Le Faive (University of Prince Edward Island)

District Dispatch: After 20 years, House hearing focuses on depository libraries

planet code4lib - Wed, 2017-09-27 13:30

On September 26, Congress’ Committee on House Administration held a hearing to discuss the Federal Depository Library Program (FDLP) – the first such hearing in 20 years.

The hearing was part of the committee’s initiative to examine Title 44 of the U.S. Code, which is the basis for the FDLP and the Government Publishing Office (GPO). While much of the law has not been substantially changed since 1962, today’s meeting is further evidence of growing momentum in Congress to develop legislation that will bolster the FDLP and help libraries connect Americans to their government.

The committee heard today from four librarians, testifying as individual experts rather than for their institutions, about their ideas for strengthening the program to improve the public’s access to government information. In addition, Laurie Hall, GPO’s acting Superintendent of Documents (and a librarian!), testified about the office’s oversight of the program.

Appearing before the committee were:

  • Mike Furlough, executive director of HathiTrust Digital Library, whose members include 128 Federal Depository Libraries
  • Celina McDonald, government documents & criminology librarian at the University of Maryland, the regional depository library for Maryland, Delaware, and the District of Columbia
  • Stephen Parks, State Librarian of the Mississippi State Law Library, which is a selective depository
  • Beth Williams, library director at Stanford Law School, a selective depository

Testimony highlighted the enduring value of the FDLP in ensuring that Americans can access the documents of their government, not just today but in the future. The witnesses also discussed several ideas for facilitating collaboration between GPO and libraries, preserving publications over the long term and improving digital access to government publications.

Committee chairman Rep. Gregg Harper (R-MS3) described the hearing as an opportunity “to see how we can make something that we like, better.” ALA extends our thanks to Chairman Harper and the committee members for their interest in modernizing Title 44 and their thoughtful questions today.

The post After 20 years, House hearing focuses on depository libraries appeared first on District Dispatch.

Terry Reese: MarcEdit 7: Continued Task Refinements

planet code4lib - Wed, 2017-09-27 05:26

Now that MarcEdit 7 is available for alpha testers, I’ve been getting back some feedback on the new task processing.  Some of this feedback relates to a couple of errors showing up in tasks that request user interaction…other feedback is related to the tasks themselves and continued performance improvements.

In this implementation, one of the areas that I am really focusing on is performance.  To that end, I changed the way that tasks are processed.  Previously, task processing looked very much like this:

A user would initiate a task via the GUI or command-line, and once the task was processed, the program would then, via the GUI, open a hidden window that would populate each of the “task” windows and then “click” the process button.  Essentially, it was working much like a program that “sends” keystrokes to a window, but in a method that was a bit more automated.

This process had some pros and cons.  On the plus-side, Tasks was something added to MarcEdit 6.x, so this allowed me to easily add the task processing functionality without tearing the program apart.  That was a major win, as tasks then were just a simple matter of processing the commands and filling a hidden form for the user.   On the con-side, the task processing had a number of hidden performance penalties.  While tasks automated processing (which allowed for improved workflows), each task processed the file separately, and after each process, the file would be reloaded into the MarcEditor.  Say you had a file that took 10 seconds to load and a task list with 6 tasks.  The file loading alone, would cost you a minute.  Now, consider if that same file had to be processed by a task list with 60 different task elements – that would be 10 minutes dedicated just to file loading and unloading; and doesn’t count the time to actually process the data.

This was a problem, so with MarcEdit 7, I took the opportunity to actually tear down the way that tasks work.  This meant divorcing the application from the task process and creating a broker that could evaluate tasks being passed to it, and manage the various aspects of task processing.  This has led to the development of a process model that looks more like this now:

Once a task is initiated and it has been parsed, the task operations are passed to a broker.  The broker then looks at the task elements and the file to be processed, and then negotiates those actions directly with the program libraries.  This removes any file loading penalties, and allows me to manage memory and temporary file use at a much more granular way.  It also immediately speeds up the process.  Take that file that takes 10 seconds to load and 60 tasks to complete.  Immediately, you improve processing time by 10 minutes.  But the question still arises, could I do more?

And the answer to this question is actually yes.  The broker has the ability to process tasks in a number of different ways.  One of these is by handling each task process one by one at a file level, the other is handling all tasks all at once, but at a record level.  You might think that record level processing would always be faster, but it’s not.  Consider the task list with 60 tasks.  Some of these elements may only apply to a small subset of records.  In the by file process, I can quickly shortcut processing of records that are out of scope, in a record by record approach, I actually have to evaluate the record.  So, in testing, I found that when records are smaller than a certain size, and the number of task actions to process was within a certain number (regardless of file size), it was almost always better to process the data by file.  Where this changes is when you have a larger task list.  How large, I’m trying to figure that out.  But as an example, I had a real-world example sent to me that has over 950 task actions to process on a file ~350 MB (344,000 records) in size.   While the by file process is significantly faster than the MarcEdit 6.x method (each process incurred a 17 second file load penalty) – this still takes a lot of time to process because you are doing 950+ actions and complete file reads.  While this type of processing might not be particularly common (I do believe this is getting into outlier territory), the process can help to illustrate what I’m trying to teach the broker to do.  I ran this file using the three different processing methodologies, and here’s the results:

  1. MarcEdit 6.3.x: 962 Task Actions, completing in ~7 hours
  2. By File: 962 Task Actions, completing in 3 hours, 12 minutes
  3. By Record: 962 Task Actions, completing in 2 hours and 20 minutes


So, that’s still a really, really long time, but taking a closer look at the file and the changes made, and you can start to see why this process takes so much time.  Looking at the results file, >10 million changes have been processed against the 340,000+ records.  Also, consider the number of iterations that must take place.  The average record has approximately 20 fields.  Since each task needs to act upon the results of the task before it, it’s impossible to have tasks process at the same time – rather, tasks must happen in succession.  This means that each task must process the entire record as the results of a task may require an action based on data changed anywhere in the record.  This means that for one record, the program needs to run 962 operations, which means looping through 19, 240 fields (assuming no fields are added or deleted).  Extrapolate that number for 340,000 records, and the program needs to evaluate 6,541,600,000 fields or over 6 billion field evaluations which works out to 49,557,575 field evaluations per minute.

Ideally, I’d love to see the processing time for this task/file pair to be down around 1 hour and 30 minutes.  That would cut the current MarcEdit 7 processing time in half, and be almost 5 hours and 30 minutes faster than the current MarcEdit 6.3.x processing.  Can I get the processing down to that number – I’m not sure.  There are still optimizations to be hand – loops that can be optimized, buffering, etc. – but I think the biggest potential speed gains may possibly be available by adding some pre-processing to a task process to do a cursory evaluation of a recordset if a set of find criteria is present.  This wouldn’t affect every task, but potentially could improve selective processing of Edit Indicator, Edit Field, Edit Subfield, and Add/Delete field functions.  This is likely the next area that I’ll be evaluating.

Of course, the other question to solve is what exactly is the tipping point when By File Processing becomes less efficient than By Record processing.  My guess is that the characteristics that will be most applicable in this decision will be the number of task actions needing to be processed.  Splitting this file for example, into a file of 1000 and running this task by record versus by file – we see the following:

  1. By File processing, 962 Task Actions, completed in: 0.69 minutes
  2. By Record Processing, 962 Task Actions, completed in: 0.36 minutes


The processing times are relatively close, but the By Record processing is twice as fast as the By File Processing.  If we reduced the number of tasks to under 20, there is a dramatic switch in the processing time and By File Processing is the clear winner.

Obviously, there is some additional work to be done here, and more testing to do to understand what characteristics and which processing style will lead to the greatest processing gains, but from this testing, I came away with  a couple pieces of information.  First, the MarcEdit 7 process, regardless of method used, is way faster than MarcEdit 6.3.x.  Second, the MarcEdit 7 process and the MarcEdit 6.3.x process suffered from a flaw related to temp file management.  You can’t see it unless you work with files this large and with this many tasks, but the program cleans up temporary files after all processing is complete.  Normally, in a single operation environment, that happens right away.  Since a task represents a single operation, ~962 temporary files at 350 MBs per file were created as part of both processes.  That’s 336, 700 MB of data or 336 GBs of Temporary data!  When you close the program, that data is all cleared, but again, Ouch.  As I say, normally, you’d never see this kind of problem, but in this kind of edge case, it shows up clearly.  This has led me to implement periodic temp file cleanup so that no more than 10 temporary files are stored at any given time.  While that still means that in the case of this test file, up to 3 GB of temporary data could be stored, the size of that temp cache would never grow larger.  This seems to be a big win, and something I would have never seen without working with this kind of data file and use case.

Finally, let’s say after all this work, I’m able to hit the best case benchmarks (1 hr. 30 min.) and a user still feels that this is too long.  What more could be done?  Honestly, I’ve been thinking about that…but really, very little.  There will be a performance ceiling given how MarcEdit has to process task data.  So for those users – if this kind of performance time wasn’t acceptable, I believe only a custom built solution would provide better performance – but even with a custom build, I doubt you’d see significant gains if one continued to require tasks to be processed in sequence.

Anyway – this is maybe a bit more of a deeper dive into how tasks work in MarcEdit 6.3.x and how they will work in MarcEdit 7 than anyone really was looking for – but this particular set of files and use case represented and interesting opportunity to really test the various methods and provide benchmarks that easily demonstrate the impact of the current task process changes.

If you have questions, feel free to let me know.


DuraSpace News: PASIG 2017 Roundup

planet code4lib - Wed, 2017-09-27 00:00

Contributed by Erin Tripp, Business Development Manager for DuraSpace

This was my second time attending PASIG and it was a wonderful experience. It featured stories of successes and failures, detailed explanations of existing guidelines, and expressions of support that we can gain the skills and knowledge to meet our digital preservation goals. If you couldn’t be there, we have you covered:

DuraSpace News: Hot Topics at 4Science: ORCiD API v.2 and New Projects in Four Continents

planet code4lib - Wed, 2017-09-27 00:00

From Susanna Mornati, 4Science  4Science released DSpace-CRIS 5.8, including the support for ORCiD API v.2.  

DuraSpace News: Benjamin Gross joins Clarivate Analytics to Support VIVO Implementations

planet code4lib - Wed, 2017-09-27 00:00

From Ann Beynon, Clarivate Analytics We are pleased to announce that Benjamin Gross has joined the Clarivate Analytics team to support VIVO implementations, as well as other data integration projects.  Benjamin is well-known in the VIVO community as an active contributor to the VIVO project, Google groups, and conferences. Benjamin was instrumental in supporting VIVO when at UNAVCO.  

David Rosenthal: Sustaining Open Resources

planet code4lib - Tue, 2017-09-26 15:00
Cambridge University Office of Scholarly Communication's Unlocking Research blog has an interesting trilogy of posts looking at the issue of how open access research resources can be sustained for the long term:
Below the fold I summarize each of their arguments and make some overall observations.
Lauren CadwalladerFrom the researcher's perspective, Dr. Cadwallader uses the example of the Virtual Fly Brain, a domain-specific repository for the connections of neurons in Drosophila brains. It was established by UK researchers 8 years ago and is now used by about 10 labs in the UK and about 200 worldwide. It was awarded a 3-year Research Council grant, which was not renewed. The Wellcome Trust awarded a further 3 year grant, ending this month. As of June:
it is uncertain whether or not they will fund it in the future. ... On the one hand funders like the Wellcome Trust, Research Councils UK and National Institutes of Health (NIH) are encouraging researchers to use domain specific repositories for data sharing. Yet on the other, they are acknowledging that the current approaches for these resources are not necessarily sustainable. Clearly, this is a global resource not a UK one, but there is no global institution funding research in Drosophila brains. There is a free rider problem; each individual national or charitable funder depends on the resource but would rather not pay for it, and there is no penalty for avoiding paying until it is too late and the resource has gone.
David CarrFrom the perspective of the Open Research team at the Wellcome Trust Carr notes that:
Rather than ask for a data management plan, applicants are now asked to provide an outputs management plan setting out how they will maximise the value of their research outputs more broadly.

Wellcome commits to meet the costs of these plans as an integral part of the grant, and provides guidance on the costs that funding applicants should consider. We recognise, however, that many research outputs will continue to have value long after the funding period comes to an end. We must accept that preserving and making these outputs available into the future carries an ongoing cost.Wellcome has been addressing these on-going costs by providing:
significant grant funding to repositories, databases and other community resources. As of July 2016, Wellcome had active grants totalling £80 million to support major data resources. We have also invested many millions more in major cohort and longitudinal studies, such as UK Biobank and ALSPAC. We provide such support through our Biomedical Resource and Technology Development scheme, and have provided additional major awards over the years to support key resources, such as PDB-Europe, Ensembl and the Open Microscopy Environment. However, these are still grants with end-dates such as faced the Virtual Fly Brain:
While our funding for these resources is not open-ended and subject to review, we have been conscious for some time that the reliance of key community resources on grant funding (typically of three to five years’ duration) can create significant challenges, hindering their ability to plan for the long-term and retain staff. Clearly funders have difficulty committing funds for the long term. And if their short-term funding is successful, they are faced with a "too big to fail" problem. The repository says "pay up now or the entire field of research gets it". Not where a funder wants to end up. Nor is the necessary brinkmanship conducive to "their ability to plan for the long-term and retain staff".

An international workshop of data resources and major funders in the life sciences:
resulted in a call for action (reported in Nature) to coordinate efforts to ensure long-term sustainability of key resources, whilst supporting resources in providing access at no charge to users.  The group proposed an international mechanism to prioritise core data resources of global importance, building on the work undertaken by ELIXIR to define criteria for such resources.  It was proposed national funders could potentially then contribute a set proportion of their overall funding (with initial proposals suggesting around 1.5 to 2 per cent) to support these core data resources.A voluntary "tax" of this kind may be the least bad approach to funding global resources.
Dave GerrardFrom the perspective of a Technical Specialist Fellow from the Polonsky-Foundation-funded Digital Preservation at Oxford and Cambridge project, Gerrard argues that there are two different audiences for open resources. I agree with him about the impracticality of the OAIS concept of Designated Community:
The concept of Designated Communities is one that, in my opinion, the OAIS Reference Model never adequately gets to grips with. For instance, the OAIS Model suggests including explanatory information in specialist repositories to make the content understandable to the general community.

Long term access within this definition thus implies designing repositories for Designated Communities consisting of what my co-Polonsky-Fellow Lee Pretlove describes as: “all of humanity, plus robots”. The deluge of additional information that would need to be added to support this totally general resource would render it unusable; to aim at everybody is effectively aiming at nobody. And, crucially, “nobody” is precisely who is most likely to fund a “specialist repository for everyone”, too.Gerrard argues that the two audiences need:
two quite different types of repository. There’s the ‘ultra-specialised’ Open Research repository for the Designated Community of researchers in the related domain, and then there’s the more general institutional ‘special collection’ repository containing materials that provide context to the science, ... Sitting somewhere between the two are publications – the specialist repository might host early drafts and work in progress, while the institutional repository contains finished, publish work. And the institutional repository might also collect enough data to support these publicationsGerrard is correct to point out that:
a scientist needs access to her ‘personal papers’ while she’s still working, so, in the old days (i.e. more than 25 years ago) the archive couldn’t take these while she was still active, and would often have to wait for the professor to retire, or even die, before such items could be donated. However, now everything is digital, the prof can both keep her “papers” locally and deposit them at the same time. The library special collection doesn’t need to wait for the professor to die to get their hands on the context of her work. Or indeed, wait for her to become a professor.This works in an ideal world because:
A further outcome of being able to donate digitally is that scientists become more responsible for managing their personal digital materials well, so that it’s easier to donate them as they go along.But in the real world this effort to "keep their ongoing work neat and tidy" is frequently viewed as a distraction from the urgent task of publishing not perishing. The researcher bears the cost of depositing her materials, the benefits accrue to other researchers in the future. Not a powerful motivation.

Gerrard argues that his model clarifies the funding issues:
Funding specialist Open Research repositories should be the responsibility of funders in that domain, but they shouldn’t have to worry about long-term access to those resources. As long as the science is active enough that it’s getting funded, then a proportion of that funding should go to the repositories that science needs to support it.Whereas:
university / institutional repositories need to find quite separate funding for their archivists to start building relationships with those same scientists, and working with them to both collect the context surrounding their science as they go along, and prepare for the time when the specialist repository needs to be mothballed. With such contextual materials in place, there don’t seem to be too many insurmountable technical reasons why, when it’s acknowledged that the “switch from one Designated Community to another” has reached the requisite tipping point, the university / institutional repository couldn’t archive the whole of the specialist research repository, describe it sensibly using the contextual material they have collected from the relevant scientists as they’ve gone along, and then store it cheaplyThis sounds plausible but both halves ignore problems:
  • The value of the resource will outlast many grants, where the funders are constrained to award short-term grants. A voluntary "tax" on these grants would diversify the repository's income, but voluntary "taxes" are subject to the free-rider problem. To assure staff recruiting and minimize churn, the repository needs reserves, so the tax needs to exceed the running cost, reinforcing the free-rider's incentives.
  • These open research repositories are a global resource. Once the "tipping point" happens, which of the many university or institutional repositories gets to bear the cost of ingesting and preserving the global resource? All the others get to free-ride. Or does Gerrard envisage disaggregating the domain repository so that each researcher's contributions end up in their institution's repository? If so, how are contributions handled from (a) collaborations between labs, and (b) a researcher's career that spans multiple institutions? Or does he envisage the researcher depositing everything into both the domain and the institutional repository? The researcher's motivation is to deposit into the domain repository. The additional work to deposit into the institutional repository is just make-work to benefit the institution, to which these days most researchers have little loyalty. The whole value of domain repositories is the way they aggregate the outputs of all researchers in a field. Isn't it important to preserve that value for the long term?

LITA: LITA Fall 2017 Online Learning Line Up

planet code4lib - Tue, 2017-09-26 14:00

Don’t miss out on the excellent online offerings put together by the LITA Education committee for this fall.

Check out all the offerings at the

LITA Online Learning page.

Select from these titles and find registration details and links on each of the sessions pages.

Building Services Around Reproducibility & Open Scholarship
with presenter: Vicky Steeves
A blended web course with weekly webinars, offered: October 18, 2017 – November 8, 2017

Taking Altmetrics to the Next Level in Your Library’s Systems and Services
with presenter: Lily Troia
Webinar offered: October 19, 2017

Diversity and Inclusion in Library Makerspace
with presenters: Sharona Ginsberg and Lauren Di Monte
Webinar offered: October 24, 2017

Digital Life Decoded: A user-centered approach to cyber-security and privacy
with presenters: Hannah Rainey, Sonoe Nakasone and Will Cross
Webinar offered: November 7, 2017

Introduction to and JSON-LD
with presenter: Jacob Shelby
Webinar offered: November 15, 2017

Sign up for any and all of these great sessions today.

Questions or Comments?

Contact LITA at (312) 280-4269 or Mark Beatty,



Subscribe to code4lib aggregator