You are here

Feed aggregator

Eric Lease Morgan: How not to work during a sabbatical

planet code4lib - Tue, 2016-07-19 14:43

This presentation — given at Code4Lib Midwest (Chicago, July 14, 2016) — outlines the various software systems I wrote during my recent tenure as an adjunct faculty member at the University of Notre Dame. (This presentation is also available as a one-page PDF handout designed to be duplex printed and folded in half as if it were a booklet.)

  • How rare is rare? – In an effort to determine the “rarity” of items in the Catholic Portal, I programmatically searched WorldCat for specific items, counted the number of times it was held by libraries in the United States, and recorded the list of the holding libraries. Through the process I learned that most of the items in the Catholic Portal are “rare”, but I also learned that “rarity” can be defined as the triangulation of scarcity, demand, and value. Thus the “rare” things may not be rare at all.
  • Image processing – By exploiting the features and functions of an open source library called OpenCV, I started exploring ways to evaluate images in the same way I have been evaluating texts. By counting & tabulating the pixels in an image it is possible to create ratios of colors, do facial recognition, or analyze geometric composition. Through these processes is may be possible to supplement art history and criticism. For example, one might be able to ask things like, “Show me all of the paintings from Picasso’s Rose Period.”
  • Library Of Congress Name Authorities – Given about 125,000 MARC authority records, I wrote an application that searched the Library Of Congress (LOC) Name Authority File, and updated the local authority records with LOC identifiers, thus making the local authority database more consistent. For items that needed disambiguation, I created a large set of simple button-based forms allowing librarians to choose the most correct name.
  • MARC record enrichment – Given about 500,000 MARC records describing ebooks, I wrote a program that found the richest OCLC record in WorldCat and then merged the found record with the local record. Ultimately the local records included more access points and thus proved to be more useful in a library catalog setting.
  • OAI-PMH processing – I finally got my brain around the process of harvesting & indexing OAI-PMH content into VUFind. Whoever wrote the original OAI-PMH applications for VUFind did a very good job, but there is a definite workflow to the process. Now that I understand the workflow it is relatively easy ingest metadata from things like ContentDM, but issues with the way Dublin Core is implement still make the process challenging.
  • EEBO/TCP – Given the most beautiful TEI mark-up I’ve ever seen, I have systematically harvested the Early English Books Online (EEBO) content from the Text Encoding Initiative (TCP) and done some broad & deep but also generic text analysis subsets of the collection. Readers are able to search the collection for items of interest, save the full text to their own space for analysis, and have a number of rudimentary reports done against the result. This process allows the reader to see the corpus from a “distance”. Very similar work has been done against subsets of content from JSTOR as well as the HathiTrust.
  • VIAF Lookup – Given about 100,000 MARC authority records, I wrote a program to search VIAF for the most appropriate identifier and associate it with the given record. Through the process I learned two things: 1) how to exploit the VIAF API, and 2) how to exploit the Levenshtein algorithm. Using the later I was able to make automated and “intelligent” choices when it came to name disambiguation. In the end, I was able to accurately associate more than 80% of the authority names with VIAF identifiers.

My tenure as an adjunct faculty member was very much akin to a one year education except for a fifty-five year old. I did many of the things college students do: go to class, attend sporting events, go on road trips, make friends, go to parties, go home for the holidays, write papers, give oral presentations, eat too much, drink too much, etc. Besides the software systems outlined above, I gave four or five professional presentations, attend & helped coordinate five or six professional meetings, taught an online, semester-long, graduate-level class of on the topic of XML, took many different classes (painting, sketching, dance, & language) many times, lived many months in Chicago, Philadelphia, and Rome, visited more than two dozen European cities, painted about fifty paintings, bound & filled about two dozen hand-made books, and took about three thousand photographs. The only thing I didn’t do is take tests.

Islandora: Islandora's Technical Lead: Daniel Lamb

planet code4lib - Mon, 2016-07-18 22:07

Following on our earlier announcement, the Islandora Foundation would like to announce that Daniel Lamb will be our Technical Lead, starting August 1st. The Islandora Technical Lead is responsible for providing leadership, technical guidance, coordination, and support to the open source community in the interests of releasing ongoing enhancements and maintenance to Islandora. 

Danny comes to us from discoverygarden, Inc, where he has spent the better part of four years working with Islandora. He is uniquely suited to taking on the role of Islandora’s Technical Lead as both a long-time committer to the Islandora 7.x-1.x project and the primary architect of Islandora CLAW. He is also an experienced presenter and Islandora Camp instructor, an official Committer on both Islandora teams, and has long been a leader in the Islandora community. We did a Meet Your Developer interview with Danny back in 2014 if you want to learn more about his background and his approach to development.


LITA: Transmission #7 – A Special Transmission

planet code4lib - Mon, 2016-07-18 18:21

Hi, everyone! Due to technical challenges and delays, I am reopening the Begin Transmission survey and feedback form (below). Join the conversation! Thanks for your help.


LITA: Did you attend the recent ALA Annual 2016 conference in Orlando FL?

planet code4lib - Mon, 2016-07-18 16:12

If so please complete our LITA conference programs survey evaluation at:

We hope you had the best ALA Annual conference, and that attending useful, informative and fun LITA programs were an important part of your conference experience. If so please take a moment to complete our evaluation survey. Your responses are very important to your colleagues who are planning programming for next years ALA Annual, as well as LITA year round continuing education sessions.

To complete your survey it might also help to check back at the

Full schedule of LITA programs and meetings.

And recall other details at the LITA @ ALA Annual page.

Thank you and we hope to see you at the

LITA Forum in Fort Worth, TX, November 17-20, 2016

Islandora: Islandora is Getting a Technical Lead

planet code4lib - Mon, 2016-07-18 14:28

The Islandora Foundation could not be more pleased to announce that it will be hiring a Technical Lead to start in August, 2016. The Islandora Technical Lead is responsible for providing leadership, technical guidance, coordination, and support to the open source community in the interests of releasing ongoing enhancements and maintenance to Islandora. Together with the Project & Community Manager and Islandora's governing committees, the Technical Lead ensures that Islandora software fulfills the mission of the project and the needs of community members.

The Technical Lead creates an inclusive, welcoming, open team environment based on a meritocracy of committers, contributors, documentation specialists, technical trainers, and other volunteer resources. They strive to recruit new members to the team from the larger community of volunteers. 

Hiring a Technical Lead has been a long-term goal of the Islandora Foundation since its launch in 2013, and we could not have gotten here without the support of our wonderful members, both institutions and individuals. While you may note that our membership funding has not quite reached the $160,000 minimum that we set as a goal, we have the opportunity to top up that funding with support from a grant for 2016/2017. We trust in our community to bring our membership funding up to close that gap over the next year so that the Islandora Foundation will remain sustainable going forward.

This is a huge step forward for the Islandora Foundation and the community that it serves. As we enter our fourth year as an independent non-profit, we look to that community for direction on where the project will go next. Adding a Technical Lead to our team will provide an invaluable resource to help achieve our goals, but the role of the Islandora community and the many wonderful volunteers within it will remain paramount. Islandora is for the community, by the community - just now with a Technical Lead for that community to work with.


District Dispatch: Dear RNC and DNC: Libraries are essential to achieving national priorities

planet code4lib - Mon, 2016-07-18 14:14

Today, the Republican National Convention (RNC) kicks off in Cleveland, and the Democratic National Convention (DNC) begins next Monday in Philadelphia. In the latest installment of the Policy Revolution! initiative, ALA submitted comments to the Republican and Democratic Party Platform Committees. A party platform is a formal set of value statements and goals to guide the direction of a political party. Final discussion and ratification of the platforms will take place during the respective conventions.

ALA’s submission is based on a large body of prior work. At the most fundamental level, such comments are informed by internal ALA policies, approved by ALA’s Council. In terms of our work more specifically targeted to the national public policy arena, we completed the National Policy Agenda for Libraries in June 2015 to provide the strategic policy direction for libraries, under the auspices of a Library Advisory Committee that included a number of library community organizations in addition to ALA.

At this point in the process, the primary goal is to showcase how libraries contribute to the broad range of national goals of importance to the major political parties. Given the economic unease around the country, ALA comments highlighted the roles of libraries in advancing economic opportunity. The comments also address several issues that are prominent in the campaigns, such as national infrastructure, veterans, education and learning and others.

There will be some library presence at the conventions. In Cleveland, the Cuyahoga County (Ohio) Public Library will be streaming briefings organized by The Atlantic. In Philadelphia, Siobhan Reardon, President and Director, Free Library of Philadelphia, will serve on a panel on getting online and digital inclusion that will be keynoted by FCC Commissioner Mignon Clyburn.

Susan Hildreth, Tony Sarmiento and Alan Inouye (L-R) discuss the challenges and opportunities presented by the upcoming national elections.

OITP held a number of sessions at the 2016 ALA Annual Conference to provide briefings and obtain guidance from the ALA community for our future policy direction.  In particular, we held one public panel session moderated by Mark Smith, Director, Texas State Library and Archives Commission with panelists Susan Hildreth, Executive Director, Peninsula (Calif.) Library System; Tony Sarmiento, Executive Director, Senior Service America; and me. Thanks to those who attended our session or one of our meetings and provided advice.

Of course, the political conventions only mark the beginning of the actual presidential campaigns, so there is much more work to be done in the months leading into the election, the transition to the next President and the first 100 days of the new Administration. We will be developing and disseminating much more information and honing in on specific recommendations. So here’s a question for you: If you could say one (or two) things to the presidential candidates about the value of libraries to our respective communities, what would you highlight? We’d like to hear from you—via the comments option on District Dispatch or send email directly to me at

The post Dear RNC and DNC: Libraries are essential to achieving national priorities appeared first on District Dispatch.

LibUX: Create Once, Publish Everywhere

planet code4lib - Mon, 2016-07-18 05:35

In 2014 I persuaded my library to build another website. No, it wasn’t a redesign, no new entity from the ground up to replace what we have. This was another website — a second one.

Ours is a unique joint-use academic and public facility. Divide this library’s users into its broadest audiences and there are still plenty to account for: faculty, undergraduate and graduate students (local, distant), alumni, the public – whatever that means.

Chunking the latter into one big patronbase isn’t particularly useful, but the allocation of the homepage’s real-estate constricted our ability to finely tune it. Our incentive to accommodate the academic community crowded out our ability to accommodate the audience who cared about events and entertainment – and this is precisely where our usability studies drew the line. Public cardholders appreciated but asked for more prominent access to new popular materials and programs, students and faculty were pretty clear about what they didn’t want.

So, I talked colleagues into spinning-off a new website — different look and feel, tone, even domain — just for public library services, and they weren’t shy about voicing concerns about increased workload involved with doubling-up and maintaining two sets of content, and whether this decision would, for example, obfuscate research services from the public or programming from the faculty. Content locked away in a silo is, after all, locked away in a silo. There’s risk that a graduate student using the academic library website might not see that a favorite author is visiting when that event is only posted for the public.

Right. Big problem, but not one exactly unique to this project. Libraries have been suffering these pain points for years. Assuaging this grief is exactly the selling point for discovery layers. That “library website” that we refer to in the singular is more like an organism of microsites and applications: there is the catalog, a static homepage hosted by the institution or county, maybe a one-off Drupal site on a server the library controls, subject guides, an event management system, a room reservation system, and iPhone app. Silos are a library’s present and future.

The increasing device complexity and capability of the web is and will continue to reinforce silos. As libraries approach their mobile moment, library websites that try to do too much will fail, whereas sites and apps that focus on doing just one thing well will succeed. It’s this sentiment recommending developers consider breaking out functionality among multiple apps, that there is a point when an app can be too feature-rich.

The Kano model can illustrate that some features have a negative impact on customer satisfaction.

Everything is designed. Few things are designed well. Brian Reed

Libraries are actually in a good position to benefit from this strategy. So much of the library web presence is already single-purposed that it wouldn’t take much to retrofit. Rather than roll the catalog into the same template as the homepage, it can be embraced as a standalone web app with its own unique purpose-driven user interface. This isn’t about going off-brand, but without the pressure of inheriting a mega-menu from the homepage, the library can be more judicious with the catalog’s design. This makes sense for library services when patrons are task-driven. Time is better spent optimizing for engagement rather than making the sites identical.

Not to mention silos aren’t inherently bad for discovery. Organizing web content in the way that news sites have sports sections is sound. Robots and crawlers have an easier time indexing content when there is a delineated path in which similar content is clustered and interlinked. The machines are cool with it. What makes discovery tricky for humans is that content on one site isn’t usually available on another. If patrons visit the library with a task in mind — “I want to find something to read”, “I need to renew my items”, “I want to see events”, or “I need to write a paper” — then there isn’t much incentive to browse outside of that content silo.

Libraries can’t depend on patrons just happening onto the event calendar after picking through the databases, nor can they depend on cramming everything on, or funneling everyone through, the front page. Getting found is going to get harder. If an institution has the ability and incentive to build an app, stakeholders want that option to be on the table without dramatically impacting workflow.  Libraries will need to be able to grow, adapt, and iterate without having to fuss over content.

A copeing mechanism

I knew a standalone, public-themed, public-toned, public-specific library website would better resonate with, well, the public. If we were better able to fine-tune the content for the audience, patrons would be more likely to engage with library services for a longer time. This allows more opportunity to introduce new services, promote databases, maybe increase circulation.

At the same time, by relieving the pressure from just one homepage, the library can also better serve academic patrons. The opportunity to increase engagement all around won this gamble the stakeholder support it needed, but not if it dramatically strained workflow or blocked any potential content from any user. We needed to change how we approached content so that it was possible to share one item across all platforms, but at the same time prevent the need to micromanage which piece of content appeared where.

In 2009, Daniel Jacobson, Director of Application and Development for NPR wrote a series of posts on Programmable Web about the NPR API beginning with “C.O.P.E.: Create Once, Publish Everywhere.” To meet the content demand for native apps on various platforms, microsites, including the NPR affiliate sites, the team wrote an API, which made it easier to fetch and repurpose media. This today is an important principle for addressing the challenges of a future-friendly web.

For most libraries it’s not going to be realistic to control all the content from one system, yet consolidating what’s possible will make it easier to manage over time. With some static pages on the institutional web server with limited control, we began migrating this old content in to a WordPress multisite, with which staff were already familiar.

There were specific types of content we intended to broadcast to the four corners: notifications and schedules, databases, events, reader’s advisory in the form of lists and staff picks, guides, and instructional videos. If the library’s success was determined by usage, turnout, and circulation, on the web that success very much depends on the ability to spotlight this content at the patron’s point of engagement.

A content system as-is doesn’t cut it. Popular content management systems like WordPress and Drupal are wonderful, but to meet libraries’ unique and portable needs these need a little rigging. If an institution hopes to staff-source content and expect everyone to use the system, then tailoring the CMS to the needs and culture of the library is an important step.

Subject specialists were creating guides and videos. Librarians involved with programming (both academic and public) were creating events. Others maintained departmental info, policies, schedules.

To ensure consistent and good content from folks better suited to create it, it is unfair and counterproductive to present a system with too steep a learning curve. I admit to being naive and surprised to see how strange and unfamiliar WordPress could be for those who don’t spend all day in it. De-jargoning the content system is no less important than de-jargoning the content.

Plus, these systems require tweaking to make content sufficiently modular. WordPress’s default editor–title, tags and categories, a featured image, and a blank slate text box–doesn’t fly for a content-type like an event, which requires start and end times, dates, all-day or multi-day options.

Moreover, the blank slate is intimidating.

Rigid content requirements and a blank-slate WYSIWYG don’t scale. When demanding content is detail oriented enough to have instructions, the stock editor can be replaced with smaller custom fields, which like any form element can be required before the post can be published.

Here’s an example: a self-hosted video actually requires multiple video formats to be cross-browser compatible and captions to be accessible. Publishing without captions violates accessibility guidelines, but without being able to ensure that the captions exists it is inevitable that at some point an inaccessible video will go live. Breaking the content editor into smaller, manageable chunks allows for fine control, checks and balances, and has the added opportunity to insert instructions at each step to streamline the process.

A cross-system, cross-department, controlled vocabulary is key. When we first started to think about sharing content between public and academic library websites, we knew that on some level all content would need to be filterable by either terms “public” or “academic.” We’re not going to publish something twice, so the public library website will have to know that it needs “public” content.

This was an addicting train of thought. We could go hog wild if new pages knew what kind of content to curate. What would it take then to create a page called “gardening” and make it a hub for all library content about the topic? It needs to be dynamic so it can stay current without micromanagement. It needs to populate itself with gardening book lists, staff-picks, upcoming “gardening” events, agricultural resources and databases – assuming the content exists. Isn’t this just a subject search for “gardening”?

If a library can assign one or two taxonomies that could be applied to all sorts of disparate content, then the query a site makes for the API could match categories regardless of their content type. The taxonomy has to be controlled and enforced so that it is consistent, and when possible can be built right into the post editor. Using WordPress, custom taxonomies can be tied to multiple types without fuss.

register_taxonomy( 'your-taxonomy', // Add any content type her 'your-taxonomy' can be used array( 'databases', 'events', 'reviews', 'items', 'video' ); // other taxonomy options omitted for brevity array( /* options */ ); );

I created two taxonomies: “Library Audience,” which lets us filter content for the type of patron–academic, public, teen, children and family, distance student, etc.–and “Subject” lets us filter by subject. The no-red-tape way to create a global “Subject” taxonomy was to just use the subjects that the library’s electronic resources use, a standardized vocabulary overseen by a committee. In our specific case, database subjects actually boil down to a three-letter designation. So while users see “Business,” the slug passed around behind the scenes is “zbu.”

Here is what a query against our eventual API for “business” looked like:

Content is then liberated by an API. Content management systems like WordPress or Drupal already have an API of sorts: the RSS feed. Technically, any site can ingest the XML and republish whatever content is made available, but it won’t include things added custom to the CMS. This isn’t an uncommon need, so both WordPress and Drupal have REST APIs – which is a little beyond the scope of this writeup.

These enable the programmatic fetch and push of content from one platform into another.

In LibGuides — an increasingly capable library-specific content management system — our content creators can use HTML5 data attributes as hooks in the template that help determine the audience and type of content to grab. It creates a little empty placeholder, like an ad spot, to be populated by a relevant event (if any), book lists, relevant resources, past or upcoming workshops, and more.

At the time this article was originally written in summer 2014 it looked a little like

<span data-content="event" data-audience="public" data-subject="comics"></span>

in which librarians decided what type of content (e.g., an event) went where on their page. For each placeholder, based off its parameters, a script builds and submits the API query string using jQuery:

$.getJSON( ‘//’ ) .success( function( response ) { // do something });

We have since largely traded jQuery for Angular. When there is a placeholder it’s a tad more agnostic

<ng-repeat="ad in ads"> {{ title }} {{ etc }} </ng-repeat>

but more often than not we just weasel it in there using attributes such as audience and type, which unless other specified will determine the values from the page.

Not random, but library events that make sense on the pages they appear.

Remember that just a few years ago many libraries rushed to create mobile sites but then struggled to maintain two sets of content, and the follow-up responsive web design is a long process involving a lot of stakeholders – many haven’t gotten this far because of the red-tape. The landscape of the web will only get weirder. There are and will continue to be new markets, new corners of the internet where libraries will want to be.

Libraries that can C.O.P.E. will be able to grow, iterate, and evolve. Libraries that can’t, won’t.

The post Create Once, Publish Everywhere appeared first on LibUX.

Galen Charlton: Cats who reside in story

planet code4lib - Sun, 2016-07-17 22:54

The tragedy of keeping house with cats is that their lives are so short in comparison to our own.

On Friday, Marlene and I put Sophie to rest; today, LaZorra. Four years ago, we lost Erasmus; before that, Scheherazade and Jennyfur. At the moment, we have just one, Amelia. It was a relief that she got a clean bill of health on Saturday… but she is nonetheless sixteen years old. The inexorability of time weighs heavily on me today.

I have no belief that there is any continuation of thought or spirt or soul after the cessation of life; the only persistence I know of for our cats is in the realm of story. And it is not enough: I am not good enough with words to capture and pin down the moment of a cat sleeping and purring on my chest or how the limbs of our little feline family would knot and jumble together.

Words are not nothing, however, so I shall share some stories about the latest to depart.

LaZorra was named after the white “Z” on her back, as if some bravo had decided to mark her before she entered this world. LaZorra was a cat of great brain, while her brother Erasmus was not. We would joke that LaZorra had claimed not only her brain cells, but those of her daughters Sophia and Amelia. (Who were also Erasmsus’ children; suffice it to say that I thought I had more time to spay LaZorra than was actually the case).

Although she was a young mother, LaZorra was a good one. Scheherazade was alive at the time and also proved to be a good auntie-cat.

Very early on, a pattern was set: Sophie would cuddle with her father Rasi; Mellie with her mother Zorrie. LaZorra would cuddle with me; as would Erasmus; per the transitive property, I ended up squished.

But really, it took only one cat to train me. For a while LaZorra had a toy that she would drag to me when she wanted me to play with her. I always did; morning, afternoon, evening, at 2 in the morning…


Well, that was Marlene reminding me that once I taught a cat that I could be trained to play with her at two a.m. that there would be no end of it—nor any rest for us—so I did not end up being perfectly accommodating.

But I came close. LaZorra knew that she was due love and affection; that her remit included unlimited interference with keyboards and screens. And in the end, assistance when she could no longer make even the slight jump to the kitchen chair.

When we lost Erasmus to cancer, Marlene and I were afraid that Sophie would inevitably follow. For her, Rasi was her sun, moon, and stars. We had Erasmus euthanized at home so that the others would know that unlike the many trips for chemo, that this time he was not coming back. Nonetheless, Sophie would often sit at the door, waiting for her daddy to come back home.

She never stopped doing that until we moved.

It was by brush and comb, little by little as she camped out on the back of the couch, that I showed her that humans might just possibly be good for something (though not as a substitute for her daddy-cat). It is such a little thing, but I hold it as one of my personal accomplishments that I helped her look outward again.

Eventually those little scritches on the back of the couch became her expected due: we learned that we were to pay the Sophie-toll every time we passed by her.

Both LaZorra and Sophie were full of personality—and thus, they were often the subjects of my “Dear Cat” tweets. I’ll close with a few of them.

Butter to LaZorra was as mushrooms to hobbits:

Dear cat:

There is no butter left.
There was no butter.
Butter never exists.
Butter? What's that?
(May I eat breakfast in peace?)


— Galen Charlton (@gmcharlt) July 9, 2016

At times, she was a little too clever for her own good:

Dear cat:

Precision in eating the pill pocket around the pill is NOT a virtue.


— Galen Charlton (@gmcharlt) April 26, 2016

Sophie was the only cat I’ve known to like popping bubblewrap:

Dear cat:

Carry on, carry on. No need to stop popping that bubble-wrap on my account.

Er, why are you looking so guilty?



— Galen Charlton (@gmcharlt) March 20, 2016

Sophie apparently enjoyed the taste of cables:

Dear cat:

If you persist in chewing on that cable, I will be unable to post your picture on the internet, and that is just not allo

— Galen Charlton (@gmcharlt) November 1, 2015

LaZorra was the suitcase-inspector-in-chief:

Dear cat,

Out of the suitcase, please. It suffices for me to bring your legend to #alamw16; your fur is optional.


— Galen Charlton (@gmcharlt) January 7, 2016

And, of course, they could be counted on to help with computation:

Dear cat:

I'm not sure even Google can turn up useful results for the query "ssser xzssfcvvvd|{PO? (99999"



— Galen Charlton (@gmcharlt) December 9, 2014

They both departed this world with pieces of our hearts in their claws.

FOSS4Lib Recent Releases: VuFind Harvest - 2.1.0

planet code4lib - Sun, 2016-07-17 17:08

Last updated July 17, 2016. Created by Peter Murray on July 17, 2016.
Log in to edit this page.

Package: VuFind HarvestRelease Date: Thursday, July 14, 2016

FOSS4Lib Updated Packages: VuFind Harvest

planet code4lib - Sun, 2016-07-17 17:07

Last updated July 17, 2016. Created by Peter Murray on July 17, 2016.
Log in to edit this page.

VuFindHarvest contains OAI-PMH harvesting logic. This is part of the VuFind project ( but may be used as a stand-alone tool or incorporated into other software dealing with metadata harvesting.

Package Type: Metadata ManipulationLicense: GPLv2 Package Links Releases for VuFind Harvest Programming Language: PHPOpen Hub Link: Hub Stats Widget: 

LibUX: 63% of web traffic originates from a smartphone

planet code4lib - Sun, 2016-07-17 00:49

More library users than not have a touch screen that connects to the Internet in their pocket. In the United States, 63% of web traffic originates from a smartphone. This number aligns with the Pew Research Center report most recently updated in October 2014, adding additional context by noting that as of last year more than “90% of American adults own a cell phone,” 42% own a tablet. Of them, 34% “go online mostly using their phones, and not using some other device such as a desktop or laptop computer.”

These trends hearken back to an industry prediction coined by Googler Luke Wroblewski called “the mobile moment.” That is, the point at which mobile traffic to an organization’s website eclipses traditional non-mobile traffic, such as from a desktop.

The pressure for libraries to make their websites mobile friendly increases from quarter to quarter as the upward trajectory makes itself more distinct.

2015 Q2 Mobile Overview Report

The post 63% of web traffic originates from a smartphone appeared first on LibUX.

LibUX: 21% of people start scrolling before the page finishes loading

planet code4lib - Sun, 2016-07-17 00:38

21% of people start scrolling before the page finishes loading. They might just scroll past the upcoming events or new resources at the top of the page that libraries are trying to promote. Whoops. Chalk that up to “above the fold” irony.

2015 Q2 Mobile Overview Report

The post 21% of people start scrolling before the page finishes loading appeared first on LibUX.

LibUX: Average load time on mobile

planet code4lib - Sun, 2016-07-17 00:30

The average load time on mobile is about 4 seconds. Type of connection—3G, 4G, LTE, WiFi, or something else—isn’t part of the report, but we know the majority of the devices are smartphones using Android or iOS. Okay, you could have guessed that. But the potential speed difference between 3G and WiFi is enormous, and we should be interested in how that breaks down. While overall web traffic from tablets makes up less than a third, the Adobe Digital Index 2014 U.S. Mobile Benchmark reported that 93% of this traffic is over WiFi, enough—I think—to skew MOVR’s load time average. My gut feeling is that a four second load for mobile devices is optimistic if those devices aren’t on WiFi but—I digress.

This is an important benchmark when considered with other data showing the importance of web page speed in terms of user behavior. For instance, the 2014 Radware Mobile State of the Union suggests that almost half of mobile users expect pages to load in just two seconds, and of them a whopping 40% will abandon a page taking longer than three. So, if the average mobile user is already in a perpetual state of uggggh come onnnnn then trying to connect to a slow library website or database isn’t doing much for his or her opinion of us.

2015 Q2 Mobile Overview Report

The post Average load time on mobile appeared first on LibUX.

David Rosenthal: What is wrong with science?

planet code4lib - Sat, 2016-07-16 22:31
This is a quick post to flag two articles well worth reading.

The 7 biggest problems facing science, according to 270 scientists by Julia Belluz, Brad Plumer, and Brian Resnick at Vox is an excellent overview of some of the most serious problems, with pointers to efforts to fix them. Their 7 are:
  • Academia has a huge money problem:
    In the United States, academic researchers in the sciences generally cannot rely on university funding alone to pay for their salaries, assistants, and lab costs. Instead, they have to seek outside grants. "In many cases the expectations were and often still are that faculty should cover at least 75 percent of the salary on grants," writes John Chatham, ... Grants also usually expire after three or so years, which pushes scientists away from long-term projects. Yet as John Pooley ... points out, the biggest discoveries usually take decades to uncover and are unlikely to occur under short-term funding schemes.
  • Too many studies are poorly designed:
    An estimated $200 billion — or the equivalent of 85 percent of global spending on research — is routinely wasted on poorly designed and redundant studies, according to meta-researchers who have analyzed inefficiencies in research. We know that as much as 30 percent of the most influential original medical research papers later turn out to be wrong or exaggerated.
  • Replicating results is crucial — and rare:
    A 2015 study looked at 83 highly cited studies that claimed to feature effective psychiatric treatments. Only 16 had ever been successfully replicated. Another 16 were contradicted by follow-up attempts, and 11 were found to have substantially smaller effects the second time around. Meanwhile, nearly half of the studies (40) had never been subject to replication at all.
  • Peer review is broken:
    numerous studies and systematic reviews have shown that peer review doesn’t reliably prevent poor-quality science from being published.
  • Too much science is locked behind paywalls:
    "Large, publicly owned publishing companies make huge profits off of scientists by publishing our science and then selling it back to the university libraries at a massive profit (which primarily benefits stockholders)," Corina Logan, an animal behavior researcher at the University of Cambridge, noted. "It is not in the best interest of the society, the scientists, the public, or the research." (In 2014, Elsevier reported a profit margin of nearly 40 percent and revenues close to $3 billion.)
  • Science is poorly communicated:
    Science journalism is often full of exaggerated, conflicting, or outright misleading claims. If you ever want to see a perfect example of this, check out "Kill or Cure," a site where Paul Battley meticulously documents all the times the Daily Mail reported that various items — from antacids to yogurt — either cause cancer, prevent cancer, or sometimes do both.
    Indeed, one review in BMJ found that one-third of university press releases contained either exaggerated claims of causation (when the study itself only suggested correlation), unwarranted implications about animal studies for people, or unfounded health advice.
  • Life as a young academic is incredibly stressful:
    A 2015 study at the University of California Berkeley found that 47 percent of PhD students surveyed could be considered depressed
Amen to all of those. Gina Kolata at the New York Times limits So Many Research Scientists, So Few Openings as Professors to the over-production of Ph. D.s:
Dr. Larson and his colleagues calculated R0s for various science fields in academia. There, R0 is the average number of Ph.D.s that a tenure-track professor will graduate over the course of his or her career, with an R0 of one meaning each professor is replaced by one new Ph.D. The highest R0 is in environmental engineering, at 19.0. It is lower — 6.3 — in biological and medical sciences combined, but that still means that for every new Ph.D. who gets a tenure-track academic job, 5.3 will be shut out. In other words, Dr. Larson said, 84 percent of new Ph.D.s in biomedicine “should be pursuing other opportunities” — jobs in industry or elsewhere, for example, that are not meant to lead to a professorship.Again, amen. A friend of mine spotted this problem years ago and has been making a business advising grad students and post-docs how to transition to "real work".

LibUX: If library programmers were given access to everything

planet code4lib - Sat, 2016-07-16 13:55

I’ve lost the religion that I long held that if library programmers are finally given access to everything the world would be better. I think we’re moving into a somewhat darker world where everything is in the cloud, things are locked down, and they’re probably pretty good. Tim Spalding

The post If library programmers were given access to everything appeared first on LibUX.

District Dispatch: Congress leaves town with funding bills unfinished

planet code4lib - Fri, 2016-07-15 18:09

House Appropriations Bills include support for LSTA and Title IV, but defund IAL

The twelve FY 2017 appropriations bills continued to progress slowly in the House and Senate as Congress leaves today for a lengthy summer recess. This past week, the House Appropriations Committee marked up (approved) its contentious FY2017 Labor, Health and Human Services, Labor, Education, and Related Agencies Appropriations bill (commonly called “LHHS”), approving the bill 31-19 with the vote splitting along party lines. The Senate approved its version of LHHS last month by a decidedly bipartisan vote of 29-1.

Most important to ALA members, the LHHS funding bill in both chambers includes funding for several programs of significant importance to the library community: LSTA, IAL, and Title IV of the Every Student Succeeds Act (ESSA). While the House Committee bill provides increased support for two of our priorities, it also eliminates funding for another.

First, the good news. Both the House and Senate funding bills include increased funding for LSTA and its Grants to States program, rejecting the President’s disappointing proposal to cut funding for both. LSTA Grants to States would receive $155.9 million in the House bill: a slight increase over FY 2016 ($155.7 million), though $240,000 less than what the Senate requested in May ($156.1 million). The President had requested only $154.8 million. Overall LSTA funding would be boosted in the House bill to $183.0 million. That compares with $183.3 in the Senate, $182.4 in the President’s request, and $182.9 ultimately approved by Congress in FY 2016). ALA continues to oppose cuts to LSTA programs.

The House also made slight upward adjustments to three other LSTA programs from FY2016 levels. Native American Library Services grants would receive $4.1 million ($3.8 in Senate and $4.0 in FY2016); the National Leadership: Libraries program received $13.1 million ($13.4 million in Senate and $13.0 in FY 2016); and the Laura Bush 21st Century Librarian program received level funding at $10 million.

New this year is a block grant created with bipartisan support under Title IV of ESSA The “Student Support and Academic Enrichment Grants” (SSAEG will provide supplemental funding to help states and school districts underwrite a “well-rounded” educational experience for students, including STEM and arts programs. Best of all, libraries are expressly authorized to receive SSAEG funds. Although SSAEG was originally authorized in the ESSA at $1.65 billion, the President and Congress appear willing to fund the program at much lower levels. The President requested only $500 million while the Senate approved less at $300 million. The House approved a higher level of $1 billion but still below authorized levels for FY2017.

Next, the decidedly bad news is that House appropriators have proposed to eliminate all funding for school libraries through the Innovative Approaches to Literacy (IAL) program.. According to the House Committee’s Report, “The Committee has chosen to focus resources on core formula-based education programs instead of narrowly-focused competitive grants such as this one.” IAL received $27 million in FY2016, which was also the funding level requested by the President and supported in the Senate bill. One half of IAL funding is reserved for school libraries with the remaining open to any national non-profit by application.

Likely limiting its chances of passage, however, the House LHHS bill included a number of divisive policy riders addressing issues including highly controversial issues like family planning, NLRB joint employer standards, and “Obamacare”. The bill also includes education-related policy riders addressing the “gainful employment” rule aimed at for-profit colleges, forthcoming teacher preparation rules, and the federal definition of a credit hour. All of the amendments introduced at the full Committee mark up to strike these riders or to restore cuts in education funding failed along party lines.

Congress will return from its recess in September, leaving them only a few weeks to adopt funding measures to keep the government open beyond the October 1 start of the Fiscal Year. That’s unlikely, so Congress probably will be forced to enact a “Continuing Resolution,” or CR, to fund the Government. Under CR rules the previous year’s level of funding is maintained for most programs. Vigorous discussions on the Hill already have begun as to what the length of the CR can and should be. If a CR that extends into the new calendar year is adopted, the new President will be forced to negotiate government-wide spending levels with Congress soon after being sworn in, possibly even before key Cabinet and other budget-related positions are filled.

The post Congress leaves town with funding bills unfinished appeared first on District Dispatch.

Open Knowledge Foundation: Progress report: OpenTrials – linking clinical trial data

planet code4lib - Fri, 2016-07-15 16:59

Since last year Open Knowledge has been developing OpenTrials, an open, online database linking the publicly available data and documents on all clinical trials conducted – something that has been talked about for many years but never created. The project is funded by The Laura and John Arnold Foundation and directed by Dr. Ben Goldacre, an internationally known leader on clinical trial transparency. Having an open and freely re-usable database of the world’s clinical trial data will increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data, and drive standards around open data in evidence-based medicine.

The project is currently in its first phase (which runs until March 2017), where the focus is on building and populating the first prototype of the OpenTrials database, as well as raising awareness of the project in the community and getting user involvement and feedback. The progress that has been made so far was presented last month at the Evidence Live conference in Oxford, which brought together leaders across the world of Evidence Based Medicine, including researchers, doctors, and the pharmaceutical industry. This was an excellent opportunity to demonstrate the project and speak to both researchers who want to use the platform as well as people with a general enthusiasm for its impact on medicine.

Around 40 people attended our talk which explained why OpenTrials is an important infrastructure project for medicine, covered some of the technical aspects of the platform, details of what data we’ve imported so far, and lastly a quick demo.

If you’re feeling impatient, here are the slides from the talk, or scroll down for a summary.

Ben Goldacre and Vitor Baptista present OpenTrials at Evidence Live 2016 (photo by benmeg / CC BY)

What we’ve imported into the OpenTrials database so far
  • 331,999 deduplicated trials, collected from nine clinical trial registries:
    • ANZCTR 11,645
    • 205,422
    • EU CTR 35,159
    • GSK 4,131
    • ISRCTN 14,256
    • Pfizer 1,567
    • Takeda 1,142
    • UMIN 20,557
    • WHO ICTRP 298,688
Current functionality
  • Basic search (by keyword)
  • Searching for trials with publications
  • Uploading missing data/documents for a particular trial
  • Showing trials with discrepancies (e.g. target sample size)
What we’re importing next Feedback and get involved

If you attended the talk and have any questions or feedback, please email us. And generally if you’re interested in contributing to OpenTrials, get in touch.

Want to get early access to the data and be a user tester? Sign up and we’ll be in touch soon.


John Mark Ockerbloom: Mid-20th century newspapers: Minding the copyrights

planet code4lib - Fri, 2016-07-15 16:39

I was pleased to read last week that the National Digital Newspaper Program, which has sponsored the digitization of over 1 million historically significant newspaper pages , has announced that it has expanded its scope to include content published up to 1963, as long as public domain status can be established.   I’m excited about this initiative, which will surface content of historic interest that’s in many readers’ living memory. I’ve advocated opening access to serials up to 1963 for a long time, and have worked on various efforts to surface information about serial copyright renewals (like this one), to make it easier to find public domain serial content that can be made freely readable online.  (In the US, renewal became automatic for copyrights secured after 1963, making it difficult to republish most newspapers after that date.  Up till then, though, there’s a lot that can be put online.)

Copyright in contributions

Clearing copyright for newspapers after 1922 can be challenging, however.  Relatively few newspapers renewed copyrights for entire issues– as I noted 10 years ago, none outside of New York City did before the end of World War II. But newspapers often aggregate lots of content from lots of sources, and determining the copyright status of those various pieces of content is necessary as well, as far as I can tell.  While section 201(c) of copyright law normally gives copyright holders of a collective work, such as a magazine or newspaper, the right to republish contributions as part of that work, people digitizing a newspaper that didn’t renew its own copyright aren’t usually copyright holders for that newspaper.  (I’m not a lawyer, though– if any legal experts want to argue that digitizing libraries get similar republication rights as the newspaper copyright holders, feel free to comment.)

Text contributions

As I mentioned in my last post, we at Penn are currently going through the Catalog of Copyright Entries to survey which periodicals have contributions with copyright renewals, and when those renewals started.  (My previous post discussed this in the context of journals, but the survey covers newspapers as well.)  Most of the contributions in the section we’re surveying are text, and we’ve now comprehensively surveyed up to 1932.  In the process, we’ve found a number of newspapers that had copyright-renewed text contributions, even when they did not have copyright-renewed issues.  The renewed contributions are most commonly serialized fiction (which was more commonly run in newspapers decades ago than it is now).  Occasionally we’ll see a special nonfiction feature by a well-known author renewed.  I have not yet seen any contribution renewals for straight news stories, though, and most newspapers published in the 1920s and early 1930s have not made any appearance in our renewal survey to date.  I’ll post an update if I see this pattern changing; but right now, if digitizers are uncertain about the status of a particular story or feature article in a newspaper, searching for its title and author in the Catalog of Copyright Entries should suffice to clear it.

Photographs and advertisements

Newspapers contain more than text, though.  They also include photos, as well as other graphical elements, which often appear in advertisements.   It turns out, however, that the renewal rate for images is very low, and the renewal rate for “commercial prints”, which include advertisements, is even lower.  There isn’t yet a searchable text file or database for these types of copyright renewals (though I’m hoping one can online before long, with help from Distributed Proofreaders), and in any case, images typically don’t have unambiguous titles one can use for searching.  However, most news photographs were published just after they were taken, and therefore they have a known copyright year and specific years in which a renewal, if any, should have been filed.  It’s possible to go through the complete artwork and commercial prints of any given year, get an overview of all the renewed photos and ads that exist, and look for matches.  (It’s a little cumbersome, but doable, with page images of the Catalog of Copyright Entries; it will be easier once there are searchable, classified transcriptions of these pages.)

Fair use arguments may also be relevant.  Even in the rare case where an advertisement was copyright-renewed, or includes copyright-renewed elements (like a copyrighted character), an ad in the context of an old newspaper largely serves an informative purpose, and presenting it there online doesn’t typically take away from the market for that advertisement.  As far as I can tell, what market exists for ads mostly involves relicensing them for new purposes such as nostalgia merchandise.  For that matter, most licensed reuses of photographs I’m aware of involve the use of high-resolution original prints and negatives, not the lower-quality copies that appear on newsprint (and that could be made even lower-grade for purposes of free display in a noncommercial research collection, if necessary).   I don’t know if NDNP is planning to accommodate fair use arguments along with public domain documentation, but they’re worth considering.

Syndicated and reprinted content: A thornier problem

Many newspapers contain not only original content, but also content that originated elsewhere.  This type of content comes in many forms: wire-service stories and photos, ads, and syndicated cartoons and columns.  I don’t yet see much cause for concern about wire news stories; typically they originate in a specific newspaper, and would normally need to be renewed with reference to that newspaper.  And at least as far as 1932, I haven’t yet seen any straight news stories renewed.   Likewise, I suspect wire photos and national ads can be cleared much like single-newspaper photos and ads can be.

But I think syndicated content may be more of a sticky issue.  Syndicated comics and features grew increasingly popular in newspapers in the 20th century, and there’s still a market for some content that goes back a long way.  For instance, the first contribution renewal for the Elizabethan Star, dated September 8, 1930, is the very first Blondie comic strip.  That strip soon became wildly popular, published by thousands of newspapers across the country.  It still enjoys a robust market, with its official website noting it runs in over 2000 newspapers today.  Moreover, its syndicator, King Features, also published weekly periodicals of its own, with issues as far back as 1933 renewed.  (As far as I can tell, it published these for copyright purposes, as very few libraries have them, but according to WorldCat an issue “binds together one copy of each comic, puzzle, or column distributed by the syndicate in a given week”.  Renew that, and you renew everything in it.)  King Features remains one of the largest syndicators in the world.  Most major newspapers, then, include at least some copyrighted (and possibly still marketable) material at least as far back as the early 1930s.

Selective presentation of serial content

The most problematic content of these old newspapers from a copyright point of view, though, is probably the least interesting content from a researcher’s point of view.  Most people who want to look at a particular locale’s newspaper want to see the local content: the news its journalists reported, the editorials it ran, the ads local businesses and readers bought.  The material that came from elsewhere, and ran identically in hundreds of other newspapers, is of less research interest.  Why not omit that, then, while still showing all the local content?

This should be feasible given current law and technology.  We know from the Google and Hathitrust cases that fair use allows completely copyrighted volumes to be digitized and used for certain purposes like search, as long as users aren’t generally shown the full text.  And while projects like HathiTrust and Chronicling America now typically show all the pages they scan, commonly used digitized newspaper software can either highlight or blank out not only specific pages but even the specific sections of a page in which a particular article or image appears.

This gives us a path forward for providing access to newspapers up to 1963 (or whatever date the paper started being renewed in its entirety).  Specifically, a library digitization project can digitize and index all the pages, but then only expose the portions of the issues it’s comfortable showing given its copyright knowledge.  It can summarize the parts it’s omitting, so that other libraries (or other trusted collaborators) can research the parts it wasn’t able to clear on its own.  Sections could then be opened up as researchers across the Internet found evidence to clear up their status.   Taken as a whole, it’s a big job, but projects like the Copyright Review Management System show how distributed copyright clearance can be feasibly done at scale.

Moreover, if we can establish a workable clearance and selective display process for US newspapers, it will probably also work for most other serials published in the US.  Most of them, whether magazines, scholarly journals,  conference proceedings, newsletters, or trade publications, are no more complicated in their sources and structures than newspapers are, and they’re often much simpler.   So I look forward to seeing how this expansion in scope up to 1963 works out for the National Digital Newspaper Program.   And I hope we can use their example and experience to open access to a wider variety of serials as well.



Open Knowledge Foundation: Open Access: Why do scholarly communication platforms matter and what is the true cost of gold OA?

planet code4lib - Fri, 2016-07-15 09:43

During the past 2,5 years Open Knowledge has been a partner in PASTEUR4OA, a project focused on aligning open access policies for European Union research. As part of the work, a series of advocacy resources was produced that can be used by stakeholders to promote the development and reinforcement of such open access policies. The final two briefing papers, written by Open Knowledge, have been published this week and deal with two pressing issues around open access today:  the financial opacity of open access publishing and its potential harmful effects for the research community, and the expansion of open and free scholarly communication platforms in the academic world – explaining the new dependencies that may arise from those platforms and why this matters for the open access movement. 

Revealing the true cost of gold OA

“Reducing the costs of readership while increasing access to research outputs” has been a rallying cry for open access publishing, or Gold OA. Yet, the Gold OA market is largely opaque and makes it hard for us to evaluate how the costs of readership actually develop. Data on both the costs of subscriptions (for hybrid OA journals) and of APCs are hard to gather. If they can be obtained, they only offer partial but very different insights into the market. This is a problem for efficient open access publishing. Funders, institutions, and individual researchers are therefore increasingly concerned that a transition to Gold OA could leave research community open for exploitative financial practices and prevent effective market coordination.

Which factors contribute to the current opacity in the market? Which approaches are taken to foster financial transparency of Gold OA? And what are recommendations to funders, institutions, researchers and publishers to increase transparency?

The paper Revealing the true costs of Gold OA – Towards a public data infrastructure of scholarly publishing costs, written by researchers of Open Knowledge International, King’s College London and the University of London, presents the current state of financial opacity in scholarly journal publishing. It describes what information is needed in order to obtain a bigger, more systemic picture of financial flows, and to understand how much money is going into the system, where this money comes from, and how these financial flows might be adjusted to support alternative kinds of publishing models.


 Why do scholarly communication platforms matter for open access? Over the past two decades, open access advocates have made significant gains in securing public access to the formal outputs of scholarly communication (e.g. peer reviewed journal articles). The same period has seen the rise of platforms from commercial publishers and technology companies that enable users to interact and share their work, as well as providing analytics and services around scholarly communication.

How should researchers and policymakers respond to the rise of these platforms? Do commercial platforms necessarily work the interests of the scholarly community? How and to what extent do these proprietary platforms pose a threat to open scholarly communication? What might public alternatives look like?

The paper Infrastructures for Open Scholarly Communication provides a brief overview of the rise of scholarly platforms – describing some of their main characteristics as well as debates and controversies surrounding them. It argues that in order to prevent new forms of enclosure, it is essential that public policymakers should be concerned with the provision of public infrastructures for scholarly communication as well as public access to the outputs of research. It concludes with a review of some of the core elements of such infrastructures, as well as recommendations for further work in this area.

Journal of Web Librarianship: Thinking Like a Student: Subject Guides in Small Academic Libraries

planet code4lib - Fri, 2016-07-15 05:23
Catherine E. Kerrigan


Subscribe to code4lib aggregator