You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 2 hours 8 sec ago

LITA: The 2016 LITA Forum includes 3 amazing Keynotes

Thu, 2016-10-06 16:57

Fort Worth, TX
November 17-20, 2016

Join your LITA and library technology colleagues for the 2016 LITA Forum

  • LITA Forum early bird rates end October 14, 2016
  • The guaranteed discount hotel rate at the Omni Fort Worth Hotel ends Wednesday October 21st, 2016, and as available thereafter.
  • Online registration closes Sunday November 13th, 2016

Register Now!

This year’s Forum has three amazing keynotes you won’t want to miss:


Cecily Walker is the Systems Project Librarian at Vancouver Public Library, where she focuses on user experience, community digital projects, digital collections, and the intersection of social justice, technology, and public librarianship. It was her frustration with the way that software was designed to meet the needs of highly technical users rather than the general public that led her to user experience, but it was her love of information, intellectual freedom, and commitment to social justice that led her back to librarianship. Cecily can be found on Twitter (@skeskali) where she frequently holds court on any number of subjects, but especially lipstick.


Waldo Jaquith is the director of U.S. Open Data, an organization that works with government and the private sector to advance the cause of open data. He previously worked in open data with the White House Office of Science and Technology Policy. No stranger to libraries, Jaquith used to work with digital assets at the Scripps Library at the Miller Center at the University of Virginia, and served on the Board of Trustees at his regional library. He lives near Charlottesville, Virginia with his wife and son. Waldo can also be found at Twitter (@waldojaquith)


Tara Robertson is the Systems Librarian and Accessibility Advocate at CAPER-BC. “I’m a librarian who doesn’t work in a library. I like figuring out how things work, why they break, and how to make them work better. I’m passionate about universal design, accessibility, open source software, intellectual freedom, feminism and Fluevog shoes.”

Twitter (@tararobertson)
Blog (

Don’t forget the Preconference Workshops

Come to the 2016 LITA Forum a day early and choose to participate in one of two outstanding preconferences.

Librarians can code! A “hands-on” computer programming workshop just for librarians
With presenter: Kelly Smith, founder of Prenda – a learning technology company with the vision of millions of kids learning to code at libraries all over the country.

Letting the Collections Tell Their Story: Using Tableau for Collection Evaluation
With presenters: Karen Harker, Collection Assessment Librarian University of North Texas Libraries; Janette Klein, Interdisciplinary Information Science PhD student University of North Texas; Priya Parwani, Graduate Research Assistant University of North Texas Libraries.

Full Details

Join us in Fort Worth, Texas, at the Omni Fort Worth Hotel located in Downtown Fort Worth, for the 2016 LITA Forum, a three-day education and networking event featuring 2 preconferences, 3 keynote sessions, more than 55 concurrent sessions and 25 poster presentations. It’s the 19th annual gathering of the highly regarded LITA Forum for technology-minded information professionals. Meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Registration is limited in order to preserve the important networking advantages of a smaller conference. Attendees take advantage of the informal Friday evening reception, networking dinners and other social opportunities to get to know colleagues and speakers.

Get all the details, register and book a hotel room at the 2016 Forum Web site.

Forum Sponsors:

OCLC, Yewno, EBSCO, BiblioCommons

Get all the details, register and book a hotel room at the 2016 Forum website.

See you in Fort Worth.

David Rosenthal: Software Heritage Foundation

Thu, 2016-10-06 15:00
Back in 2009 I wrote:
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.There are no legal obstacles to collecting and preserving open source code. Technically, doing so is much easier than general Web archiving. It seemed to me like a no-brainer, especially because almost all other digital preservation efforts depended upon the open source code no-one was preserving! I urged many national libraries to take this work on. They all thought someone else should do it, but none of the someones agreed.

Finally, a team under Roberto di Cosmo with initial support from INRIA has stepped into the breach. As you can see at their website they are already collecting a vast amount of code from open source repositories around the Internet. statistics 06Oct16They are in the process of setting up a foundation to support this work. Everyone should support this important essential work.

Evergreen ILS: Evergreen 2.11.0 released

Wed, 2016-10-05 20:21

On behalf of the build-master team and myself, I am pleased to announce the release of Evergreen 2.11.0. Included in Evergreen 2.11.0 are the following new features:

  • Add Date Header to Action Trigger Email/SMS Templates
  • Support for Ubuntu 16.04
  • Purge User Activity
  • Authority Record Import Updates Editor, Edit Date.
  • Authority Propagation Updates Bib Editor, Edit Date
  • Bibliographic Record Source Now Copied to 901$s
  • Option to Update Bib Source and Edit Details on Record Import
  • Staff Client Honors Aged Circulations
  • “Canceled Transit” Item Status
  • Copy Status “Is Available” Flag
  • Email Checkout Receipts
  • Set Per-OU Limits on Allowed Payment Amounts
  • Additional Fields Available for Display in Some Interfaces
  • Merge Notification Preferences Tables in TPAC
  • Improved Holds Screens in My Account
  • Popularity Boost for Ranking Search Results
  • Badge Configuration
  • Removal of Advanced Hold Options link when part holds are expected
  • SIP Renewals
  • Treat SIP Location Field as Login Workstation

These, along with dozens of bug fixes and updates to documentation, are the result of work by more than thirty individuals at over 15 organizations.

To download Evergreen 2.11.0 and to read the full release notes, please visit the downloads page.

LITA: Jobs in Information Technology: October 5, 2016

Wed, 2016-10-05 19:40

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

MIT Libraries, Web Developer, Cambridge, MA

Boston College, Digital Library Applications Developer, Chestnut Hill, MA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

FOSS4Lib Upcoming Events: JHOVE Online Hack Day

Wed, 2016-10-05 15:46
Date: Tuesday, October 11, 2016 - 09:00 to 17:00Supports: JHOVE

Last updated October 5, 2016. Created by Peter Murray on October 5, 2016.
Log in to edit this page.


JHOVE is a widely-used open source digital preservation tool, used for validating content, such as PDFs. However, some of the validation output can be difficult to understand. The aim of this online hack day is to enhance our knowledge about JHOVE errors – to create descriptions of errors and to identify example files, as well as to start to understand their preservation impact and what can possibly be done about them.

David Rosenthal: Another Vint Cerf Column

Wed, 2016-10-05 15:00
Vint Cerf has another column on the problem of digital preservation. He concludes:
These thoughts immediately raise the question of financial support for such work. In the past, there were patrons and the religious orders of the Catholic Church as well as the centers of Islamic science and learning that underwrote the cost of such preservation. It seems inescapable that our society will need to find its own formula for underwriting the cost of preserving knowledge in media that will have some permanence. That many of the digital objects to be preserved will require executable software for their rendering is also inescapable. Unless we face this challenge in a direct way, the truly impressive knowledge we have collectively produced in the past 100 years or so may simply evaporate with time.Vint is right about the fundamental problem but wrong about how to solve it. He is right that the problem isn't not knowing how to make digital information persistent, it is not knowing how to pay to make digital information persistent. Yearning for quasi-immortal media makes the problem of paying for it worse not better, because quasi-immortal media such as DNA are both more expensive and their more expensive cost is front-loaded. Copyability is inherent in on-line information, that's how you know it is on-line. Work with this grain of the medium, don't fight it.

LITA: Social Media For My Institution – a LITA web course

Wed, 2016-10-05 14:59

Don’t miss out on this informative LITA web course starting soon.

Social Media For My Institution: from “mine” to “ours”

Instructor: Dr. Plamen Miltenoff
Wednesdays, 10/19/2016 – 11/9/2016
Blended format web course

Register Online, page arranged by session date (login required)

A course for librarians who want to explore the institutional application of social media. Based on an established academic course at St. Cloud State University “Social Media in Global Context”. This course will critically examine the institutional need of social media (SM) and juxtapose it to its private use. Discuss the mechanics of choice for recent and future SM tools. Present a theoretical introduction to the subculture of social media. Show how to streamline library SM policies with the goals and mission of the institution. There will be hands-on exercises on creation and dissemination of textual and multimedia content, and patrons’ engagement. And will include brainstorming on suitable for the institution strategies regarding resources, human and technological, workload share, storytelling, and branding and related issues such as privacy, security etc.

This is a blended format web course:

The course will be delivered as 4 separate live webinar lectures, one per week on Wednesdays, October 19, 26, November 2, and 9 at 2pm Central. The webinars will also be recorded and distributed through the web course platform, Moodle, for asynchronous participation.

Details here and Registration here

Dr. Plamen Miltenoff is an information specialist and Professor at St. Cloud State University. His education includes several graduate degrees in history and Library and Information Science and in education. His professional interests encompass social Web development and design, gaming and gamification environments. For more information see

And don’t miss other upcoming LITA fall continuing education offerings:

Beyond Usage Statistics: How to use Google Analytics to Improve your Repository
Presenter: Hui Zhang
Tuesday, October 11, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Online Productivity Tools: Smart Shortcuts and Clever Tricks
Presenter: Jaclyn McKewan
Tuesday November 8, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Questions or Comments?

For questions or comments, contact LITA at (312) 280-4268 or Mark Beatty,

Islandora: Islandora CLAW MVP

Wed, 2016-10-05 14:08

The Minimum Viable Product document that I have been working on with the CLAW committers over the past few weeks has been made public, and is available for review. It defines the scope of the project, what we think is required for a stable release, proposed content modeling, and the overall design of the various subsystems of the software. We will be using this as a starting point for detailed scoping before attempting future sprints.

Please feel free to review and make comments. All feedback is appreciated.

District Dispatch: Failing to finish funding bills, Congresses presses pause

Wed, 2016-10-05 13:32

Unable to reach agreement on 12 appropriations bills, Congress last week passed H.R. 5325, a “Continuing Resolution” often called a CR, that keeps the Federal government operating through December 9 when Congress will be forced to return for a lame-duck session to finish the appropriations bills. The President signed the CR late last week.

The CR provides funding levels for nearly all Federal programs, such as LSTA, IAL, Library of Congress, etc., at nearly the same levels set in last year’s Omnibus funding package. There is a slight decrease to many programs of 0.549%, however due in part, to “advance” funding increases for veterans’ health approved last year.

Earlier last week, Senate Republicans failed in their efforts to force through a Republican-written CR. Facing unified Democratic opposition and with the threat of a government shutdown – unappealing to all so close to the elections – Republicans and Democrats set aside funding disagreements for the time being.  The President signed the “clean” CR hours after the House passed the funding measure that had passed the Senate Tuesday.

Congress will be forced to address a series of contentious funding issues in a post-election lame-duck session. The most significant disagreements concerned funding relief for Flint, MI, water contamination, and Louisiana flooding disaster. Democrats were successful in blocking the Republican’s CR which did not including funding for Flint. The final CR did not include funding for either need but agreements were reached last week to address both needs in the lame-duck.

ALA continues to work to ensure strong funding for library programs which fared well in both House and Senate Appropriations Committees. While overall funding for education was cut significantly, both committees recommended small increases for LSTA and the Grants to States Program as well as level funding for Innovative Approaches to Literacy.

The outlook for this year’s lame-duck session could be quite interesting for several factors. The elections may result in a change of party control in the Senate and/or the White House. The lame-duck session will also include a number of departing Representatives and Senators who will be retiring or will have lost re-election not to mention the President will also be departing in January. We will be watching, with many others, to determine if the lame-duck will be contentious or if Members will just want to quickly and quietly wrap up business.

The post Failing to finish funding bills, Congresses presses pause appeared first on District Dispatch.

Access Conference: Live and Streaming: Access 2016

Wed, 2016-10-05 11:54

Can’t join us in Fredericton this year? Don’t worry! We’re live streaming the show! Join us each day, starting at 9:00 AM.

Keep watching this space for updates! We’ll add individual talks to our YouTube channel over the next couple of weeks.

Open Knowledge Foundation: Who Will Shape the Future of the Data Society?

Wed, 2016-10-05 08:49

This piece was originally posted on the blog of the International Open Data Conference 2016, which takes place in Madrid, 6-7th October 2016.

The contemporary world is held together by a vast and overlapping fabric of information systems. These information systems do not only tell us things about the world around us. They also play a central role in organising many different aspects of our lives. They are not only instruments of knowledge, but also engines of change. But what kind of change will they bring?

Contemporary data infrastructures are the result of hundreds of years of work and thought. In charting the development of these infrastructures we can learn about the rise and fall not only of the different methods, technologies and standards implicated in the making of data, but also about the articulation of different kinds of social, political, economic and cultural worlds: different kinds of “data worlds”.

Beyond the rows and columns of data tables, the development of data infrastructures tell tales of the emergence of the world economy and global institutions; different ways of classifying populations; different ways of managing finances and evaluating performance; different programmes to reform and restructure public institutions; and how all kinds of issues and concerns are rendered into quantitative portraits in relation to which progress can be charted – from gender equality to child mortality, biodiversity to broadband access, unemployment to urban ecology.

The transnational network assembled in Madrid for the International Open Data Conference has the opportunity to play a significant role in shaping the future of these data worlds. Many of those present have made huge contributions towards an agenda of opening up datasets and developing capacities to use them. Thanks to these efforts there is now global momentum around open data amongst international organisations, national governments, local administrations and civil society groups – which will have an enduring impact on how data is made public.

Perhaps, around a decade after the first stirrings of interest in what we now know as “open data”, it is time to have a broader conversation around not only the opening up and use of datasets, but also the making of data infrastructures: of what issues are rendered into data and how, and the kinds of dynamics of collective life that these infrastructures give rise to. How might we increase public deliberation around the calibration and direction of these engines of change?

Anyone involved with the creation of official data will be well aware that this is not a trivial proposition. Not least because of the huge amount of effort and expense that can be incurred in everything from developing standards, commissioning IT systems, organising consultation processes and running the social, technical and administrative systems which can be required to create and maintain even the smallest and simplest of datasets. Reshaping data worlds can be slow and painstaking work. But unless we instate processes to ensure alignment between data infrastructures and the concerns of their various publics, we risk sustaining systems which are at best disconnected from and at worst damaging towards those whom they are intended to benefit.

What might such social shaping of data infrastructures look like? Luckily there is no shortage of recent examples – from civil society groups campaigning for changes in existing information systems (such as advocacy around the UK’s company register), to cases of citizen and civil society data leading to changes in official data collection practices, to the emergence of new tools and methods to work with, challenge and articulate alternatives to official data. Official data can also be augmented by “born digital” data derived from a variety of different platforms, sources and devices which can be creatively repurposed in the service of studying and securing progress around different issues.

While there is a great deal of experimentation with data infrastructures “in the wild”, how might institutions learn from these initiatives in order to make public data infrastructures more responsive to their publics? How can we open up new spaces for participation and deliberation around official information systems at the same time as building on the processes and standards which have developed over decades to ensure the quality, integrity and comparability of official data? How might participatory design methods be applied to involve different publics in the making of public data? How might official data be layered with other “born digital” data sources to develop a richer picture around issues that matter? How do we develop the social, technical and methodological capacities required to enable more people to take part not just in using datasets, but also reshaping data worlds?

Addressing these questions will be crucial to the development of a new phase of the open data movement – from the opening up of datasets to the opening up of data infrastructures. Public institutions may find they have not only new users, but new potential contributors and collaborators as the sites where public data is made begin to multiply and extend outside of the public sector – raising new issues and challenges related to the design, governance and political economics of public information systems.

The development of new institutional processes, policies and practices to increase democratic engagement around data infrastructures may be more time consuming than some of the comparatively simpler steps that institutions can take to open up their datasets. But further work in this area is vital to secure progress on a wide range of issues – from tackling tax base erosion to tracking progress towards commitments made at the recent Paris climate negotiations.

As a modest contribution to advancing research and practice around these issues, a new initiative called the Public Data Lab is forming to convene researchers, institutions and civil society groups with an interest in the making of data infrastructures, as well as the development of capacities that are required for more people to not only take part in the data society, but also to more meaningfully participate in shaping its future.

Stuart Yeates: How would we know when it was time to move from TEI/XML to TEI/JSON?

Tue, 2016-10-04 21:20
This post inspired by TEI Next by Hugh Cayless.

How would we know when it was time to move from TEI/XML to TEI/JSON?

If we stand back and think about what it is we (the TEI community) need from the format :
  1. A common format for storing and communicating Texts and augmentations of Texts (Transcriptions, Manuscript Description, Critical Apparatus, Authority Control, etc, etc.).
  2. A body of documentation for shared use and understanding of that format.
  3. A method of validating Texts in the format as being in the format.
  4. A method of transforming Texts in the format for computation, display or migration.
  5. The ability to reuse the work of other communities so we don't have to build everything for ourselves (Unicode, IETF language tags, URIs, parsers, validators, outsourcing providers who are tooled up to at least have a conversation about what we're trying to do, etc)
[Everyone will have their slightly different priorities for a list like this, but I'm sure we can agree that a list of important functionality could be drawn up and expanded to requirements list at a sufficiently granular level so we can assess different potential technologies against those items. ] 
If we really want to ponder whether TEI/JSON is the next step after TEI/XML we need to compare the two approaches against such as list of requirements. Personally I'm confident that TEI/XML will come out in front right now. Whether javascript has potential to replace XSLT as the preferred method for really exciting interfaces to TEI/XML docs is a much more open question, in my mind.  
That's not to say that the criticisms of XML aren't true (they are) or valid (they are) or worth repeating (they are), but perfection is commonly the enemy of progress.

Terry Reese: MarcEdit Task Management–planned behavior changes

Tue, 2016-10-04 21:08

Below are two messages from a conversation on the MarcEdit list around the Task Manager/Task Functionality in MarcEdit.  I’ve been discussing some changes to the way that this works – particularly related to how the networked task management works.  Since this is a widely used function of the application, I’m pulling these two messages out and making them available here.  If you see something that gives you pause, let me know.



Following up – I spent a bit of time working on this and as far as I can tell, everything now works as I’ve described below.  This definitely complicates the management of tasks a bit on my end – but after doing a bit of work – here’s the results.  Please, look over the notes here and below and let me know if you see any potential issues.  If I don’t hear anything, I’ll include this in my update over the weekend.

Simplifying Network Sharing:

When you setup network sharing, the tool now recognizes if your path is a networked path (or something else) and will automatically do a couple things if it’s a networked path:

Adds an import option to the Preferences (option only shows if your path is a network path)

If you click the copy option, it will create a .network option in the local tasks folder and then move a copy of all your items into the network space and into the _tasks.txt file on the network.

On startup, MarcEdit automatically will update your .network task folder with the current data in the defined network folder.

When off-line, MarcEdit automatically will use the data in the .network folder (a local cached copy of your networked data)

When off-line and using a networked path, if you select the task manager, you will see the following:

When you have a networked folder, MarcEdit creates the local .network cache as read-only.  You’ll see this is enforced in the editor.

Changes to the _tasks.txt file – file paths are no longer embedded.  Just the name of the file.  The assumption is that all files will be found in the defined tasks directory.  And the program will determine if there is an old path or just a filename, and will ignore any old path information, extracting just the filename and using it with the defined task paths now managed by the program.

Within the TASKS themselves, MarcEdit no longer stores paths to task lists.  Like the _tasks.txt file, just the name of the task file is stored, with the assumption being that the task list will be in the task defined file.  This means imports and exports can be done through the tool, or just by copying and pasting files into the managed task folder.

Finally – in Windows, networked drives can cause long delays when they are offline.  I’ve created a somewhat novel approach to checking this data – but it means that I’ve set a low timeout (~150 mms) – that means if your network has a lot of latency (i.e., not responsive), MarcEdit could think it’s offline.  I can make this value configurable if necessary, but the timeout here really has an impact on parts of the program because the tasks data is read into the UI of the Editor.  In my testing, the timeouts appear to be appropriate. 

These changes have just been implemented on the windows/linux side right now – I’ll move them to the mac version tonight. 

If you’d like to try these changes, let me know – I won’t release these till the weekend, but would be willing to build a test build for folks interested in testing how this works. 

Finally, I know how much the tasks function is used, so I’m giving time for feedback before going forward.  If you see something described that you think may be problematic, let me know.



From: Terry Reese
Subject: Proposed TASK changes…Feedback requested. [Re: Saving tasks lists on the network….]

Following up on this – I’m looking at a behavior change and want to make certain that this won’t cause anyone problems.  Let me explain how the tasks works currently (for access) and how I’ve reworked the code.  Please let me know if anyone is using a workflow where this might be problematic.


When a task is created or cloned, the task and its full path is saved to the _tasks.txt file (in your local preferences).  Example:

Main task with a whole lot of text is right here               C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_03_18_052022564.txt         SHIFT+F1
AAR Library         C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_03_18_052317664.txt        
TLA        C:\Users\reese.2179\AppData\Roaming\marcedit\macros\tasksfile-2016_04_19_113756870.txt        

This is a tab delimited list – with the first value being the name, the second value the path to the task, the first being a shortcut if assigned.  The task naming convention has changed a number of times through the years.  Currently, I use GUIDs to prevent any possibility of collision between users.  This is important when saving to the network.

When a user saves to the network, internally, the tool just changes where it looks for the _tasks.txt file.  But data is still saved in that location using the full network path.  So, if users potentially have different drive numbers, or someone uses a UNC path and someone else doesn’t, then you have problems.  Also, when setting up a network folder, if you have existing tasks, you have to import and export them into the folder.  Depending on how deep your task references are, that may require some manual intervention.

If you are off the network and you save to a network path, your tasks are simply unavailable unless you keep a copy in your local store (which most wouldn’t).


Here’s what I propose to do.  This will require changing the import and export processes (trivial), making a change to the preferences (trivial), and updating the run code for the macros (little less trivial). 

First, the _tasks.txt file will have all filepaths removed.  Items will only be referenced by filenames.  Example:

Main task with a whole lot of text is right here    tasksfile-2016_03_18_052022564.txt               SHIFT+F1
AAR Library         tasksfile-2016_03_18_052317664.txt      
TLA        tasksfile-2016_04_19_113756870.txt      

The program already sets the taskfile path so that it can run tasks that are on the network, this will continue that process of normalization.  Now, no file paths will be part of the string, and access to task locations will be completely tied to the information set in the preferences. 


  • This will make tasks more portable as I no longer have to fix paths.
  • Makes it easier to provide automatic import of tasks when first setting up a network location
  • Allows me, for the first time, to provide local caching of networked caches so that if a user is configured to use a network location, is offline, their tasks will continue to be available.  At this point, I’m thinking caches would be updated when the user opens MarcEdit, or makes a change to a network task.  The assumption, for simplicities sake, is that if you are using a network drive, you cannot edit local networked tasks – and I’ll likely find a way to enforce that to avoid problems with caching and updates.


  • If you use a networked task, you won’t be able to edit the offline cache when you are offline.  You’ll still have access to your tasks (which isn’t true today), but you won’t be allowed to change them.
  • This isn’t possible with the task manager, but in theory, someone could code tasks to live in locations other than the places MarcEdit manages.  This would go away.  All tasks would need to be managed within the defined network path, or the local tasks folder.
  • Using GUIDs for filenames, it shouldn’t happen – but there is always the theoretical chance of duplicate files being created.  I’ll need to ensure I keep checks to make sure that doesn’t happen, which complicates management slightly.

Second – Tasks themselves….

Presently, when a task references a task list, it uses the full path.  Again, the path will be removed to just the filename.  The same pros and cons apply here as above.

For users using tasks, you honestly shouldn’t see a difference.  The first time you’d run the program, MarcEdit would likely modify the tasks that you have (network or otherwise) and things would just go as is.  If you are creating a new networked folder, you’d see a new option in the locations preference that would allow you to automatically copy your current tasks to the network when setting that up.  And, if you are offline and a networked user, you’ll find that you now have access to your networked tasks.  Though, this part represents one more behavior change – presently, if you have networked tasks, you can take yourself offline and create and run tasks local to you.  In the above format, that goes away since MarcEdit will be caching your network tasks locally in a .network location.  If the current functionality is still desired (i.e., folks have workflows where they have are connected to the network for shared tasks, but disconnect to run local only to them tasks), I may be able to setup something so that the task runner checks both the .network and local task directories.  My preference would be to not to, but I understand that the current behavior has been around for years now, and I really would like to minimize the impact of making these changes.


David Rosenthal: RU18?

Tue, 2016-10-04 20:00
LOCKSS is eighteen years old! I told the story of its birth three years ago.

There's a list of the publications in that time, and talks in the last decade, on the LOCKSS web site.

Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system, and to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive that has sustained it in production.

Brown University Library Digital Technologies Projects: Researchers@Brown Ranks #1 in SEO Analysis

Tue, 2016-10-04 17:27

At the 2016 VIVO national conference in August 2016, Anirvan Chatterjee from the University of California, San Francisco gave a presentation on Search Engine Optimization (SEO) — strategies for increasing a site’s ranking in search results.  He analyzed 90 Research Networking Systems (RNS) to determine the proportion of faculty profile pages appearing among the top 3 search results on Google.  His analysis ranked Researchers@Brown (  #1 out of the 90 sites tested.

Chatterjee’s talk was entitled “The SEO State of the Union 2016: 5 Data-Driven Steps to Make your Research Networking Sites Discoverable by Real Users, Based on Real Google Results”

The report of the research,  “RNS SEO 2016: How 90 research networking sites perform on Google” is available here:

David Rosenthal: Panel on Software Preservation at iPRES

Tue, 2016-10-04 14:10
I was one of five members of a panel on Software Preservation at iPRES 2016, moderated by Maureen Pennock. We each had three minutes to answer the question "what have you contributed towards software preservation in the past year?" Follow me below the fold for my answer.

So, what have I contributed towards software preservation in the past year? In my case, the answer won't take much of my three minutes. I published a 37-page report, funded by the Andrew W. Mellon Foundation, entitled Emulation and Virtualization as Preservation Strategies, wrote 15 blog posts on emulation, and gave 4 talks on the topic.

Obviously, you need to read each and every word of them, but for now I will condense this mass of words into six key points for you:
  1. The barriers to delivering emulations of preserved software are no longer technical. Rhizome's Theresa Duncan CD-ROMs, the Internet Archive's Software Library, and Ilya Kreymer's are examples of emulations transparently embedded in the Web. Fabrice Bellard's v86 Javascript emulator allows you not merely to run Linux or OpenBSD in your browser, but even to boot your own floppy, CD-ROM or disk image in it.
  2. The cost of preparing such emulations is still too high. Some progress has been made towards tools for automatically extracting the necessary technical metadata from CD-ROMs, but overall the state of ingest tooling is inadequate.
  3. Most ways in which emulations are embedded in Web pages are not themselves preservable. The Web page embeds not merely the software and hardware environment to be emulated, which should remain the same, but also the particular technology to be used to emulate it, which will change. That's not the only problem. One-size-fits-all emulation delivers some users a miserable experience. The appropriate emulation technology to use depends on the user's device, browser, latency, bandwidth, etc. What's needed is a standard for representing preserved system images and metadata, an emulation mime-type, and an analog of pdf.js, code that is downloaded to work out an appropriate emulation at dissemination time.
  4. Emulation of software that connects to the Internet is a nightmare, for two reasons. First, it will almost certainly use some network services, which will likely not be there when needed. Second, as the important paper Familiarity Breeds Contempt shows, the software will contain numerous vulnerabilities. It will be compromised as soon as it connects.
  5. Except for open source environments, the legal framework in which software is preserved and run in emulation is highly problematic, being governed by both copyright, and the end user license agreement, of every component in the software stack from the BIOS through the operating system to the application. Because software is copyright, national libraries with copyright deposit should be systematically collecting it to enable future emulations. The only systematic collection I'm aware of is by NIST on behalf of the FBI for forensic purposes.
  6. Even if the cost of ingest could be greatly reduced, a sustainable business model for ingesting, preserving and disseminating software is nowhere to be seen.
As usual, the full text of these remarks with links to the sources will go up on my blog shortly after this session.

I was also asked to prepare a response to one of the questions to be used to spark debate:
Economic sustainability: What evidence is required to determine commercial viability for software preservation services? Can cultural heritage institutions make a business case to rights holders that preserving software can co-exist with commercial viability?I decided to re-formulate the two questions:
  • Can closed-source software vendors be persuaded to allow their old software to be run in emulation? The picture here is somewhat encouraging. Some major vendors are cooperating to some extent. For example, Microsoft's educational licenses allow their old software to be run in emulation. Microsoft has not objected to the Internet Archive's Windows 3.x Showcase. I've had encouraging conversations with senior people at other major vendors. But this is all about use of old software without payment. It is not seen as depriving the vendor of income.
  • Is there a business model to support emulation services? This picture is very discouraging. Someone is going to have to pay the cost of emulation. As Rhizome found when the Theresa Duncan CD-ROMs went viral, if the end user doesn't pay there can be budget crises. If the end user does pay, its a significant barrier to use, and it starts looking like it is depriving the vendor of income. Some vendors might agree to cost-recovery charging. But typically there are multiple vendors involved. Consider emulating an old Autodesk on Windows environment. That is two vendors. Do they both agree to the principle of cost-recovery, and to the amount of cost-recovery?

    Update: On the fly, I pointed out the analogy between software preservation and e-journal preservation, the other area of preservation that handles expensive commercial content. Three approaches have emerged in e-journal preservation:
    • LOCKSS implements a model in which each purchaser preserves their own purchase. This does not look like it deprives the vendor of income.
    • CLOCKSS implements a model analogous to software escrow, in which software that is no longer available from the vendor is released from escrow. This does not look like it deprives the vendor of income.
    • Portico implements a subscription model, which could be seen as diverting income from the vendor to the archive. Critical to Portico's origin was agreement to this model by Elsevier, the dominant publisher. Other publishers then followed suit.
    This suggests that persuading dominant software publishers to accept a business model is critical.

Open Knowledge Foundation: Open Knowledge Finland Summer 2016 Update

Tue, 2016-10-04 12:00

This blog post is part of our summer series featuring chapter updates from across the Open Knowledge Network and was written by the team of Open Knowledge Finland.

Summer is a great time in Finland. It’s so sunny that everyone seems to be on holiday! However, there was no time for extended holidays at Open Knowledge Finland – we had a very busy summer. Here is our update for the Network, with key news from the last few months.

Open Knowledge Finland has a new board!

One of the most exciting changes this year was our annual meeting. OKFFI held its annual meeting on Monday May 30 at the Helsinki office. Nearly 40 people (well over 10% of members) attended face-to-face or online – quite a good number, in fact!

Antti ‘Jogi’ Poikola was unanimously selected to continue as the chairman. The new board consists of 3 old members (Jogi, as well as Lilli Linkola and Mika Honkanen) and no less than 5 new members – Susanna Ånäs, Liisi Soroush, Raoul Plommer, Mikael Seppälä and Jessica Parland-von Essen. In its first meeting, each board member was assigned a primary and secondary role as follows:

Antti Poikola – chairman and  web communications

Mika Honkanen – vice chairman and  2nd treasurer

Lilli Linkola – secretary and working group contact

Mikael Seppälä – treasurer and working group contact

Raoul Plommer – web communications and tools and international relations

Susanna Ånäs – internal communications and international relations

Liisi Soroush – collaboration networks and member secretary

Jessica Parland-von Essen – external communications  and collaboration networks

With the new board, it is nice to see the gender split is at 50-50. It is also a great sign that there are a lot of people who want to apply for the board (13 candidates) and that we have great new people aboard to help steer the community. Congratulations and good luck to the board!

Open Knowledge Finland is growing!

Currently, 8 people are employed by Open Knowledge Finland. However, this number will soon decrease slightly as projects are ending. For this year, we have had a number of new people joining us – Emilia Hjelm, John Sperryn, Konsta Happonen. Previously active members like Heidi Laine, Mika Honkanen have received part-time contracts. On average, we have about 4-5 FTE in staff.

In terms of finances, we have managed to grow at a good pace – from just under 200k eur in 2014, to about 300k eur in 2015 – and still on the rise, a total of nearly 500 000 eur in total turnover expected in 2016. The funding is short-term, fragmented and diverse – which is both exciting as well as a cause of concern (for longer term planning).

Open Knowledge Finland currently has over 350 members – and hosts an Open Knowledge Network of nearly 4000 people in Finland.

MyData 2016 gathered about 700 international people to Helsinki – and accelerated the movement for human-centric personal data

2016 is the year of MyData. Open Knowledge Finland is all about the free flow of information. Open data, open knowledge, open collaboration – and, we believe this also includes free (user-controlled) flow of personal information. The MyData movement encompasses concepts and tools not only to build more transparent societies – but also to develop effective services and create new business in the digital domain.

Actions around the MyData conceptual framework represents the BIGGEST concentration of effort for us this year. In particular, Open Knowledge Finland’s key actions for the fall of 2016 were geared towards the MyData 2016 conference (31 Aug – 2 Sep) and the Ultrahack MyData hackathon running in parallel with the conference.

We had some 700 visitors in total – over 500 conference visitors, over 100 hackers or hack visitors, over 30 partner organisations involved. Amazingly, we had 140+ speakers, in 40+ sessions. Visitors came from about 30 countries. The feedback has been excellent – a great results for a first-timer conference!

Check out the event images on the Flickr pool: Conference video archive is available at . Please stay tuned to and @mydata2016 on Twitter. More wrap-ups and posts to follow. And yes, MyData 2017 is on the drawing board! Follow @MyData2017 to keep up on the plans for next year!

That’s not all, folks!

In addition to MyData, many of our 9 working groups have interesting ongoing projects, ranging in size, duration and scope. In a nutshell, here are a few of the active ones:

The 3 year EU project “D-CENT” (Democracy WG) is wrapping up soon. D-CENT is a Europe-wide project creating privacy-aware tools and applications for direct democracy and economic empowerment. Together with citizens and developers, we are creating a decentralised social networking platform for large-scale collaboration and decision-making. Contact :

Yhtäköyttä (Democracy WG), “Common knowledge practices in research and decision-making”,  is our first project for he Finnish Government’s analysis and assessment of research activities (VN TEAS) coordinated by the Prime Minister’s Office (VNK). The aim of the project is to find out what kind of tools and methods could be used in government in order to utilize knowledge management and research data better and to improve evidence-based decision making. This project will involve theoretical study, 30+ interviews and 4 experiments in new tools and methods such as data visualization, open impact assessment, real-time document editing, real-time fact-checking. Contact:

Cost-effective utilization of open data and basic registers: The research project’s goal is to better understand and measure the impacts of open data and the use of the basic public registers. As an outcome, we expect policy recommendations and suggestions for new methods, processes or technical changes to help improve cost-efficient publishing of open data and increase the impact of the basic registers. Contact ;

Open Citizen Science:  Citizen science has most notably been used as a method for creating observational data for life science research. Against the backdrop of current technological advancement, we need to create Citizen Science v 2.0 – open, diverse, responsible, dialogic and academically excellent. In terms of outcomes, we envision a set of concrete recommendations for national stakeholders; we willcreate understanding, awareness and discussion about citizen science as a scientific method and a community; and we will catalyze a research agenda for a new kind of open citizen science. Contact:

Creative Commons Licensing Support: As Creative Commons licenses are the official recommended license for open data in the Finnish governmental sector, awareness and instructions for using them in practice are needed across many sectors of society, including for public open bids, content creation subcontracting, and data purchasing. Contact:

Other projects…to be updated in the next blog! See also summary of OK Finland projects in a few slides.

Get in touch!

During the autumn, we will also be having an extra general meeting and plan to change our rules to better accommodate for scaling. Stay tuned – more to follow!

Want to get in touch? Contact executive director Teemu Ropponen, or international collaboration team, board members Raoul Plommer & Susanna Ånä.

LITA: Using Google Statistics for your Repository

Mon, 2016-10-03 19:06

Don’t miss this new LITA webinar!

Beyond Usage Statistics: How to use Google Analytics to Improve your Repository

Presenter: Hui Zhang
Tuesday, October 11, 2016
11:00 am – 12:30 pm Central Time

Register Online, page arranged by session date (login required)

Librarians and repository managers are increasingly asked to take a data-centric approach for content management and impact measurement. Usage statistics, such as page views and downloads, have been widely used for demonstrating repository impacts. However, usage statistics restrict your capacity of identifying user trends and patterns such as how many visits are contributed by crawlers, originated from a mobile device, or redirected by a search engine. Knowing these figures will help librarians to optimize the digital contents for better usability and discoverability. This 90 minute webinar will teach you the concepts of metrics and dimensions along with hands-on activities of how to use Google Analytics (GA) on library data from an institutional repository. Be sure to check the details page for takeaways and prerequisites.

Details here and Registration here

Hui Zhang is the Digital Application Librarian at Oregon State University Libraries and Press. He has years of experience in generating impact reports with major platforms such as DSpace and Hydra Sufia using Google Analytics or local statistics index. Other than repository development, his interests include altmetrics, data visualization, and linked data

And don’t miss other upcoming LITA fall continuing education offerings:

Social Media For My Institution; from “mine” to “ours”
Instructor: Plamen Miltenoff
Starting Wednesday October 19, 2016, running for 4 weeks
Register Online, page arranged by session date (login required)

Online Productivity Tools: Smart Shortcuts and Clever Tricks
Presenter: Jaclyn McKewan
Tuesday November 8, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Questions or Comments?

For questions or comments, contact LITA at (312) 280-4268 or Mark Beatty,

John Mark Ockerbloom: Forward to Libraries update, and some thoughts on sustainability and scale

Mon, 2016-10-03 18:35

It’s been a while since I posted about Forward to Libraries, but if you’ve been following my Github repo, you may have noticed that it’s had a steady stream of updates and growth.  If making connections across library collections or between libraries and Wikipedia interests you, the service is more comprehensive and wide-ranging than ever.  Here’s an update:

Number of libraries

We now support forwarding to over 1,000 library systems worldwide, running dozens of off-the-shelf and custom-developed catalog and discovery systems.  I’ve expanded coverage in all 50 states, Canada, the UK, and Australia, and have also added links to more countries in all inhabited continents.  (Because the system currently focuses on Library of Congress headings, it works best with Anglo-American catalogs, but it yields acceptable results in some other catalogs as well, like the big Norway research library union catalog I just added today.)  While major research libraries and big-city public libraries are well-represented, I’ve also been trying to add HBCUs, community colleges, rural library networks, and other sometimes-overlooked communities.  And I respond quickly to requests from users to add their libraries.

Wikipedia coverage

Forward to Libraries can be used to make links between library collections, and to and from my millions-strong Online Books Page listings.  But it’s also known for its interoperation with Wikipedia, where it knows how to link between over half a million distinct Library of Congress name and subject headings and their corresponding English language Wikipedia articles.  The majority of these mappings are name mappings provided by OCLC’s VIAF, which in its most recent data dump includes over 485,000 VIAF identifiers that include both a Library of Congress Name Authorities identifier and an English Wikipedia article link.  An additional 50,000 or so topical, demographic, and geographical LC subject headings are also mapped.  These mappings derive from exact matches, algorithmic matching for certain kinds of heading similarities, and a manually curated mappings file that has grown over time to include more than 22,000 correspondences.

What you can do with this data

If you’re browsing a topic or an author’s works on the Online Books Page, you can follow links to searches for the same topic or author in any of the 1000+ libraries currently in my dataset.  (If your library’s not one of them, just ask me to add it.)  If the topic or the author is covered in one of the 500,000 corresponding Wikipedia articles the system knows of, you’ll also be offered a link to the relevant article.

If you’re browsing certain Wikipedia articles, you’ll also find links from them back to library searches in your favorite of those 1000+ library systems (or any other of those systems you wish to search).  Right now those links use templates that must be manually placed, so there are only about 2500 Wikipedia articles with those links, but any Wikipedia editor can add the templates to additional articles.  (A bot could potentially add more quickly, but that would require some negotiation with the Wikipedia community that I haven’t undertaken to date.)  If you’re involved in a Wikipedia-library collaboration project (like this one), you may want to add one of these templates when editing articles on topics that are likely to have relevant source materials in multiple libraries.  (The most common template used is the Library Resources Box, generally added to the External Links or Further Reading section of an article.)

If you’re interested in offering a similar library or Wikipedia linking service from your own catalog or discovery system, I’d be interested in hearing from you.  You can either point to my forwarding service (using a standard linking syntax) or implement your own forwarder based on my code and data on Github.  Right now it requires some effort and expertise to implement either method, but I’m happy to work with interested libraries or developers to make forwarding easier to implement.

Scaling and sustaining the service

The Forward to Libraries service still runs largely as a 1-person part-time project.  (The Wikipedia templates are largely placed by others, and the service fundamentally depends on regularly updated data sets from OCLC, the Library of Congress, and Wikipedia, but I maintain the code and coordinate the central referral database myself.)

Part-time, 1-person projects raise some common questions:  “Will they be sustained?”  and “Can they scale?”  I wasn’t sure myself of the answers to those questions when I started developing this service.  Fortunately, I went ahead and introduced it anyway, and three-going-on-four years later, I’m happy to say that the basic service *is* in fact more sustainable and scalable than I’d thought it might be.   The code is fairly simple, and doesn’t require a lot of updating to keep running.  The main scale issues for the basic service have to do with the number of library systems and the number of topic mappings in the system, and those are both manageable.

The number of libraries turns out to be the more challenging factor to manage.  Libraries change their catalogs and discovery systems on a regular basis, and when they do, search links that worked for their old catalogs often don’t work for the new ones.  I have an automated tool that I run occasionally to flag libraries that return an error code to my search links; it’s not very sophisticated, but it does alert me to many libraries whose profiles I need to update, without my having to check all of them manually.  (If you find any libraries where forwarding is no longer working, you can also use the suggestion form to alert me to the problem.)  The more pressing scaling problem at the moment is the user experience: right now, when you’re asked to choose a library, the program shows a list of links to all 1000+ libraries currently in the system.  That can be a bit much to handle, especially for users whose data is metered.  Updating the library choice form to only show local libraries after the user selects the state, province or country they’re in will cut down on the data sent to the user; that may cost the user an extra click, but the tradeoff seems worth it at this point.

The number of topic mappings, on the other hand, has been easier to manage than I’d thought.  VIAF publishes updated data files for names about once a month, and I can run a script over it to automatically update my service’s name heading mappings when a new file comes out. Likewise, Wikipedia is now providing twice-a-month dump files of its English language encyclopedia.  I can download one of Wikipedia’s files in a couple of hours, and then run a script that flags any topical articles I map to that have gone away, changed their title, or redirected to other articles.  I can then change my mappings file accordingly in under an hour. Library of Congress subject headings change as well, but they don’t change very fast.  New and changed topical headings are published online about once a month, and one can generally add or change mappings from one of their updates within a few hours.  I spend a few spare-time hours each month adding mappings for older subject headings, so if the current rate of LCSH growth holds, in theory I could cover *all* LCSH topical headings in my manual mappings file after some number of years.  In practice, I don’t have to do this, especially as topical mappings also start to get collaboratively managed in places like Wikidata.  (I’m not currently working with their data set, but hope to do so in the future.)

Broadening the service

While I can maintain and gradually grow the basic service with my current resources, broadening it would require more.  Not every library, particularly outside the US, uses Library of Congress Subject Headings, so it would be nice to offer mappings to more subject ontologies and languages.  Similarly, not everyone likes to work with Wikipedia (often with good reason), and it’d be nice to support links to and from alternative knowledge bases as well.  The basic architecture of Forward to Libraries is capable of handling multiple library and web-based ontologies, but additional coding and data maintenance would be required.  There are also various ways to build on the service to engage more deeply with libraries and current topics of interest; I’ve explored some ideas along these lines, but haven’t had the time to implement them.

Things continuing as they are, though, I should be able to maintain and grow the basic Forward to Libraries service for quite some time to come.  I’m thankful to the people and the data providers that have made this service possible. And if you’re interested in doing more with it, or helping develop it in new directions, I’d be very glad to talk with you.

LibUX: User Experience Debt

Mon, 2016-10-03 14:59

An IA Summit presentation by Andrew Wright that demonstrates a way to think about the gap between the current user experience of a site and its potential.

User Experience Debt: Creating Awareness and Acting on Missed Opportunities