We’re pleased to announce that several free webinars are scheduled for the first three months of 2016. Mark your calendars!Date/Time Speaker Title 26 January 2016 (14:00-17:00 UTC / 09:00-12:00 EST) Owen Stephens Installing OpenRefine This webinar will be an opportunity for folks to see how OpenRefine can be installed and to get help doing so, and serves as preparation for the webinar in March. There will also be folks at hand in the Mashcat Slack channel to assist.
The registration link for each webinar will be communicated in advance. Many thanks to Alison Hitchens and the University of Waterloo for offering up their Adobe Connect instance to host the webinars.
Early in the project their advisory board strongly encouraged them to focus on emulation as a strategy, advice that they followed. Their work thus parallels to a considerable extent the German National Library's (DNB's) use of Freiburg's Emulation As A Service (EAAS) to provide access to their collection of CD-ROMs. The Cornell team's contribution includes surveys of artists, curators and researchers to identify their concerns about emulation because, as they write:
emulation is not always an ideal access strategy: emulation platforms can introduce rendering problems of their own, and emulation usually means that users will experience technologically out-of-date artworks with up-to-date hardware. This made it all the more important for the team to survey media art researchers, curators, and artists, in order to gain a better sense of the relative importance of the artworks' most important characteristics for different kinds of media archives patrons. The major concern they reported was experiential fidelity:
Emulation was controversial for many, in large part for its propensity to mask the material historical contexts (for example, the hardware environments) in which and for which digital artworks had been created. This part of the artwork's history was seen as an element of its authenticity, which the archiving institution must preserve to the best of its ability, or lose credibility in the eyes of patrons. We determined that cultural authenticity, as distinct from forensic or archival authenticity, derived from a number of factors in the eyes of the museum or archive visitor. Among our survey respondents, a few key factors stood out: acknowledgement of the work's own historical contexts, preservation of the work's most significant properties, and fidelity to the artist's intentions, which is perhaps better understood as respect for the artist's authority to define the work's most significant properties. As my report pointed out (Section 2.4.3), hardware evolution can significantly impair the experiential fidelity of legacy artefacts, and (Section 3.2.2) the current migration from PCs to smartphones as the access device of choice will make the problem much worse. Except in carefully controlled "reading room" conditions the Cornell team significantly underestimate the problem:
Accessing historical software with current hardware can subtly alter aspects of the work's rendering. For example, a mouse with a scroll wheel may permit forms of user interactivity that were not technologically possible when a software-based artwork was created. Changes in display monitor hardware (for example, the industry shift from CRT to LED display) brings about color shifts that are difficult to calibrate or compensate for. The extreme disparity between the speed of current and historical processors can lead to problems with rendering speed, a problem that is unfortunately not trivial to solve. The overestimate a different part of the problem when they write:
emulators, too, are condemned to eventual obsolescence; as new operating systems emerge, the distance between "current" and "historical" operating systems must be recalculated, and new emulators created to bridge this distance anew. We attempted to establish archival practices that would mitigate these instabilities. For example, we collected preservation metadata specific to emulators that included documentation of versions used, rights information about firmware, date and source of download, and all steps taken in compiling them, including information about the compiling environment. We were also careful to keep metadata for artworks emulator-agnostic, in order to avoid future anachronism in our records. If the current environment they use to access a digital artwork is preserved, including the operating system and the emulator the digital artwork currently needs, future systems will be able to emulate the current environment. Their description of the risk of emulator obsolescence assumes we are restricted to a single layer of emulation. We aren't. Multi-layer emulations have a long history, for example in the IBM world, and in the Internet Archive's software collection.
Ilya Kreymer's oldweb.today shows that another concern the Cornell team raise is also overblown:
The objective of a 2013 study by the New York Art Resources Consortium (NYARC)was to identify the organizational, economic, and technological challenges posed by the rapidly increasing number of web-based resources that document art history and the art market. 18 One of the conclusions of the study was that regardless of the progress made, "it often feels that the more we learn about the ever-evolving nature of web publishing, the larger the questions and obstacles loom." Although there are relevant standards and technologies, web archiving solutions remain to be costly, and harvesting technologies as of yet lack maturity to completely capture the more complex cases. The study concluded that there needs to be organized efforts to collect and provide access to art resources published on the web. The ability to view old web sites with contemporary browsers provided by oldweb.today should allay these fears.
Ultimately, as do others in the field, the Cornell team takes a pragmatic view of the potential for experiential fidelity, refusing to make the best be the enemy of the good.
The trick is finding ways to capture the experience - or a modest proxy of it - so that future generations will get a glimpse of how early digital artworks were created, experienced, and interpreted. So much of new media works' cultural meaning derives from users' spontaneous and contextual interactions with the art objects. Espenschied, et al.point out that digital artworks relay digital culture and "history is comprehended as the understanding of how and in which contexts a certain artifact was created and manipulated and how it affected its users and surrounding objects."
I’ve posted an update for all versions – changed noted here:
- Windows/Linux Changelog: http://marcedit.reeset.net/software/update.txt
- Mac OSX Changelog: http://marcedit.reeset.net/software/mac.txt
The significant change was a shift in how the linked data processing works. I’ve shifted from hard code to a rules file. You can read about that here: http://blog.reeset.net/archives/1887
If you need to download the file, you can get it from the automated update tool or from: http://marcedit.reeset.net/downloads.
The next version of Mirlyn (mirlyn.lib.umich.edu) is going to take some time to create, but let's take a peek under the hood and see how the next generation of search will work.
This post is a brief overview of the process in designing for large web-based systems. This includes understanding what makes up an interface and how to start fresh to create a good foundation that won't be regrettable later.
From the OR2016 Fedora Interest Group program committee
Join us for the Fedora Interest Group sessions at Open Repositories 2016  in Dublin to meet other Fedora users and developers and share your experiences with one another. If you are new to Fedora you will find many opportunities to learn more about the repository software from your peers in the open source community.
This year’s Fedora Interest Group track will showcase presentations, panels, demonstrations, and project updates. Some of the central themes include:
From the OR2016 DSpace Interest Group program committee
London, UK The DSpace community will meet again at the 11th Open Repositories Conference, 13–16 June, in Dublin, Ireland. The DSpace Interest Group program committee invites your contributions to share, describe and report on use of the DSpace platform, outlining novel experiences or developments in the construction and use of DSpace repositories. Whether you’re a developer, researcher, repository manager, administrator or practitioner, we would like to hear from you.
I think it can be easy to look around a library — especially a smooth-running one — and forget that the work that gets done there ranges from the merely difficult to the incredibly complex. This isn’t the sort of stuff just anyone can do, no matter how well-meaning and interested they might be, which is why there are specialized degree programs designed to turn out inventive and effective experts.
I’m talking, of course, about the accountants. And computer programmers. And instructional designers. And usability experts.
And, oh, yeah, the librarians.
A double standard?
There’s a temptation among librarians (and programmers too, of course, and an awful lot of professors) to think that the world consists of two types of work:
- Stuff only we can do, and
- Everything else
If I were to head off to a library school for a semester and take a single course on cataloging, my colleagues would be understandably worried about dropping me next to the ILS with a stack of new books. A single group project looking broadly at research methodologies doesn’t qualify me for … well, for anything, inside the library or not.
But I often see librarians with only half a semester of programming, or a survey course on usability testing (never mind actual UX), or experience in a group project where they got stuck with the title Project Manager take on (or, often, be thrust into) actual professional roles to do those things.
The unspoken, de facto standard seems to be, “We can teach a librarian to do anything, but we can’t or won’t teach anyone else to do Real Librarian work.”
Subject-matter expertise is not overall expertise
I’m lucky enough to work in a ginormous academic library, where we’re not afraid to hire specialists when warranted. And yet, even here, there persists the curious belief that librarians can and often should do just about everything.
This leads me to what I believe is a Truth That Must Be Spoken:
A committee of four interested and well-meaning librarians is not equivalent to a trained expert with actual education and experience.
There’s a reason most disciplines separate out the “subject-matter expert” (SME) from the other work. Instructional Designers are trained to do analysis, study users and measure outcomes, and work with a SME to incorporate their knowledge into a useful instructional product. The world at large differentiates between web design, content management, and quality assurance. And the first time you work with a real project manager, you’ll come to the stark realization that you’ve never before worked with a real project manager, because the experience is transformative.
Knowing the content and culture makes you a necessary part of a complete intervention. It doesn’t make you the only necessary part.
A question of value
“But Bill,” you’re saying after doing a quick check to see what my name is, “we don’t have the money to hire experts in everything, and besides, we’re dedicated to growing those sorts of expertise within the library profession.”
I’m not against that — who could be against that? But I do worry that it exemplifies an attitude that the value the library really offers is essentially embodied in the sorts of things librarians have been doing for a century or more — things that only librarians can do — and everything else that happens in a library adds notable but ultimately marginal value to the patrons.
That’s not true. The website, the instructional and outreach activities, increasingly complicated management, and (the big one these days) contract negotiation with vendors are all hugely important to the library, and arguably have a much bigger impact on the patrons as a group than, say, face-to-face reference work, or original cataloging. I know our digital environment is used orders of magnitude more than our physical plant, up to and including the actual librarians. Not all users are (or should be) valued equally, but when the zeros start stacking up like that, you should at least take a hard look at where your resources are being spent compared to where your patrons are deriving most of the value.
It’s great if you can get a librarian with the skills needed to excel at these “other” things. But when you put a near-novice in charge of something, you’re implicitly saying two things:
- This isn’t all that important to do well or quickly, which you can tell because we put you, a novice, in charge of it, and
- The work you were doing before isn’t that important, because we’re willing to pay you to try to learn all this stuff on-the-job instead of whatever you were doing before.
If there’s an eyes-wide-open assessment of the needs of the institution and they decide in favor of internal training, then that’s great. What I’m railing against is starting a project/program/whatever with the implicit attitude that the “library part” is specialized and hard, and that we don’t really care if everything else is done well, agilely, and quickly, because it’s essentially window dressing.
What to do?
Unfortunately, librarianship is, as a discipline, constantly under attack by people looking for a simple way to cut costs. I worry this has the unfortunate side effect of causing librarians as a culture to close ranks. One way this manifests itself is by many institutions requiring an MLS for just about any job in the library. I don’t think that’s in anyone’s interest.
Are you better off hiring another librarian, or a programmer? Should you move someone off their duties to do system administration (almost certainly badly), or should you cut something else and outsource it? Do you have any idea at all if your instructional interventions have lasting impact? If not, maybe it’s time to hire someone to help you find out.
The days when the quality of a library’s services depended almost exclusively on the librarians and the collection are behind us. It takes a complex, heterogenous set of knowledge and expertise to provide the best service you can for as many patrons as you can. And maybe, just maybe, the best way to gather those skills is to hire some non-librarians and take advantage of what they know.
Librarians deserve to be valued for their expertise, education, and experience. So does everyone else.
Here’s the first web course in the LITA spring 2016 offerings:
Which Test for Which Data: Statistics at the Reference Desk
Instructor: Rachel Williams, PhD student in the School of Library and Information Studies at UW-Madison
Offered: February 29 – March 31, 2016
A Moodle based web course with asynchronous weekly content lessons, tutorials, assignments, and group discussion.
This web course is designed to help librarians faced with statistical questions at the reference desk. Whether assisting a student reading through papers or guiding them when they brightly ask “Can I run a t-test on this?”, librarians will feel more confident facing statistical questions. This course will be ideal for library professionals who are looking to expand their knowledge of statistical methods in order to provide assistance to students who may use basic statistics in their courses or research. Students taking the course should have a general understanding of mean, median, and mode.
- Develop knowledge related to statistical concepts, including basic information on what the goals of statistical tests are and which kinds of data scales are associated with each, with a focus on t-tests, correlations, and chi-square tests.
- Explore different kinds of statistical tests and increase ability to discern between the utility of different types of statistical tests and why one may be more appropriate than another.
- Increase literacy in evaluating and describing statistical research that uses t-tests, correlations, and chi-square tests.
- Improve confidence in answering questions about statistical tests in a reference setting, including explaining tests and results and assisting users in determining which statistical tests are appropriate for a dataset. Helping others analyze graphical representations of statistics.
Rachel Williams is a PhD student in the School of Library and Information Studies at UW-Madison. Rachel has several years of experience in public and academic libraries and is passionate about research design and methods. She has also taught courses at SLIS on database design, metadata, and social media in information agencies. Rachel’s research explores the constraints and collaborations public libraries operate within to facilitate access to health information and services for the homeless.
February 29 – March 31, 2016
- LITA Member: $135
- ALA Member: $195
- Non-member: $260
Moodle login info will be sent to registrants the week prior to the start date. The Moodle-developed course site will include weekly new content lessons and is composed of self-paced modules with facilitated interaction led by the instructor. Students regularly use the forum and chat room functions to facilitate their class participation. The course web site will be open for 1 week prior to the start date for students to have access to Moodle instructions and set their browser correctly. The course site will remain open for 90 days after the end date for students to refer back to course material.
Register Online, page arranged by session date (login required)
Mail or fax form to ALA Registration
call 1-800-545-2433 and press 5
Questions or Comments?
For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty, firstname.lastname@example.org
The following is a guest post by Jeffrey Erickson, National Digital Stewardship Resident at the University Archives and Special Collections at UMass Boston. He participates in the NDSR-Boston cohort.
I am a recent graduate of Simmons College’s School of Library and Information Science as well as a current participant in this year’s Boston cohort of the National Digital Stewardship Residency (NDSR) program. At Simmons, I focused my studies on archiving and cultural heritage informatics. In the NDSR program, I am excited to continue learning as I work on a digital preservation planning project at the University Archives and Special Collections (UASC) at UMass Boston.
My project involves developing a digital preservation plan for the digital objects of the UASC project called the “Mass. Memories Road Show (MMRS)”. Because the UASC operates with limited IT support, hosted technical systems and services are used wherever possible. Therefore, this project is testing the use of ArchivesDirect, a hosted digital preservation solution that combines the Archivematica digital preservation workflow tool and the DuraCloud storage service.
The project is divided into three phases:
- Research and Practice
- Research digital preservation concepts, good practices, tools and services
- Assess the digital stewardship landscape at UMass Boston
- Examine digitization practices and the digital asset management systems employed by the University Archives and Special Collections
- Review and Testing
During the Review and Testing phase I will be collaborating with Archives staff, Library leadership and University stakeholders to:
- Further develop/refine workflows which prepare UASC for continuing digitization projects
- Develop policies and procedures to implement long-term preservation of holdings using cloud-based storage services
- Review and test new policies and procedures
- Implementation and Final Reporting
- Apply the new digital preservation policies and procedures to the MMRS digital objects
- Perform digital preservation tasks
- Assign technical and preservation metadata
- Generate and verify fixity information to ensure data integrity
- Create Archival Information Packages (AIPs), upload them to DuraCloud service
- Other tasks as necessary
- Prepare a final report documenting the project, the procedures and my recommendations
The Mass. Memories Road Show is an ongoing community-based digital humanities project operated by UMass Boston’s Archives and Special Collections. The project captures stories and photographs about Massachusetts communities contributed by its residents. The goal of the project is to build communities and create a collection of images and videos collected at public events (“Road shows”) for educational purposes. Members of the community participate by contributing their personal photographs, which are digitized by UMass Boston Archives staff and volunteers. Additionally, participants may be photographed or video recorded telling their stories. The collected images and videos are then processed and uploaded to the UASC’s CONTENTdm repository. The digital objects in this collection require digital preservation because they are original materials that cannot be replaced if they were to become lost or damaged. To date the collection consists of more than 7,500 photographs and 600 video interviews, with several hundred added each year.
The initial Research Phase of the project has been completed. I have gathered a lot of information about digital preservation concepts and good practices. I have evaluated many digital preservation tools, including Archivematica and DuraCloud. I have familiarized myself with the Mass. Memories Road Show; documenting the digitization workflows and studying the digital asset management system in use at UMass Boston. A Road Show event was held in October on Martha’s Vineyard so I was able to witness first-hand how the digital objects are captured and digitized.
The primary deliverables for the Research Phase of the project were a Digital Content Review (DCR) and a GAP Analysis. The DCR defines the scope of the digital objects in the collection and assesses the collection’s future growth and digital preservation needs. The GAP Analysis considers the current digitization practices and the digital asset management system and compares them to the digital preservation requirement as outlined in the OAIS Reference model. The GAP Analysis and the parameters of the project dictate that digital preservation efforts be concentrated on preparing digital objects for ingest and implementing archival storage.
Moving forward, I will be working with Archives staff and ArchivesDirect consultants to address the following issues.
- Identifying tools and developing procedures for generating and reviewing fixity information to ensure the authenticity and data integrity of the collection.
- Refining the workflows to determine how best to integrate the new systems, Archivematica and DuraCloud, with the current digital asset management system centered on the CONTENTdm repository.
- Developing local storage options to ensure that multiple copies of the collection are preserved in archival storage.
- Determining procedures and solutions for incorporating descriptive, technical and preservation metadata requirements into the digital preservation workflow.
- Creating an exit strategy to ensure digital preservation can continue in the event that any of the hosted services become unavailable for any reason.
I am looking forward to moving from the research phase to the practice phases of the project and applying the information I have gathered to the tasks involved in the preservation process. I anticipate that the most challenging elements standing between me and a successful completion of the project will be figuring out how Archivematica, CONTENTdm and DuraCloud can best be configured to work together and how to manage the metadata requirements. Integrating the three systems to work together will require me to gain an in-depth understanding of how they work and how they are configured. As a “systems” guy, I look forward to taking a look under the hood. Similarly, I am looking forward to gaining a stronger understanding of how to manage and work with technical and preservation metadata.
I feel like I have learned so much through the first half of my NDSR project. But I realize that there is still a lot to do and even more to learn. Thank you for your interest in my project.
We're taking this week's blog spot to highlight a nifty little tool for Islandora that anyone can adopt: YUDLbot. Short for York University Digital Library bot, the YUDLbot was written by Nick Ruest to take objects from York's Islandora repository and tweet them hourly, using the object description as the body of the tweet and linking back to the object in the repo. Randomly trolling through such an extensive repository turns up some pretty fun things, such as:
Image of a cat with a cast on its leg, laying in a basket. https://t.co/duQT9jBvGW— YUDLbot (@YUDLbot) January 18, 2016
Or my personal favourite:
Blurry image of closeup of surface. https://t.co/i8suwEdS93— YUDLbot (@YUDLbot) January 14, 2016
The parameters the bot uses to select objects can also be tweaked further, spawning new bots like YUDLcat and YUDLdog. The code behind all of this is available on GitHub, along with some quick instruction on how to customize it for your own repo. The University of Toronto Scarborough has their own YUDLbot-based Twitterbot and there is an unofficial bot tweeting from the collection of the Toronto Public Library. Why not take the code for a spin and start tweeting your repo?
Jim Cramer, who coined the “FANG” acronym as a descriptor for the high-flying Facebook, Amazon, Netflix, and Google group of tech stocks that have dramatically outperformed the market …. In fact, though, Cramer was more right than he apparently knows: the performance of the FANG group is entirely justified because of the underlying companies, or, to be more precise, because the underlying companies are following the exact same playbook. Ben Thompson
We read The FANG Playbook on Stratechery by Ben Thompson, who explains how controlling users’ entry points into a market category enables them to exert control over that user experience and subsequently control — like a dam in a river — what happens in that market.
There is a clear pattern for all four companies: each controls, to varying degrees, the entry point for customers to the category in which they compete. This control of the customer entry point, by extension, gives each company power over the companies actually supplying what each company “sells”, whether that be content, goods, video, or life insurance. Ben Thompson
Libraries aren’t so disimilar. I wrote in The Library Interface how looking at the library as an intermediary touchpoint between what patrons want and that content / product / service shows the value of being and designing to an access point:
These are the core features of the library interface. Libraries absorb the community-wide cost to access information curated by knowledge-experts that help sift through the Googleable cruft. They provide access to a repository of physical items users want and don’t want to buy (books, tools, looms, 3d printers, machines). A library is, too, where community is accessed. In the provision of this access anywhere on the open web and through human proxies, the library creates delight. Michael Schofield
If you like you can download the MP3.
As usual, you support us by helping us get the word out: share a link and take a moment to leave a nice review. Thanks!
The post 032 – What can libraries learn from the FANG playbook? appeared first on LibUX.
DuraSpace News: Digital Preservation Planning: An NDSR Boston Project Update featuring DuraCloud and ArchivesDirect
From The Signal: Digital Preservation Blog from Library of Congress
Last updated January 23, 2016. Created by Peter Murray on January 23, 2016.
Log in to edit this page.
From the announcement:
Fedora 4, the new, revitalized version of Fedora, was released into production in November 2014. Fedora 4 features include vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more. Fedora Camp offers everyone a chance to dive in and learn all about Fedora 4.
Over the winter break I’ve been enjoying The Culture of Connectivity which was recommended by Nicholas Proferes during a talk last fall at the UMD iSchool about his doctoral research. I might have more to say when I finish it, but the chapter that examines Twitter contained a nice description of a tweet that seemed particularly relevant as Twitter considers changing from 140 to 10,000 characters.
Both the quality and quantity of tweets have been vital elements in the dispute of what constitutes Twitter’s essence during the stage of interpretive flexibility. The “tweet” is arguably Twitter’s most distinctive contribution to online culture; a sentence limited to 140 characters flagged by a hashtag has become a global format for online public commentary. This new cultural form has been adopted widely outside the platform proper, for instance in newspapers and on television. Its concise syntax and delimited length render the tweet virtually synomymous with a quote–a citation from a source for which authentication resides with the platform, not the journalist. Aside from figuring in the news the tweet has emerged as a cultural form inspiring poets and literary authors. (Dijk, 2013, pp. 76–77)
This was my first encounter with the notion of interpretive flexibility, a relatively old idea (at least relative to social media) from Science and Technology Studies that technological artifacts can often exist in time periods that support multiple (and possibly conflicting) interpretations (Pinch & Bijker, 1984). In some ways interpretive flexibility sounds like common sense, but maybe not when you consider how easy it is to slip into thinking of technologies as fit for a particular purpose, and scientific findings as facts. I think it’s kinda interesting to consider use or as a form interpretation when it comes to software. Hacking as interpretation.References
Dijk, J. van. (2013). The culture of connectivity: A critical history of social media. Oxford University Press. Retrieved from https://global.oup.com/academic/product/the-culture-of-connectivity-9780199970780
Pinch, T. J., & Bijker, W. E. (1984). The social construction of facts and artifacts. Social Studies of Science, 14(3), 399–441. Retrieved from http://www.jstor.org/stable/285355
Library of Congress: The Signal: Intellectual Property Rights Issues for Software Emulation: An Interview with Euan Cochrane, Zach Vowell, and Jessica Meyerson
The following is a guest post by Morgan McKeehan, National Digital Stewardship Resident at Rhizome. She is participating in the NDSR-NYC cohort.
I began my National Digital Stewardship Residency at Rhizome — NDSR project description here (PDF) — by leading a workshop for the Emulation as a Service framework (EaaS), at “Party Like it’s 1999: Emulated Access to Complex Media Collections,” a panel about software emulation organized by NDSR-NYC alum Julia Kim, at the 2015 AMIA conference in Portland, OR. Software emulation is an important tool for preservation of digital artworks because it allows researchers to experience complex digital materials in their native creation environments, and can thereby enable full access to “software dependent content,” the term offered by Euan Cochrane, Digital Preservation Manager at Yale University, for content that is integral to the overall meaning of a work, but which “requires a particular and limited range of software environments in order to be interacted with, rendered, viewed or consumed.”
The EaaS framework provides a streamlined user experience for accessing a number of different emulated software environments, and the AMIA panel provided an opportunity to examine this approach alongside other recent projects using software emulation, including computer games within the Timothy Leary Papers that are available to play at the New York Public Library Brooke Russell Astor Reading Room, Cornell University Library’s Preservation and Access Frameworks for Digital Art Objects project (PAFDAO), and the Jeremy Blake Papers at New York University. In my workshop presentation I discussed Rhizome’s collaborative development with bwFLA of a cloud-based EaaS implementation allowing users to access artworks via Rhizome’s emulator site, which can also be embedded into other websites such as blogs and social media sites. Since April 2015, Rhizome’s online EaaS implementation has allowed visitors to immerse themselves in the interactive narratives and richly detailed audio and graphics of three video games/artworks created as CD-ROMs in the 1990s by the artist Theresa Duncan.
Even with such promising technical developments for emulation as an access strategy, however, at present intellectual property (IP) rights and software licensing issues remain as primary obstacles to widespread implementation. For example, one of the case studies I examined, Cochrane’s August 2014 post in The Signal analyzing the EaaS implementation at Yale University Library, described three compelling use-case scenarios for emulation across different research contexts, but noted that even given the potential usefulness of EaaS for each project, licensing issues ultimately prevented researchers from actually using EaaS beyond their initial research stages.
Meanwhile, examining and addressing the complex challenges of software IP rights has been included as a key part of the work of the Software Preservation Network (SPN), one of the national digital platform priority projects among nine recently selected for funding by the IMLS. SPN’s project proposal (PDF) includes “articulating a business model and organizational structure through which software licensing and legitimate, ongoing non-profit re-use can operate harmoniously” as a critical step toward preserving software, and therefore a core practice within digital preservation.
To learn more about IP rights issues related to software emulation, I reached out to Euan Cochrane, and to SPN’s co-PIs: Zach Vowell, Digital Archivist at California Polytechnic State University, and Jessica Meyerson, Digital Archivist at the University of Texas at Austin, with a few questions about broad challenges within software IP rights, specific licensing scenarios they are addressing, and potential future licensing models. Their observations shed light on the range of avenues they are pursuing to further establish emulation as a viable and practical method for enabling researchers’ direct interaction with valuable software-dependent content that is locked inside legacy media or inaccessible file formats.
Morgan: The SPN’s IMLS project summary cites a 2014-15 survey of archives professionals in which 51% of survey participants identified access to licenses as an obstacle for emulated access to born-digital materials. Could you share your observations about the implications of these findings, and what approaches SPN may take to address some of these challenges?
Jessica: At the most basic level, that survey response pattern indicates that even in a scenario where staff resources and technical constraints are not considered the most significant barrier, organizations are still hesitant to endorse emulation as an institutional preservation and access strategy unless they have some way to reconcile the software licensing requirements. With no systematic software collection program, most institutions do not currently know how they would access the software titles required to provide emulated access. And in terms of systematic software collection, no single institution can possibly collect all of the software titles needed to render/access the digital objects in its collections present and future.
So, unless we want restrictive software licensing to play a crucial role in appraisal and collection development decisions moving forward – as a community we might think about systematic software collection in terms of documentation strategy. In other words, art and design organizations could focus on creative design software such as Adobe Creative Suite and business history focused collections might focus on accounting software such as Quickbooks or business intelligence software such as Tableau. Meanwhile, digital objects created in software only available as a cloud-hosted service will likely require participation from the software industry.
Morgan: How will SPN address software licensing as it relates specifically to emulation? For example, what priority might be given to the creation of allowances for emulation use-cases?
Zach: Academic licensing may provide some protection from infringement claims, but the extent of that protection remains to be seen. (That is, is it good enough for an institution to show a receipt for its license purchase and that covers all software under its care?). Regardless, licenses will need to be well documented and linked to the license keys used to activate the software.
Ideally, we want to establish a licensing framework that directly addresses the emulation use-cases that cultural heritage organizations are facing now (and in the near future). Pursuing such a framework is a high priority for our project, and hopefully, our project’s research can develop the literature on use cases emulation and other born-digital access strategies. Of course, establishing such a framework presupposes a working relationship with software rights holders — to that end, we’ve been discussing three distinct licensing scenarios:
- rights holders are interested in collaborating on a new licensing framework
- rights holders are not interested in collaborating on a new licensing framework
- orphan works
The SPN project will analyze and document the liability to copyright infringement present in each scenario — and potentially produce best practices based on that analysis.
Morgan: Euan, at Yale, you are working to implement the EaaS framework to enable researchers’ access to digitized materials from the libraries’ collections. For example, you’ve described one project that aims to increase overall use of the wealth of (currently underused) digital materials stored on floppy disks and CD-ROMs in Yale’s general collections by making these materials directly available to users on library computers running legacy software via the EaaS framework. What kinds of licensing issues have you discovered in the course of this process, and how are you addressing them?
Euan: Since writing the Signal article, we’ve discovered that existing agreements with Microsoft allow us to use their older software in many contexts, which is great. But, while very useful, that leaves many thousands of other products that we don’t yet have any solution for; and furthermore, our existing agreement with Microsoft may need to be updated to take into account uses that aren’t covered by it. We will implement EaaS into a production environment in coming months, and start using it with the disks that are clearly allowable within that purpose and which only require either Linux or Windows to be accessed (within the terms of our current Microsoft agreement).
To make digitized content from the general collections available via the EaaS framework, we need the following:
- Either: the ability to restrict access to only as many users as we have copies of the software—something the bwFLA team have been working on recently; or: permission to make the content available to as many people as want it, regardless of how many copies we have in our possession/own.
- Either: an understanding of any end-user license agreements (EULAs) associated with each item and what they allow us to do (where applicable); or: permission to ignore the EULAs; or: permission to use the items in this context, even when it is not mentioned in the EULAs.
- Permission to make enough copies of the data to enable this to technically happen and to properly preserve the bits.
- Sufficient software licenses to support concurrent users of the software needed to interact with the content on the disk(/c)s.
Morgan: Could you describe any licensing models you’re looking at which may provide effective approaches to these issues?
Euan: Tools like EaaS and cloud gaming enable software to be treated as a performance, and as a service much like music is treated by streaming music services. As such, we could address licensing by establishing similar organizations to manage it as exist for music. i.e. performance rights organizations (PROs). So my suggestion is that we set up such an organization for licensing old software.
Some of the significant benefits to a performance rights model for software would be:
- Memory institutions could archive software and only ever pay anything (if anything) when it is used, not just for archiving it.
- Appropriate (e.g. academic/non-profit) organizations might be provided with royalty-free licenses to older software, but these could be managed/validated by the PRO.
- Commercial services could go to the PRO for licenses for commercial uses.
- PROs would alleviate the current inefficient situation of organizations issuing separate access requests from each vendor for a needed software product.
- PROs could deal with transnational licensing. For example, a single PRO could enable French-licensed software to be made available in other countries for use in accessing content produced in France using that software.
Morgan: Jessica, what kinds of licensing models is SPN exploring? I’m also curious about what you’re learning about how these issues factor into the use of emulation for creating “virtual reading rooms” for digital materials, and what kinds of models might inform the future development of software emulation. Are there examples of successful current implementations that you’re looking at?
In terms of licensing models, one that has come up in our conversations (and during the 2014 SAA session) was the performing rights licensing model. Again, that model presupposes that copyright holders are willing to opt-in and embrace the model.
In terms of virtual reading rooms, Dragan Espenschied’s work at Rhizome with Theresa Duncan’s Zero Zero is one exciting example of current emulation projects. The 2014-2015 Born Digital Access Survey (referenced in the SPN IMLS project narrative) investigated current efforts to provide access to born-digital collection material and interview questions specifically addressed any attempts by participants to experiment with emulation as a preservation/access method in their institutions—that data set should be available to the community in the spring. Others are working on similar research—but so far, the general impression is that institutions currently engaged in emulation projects are taking a calculated risk by relying on the section 108 fair use exemption for libraries and archives. However, we can only apply that logic to physical installation media (either already in our collections or available by some other means, i.e., purchase or donation). Again, in the current era of cloud-based subscription software we can apply fair use, procurement of the actual software (once the individual or institutional subscription has lapsed) is introduced as part of the larger challenge.
This underscores the need for the SPN project—to serve as a mechanism for the coordination of existing but disparate collection efforts, as well as a vehicle for collective licensing negotiations. SPN’s emphasis is distinctive but complementary to the UNESCO Memory of the World PERSIST project, which is working to address the broader challenge of “sustainability for the information society” by focusing on several key areas, including the development of technical infrastructure in the form of the Software Heritage Platform (PDF).
Happy Friday everyone! This is part five of my Linked Data Series. You can find the previous posts by going to my author page. Last week I was fortunate enough to attend Mashcat 2016 in Boston. It was a wonderful one-day conference. We had some very interesting conversations aimed at breaking down communication barriers in libraries (archives and museums), and I was able to meet some fantastic professionals (and students).
In addition to attending, I also presented a talk titled Finding Aid-LD: Implementing Linked Data in a Finding Aid Environment (slides). During the presentation I identified various Linked Data publishing strategies that are currently being implemented. I thought this would be a neat topic to post here as well, so today I’m going to give you the deets on Linked Data publishing strategies.Survey of Publishing Strategies
Note that these strategies are not mutually exclusive. You can combine these strategies for any particular solution.Data Dump
A data dump is a zipped file or set of files that contain the complete dataset of a provider.
Somebody wants to download a provider’s full dataset for research, reuse, etc.
A subject page is a document or set of documents that contain all the data about a resource. Subject pages are very similar to traditional metadata records. Common practice is to use content negotiation so that when you go to a URI, the URI will redirect to a human-readable or machine-readable document based on the HTTP ACCEPT header. A newer and increasingly popular practice is to embed RDFa into HTML documents. Google and the other big search engines index RDFa and other types of embedded metadata. RDFa is becoming an added layer to content negotiation, and in many cases an alternative altogether.
A person wants to dereference a resource URI and discover new knowledge by browsing through resource links.
Triplestores are databases for storing RDF triples/data. SPARQL is a query language for RDF and most commonly accesses RDF data through triplestores. SPARQL can run very complex, semantic queries on RDF and can infer new knowledge based on the complex queries. A SPARQL endpoint is a server access point that you go to to run queries on a triplestore.
A researcher wants to run complex, semantic querying of the data. A reference librarian needs to perform a complex query during a reference session.
A relatively new strategy is through Triple Pattern Fragments (TPF). TPF aims to be an efficient solution for querying RDF data (you can read more about what I mean here). TPF breaks queries down into triple patterns (subject predicate object). Example:
Give me all the resources whose birthName is “Christopher Frank Carandini Lee”.
?subject <http://dbpedia.org/ontology/birthName> "Christopher Frank Carandini Lee"
There are currently two types of TPF software: TPF servers and TPF clients. The server runs simple triple pattern queries as shown above. The client uses triple pattern queries to run complex, SPARQL-like queries. According to their website, TPF clients have lower server cost and higher availability when compared to SPARQL endpoints, which means that the former might be a good alternative to the latter. The only caveat is that a TPF client uses more bandwidth and has a higher client cost.
A Linked Data API is an effort to transform complex RDF data into simple RESTful APIs. The only such software that I’ve found is aptly named Linked Data API. According to the documentation, Linked Data API is an API layer that sits on top of a SPARQL endpoint. It can generate documents (subject pages) and run “sophisticated queries” (though, I don’t think they can be as complex as SPARQL queries). I’ll confess that this strategy is the one I’m least knowledgeable about, so please feel free to delve into the documentation.
- Cool URIs for the Semantic Web
- Apache Jena
- Linked Data Fragments | In depth
- linkeddata.org, Tools
- Linked Data Platform
- SPARQL By Example
I hope this gives you a good idea of the plethora of ways to publish Linked Data. If you know of any others please list them in the comments. As always, I invite you to post questions and comments below or send them to me via email. Thanks for reading!?