You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 6 days 21 hours ago

Journal of Web Librarianship: Improving Primo Usability and Teachability with Help from the Users

Thu, 2016-07-07 04:09
10.1080/19322909.2016.1190678
Barbara Valentine

Ed Summers: DocNow's Theoretical Bricolage

Thu, 2016-07-07 04:00

Below are some remarks I made on a panel at Archival Education and Research Institute (AERI) 2016 about the Documenting the Now project.

Thank you for the opportunity to be here at AERI today. I have to apologize for not being able to present on the topic proposed by Bergis Jules a little less than a year ago. If he was able to make it Bergis was hoping to talk to you today about research that he has been doing into how African Americans’ access to traditionally privileged digital information spaces presents archivists with opportunities and challenges around collecting and preserving social media records. In many ways that particular research was eclipsed by a project he and I have been collaborating on for the last year called Documenting the Now. When deciding whether to continue the presentation Ricky suggested that I use this time to share some background information and preliminary work about the Documenting the Now project. For the purposes of this panel I thought I would also try to highlight some of theoretical perspectives that I think are guiding the project. Please bear in mind that this is largely what we might call in the software world vaporware. It is research that is still in the process of becoming.

Documenting the Now

A good friend of mine once advised me to lead presentations with the punchline: to start with the thing I want the audience to remember, in case they decide to tune out, or walk out. Perhaps that window has already closed. But just in case I thought I would start by briefly describing the goals of the Documenting the Now project.

The Documenting the Now project is a two year partnership involving Washington University in St. Louis, the University of Maryland and the University of California at Riverside that will achieve three distinct but interdependent goals:

  1. To deposit a collection of 13M tweets into an institutional respository running at Washington University in St. Louis.
  2. To develop an open source application, DocNow, that allows researchers and archivists to create their own collections of Twitter and Web content.
  3. To build a community of practice and documentation around social media archiving, with particular attention to ethical practices of collecting “public” data from Twitter.

If you remember nothing else from this presentation just remember those three things we are trying to achieve. I’m going to spend the rest of my time here telling you a bit more about each of the goals, and their theoretical context.

The Dataset

Almost two years ago, Michael Brown, an 18 year old black man was killed by Darren Wilson, a white police officer in Ferguson, Missouri. The killing initiated a series of ongoing protests and demonstrations in Ferguson, that were then amplified by the emerging BlackLivesMatter movement. Two days after Michael Brown was killed, the Society of American Archivists annual meeting began in Washington, DC. There was much conversation among attendees about the need to document what was going on in social media and on the Web around the protests. It makes sense that this would be the case, for as Punzalan & Caswell (2016) recently wrote:

… the cause of social justice has been a topic of discussion on various archival fronts for at least 40 years. If the birth of the modern Western archival profession occurred in 1898 with the publication of the Dutch Manual, then the field has been tackling various aspects of social justice issues for nearly half of its modern history. With this trajectory, we believe that the conversation will continue in years to come. (p. 32)

Standing on these 40 year old shoulders Bergis and I decided to do what we could to collect the conversation in Twitter that was happening in response to killing of Michael Brown. We collected 13 million tweets that mentioned the word “ferguson” in the two weeks following his death. The process highlighted the strengths and weaknesses of the tools that were available to us, and ultimately raised questions about the ethics of performing this data collection that was generated by hundreds of thousands of users. Our work took a new turn in the following year as the #BlackLivesMatter movement raised awareness about the deaths of Eric Garner, Tamir Rice, Walter Scott, Freddie Gray, Sandra Bland and Samuel DuBose–many of which we collected Twitter conversations for. Indeed, it continues still today as awareness was raised in Twitter about the killing of Alton Sterling and Philando Castile just days ago.

Bergis and I wrote about this data collection work on Medium in the On Archivy publication, which tapped into an existing wellspring of interest in the role of archives in social justice, and the emerging BlackLivesMatter movement. The analysis of Jackson & Foucault Welles (2015) as well as Freelon, Mcilwain, & Clark (2016) has since confirmed the pivotal role that Twitter played in incubating and building awareness about the killing of Michael Brown. They write:

Without focusing on the first week of the Ferguson network and further unpacking the network by day, we would not have been able to see the important influence of key crowdsourced elites and members of American counterpublics. In particular, our data spotlight the discursive labor of initiators and other influential everyday citizens, most of whom were young and/or African- American, who pushed the larger public sphere to address what happened to Michael Brown and offered ideological interpretations of Brown’s death and resulting events firmly situated in minority experiences with state oppression. (Jackson & Foucault Welles, 2015, pp. 412–413)

This Ferguson dataset is what we are planning to deposit in some form into the Fedora based institutional repository at Washington University in St. Louis.

The Application

We performed data collection using a utility I had previously created called twarc. I think it’s fair to say that twarc is a user unfriendly tool. It runs (assuming you can install it in the first place) from the command line, which isn’t really accessible to a large number of users. There are a variety of other tools available, but none that would allow us to easily search tweets that had already been sent, and save the richly structured data available from the Twitter API. Realistically, twarc was also ready-to-hand since I was already familiar with it, and time was of the essence.

Time was of the essence because of some peculiarities related to the Twitter API. Only tweets from the past seven days are available via the Twitter search API. In addition you can only request 100 tweets at a time, and those requests can only be issued 180 times every 15 minutes. So these quotas or rate limits control how many tweets can be requested in a day: 1,728,000. If you do not observe these limits your requests for data will be denied, and your application can potentially be blocked by Twitter. So we were working against the clock to collect the tweets before they fell out of the 7 day window.

While it is by no means a secret, understanding social media platforms, and how to work with them as they change and evolve over time is not knowledge that is widely held by archivists and researchers. Bergis and I became convinced that there was an opportunity to build an application that would empower users to create these collections for themselves. In addition to collecting the Twitter conversation we also wanted to build on the work of Rollason-Cass & Reed (2015) at the Internet Archive in using the Twitter conversation as an appraisal tool in archives of Web content.

In his recent critical analysis of Web 2.0 technologies Proferes (2016) offers an interpretation of social media platforms using Braman (2006)’s concept of information power, where power is realized through an individual’s or collective’s ability to make choices about how information about them is collected and shared.

What is important is that access to information about how this part of the platform works creates the possibility for the individual to make a choice. Choice creates the possibility for the expression of informational power. These possibilities are closed off when users do not have the basis of informational power from which to enter these fields of action.

I think Proferes’ application of information power is a useful critical lens to view our application development through. We know in this post-Snowden environment that powerful entities already have the ability collect and analyze large amounts of social media content. We want the DocNow application to inform and empower archivists, researchers and content creators in the building of social media archives.

The Community

Finally, and most perhaps most importantly, we want to build a community of practice around the ethical collection of content like the Ferguson dataset from Twitter and the Web. This work began in small part at the Maryland Institute for Technology in the Humanities where we hosted a series of four BlackLivesMatter Teach Ins around the Ferguson dataset. We also sought out a partnership with Washington University in St. Louis since their work on the Documenting Ferguson project complements the work we were doing in social media.

As the Documenting the Now project took shape, and we put our proposal to the Mellon Foundation together, Bergis assembled an advisory board of 20 individuals coming from a variety of backgrounds: sociologists, political scientists, archivists, software developers, and journalists. We will be meeting for the first time in St. Louis on August 21-23 in order to explore together what shape the DocNow application could take using a prototype we have been working on for the past few months. We will also be joined by a group of activists who used social media during the Ferguson protests. The involvement of activists and our advisory board in the design of DocNow is central to our work, and is informed by two strands of theory.

Value Sensitive Design where the ethical values of direct and indirect stakeholders are factored into the design. In particular Shilton (2012)’s work on value levers is an important conceptual tool for opening and sustaining conversations about values while allowing those conversations to inform the design.

The second is the application of a feminist ethics of care recently outlined by Caswell & Cifor (2016) where relations between the archivist, record creators, subjects, users and communities are marked by an attention to the contingencies of mutual responsibility. As they say:

We cannot ethically continue to conceive of our primary users as academic scholars; survivors of human rights abuse and victims’ families use records, community members use records. We need to build policies, procedures, and services with these users in mind, but even more so, we need to shift our affective orientations in service to these users.

We see engaging the activist community who generated content in social media as key participants in the design of DocNow. Hopefully a year from now we will have more to report about how this fusion of theoretical ideas plays out in the form of the DocNow application and community. If any of this is of interest we welcome your feedback in our Slack channel which currently has over 100 members. Don’t worry, not all of them are active at the same time. Also, please join us for the livestream portions of our St. Louis meeting in August. We will be sending out information about that via our newsletter. I’ll be around here at AERI till Tuesday so please find me if you want to learn anything more about the project.

References

Braman, S. (2006). Change of state: Information, policy and power. MIT Press.

Caswell, M., & Cifor, M. (2016). From human rights to feminist ethics: Radical empathy in the archives. Archivaria, 82, 23–43.

Freelon, D., Mcilwain, C. D., & Clark, M. D. (2016). Beyond the hashtags: #Ferguson, #blackLivesMatter, and the struggle for online justice. Center for Social Media; Impact. Retrieved from http://www.cmsimpact.org/sites/default/files/beyond_the_hashtags_2016.pdf

Jackson, S. J., & Foucault Welles, B. (2015). #Ferguson is everywhere: Initiators in emerging counterpublic networks. Information, Communication & Society, 1–22.

Proferes, N. (2016). Web 2.0 user knowledge and the limits of individual and collective power. First Monday, 21(6). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/6793

Punzalan, R. L., & Caswell, M. (2016). Critical directions for archival approaches to social justice. Library Quarterly, 86(1), 25–42.

Rollason-Cass, S., & Reed, S. (2015). Living movements, living archives: Selecting and archiving web content during times of social unrest. New Review of Information Networking, 20(1-2), 241–247.

Shilton, K. (2012). Values levers: Building ethics into design. Science, Technology & Human Values, 0162243912436985.

SearchHub: Lucidworks View 1.2 Ready for Download

Wed, 2016-07-06 21:22

A new version of our UX framework Lucidworks View is ready for download!

View is an extensible search interface designed to work with Fusion, allowing for the deployment of an enterprise-ready search front end with minimal effort. View has been designed to use the power of Fusion query pipelines and signals, and provides essential search capabilities including faceted navigation, typeahead suggestions, and landing page redirects.

New Features

  • Now support range facet type
  • Add multi select faceting, meaning you can facet on multiple items within a single query
  • Add ‘clear all’ button to facet lists, clearing all selected facets
  • Improved signals now pass in more information including position, language, and platform
  • Enhanced signals service, adding additional functionality
  • Improved field display in templates by setting set max-length
  • And several bug fixes including fixing typeahead of a query profile or pipeline, and proper truncation for fixed field values

Lucidworks View 1.2 is available for immediate download at http://lucidworks.com/products/view

Read the release notes or documentation, learn more on the Lucidworks View product page, or browse the source on GitHub.

The post Lucidworks View 1.2 Ready for Download appeared first on Lucidworks.com.

District Dispatch: 3D printing and educational equity at ALA Annual

Wed, 2016-07-06 19:22

Guest Blogger Rebeccah Baker served as the Student-to-Staff Program participant for the ALA Office for Information Technology Policy (OITP) at the 2016 ALA Annual Conference. Rebeccah completed her M.L.S. at the University of Maryland’s College of Information Studies in May 2016.

Model of a DNA double helix. Image from the Benetech Diagram Center.

“3D printing is the latest wave of the digital revolution,” as stated by OITP’s senior information policy analyst Charlie Wapner during the session “3D Accessibility Synergy: Anchor Institutions ‘Make’ Opportunities for Diverse Learners” at the 2016 American Library Association (ALA) Annual Conference held in Orlando, Florida. This OITP session focused on the forward-thinking partnership between libraries, museums, schools and the Silicon Valley-based social advocacy organization Benetech. Dr. Lisa Wadors Verne, program manager of Education, Research, and Partnerships for Benetech discussed the organization’s aspiration to level the playing field for learners with disabilities. Imagine 3D printing a model of DNA at your library and having the opportunity to witness learners, regardless of their level of ability, better understand the model through multimodal learning.

Benetech’s Diagram Center, which is primarily funded by the Office of Special Education Programming in the U.S. Department of Education, is leading the effort to create learning tools and strategies that help individuals with print and other disabilities more easily grasp complex science, technology, engineering, and mathematics (STEM) topics. The center focuses on taking the images and information within books and bringing that information directly into the hands of the learners through tactile representations. Tactile representations are traditionally expensive learning tools, but 3D printed alternatives provide the benefits of being both customizable and low-cost. The organization is actively working with publishers to make books that are born-accessible. Examples of these tactile representations include:

  • Haptic feedback technology used for charts
  • Sonification for bell curve illustrations
  • 3D printed models

Benetech has created a Repository of Accessible 3D Objects that is actively expanding as well as a Quick Start Guide to 3D Printing and Accessible Education  which is designed for institutions with new makerspace programs and is constantly updated to remain relevant.

OITP wants library staff to feel confident with the knowledge and skills needed to develop their makerspace. Wapner has developed a tip sheet, “Progress in the Making: An Introduction to 3D Printing and Public Policy”, white paper, “Progress in the Making: 3D Printing Policy Considerations through the Library Lens”, and the report “Toward a More Printed Union.” Libraries possess three unique characteristics that provide the optimal environment for people to explore 3D printing:

  • Non-judgmental work spaces
  • Ubiquitous with libraries in communities across the country
  • Informal-learning labs that promote experimentation and creativity

The number of public libraries that provide access to 3D printing has rapidly increased in the United States, which OITP is very involved in advocating for since this topic involves copyright, patent, trademark, product reliability, and intellectual freedom issues. “3D printing represents the present, not just the future,” said Wapner. 3D printed tactile representations offer a means for disabled learners to face struggles with self-confidence through creating accessible models of complex STEM topics, making this technology an invaluable asset to any library.

As a recent M.L.S. graduate, first time ALA conference attendee, and soon to be Federal employee, attending OITP’s conference sessions as the University of Maryland’s Student-to-Staff  representative was an overwhelmingly pleasant learning experience. OITP’s sessions gave me the opportunity to learn about how policy leaders are addressing real world challenges in education, the digital disconnect, and entrepreneurship with innovative solutions and collaborative partnerships with libraries. These sessions reminded me of the importance of our profession and why I chose to pursue a career in this field.

The post 3D printing and educational equity at ALA Annual appeared first on District Dispatch.

DPLA: DPLA Workshop: DPLA for Genealogy and Family History, July 26, 2016 at 3:00pm EST

Wed, 2016-07-06 15:00

Following the recent announcement of our partnership with FamilySearch, we are pleased to invite our extended community to attend a free DPLA workshop webinar — DPLA for Genealogy and Family History, taking place on July 26, 2016 at 3:00pm EST.

In this hour-long workshop webinar, DPLA Community Reps and experienced genealogists Tamika Maddox Strong and Amy Johnson Crow will introduce DPLA as a resource for family historians, genealogists, and anyone interested in learning about new research skills and cool collections in DPLA.  You will leave this workshop with new tips for searching for family names in DPLA and exploring resources in your family’s hometown or region.  We’ll also take a peek at some of the collections that may prove invaluable for your family research: yearbooks, letters, newspapers, military records, family Bibles and more!  With content from 2,000 libraries, archives, and museums across the country, DPLA also offers a unique opportunity to dig deeper and add context to the lives of our ancestors. The workshop is free and open to all so whether you are new to DPLA, new to genealogy, or highly experienced, the workshop will have something for you.  

Click here to register

DPLA Workshops are online learning opportunities highlighting subjects central to our community, such as education, metadata, technology, copyright, and more. These events are open to the public (registration required).  To hear about all upcoming workshop announcements, sign up for our mailing list.

Banner image: “Family Tree” ca. 1580-1651 from the collections of The New York Public Library.

David Rosenthal: Talk at JISC/CNI Workshop

Wed, 2016-07-06 15:00
I was invited to give a talk at a workshop convened by JISC and CNI in Oxford. Below the fold, an edited text with links to the sources.

Thanks to funding from the Mellon Foundation I spent last summer on behalf of the Mellon and Sloan Foundations, and IMLS researching and writing a report entitled Emulation & Virtualization as Preservation Strategies. Time allows only a taste of what is in the report and subsequent developments, which you can find on my blog linked from the text of this talk.

Migration and emulation were the two techniques identified in Jeff Rothenberg's seminal 1995 Ensuring the Longevity of Digital Documents. He came down strongly in favor of emulation. Despite this, migration has been overwhelmingly favored. The emulators were never a problem, they have been part of the mainstream since the early days of IBM computers. But emulation was thought to be restricted to hackers such as Nick Lee, who put MacOS on his Apple Watch, and Hacking Jules, who put Nintendo64 on his Android Wear. What has changed is that emulation frameworks have been developed that present emulations as a normal part of the Web. You don't even need to know you're seeing an emulation.
Theresa Duncan CD-ROMsFrom 1995 to 1997 Theresa Duncan produced three seminal feminist CD-ROM games, Chop Suey, Smarty and Zero Zero. Rhizome, a project hosted by the New Museum in New York, has put emulations of them on the Web. You can visit http://archive.rhizome.org/theresa-duncan-cdroms/, click any of the "Play" buttons and have an experience very close to that of playing the CD on MacOS 7.5 . This has proved popular. For several days after their initial release they were being invoked on average every 3 minutes.

These demos were pre-recorded using Kazam and a Chromium browser on my Acer C720 Chromebook running Ubuntu 14.04.
What Happened?What happened when I clicked Smarty's Play button?
  • The browser connects to a session manager in Amazon's cloud, which notices that this is a new session.
  • Normally it would authenticate the user, but because this CD-ROM emulation is open access it doesn't need to.
  • It assigns one of its pool of running Amazon instances to run the session's emulator.
  • Each instance can run a limited number of emulators. If no instance is available when the request comes in it can take up to 90 seconds to start another.
  • It starts the emulation on the assigned instance, supplying metadata telling the emulator what to run.
  • The emulator starts.
  • After a short delay the user sees the Mac boot sequence, and then the CD-ROM starts running.
  • At intervals, the emulator sends the session manager a keep-alive signal. Emulators that haven't sent one in 30 seconds are presumed dead, and their resources are reclaimed to avoid paying the cloud provider for unused resources.
bwFLARhizome, and others such as Yale, the DNB and ZKM Karlsruhe use technology from the bwFLA team at the University of Freiburg to provide Emulation As A Service (EAAS). Their GPLv3 licensed framework runs in "the cloud" to provide comprehensive management and access facilities wrapped around a number of emulators. It can also run as a bootable USB image or via Docker. bwFLA encapsulates each emulator so that the framework sees three standard interfaces:
  • Data I/O, connecting the emulator to data sources such as disk images, user files, an emulated network containing other emulators, and the Internet.
  • Interactive Access, connecting the emulator to the user using standard HTML5 facilities.
  • Control, providing a Web Services interface that bwFLA's resource management can use to control the emulator.
The communication between the emulator and the user takes place via standard HTTP on port 80; there is no need for a user to install software, or browser plugins, and no need to use ports other than 80. Both of these are important for systems targeted at use by the general public.
VisiCalcIn 1979 Dan Bricklin and Bob Frankston launched VisiCalc for the Apple ][, the world's first spreadsheet. You can run it on an emulated Apple ][ by visiting https://archive.org/details/VisiCalc_1979_SoftwareArts and clicking the power button. Some of the key-bindings are strange to users conditioned by decades of Excel, but once you've found the original VisiCalc reference card, it is perfectly usable.

What Happened?The Apple ][ emulator isn't running in the cloud, as bwFLA's does. Instead, it is running inside my browser. The emulators have been compiled into JavaScript, using emscripten. When I clicked on the link to the emulation, metadata describing the emulation including the emulator to use was downloaded into my browser, which then downloaded the JavaScript for the emulator and the system image for the Apple ][ with VisiCalc installed.
EmularityThis is Emularity, the framework underlying the Internet Archive's software library, which currently holds nearly 36,000 items, including more than 7,300 for MS-DOS, 3,600 for Apple, 2,900 console games and 600 arcade games. Some can be downloaded, but most can only be streamed.

The oldest is an emulation of a PDP-1 with a DEC 30 display running the Space War game from 1962, more than half a century ago. As I can testify having played this and similar games on Cambridge University's PDP-7 with a DEC 340 display seven years later, this emulation works well.

The quality of the others is mixed. Resources for QA and fixing problems are limited; with a collection this size problems are to be expected. Jason Scott crowd-sources most of the QA; his method is to see if the software boots up and if so, put it up and wait to see whether visitors who remember it post comments identifying problems, or whether the copyright owner objects. The most common problem is the sound.

It might be thought that the performance of running the emulator locally by adding another layer of emulation (the JavaScript virtual machine) would be inadequate, but this is not the case for two reasons. First, the user's computer is vastly more powerful than an Apple ][ and, second, the performance of the JavaScript engine in a browser is critical to its success, so large resources are expended on optimizing it.
The Internet is for CatsThe Internet is for cats. Well, no, the Internet is for porn. But after porn, it is for cats. Among the earliest cats to colonize the Internet were Nijinski and Pavlova, who were in charge of Mark Weiser and Vicky Reich. On 11 Jan 1995 Mark put up their Web page, and here it is from the Wayback Machine. The text and images are all there and the links work. Pretty good preservation.
The Internet was for CatsBut when Mark put it up, it looked different.

Here is the same page from the Wayback Machine viewed with NCSA Mosaic 2.7, a nearly contemporary browser on a nearly contemporary Linux system, courtesy of Ilya Kreymer's oldweb.today. The background and the fonts are quite different. In some cases this can be important, so this is even better preservation.
oldweb.todayHere is the BBC News front page from 1999 in Internet Explorer 4.01 on Windows. oldweb.today uses Docker to assemble an appropriate OS and browser combination, emulate them, and uses Memento (RFC7089) to aggregate the contents of now about 15 Web archives, for each resource in the page choosing to retrieve it from the archive which has the version collected closest to the requested time.
Use casesI've shown you three different emulation frameworks with three different use cases, implemented in three different ways:
  • Old CD-ROMs, emulated via a Web service framework.
  • Historically important software, emulated in your browser using JavaScript.
  • Preserved Web content, emulated using Docker container technology.
I'm sure many of you will have other use cases. For us, the use case we're looking at arrived late last year, when the feed the CLOCKSS archive ingests from one of the major publishers suddenly got much bigger. We looked into it, and discovered that among the supplementary material attached to some papers were now virtual machines. If that content is ever triggered, we need to be able to run those VMs under emulation.

If you care about reproducibility of in silico science, it isn't enough to archive the data, or even the data plus the source code of the analysis software. The results depend on the entire stack of software, all the libraries and the operating system.

The Olive project at CMU has the data and source code for CHASTE 3.1, a simulation package for biology and physiology from 2013. But the system can only run on a specific version, 12.04, of the Ubuntu version of Linux. Even recent scientific software has complex dependencies that require archiving the binaries and emulating them.
How do you use emulation?How do you go about creating a Web page containing an emulation like the ones I've shown you? At a high level, the stages are:
  • Create a bootable disk image in the format your emulation framework needs, which is typically QEMU's "qcow2". It should contain the binaries you want to run installed in the operating system they need.
  • Configure suitable hardware to boot the image by specifiying the CPU type, the amount of memory, the periherals and their contents, such as CD-ROM .iso images. Express this configuration in the metadata format used by your emulation framework.
  • Add the disk image and the configuration metadata to a Web server.
  • Embed the necessary links to connect them into a "landing page" for the emulation such as the ones I've shown you.
Technical IssuesThis sequence sounds pretty simple, and it soon should be. But right now there are some technical issues:
  • You need tools to create disk images, and they aren't currently that easy to use.
  • You need tools to create the configuration metadata. The bwFLA team and the DNB have had considerable success automating the process for CD-ROMs, but for other use cases the tools need a lot of work.
  • The way each framework embeds its emulations in a Web page is different and incompatible. The links are to specific emulation instances. Over time emulation technology will improve, and these links will break, rendering the landing pages useless. We need a standard way to embed emulations that leaves the browser to figure out how best to do the emulation, an emulation mime-type and an "emul.js" by analogy with "pdf.js".
Following on from my report, efforts are under way to address these and some other technical issues such as the very tricky issues that crop up when you connect an emulation of old software to the Internet.
Legal IssuesThe big barrier to widespread adoption of emulation for preservation is legal. Open source software is not a problem, but proprietary software is protected in two overlapping ways, by copyright and by the End User License Agreement. In theory copyright eventually expires, but the EULA never does. Copyright controls whether you can make and keep copies, such as those in disk images. EULAs vary, but probably control not merely copying but also running the software. And, since the software stack consists of multiple layers each with its own copyright and EULA, you are restricted to the intersection of them all.

There are a few rays of hope. Microsoft academic site licenses these days allow old Microsoft software to be copied for preservation and to be run for internal use. UNESCO's PERSIST is trying to engage major software vendors in a discussion of these legalities. The Internet Archive's massive software collection operates similarly to the DMCA's "safe harbor" provision, in that if the copyright owner objects the emulation is taken down. Objections have been rare, but this is really old software and mostly games. In theory, companies do not lose money because someone preserves and lets people run really old software. In practice, there are two reasons why their lawyers are reluctant to agree to this, the "not one inch" copyright maximalist ethos, and the risk for individual lawyers of making a career-limiting move.
ConclusionEspecially as container technology takes over the mainstream of IT, it is likely that over the next few years it will become evident that migration-based preservation strategies are obsolete.

David Rosenthal: The Major Threat is Economic

Tue, 2016-07-05 15:00
I've frequently said that the major threat to digital preservation is economic; back in 2013 I posted The Major Threat is Economic. We are reminded of this by the announcement last March that:
The future of the Trove online database is in doubt due to funding cuts to the National Library of Australia.Trove is the National Library's system:
In 2014, the database's fifth year, an estimated 70,000 people were using the website each day.

Australia Library and Information Association chief executive Sue McKarracher said Trove was a visionary move by the library and had turned into a world-class resource.
...
"If you look at things like the digital public libraries in the United States, really a lot of that came from looking at our Trove and seeing what a nation could do investing in a platform that would hold museum, gallery and library archives collections and make them accessible to the world."

FOSS4Lib Recent Releases: VuFind - 3.0.2

Tue, 2016-07-05 14:37
Package: VuFindRelease Date: Tuesday, July 5, 2016

Last updated July 5, 2016. Created by Demian Katz on July 5, 2016.
Log in to edit this page.

Minor bug fix release.

Library Tech Talk (U of Michigan): Designing for the Library Website

Tue, 2016-07-05 00:00

This post is a brief overview of the process in designing for large web-based systems. This includes understanding what makes up an interface and how to start fresh to create a good foundation that won't be regrettable later.

DuraSpace News: VIVO Updates for July 8–Welcome RSP Thomson Reuters, Summertime Meetings/Events

Tue, 2016-07-05 00:00

From Mike Conlon, VIVO project director

Thomson Reuters is a VIVO Registered Service Provider!  Need help with your VIVO implementation?  Thomson Reuters is now a VIVO Registered Service Provider.  Thomson Reuters can help you plan and implement VIVO at your institution.  For more information about Thomson Reuters services, please contact Ann Beynon at Thomson Reuters.

Thomson Reuters joins Symplectic and Gunter Media Group as VIVO Registered Service Providers.

Mita Williams: Ex Libris

Fri, 2016-07-01 20:49

Last week, as Europe staggered from the implications of the Brexit referendum, I was in Denmark on vacation with most of my nights free to read about the Existentialists and how their lives were torn asunder by the violence we now call history.

I enjoyed my copy of At the Existentialist Cafe very much and I’m hoping to pass it on to a friend or even my local library if they would have it. But before I do, I’m going to add my very own bookplate.

I don’t have any bookplates yet as still I haven’t decided on the design. I’m hoping to materialize a handful of ideas and choose the best one(s) for printing.

Years ago I suggested that every librarian should write a book. That was clearly too big of an ask. So I would like to use my first post here on my new blog to suggest that everyone should make their own bookplate for their books.

I am not suggesting that you should do so for the benefit of future historians, libraries, or book collectors.

I’m suggesting you do so because you have a history that is worth commemorating in your own expression.

District Dispatch: Fourth Feted with FOIA Reform!

Fri, 2016-07-01 13:38

After nearly a decade of advocacy work by ALA, other activists and government officials, with the President’s signature last night we and the nation finally have a long-needed update to FOIA just in time to celebrate FOIA’s 50th birthday on July 4th. Now we can all celebrate our nation’s independence with the knowledge that our country will have an even more transparent and open government! The Founders would be pleased!

At a time when not much seems to be getting done in Congress, ALA was thrilled to see this important step forward for government transparency! The FOIA Improvement Act:

  • Codifies the “presumption of openness” for government documents for future administrations;
  • Harnesses technology to improve the FOIA process;
  • Limits, to a period of 25 years, the period of time that agencies may keep records of their internal deliberations confidential; and
  • Increases the effectiveness of the FOIA by strengthening the Office of Government Information Services (created in the last FOIA reform bill, the OPEN Government Act of 2007)

When the Senate bill was first introduced on February 2, 2015, at the start of this Congress, we were hopeful that FOIA reform would progress swiftly.  While that was not quite the case (the Senate bill passed on March 15th of this year and the House adopted its version on May 13th), Congress still successfully reconciled the two bills and sent the version just signed into law to the President on June 22nd.

The White House has said that they will release guidance for compliance with the new law later this year and has committed to setting up a centralized online request portal for all federal agencies by next year. Watch this space for more details as they become available and, in the meantime, have an extra-happy Fourth of July!

The post Fourth Feted with FOIA Reform! appeared first on District Dispatch.

State Library of Denmark: 2D visualization of high dimensional word embeddings

Fri, 2016-07-01 11:58

In this blog post I tried to make an method for a computer to  read a text and analyse the characters and then make a 2D visualization of the similarity of the characters. To achieve this I am using the word2vec algorithm and then making a distance matrix of all mutual distances and fitting them into a 2D plot. The three texts I used was

  • All  3 Lord of The Ring books
  • Pride and Prejudice + Emma by Jane Austen
  • A combined text of 35.000 free english Gutenberg e-books

Word2Vec is an algorithm invented by Google researchers in 2013. Input it a text which has been preprocessed I will explain later. The algorithm  then extract all words and maps each word to a multidimensional vector of typical 200 dimensions. Think of a the quills of a hedgehog where each quill is a word, except it is in more than 3 dimensions. What is remarkable about the algorithm is that it captures some of the contexts of the words and this is reflected in the multidimensional vectors. Words that are somewhat similar are very close in this vector space, where ‘close’ is measured by the angle between two vectors. Furthermore the relative positions of two words also captures a relation between words. A well known example is that the distance vector from ‘man’ to ‘king’ is almost identical to the distance vector from ‘woman’ to ‘queen’. Using this information you are able to predict the word ‘queen’ given the three words <man,king> <woman,?>. It is far from obvious to understand why the algorithm  reflects this behaviour in the vector space and I have not fully understood the algorithm yet. Before you can use the word2vec algorithm you have to remove all punctuations and split the sentences into separate lines and lowercase the text. The splitting into sentences is not just splitting whenever you meet a ‘.’ character. For instance Mr. Anderson should not trigger a split.

First I  create the multidimensional representation of the words using word2vec which is just all the words (like a dictionary) and the vector for that word.  Next step I manual input the characters (or words in fact.) that I want to create the visualization for and calculate the distance matrix for all mutual distances by taking the cosinus of the angle between the vectors. This gives a value between -1 and +1 which I then shifts to 0 to 2 so I have a positive distance between the words. Finally I take this distance matrix and turn it into a 2D visualization trying to keep the distances as ‘close a possible’ in the 2D visualization as in the vectorspace. Of course this is not possible generally. Even for 3 vectors this can be impossible (if the Triangle inequality is broken). I create the plot by dividing the 2D into a grid and place the first character in the middle. The next character is also easy to place in on the circle with the radius of the distance. For the following characters I place one a time in the grid that minimize the sum of the distance-errors to the already placed characters in the grid. This is a greedy algorithm that priorities the first characters added to the plot and this I why the plotted the main characters first and have the other characters place them accordingly to these.

I tried to use the Stanford entity extraction tool to both extract locations and persons from a given text, but there was way too many false positives, thus I had the manually feed the algorithm the characters. To do it perfect I should had replaced a character metioned with  multiple names by a single same. Gandalf, Gandalf the Grey, Mithrandir is the same character etc. but I did not perform this substitution. So when I select the character Gandalf I only get the context where he is mentioned as Gandalf and not Mithrandir.

And now lets see some of the 2D visualizations!

Lord of the Rings

0) Frodo
1) Sam
2) Gandalf
3) Gollum
4) Elrond
5) Saroman
6) Legolas
7) Gimli
8) Bilbo
9) Galadriel

______________________________________________________________________
___________________________________________3__________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
_______________________________________________1______________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
5_____________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
_______________________________________________0__________________8___
_______________________2______________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
_______________________________________6______________________________
______________________________________________________________________
______________________________________________________________________
__________________________________7___________________________________
______________________________________________________________________
______________________________________________________________________
_______________________4______________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
_____________________________9________________________________________
______________________________________________________________________

 

Jane Austen: Pride and Prejudice +Emma

0)Elizabeth ( Elizabeth Bennet)
1)Wickham (George Wickham)
2)Darcy (Mr. Darcy)
3)Bourgh (Lady Catherine de Bourgh)
4)Lydia (Lydia Bennet)
5)William (Mr. William Collins)
6)Kitty (Catherine “Kitty” Bennet)
7)Emma (Emma Woodhouse)
8)Knightley (George Knightley)

__________________________________________________________________________
_________________________________________8________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
___________________________1__________2___________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
____________________________________________________________________7_____
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
_____________________________________________________0____________________
________________3__________4______________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
_______________________________________________________5__________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
__________________________________________________________________________
___________________6______________________________________________________
__________________________________________________________________________

 

35.000 English Gutenberg books

In this plot instead of characters I selected different animals

0) Fish
1) Cat
2) Dog
3) Cow
4) Bird
5) Crocodile
6) Donkey
7) Mule
8) Horse
9) Snake
__________________________________________________________
_____________________________________________7____________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
____________________________________________6_____________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
________________________________________________________8_
__________________________________________________________
__________________________________________________________
________________________________2_________________________
__________________________________________________________
_______________1__________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
___________________________________________________3______
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__4_______________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________9_______________________________
__________________________________________________________
______________5___________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________________________________________
__________________________0_______________________________
__________________________________________________________

Conclusion

Does the 2D plotting catch some of the essence of the words/characters from the books? Or does it look like they are just thrown in random on the plane?

I look forward to your your conclusion! For the Gutenberg animals plot I believe the visualization really does match how I see the animals. Fish, reptiles are grouped together and in the upper left corner we have the horses family of animals. For the Jane Austin it is also interesting that the character Emma match Elizabeth most though there are from two different books but somewhat identical main characters.


Information Technology and Libraries: Let's Get Virtual: An Examination of Best Practices to Provide Public Access to Digital Versions of Three-Dimensional Objects

Fri, 2016-07-01 03:51

Three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. Previous research has not addressed the evaluation of digitized versions of three-dimensional objects. This paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. Next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. Finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed.

Information Technology and Libraries: In the Name of the Name: RDF literals, ER attributes and the potential to rethink the structures and visualizations of catalogs

Fri, 2016-07-01 03:51

The aim of this study is to contribute to the field of machine-processable bibliographic data that is suitable for the Semantic Web. We examine the Entity Relationship (ER) model, which has been selected by IFLA as a “conceptual framework” in order to model the FR family (FRBR, FRAD and RDA), and the problems ER causes as we move towards the Semantic Web. Subsequently, while maintaining the semantics of the aforementioned standards but rejecting the ER as a conceptual framework for bibliographic data, this paper builds on the Resource Description Framework (RDF) potential and documents how both the RDF and Linked Data’s rationale can affect the way we model bibliographic data.

In this way, a new approach to bibliographic data emerges where the distinction between description and authorities is obsolete. Instead, the integration of the authorities with descriptive information becomes fundamental so that a network of correlations can be established between the entities and the names by which the entities are known. Naming is a vital issue for human cultures because names are not random sequences of characters or sounds which stand just as identifiers for the entities - they also have socio-cultural meanings and interpretations. Thus, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. In this study, a method is proposed to connect the names with the entities they represent and, in this way, to document the provenance of these names by connecting specific resources with specific names.

Information Technology and Libraries: Hitting the Road towards a Greater Digital Destination: Evaluating and Testing DAMS at the University of Houston Libraries

Fri, 2016-07-01 03:51

Since 2009, tens of thousands of rare and unique items have been made available online for research through the University of Houston Digital Library.  Six years later, the Libraries' new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. The Libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. To answer the calls for technological change, the Libraries Administration appointed a Digital Asset Management System (DAMS) Implementation Task Force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. This article focuses on the task force’s DAMS selection activities: needs assessment, systems evaluation, and systems testing. The authors also describe the task force’s DAMS recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. Finally, the authors outline their DAMS implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development.

William Denton: Tidy data in Org

Fri, 2016-07-01 03:02

Today at work I was doing some analysis of spending on electronic resources. I’d done it a few months ago on fiscal year 2015, in a hacky kind of way, but now that F2016 is complete I had two years ago data to work with. As usual I used Org and R, but this time I rejigged everything to use Hadley Wickham’s idea of tidy data and his tools for working with such data, and it made things not only simpler to work with in R but also to present with in Org.

Here’s a simplified example of what it looked like.lo First, I load in the R packages I’ll need for this brief example. In Org hitting Ctrl-c Ctrl-c runs these code blocks. This one is configured to have no output.

#+BEGIN_SRC R :session :results silent :exports none library(dplyr) library(tidyr) library(ggplot2) #+END_SRC

Next, a table of data: costs of things that librarians spend money on. (We can’t share our eresource spending data … perhaps some day.) This table is meant for people to read and it will appear in the exported PDF. The way it’s presented is good for humans, but not right for machines. I call it tab_costs because it’s a table of costs and I’m going to need to refer to the table later.

#+NAME: tab_costs | name | F2015 | F2016 | |-----------+-------+-------| | books | 200 | 300 | | cardigans | 90 | 60 | | cats | 400 | 410 | | champagne | 80 | 90 | | internet | 130 | 140 | | notebooks | 50 | 60 | | tea | 30 | 35 |

The way I have Emacs configured, that looks like this (with extra-prettified source blocks):

Emacs rocks.

The next bit of R reads that table into the variable costs_raw, which I then transform with tidyr’s gather function into something more machine-useable. The gather statement says take all the columns except “name” and turn the column names into “year” and the cell values into “cost”. So I can see it and make sure it’ll work, the output is given, but :exports none means that this table won’t be exported when the document is turned into a PDF. Only I can see this, in Emacs.

#+BEGIN_SRC R :session :results values :exports none :var costs_raw=tab_costs :colnames yes costs <- costs_raw %>% gather(year, cost, -name) #+END_SRC #+RESULTS: | name | year | cost | |-----------+-------+------| | books | F2015 | 200 | | cardigans | F2015 | 90 | | cats | F2015 | 400 | | champagne | F2015 | 80 | | internet | F2015 | 130 | | notebooks | F2015 | 50 | | tea | F2015 | 30 | | books | F2016 | 300 | | cardigans | F2016 | 60 | | cats | F2016 | 410 | | champagne | F2016 | 90 | | internet | F2016 | 140 | | notebooks | F2016 | 60 | | tea | F2016 | 35 |

That’s hard for humans to read, but it means making a chart comparing spending across the two years is easy.

#+BEGIN_SRC R :session :results graphics :exports results :file /tmp/tmp.png :width 600 :height 400 ggplot(costs, aes(x = year, y = cost)) + geom_bar(aes(fill = name), stat = "identity", position = "dodge") + labs(x = "", y = "$", title = "Librarian spending") #+END_SRC What librarians spend money on.

Or (see the geom_bar docs for more):

#+BEGIN_SRC R :session :results graphics :exports results :file /tmp/tmp-year.png :width 600 :height 400 ggplot(costs, aes(x = name, y = cost)) + geom_bar(aes(fill = year), stat = "identity", position = "dodge") + labs(x = "", y = "$", title = "Librarian spending") #+END_SRC Another way to show.

Another Emacs screenshot showing how Org mixes code, graphics and text (well, text if I’d written some, but I didn’t here):

Emacs rocks!

Access Conference: New Brunswick HST Increases 2% at Midnight

Thu, 2016-06-30 15:49

While Early Bird Sales don’t end until Wednesday July 13th, New Brunswick HST goes up tomorrow by 2%. A 2% percent savings means more lobster in your carry-on for the trip home.

Don’t miss out on this amazing deal. Full conference (and Early Bird) tickets include admission to hackfest, two and a half days of our amazing single-stream conference and a half-day workshop on the last day. Plus there will be some amazing food and entertainment and chance to meet-up with colleagues near and far.

Buy ticket

Library of Congress: The Signal: DPOE Program Harnesses the Spirit of Kentucky Librarians

Thu, 2016-06-30 12:05

This is a guest post by Barrie Howard.

(left to right) Anne Abate, Anne Ryckbost, Joseph Shankweiler, Suellyn Lathrop, Heather Fox, Amy Rudersdorf, Mary Molinaro, Brittany Netherton, George Coulbourne, Lynn Kahkola, Sam Meister, Rachel Howard, Dieter Ullrich, Lori Thompson, Sarah Hopley, Jamie Haddix, Jackie Couture, Denise Fritsch, AJ Boston.

The Library of Congress’s Digital Preservation Outreach and Education program delivered a train-the-trainer workshop on June 10, providing professional development in digital preservation to library professionals from Kentucky and West Virginia.

The workshop was held at Northern Kentucky University and sponsored by the State Assisted Academic Library Council of Kentucky, which recruited 14 workshop participants from six of its member institutions, two from the Federation of Kentucky Academic Libraries and one from West Virginia.

The event in Kentucky marks the eighth workshop since the program began in 2010, which has trained 173 working professionals to date. The first workshop was held at the Library of Congress and since then the workshops have traveled as far afield as Alaska and Australia. Two more workshops are planned for the fall: one in the Washington, D.C. metropolitan area and one in Jackson, Mississippi.

The aim of the DPOE workshops is to produce a corps of trainers equipped to teach others the basic principles and practices of preserving digital materials. In this way, DPOE’s “teach-a-person-to-fish” model extends the workshop beyond only those who can attend.

DPOE trainers go on to develop training events of their own and have held webinars and workshops in the Midwest, Pacific Northwest and Southeast regions of the United States, events that will be replicated across Kentucky over the next year. After each workshop, the new trainers enter into a network of practitioners and continue to engage with each other – and the broader digital-preservation community – online.

DPOE supports this network by providing an email-distribution list that allows practitioners to share information about digital-preservation best practices, services and tools, as well as stories about their experiences in advancing digital preservation.

SAALCK executive director Anne Abate worked with the Library’s George Coulbourne, chief of Internship and Fellowship Programs Division, and the three anchor instructors for the workshop: Sam Meister (Educopia Institute), Mary Molinaro (Digital Preservation Network) and Amy Rudersdorf (AVPreserve). The instructors have provided subject-matter expertise to the program in the past, offering guidance to DPOE by reviewing and revising the baseline curriculum for the workshop.

The workshops are just one way that DPOE seeks to foster outreach and education about digital preservation on a national scale. DPOE maintains a training calendar as a public service to assist working professionals discover continuing education opportunities in the practice of digital preservation. The calendar is updated monthly and includes training events hosted by DPOE trainers.

Pages