We are delighted to partner with figshare to make #VIVO15 presentations openly available through their new figshare for institutions service. This makes the terrific work of the VIVO community more accessible than ever in a beautiful, easy-to-use interface. Moreover, the materials are now persistent and citable with a DOI. Our sincere thanks to the figshare and Digital Science teams for their work to make this possible!
A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so we felt pretty secure diving in).
But we were thinking about versioning in the Open World of RDF even then. The NSDL Registry kept careful track of all changes to a vocabulary (who, what, when) and the only way to get data in was through the user interface. We ran an early experiment in making versions based on dynamic, timestamp-based snapshots (we called them ‘time slices’, Git calls them ‘commit snapshots’) available for value vocabularies, but this failed to gain any traction. This seemed to be partly because, well, it was a decade ago for one, and while it attempted to solve an Open World problem with versioned URIs, it created a new set of problems for Closed World experimenters. Ultimately, we left the versions issue to sit and stew for a bit (6 years!).
All that started to change in 2008 as we started working with RDA, and needed to move past value vocabularies into properties and classes, and beyond that into issues around uploading data into the OMR. Lately, Git and GitHub have started taking off and provide a way for us to make some important jumps in functionality that have culminated in the OMR/GitHub-based RDA Registry. Sounds easy and intuitive now, but it sure wasn’t at the time, and what most people don’t know is that the OMR is still where RDA/RDF data originates — it wasn’t supplanted by Git/Github, but is chugging along in the background. The OMR’s RDF CMS is still visible and usable by all, but folks managing larger vocabularies now have more options.
One important aspect of the use of Git and GitHub was the ability to rethink versioning.
Just about a year ago our paper on this topic (Versioning Vocabularies in a Linked Data World, by Diane Hillmann, Gordon Dunsire and Jon Phipps) was presented to the IFLA Satellite meeting in Paris. We used as our model the way software on our various devices and systems is updated–more and more these changes happen without much (if any) interaction with us.
In the world of vocabularies defining the properties and values in linked data, most updating is still very manual (if done at all), and the important information about what has changed and when is often hidden behind web pages or downloadable files that provide no machine-understandable connections identifying changes. And just solving the change management issue does little to solve the inevitable ‘vocabulary rot’ that can make published ‘linked data’ less and less meaningful, accurate, and useful over time.
Building stable change management practices is a very critical missing piece of the linked data publishing puzzle. The problem will grow exponentially as language versions and inter-vocabulary mappings start to show up as well — and it won’t be too long before that happens.
Please take a look at the paper and join in the conversation!
Here are my reading notes for week 4 of the Engaged Intellectual. Superficially these papers seemed oriented around the three cultures of social science, the humanities and the physical sciences. But there were some interesting cross-currents between them.
Crotty, M. (1998). The foundations of social research : Meaning and perspective in the research process. London ; Thousand Oaks Calif.: Sage Publications.
Crotty starts out by outlining a set of questions that must be answered when embarking on a research project:
- What methods do we propose to use.
- What methodology governs our choice and use of methods.
- What theoretical perspective lies behind the methodology in question.
- What epistemology informs this theoretical perspective.
He then goes on to define this terminology, while pointing out that that we often talk about them all together:
- the techniques or procedures Participant Observation
- the strategy or design of the chosen methods that fit the desired outcomes (e.g. Ethnography)
- theoretical perspective
- the philosophical underpinnings for the methodology (e.g. Symbolic Interactionism)
- the theory of knowledge that is embedded in the theory ; how we know what we know (e.g. Constructivism)
The pragmatist in me wants to pause here to reflect that the real value does not lie in the truth of this picture of research, but that it’s a useful way of distinguishing between all the concepts that are flying around when learning about research. A firm understanding of these different levels helps ground the decisions made about what methodologies to use, and how to interpret the results.
The theoretical perspective is often assumed as part of the methodology and needs to be made explicit. Epistemology (how we know) is distinguished from Ontology (the study of being). They often get muddled up too. Corry points out that we often start with particulars: a specific problem that needs to be solved, a research question or set of research questions.
We plan our research in terms of that issue that needs to be addressed, a problem that needs to be solved, a question that needs to be answered. We plan our research in terms of that issue or problem or question. What, we go on to ask, are the objectives of our research? What strategy seems likely to provide what we are looking for? What does that strategy direct us to do to achieve our aims and objectives? In this way our research question, incorporating the purposes of our research, leads us to methodology and methods. (Crotty, 1998, p. 13)
The methodology needs to be defended, so that people will understand the results. Methodologies are created based on need, and it can help to understand the menu of methodologies that are available. Methodologies can also be combined, and of course new ones can be created.
The discussion has helped me partially unravel my own muddled thoughts about what I want to research from how I want to research. It at least helped me feel OK about being muddled! I have come into the program wanting to study Web archives, specifically the ways we decide what to archive also known as appraisal. At the same time I am interested in looking at these decisions as an expression of individual and collective notions of values in particular contexts. How are these values arrived at? What’s the best way to study them? As I’ve mentioned in the past reading Steven Johnson’s work about repair was an inspiration for me to enter the PhD program: so I’m hoping to use his ethnographic approach, possibly in combination with Geiger’s trace ethnography (Geiger & Ribes, 2011). But what this means as far as methods go I’m still not sure a) what methods this approach suggests, and b) if it they are a good fit for the problem I’m studying (appraisal in Web archives). Hopefully mixed methods will grant me some license to use several methods in a coherent way.
This was a lot to pack into a book introduction. I might have to return to read more Crotty (1998) when there is time since he was able to explain some pretty complicated things in a very clear, compelling and useful way.
Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342(6157), 468–472.
It was really quite interesting having read (Crotty, 1998) beforehand, because it made me think about this paper a little bit differently in terms of method, methodology, theory, and epistemology. They are trying to measure the novelty of scientific ideas, in an attempt to show that high impact articles combine conventional and unusual ideas. At first it seemed like it would be extremely hard to try to measure novelty. The methods were firmly from descriptive and inferential statistics (the methodology), from statistical theory. Wikipedia tells me that statistics are from the Formal Epistemology school which uses model and inference to answer questions.
One of the epistemological foundations of the paper also seems to be that Science (writ large) is measurable by the body of scientific literature and the citations found within it. I guess the authors would agree that it is possible that non-documentary factors could determine the creativity and impact of scientific ideas as well. But this does nothing to necessarily invalidate their own findings. Since the Philosophical Transactions in 1887 the very idea of science has been explicitly tied to publishing. But this is a relatively new occurrence, and the history of science extends much further back in time.
Still the authors have done a nice job of using the structure of the literature to infer novelty. It’s pretty cool that they were able to use all the literature (no need to sample), and it was already structured and easy to process. Hats off to them for writing this article without mentioning Big Data once. I guess they must have had a contact at Thomson Reuters to get access. One caveat here is the biases built in to Web of Science when it comes to what is indexed and how. WoS is a black box of sorts. Also, only data for 1980-2000 was examined, and the scientific endeavor has been going on for a while.
I liked how they cited Darwin and Einstein so that they could increase their own z-score. The Ggogle-colored cubes in Fig 2 are also cute in their suggestiveness. What if this were applied to the Web graph, where a journal was replaced with a website, and an article was replaced with a web page?
At any rate, their idea of z-score is seemed like an interesting technique
Mayer, N. (2004, October). Reclaiming our history: The mysterious birth of art and design. American Institute of Graphic Arts.
Professor Kraus asked us focus on the rhetoric that Mayer uses when reading this piece. Immediately I was struck by a) the number of large colorful and really quite beautiful images, and b) the sparse layout, which read almost like poetry in places. I guess it helps to know that this was a presentation at a conference, presumably slides with reading notes. It feels like it was written to be heard and seen rather than read, although it reads well. Her format underlines her essential point that reading words is ultimately looking at symbols and imagery.
Mayer seems focused on helping us see how dependent we are on culture for interpreting symbols. I liked how she humorously used the the perceived penises and vulvas in the cave paintings to deconstruct the anthropologists who were studying the paintings. I also liked how she ultimately grounded the analysis in art, and the multiple ways of seeing as subjects, and as groups. I was reminded of Neil Gaiman’s Long Now talk How Stories Last, in which he talked about the effort by Thomas Sebeok and the Human Interference Task Force of the Department of Energy to figure out ways of warning people in the future about the nuclear waste stored underneath Yucca Mountain.
How Stories Last is a very entertaining talk about stories and information if you have 45 minutes spare. here is the specific segment about Sebeok’s work if you want to quickly listen. As Gaiman so artfully points out, Sebeok’s advice in the end was to use to use culture to create an information relay across the generations. Here are Sebeok’s words in the conclusion of his report:
It follows that no fail-safe method of communication can be envisaged 10,000 years ahead. To be effective, the intended messages have to be recoded, and recoded again and again, at relatively brief intervals. For this reason, a “relay-system” of communication is strongly recommended, with a built-in enforcement mechanism, for dramatic emphasis here dubbed an “atomic priesthood”, i.e., a commission, relatively independent of future political currents, self-selective in membership, using whatever devices for enforcement are at its disposal, including those of a folkloristic character. (Sebeok, 1984, p. 28)
This in turn reminds me of Janée, Frew, & Moore (2009) which talks about how digital preservation systems can model this type of relay…but that’s a rabbit hole for another post. I only brought this up to highlight Mayer’s essential point that our understanding of information is mediated by shared culture and context. We are lost without it. But understanding is always imperfect. All models are wrong but some are useful. Maybe learning the art of feeling good lost has its uses. Yes, I’m just seeing if you are still awake. Hi!References
Crotty, M. (1998). The foundations of social research : Meaning and perspective in the research process. London ; Thousand Oaks Calif.: Sage Publications.
Geiger, R. S., & Ribes, D. (2011). Trace ethnography: Following coordination through documentary practices. In 44th hawaii international conference on system sciences (pp. 1–10). IEEE. Retrieved from http://www.stuartgeiger.com/trace-ethnography-hicss-geiger-ribes.pdf
Janée, G., Frew, J., & Moore, T. (2009). Relay-supporting archives: Requirements and progress. International Journal of Digital Curation, 4(1), 57–70.
Sebeok, T. A. (1984). Communication measures to bridge ten millennia. (6705990). United States Department of Energy. Retrieved from http://www.osti.gov/scitech/biblio/6705990
How about investing a couple hours to learn how libraries and museums can strengthen executive skills?
Mind in the Making (MITM), a program of the Families and Work Institute (FWI), and partner, the Institute of Museum and Library Services (IMLS), will present a free webinar for museum and library professionals on executive function life skills. The webinar will feature findings from the just-released groundbreaking report, Brain-Building Powerhouses: How Museums and Libraries Can Strengthen Executive Function Life Skills.
The webinar presenters include report contributors Mimi Howard and Andrea Camp, Mind in the Making author and FWI President Ellen Galinsky, and IMLS Supervisory Grants Management Specialist Helen Wechsler. They will discuss new findings from research on brain development, the importance of executive function skills, and how museums and libraries across the country are incorporating this research into their programs and exhibits.
Some of the outstanding initiatives in museums and libraries featured in the report will be presented in the webinar by the following:
• Laurie Kleinbaum Fink, Science Museum of Minnesota
• Stephanie Terry, Children’s Museum of Evansville
• Kerry Salazar, Portland Children’s Museum
• Kimberlee Kiehl, Smithsonian Early Enrichment Center
• Holly Henley, Arizona State Library
• Anne Kilkenny, Providence Public Library
• Kathy Shahbodaghi, Columbus Metropolitan Library
Executive function skills are built on the brain processes we use to pay attention and exercise self control, to hold information in our minds so that we can use it, and to think flexibly. These skills become foundational for other skills, including delaying gratification, understanding the perspectives of others, reflection, innovation, critical thinking, problem solving, and taking on challenges.
Webinar: Brain-Building Powerhouses: How Museums and Libraries Can Strengthen Executive Function Life Skills
Date: Tuesday, September 22, 2015
Time: 2:00 PM EDT
Link: Join the webinar with this link to the Blackboard Collaborate system.
Ph code: 1-866-299-7945, Enter guest code 5680404#
Note: IMLS-hosted webinars use the Blackboard Collaborate system. If you are a first-time user of Blackboard Collaborate, click here to check your system compatibility in advance of the webinar. You will be able to confirm that your operating system and Java are up-to-date, and enter a Configuration Room that will allow you to configure your connection speed and audio settings before the IMLS webinar begins. (If you choose to enter a Configuration Room, please note that the IMLS webinar will use Blackboard version 12.6.)
# # #
The post How CareerBuilder Executes Semantic and Multilingual Strategies with Apache Lucene/Solr appeared first on Lucidworks.
The Digital Public Library of America (http://dp.la) and Europeana (http://europeana.eu) invites interested and qualified individuals or firms to submit a proposal for development related to the infrastructure for the International Rights Statements Working Group.
- A PDF version of this request for proposals is also available at: http://dp.la/info/wp-content/uploads/2015/09/rs-rfp.pdf
- A PDF version of the Requirements for the Technical Infrastructure for Standardized International Rights Statements is available at: http://dp.la/info/wp-content/uploads/2015/09/irswg-tech-white-paper-rfp.pdf
- RFP issued: 18 September 2015
- Deadline for proposals: 00:00 GMT, 6 October 2015
- Work is to be performed no sooner than 8 October 2015.
- Functional prototypes for components A and C must be completed by 24 December 2015.
- Work for components A, B, and C below must be completed by 15 January 2016.
This document specifies the project scope and requirements for technical infrastructure supporting a framework and vocabulary of machine-readable rights statements under development by the International Rights Statements Working Group, a joint Digital Public Library of America (DPLA)–Europeana Foundation working group.
The working group shall provide and maintain RDF descriptions of the rights statements, with canonical serializations in Turtle, modeled as a vocabulary in the Simple Knowledge Organization System (SKOS). These descriptions will include multiple official translations of each statement, and support versioning of the statements and/or vocabulary scheme. Alongside the descriptions statements, the working group will produce a summary of the data model and properties used.
The contractor will provide an implementation that acts as a platform for hosting these statements. The platform consists of an application for publishing the rights statements according to linked data best practices and a content management system to be used by the working group to publish materials related to the project. These two components should provide the feel of an integrated user experience, and must be served publicly from a single domain (http://rightsstatements.org/). As part of this contract, the contractor will also provide one year of maintenance, support, and security updates for these components, their dependencies, and the operating systems for servers on which they are deployed.Components Component A. Rights statements application
A web application that provides both machine-readable representations of the rights statements (in RDF serializations including JSON-LD and Turtle) and human-readable representations. The working group will provide a canonical version of the rights statements in Turtle-serialized RDF as needed for launch, as well as a testing version used to implement and test specific features, including, but not limited to, versions (see 3a) and translations (see 4b and 4c).
- Human readable representations
- The application shall provide a human-readable web page representing each rights statement, with provision for versions, multiple language support, and additional request parameters as described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- All human-readable representations shall be generated from the canonical Turtle-serialized RDF.
- Human-readable representations must be available as HTML5 with RDFa 1.1 or RDFa 1.1 Lite.
- Human-readable representations must provide links to the RDF representations listed below.
- RDF representations
- The application shall provide multiple RDF serializations of the individual rights statements through content negotiation on the statement URI. Minimally, it must support Turtle and JSON-LD. Additional serializations are desirable but not required.
- The application shall provide multiple RDF serializations of the entire vocabulary through content negotiation on the vocabulary version URI. The vocabulary shall support the same RDF serializations as the individual statements.
- All RDF serializations must be equivalent to the canonical Turtle-serialized RDF.
- The application shall support multiple versions of each statement. The structure of the versions is described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- Otherwise valid statement URIs that omit the version number should respond with code 404.
- Languages and translation
- Human-readable representations should dynamically handle requests for translations of the statements through HTTP Accept-Language headers and through the use of parameters as specified in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- The working group will provide text in one or more languages for each statement as RDF language-tagged literals in compliance with IETF BCP47. All language-tagged literals will be encoded as UTF-8.
- The working group will provide translations for content not derived from the statement RDF, e.g., navigational elements. The application will support this through an internationalization framework, e.g., GNU gettext.
- Additional request parameters
- For specific statements, human-readable representations must accept query string parameters and generate a view of the statement enhanced by additional metadata described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- Resource URIs and HTTP request patterns
- The HTTP behavior of the application shall follow the URI structure and interaction patterns described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- Resources must follow best practices for serving both human- and machine-readable representations for linked data vocabularies.
- Visual identity
- The working group will provide static HTML templates developed by another vendor charged with implementing the site’s visual identity.
- These templates must be transformed to work in the context of the application to ensure that human-readable representations follow the visual identity of the site as provided by the working group.
An implementation of an off-the-shelf, free/libre/open source content management system (CMS), and associated plugins to publish pages about the project and initiative, related publications, etc.
- The CMS will be separate from the rights statements application.
- The CMS may be a static site generator.
- The CMS should support multilingual versions of content, either natively or through the use of plugin modules.
- A theme or templates for the CMS must be provided, which follow the visual identity defined for the site.
- The CMS must provide export of static content (text and multimedia).
- All content will be edited and maintained by members of the working group.
An implementation of an existing free/libre/open source configuration management and deployment automation system, and any needed templates, scripts, etc., used to install dependencies, to configure and deploy components A and B above, and to manage the servers.
- The implementation must be published to a version control repository under the working group’s organization on GitHub.
- The implementation should support a shared set of configuration with templating to allow the components above to be deployed to a staging virtual machine and a production virtual machine using a common set of procedures.
- An implementation of an agentless configuration and deployment management system (e.g., Ansible) is strongly preferred.
- The implementation must include a configuration for an HTTP proxy server (e.g., Nginx, Apache HTTPD, etc.) that will allow components A and B to be presented together through a single domain name.
- The proxy server configuration must allow components A and B to be served from a common domain name (http://rightsstatements.org/).
- The proxy server configuration should provide caching for requests that respects the HTTP interaction patterns described in Requirements for the Technical Infrastructure for Standardized International Rights Statements.
- The vendor will also develop, execute, and provide reports for a load testing strategy for the implemented configuration.
All components must run within a shared Linux virtual machine, preferably running Debian stable. The virtual machine will be hosted on a server physically located in a Luxembourg-based data center. The working group is providing both a staging environment and a production environment.
All materials developed during this project shall be released under open source/open content licensing. Source code will be licensed under the European Union Public License, version 1.1. Documentation will be licensed under a CC0 Public Domain Dedication.Guidelines for proposals
All proposals must adhere to the following submission guidelines and requirements.
- Proposals are due no later than 00:00 GMT, 6 October 2015.
- Proposals should be sent via email to email@example.com as a single PDF file attached to the message. Questions about the proposal can also be sent to this address.
- Please format the subject line with the phrase “RightsStatements.org Proposal – [Name of respondent].”
- You should receive confirmation of receipt of your proposal no later than 00:00 GMT, 8 October 2015. If you have not received confirmation of your proposal by this time, please send an email to firstname.lastname@example.org, otherwise follow the same guidelines as above.
All proposals should include the following:
- Pricing, in US Dollars and/or Euros, as costs for each work component identified above, and as an hourly rate for any maintenance costs. The exchange rate will be set in the contract. The currency for payment will be chosen by the agent of the working group that is the party to this contract.
- Proposed staffing plan, including qualifications of project team members (resumes/CVs and links or descriptions of previous projects such as open source contributions).
- References, listing all clients/organizations with whom the proposer has done business like that required by this solicitation with the last three years.
- Qualifications and experience, including
- General qualifications and development expertise
- Information about development and project management skills and philosophy
- Examples of successful projects, delivered on time and on budget
- Preferred tools and methodologies used for issue tracking, project management, and communication
- Preferences for change control tools and methodologies
- Project specific strategies
- History of developing software in the library, archives, or museum domain
- Information about experience with hosting and maintenance of RDF/SKOS vocabularies and linked data resources
- General qualifications and development expertise
- Legal authority/capacity, or proof that the vendor is authorized to perform the contract under national law. Proof of the above is to be provided by (a copy of) a certificate of registration in the relevant trade or professional registers in the country of establishment/incorporation.
- Proposals must be submitted by the due date.
- Proposers are asked to guarantee their proposal prices for a period of at least 60 days from the date of the submission of the proposal.
- Proposers must be fully responsible for the acts and omissions of their employees and agents.
- The working group reserves the right to extend the deadline for proposals.
- The working group reserves the right to include a mandatory meeting via teleconference with proposers individually before acceptance. Top scored proposals may be required to participate in an interview to support and clarify their proposal.
- The working group reserves the right to negotiate with each contractor.
- There is no allowance for project expenses, travel, or ancillary expenses that the contractor may incur.
- Ownership of any intellectual property will be shared between the Digital Public Library of America and the Europeana Foundation.
Back in January, The Atlantic ran an article on a new device being used at the Cooper Hewitt design museum in New York City. This device allows museum visitors to become curators of their own collections, saving information about exhibits to their own special account they can access via computer after they leave. This device is called a pen; Robinson Meyer, the article’s author, likens it to a “gray plastic crayon the size of a turkey baster”. I think it’s more like a magic wand.Courtesy of the Cooper Hewitt Museum website
Not only can you use the pen to save information you think is cool, you can also interact with the museum at large: in the Immersion Room, for example, you can draw a design with your pen and watch it spring to life on the walls around you. In the Process Lab, you use the pen to solve real-life design problems. As Meyer puts it, “The pen does something that countless companies, organizations, archives, and libraries are trying to do: It bridges the digital and the physical.”
The mention of libraries struck me: how could something like the Cooper Hewitt pen be used in your average public library?
The first thing that came to my mind was RFID. In my library, we use RFID to tag and label our materials. There are currently RFID “wands” that, when waved over stacks, can help staff locate books they thought were missing.
But let’s turn that around: give the patron the wand – rather, the pen – and program in a subject they’re looking for…say, do-it-yourself dog grooming. As the patron wanders, the pen is talking with the stacks via RFID asking where those materials would be. Soon the pen vibrates and a small LED light shines on the materials. Eureka!
Or, just as the Cooper Hewitt allows visitors to build their own virtual collection online, we can have patrons build their own virtual libraries. Using the same RFID scanning technology as before, patrons can link items to their library card number that they’ve already borrowed or maybe want to view in the future. It could be a system similar to Goodreads (or maybe even link it to Goodreads itself) or it could be a personal website that only the user – not the library – has access to.
What are some ways you might be able to use this tech in your library system?
Creating Tomorrow’s Technologists: Contrasting Information Technology Curriculum in North American Library and Information Science Graduate Programs against Code4lib Job Listings by Monica Maceli recently appeared in the Journal of Education for Library and Information Science 56.3 (DOI:10.12783/issn.2328-2967/56/3/3). As the title states, it studies listings on jobs.code4lib.org:
This research study explores technology-related course offerings in ALA-accredited library and information science (LIS) graduate programs in North America. These data are juxtaposed against a text analysis of several thousand LIS-specific technology job listings from the Code4lib jobs website. Starting in 2003, as a popular library technology mailing list, Code4lib has since expanded to an annual conference in the United States and a job-posting website. The study found that database and web design/development topics continued to dominate course offerings with diverse sub-topics covered. Strong growth was noted in the area of user experience but a lack of related jobs for librarians was identified. Analysis of the job listings revealed common technology-centric librarian and non-librarian job titles, as well as frequently correlated requirements for technology skillsets relating to the popular foci of web design/development and metadata. Finally, this study presents a series of suggestions for LIS educators in order that they continue to keep curriculum aligned with current technology employment requirements.
Open Knowledge Founder Rufus Pollock was recently recognized as Ashoka UK’s fellow of the month. This brief video highlights his thoughts on open knowledge, and his vision for an information age grounded in openness, collaboration, sharing, and distributed power.
Video produced and provided by Ashoka UK Ashoka builds networks of social innovators and selects high-impact entrepreneurs, who creatively solve some of the world’s biggest social challenges, to become Ashoka Fellows. Their work also extends to the education sector where we are creating a network of schools that are teaching students skills for the modern world, empathy, teamwork, creativity. Read more from Ashoka UK here.
The LOCKSS Program develops and supports libraries using open source peer-to-peer digital preservation software. Although initial development and deployment was funded by grants including from NSF and the Mellon Foundation, grant funding is not a sustainable basis for long-term preservation. The LOCKSS Program runs the "Red Hat" model of free, open source software and paid support. From 2007 through 2012 the program was in the black with no grant funds at all.
The demands of the "Red Hat" model make it hard to devote development resources to enhancements that don't address immediate user demands but are targeted at longer-term issues. After discussing this issue with the Mellon Foundation, the LOCKSS Program was awarded a grant to cover a specific set of infrastructure enhancements. It made significant functional and performance improvements to the LOCKSS software in the areas of ingest, preservation and dissemination. The LOCKSS Program's experience shows that the "Red Hat" model is a viable basis for long-term digital preservation, but that it may need to be supplemented by occasional small grants targeted at longer-term issues.Among the enhancements described in the paper are implementations of Memento (RFC7089) and Shibboleth, support for crawling sites that use AJAX, and some significant enhancements to the LOCKSS peer-to-peer polling protocol.
Winchester, MA The Digital Library Federation Fall Forum (http://dlfforum2015.sched.org, #DLFforum) will be held in Vancouver, British Columbia, Canada Oct 26-28, 2015. Fedora product manager David Wilcox will offer a hands-on Fedora 4 Workshop on Tuesday, Oct 27 from 9:00am - 12:00pm:
“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Creighton University or the DSpace Project."
Jonathan Markow from DuraSpace interviewed Richard Jizba to learn about Creighton University’s DSpace Repositories.
“What’s your role with DSpace at your institution?”
I’m rereading the Three Musketeers saga by Alexandre Dumas—one of the greatest of all works of literature—and just re-met a quote I wrote down when I first read it. It’s from chapter twenty-seven of Twenty Years After (Vingt ans après), and is about Urbain Grandier, but I’ll leave out his name to strengthen it:
“[He] was not a sorceror, but a learned man, which is quite another thing. He did not foretell the future. He knew the past, which is sometimes much worse.”
In the original, it reads:
“[Il] n'était pas un sorcier, c'était un savant, ce qui est tout autre chose. [Il] ne prédisait pas l'avenir. Il savait le passé, ce qui quelquefois est bien pis.”
Thanks to the efforts of many contributors, the Evergreen community is pleased to announce the release of version 2.9.0 of the Evergreen open source integrated library system. Please visit the download page to get it!
New features and enhancements of note in Evergreen 2.9.0 include:
- Evergreen now supports placing blanket orders, allowing staff to invoice an encumbered amount multiple times, paying off the charge over a period of time.
- There is now better reporting of progress when a purchase order is activated.
- The Acquisitions Administration menu in the staff client is now directly accessible from the main “Admin” menu.
- There is now an action/trigger event definition for sending alerts to users before their accounts are scheduled to expire.
- When registering a new user, the duplicate record search now includes inactive users.
- Evergreen now offers more options for controlling whether and when users can carry negative balances on their account.
- The web-based self-check interface now warns the user if their session is about to expire.
- The “Manage Authorities” results list now displays the thesaurus associated with each authority record.
- Item statistical categories can now be set during record import.
- The web staff interface preview now includes cataloging functionality, including a new MARC editor, Z39.50 record import, and a new volume/copy editor.
- The account expiration date is now displayed on the user’s “My Account” page in the public catalog.
- Users can now sort their lists of items checked out, check out history, and holds when logged into the public catalog.
- The bibliographic record source is now available for use by public catalog templates.
- The public catalog can now cache Template Toolkit templates, improving its speed.
- On the catalog’s record summary page, there is now a link to allow staff to to forcibly clear the cache of added content for that record.
- Google Analytics (if enabled at all) is now disabled in the staff client.
- Several deprecated parts of the code have been removed, including script-based circulation policies, the open-ils.penalty service, the legacy self-check interface, and the old “JSPAC” public catalog interface.
For more information about what’s in the release, check out the release notes.
Libraries are transforming amidst the changing information landscape and a report released this week by the Pew Research Center, Libraries at the Crossroads, affirms the evolving role of public libraries within their communities as vital resources that advance education and digital empowerment.
“Public libraries are transforming beyond their traditional roles and providing more opportunities for community engagement and new services that connect closely with patrons’ needs,” ALA President Sari Feldman said. “The Pew Research Center report shows that public libraries are far from being just ‘nice to have,’ but serve as a lifeline for their users, with more than 65 percent of those surveyed indicating that closing their local public library would have a major impact on their community.
“Libraries are not just about what we have for people, but what we do for and with people,” Feldman said. “Today’s survey found that three-quarters of the public say libraries have been effective at helping people learn how to use new technologies. This is buttressed by the ALA’s Digital Inclusion Survey, which finds that virtually all libraries provide free public access to computers and the Internet, Wi-Fi, technology training and robust digital content that supports education, employment, e-government access and more.
“Although the report affirms the value of public libraries, the ALA recognizes the need for greater public awareness of the transformation of library services, as the report shows library visits over the past three years have slightly decreased. In response, libraries of all types are preparing for the launch of a national public awareness campaign entitled ‘Libraries Transform.’
“Libraries from across the county will participate in the campaign and will work to change the perception that ‘libraries are just quiet places to do research, find a book, and read’ to ‘libraries are centers of their communities: places to learn, create and share, with the help of library staff and the resources they provide,” she noted.
The report also reveals that 75 percent of the public say libraries have been effective at helping people learn how to use new technologies. This is buttressed by the ALA’s Digital Inclusion Survey, which finds that virtually all libraries provide free public access to computers and the Internet, Wi-Fi, technology training and robust digital content that supports education, employment, e-government access and more.
With their accessibility to the public in virtually every community around the country, libraries offer online educational tools for students, employment resources for job-seekers, computer access for those without it and innovation centers for entrepreneurs of all ages.
Other interesting findings in the report that point to the vital role of libraries in communities nationwide include:
o 65 percent maintain that libraries contribute to helping people decide what information they can trust.
o 75 percent say libraries have been effective at helping people learn how to use new technologies.
o 78 percent believe that libraries are effective at promoting literacy and love of reading.
- Aggregation jobs now run only using Spark. In previous versions, you could run them in Spark optionally, or natively in Fusion. We’ve found we’re happy enough with Spark to make it the only option now.
- You can now send alerts to PagerDuty. Previously, you could send and email or a Slack message. PagerDuty was a fairly popular request.
- Several new options for crawling websites
- Improvements to SSL when communicating between Fusion nodes
- A reorganization of the Fusion directory structure to better isolate your site-specific data and config from version-specific Fusion binaries, for easier upgrades and maintenance releases
- Better logging and debuggability
- Incremental enhancements to document parsing
- As always, some performance, reliability, and stability improvements
Thus devoting a large proportion of the resources available for preservation to obsessively collecting metadata intended to ease eventual format migration was economically unjustifiable, for three reasons. First, the time value of money meant that paying the cost later would allow more content to be preserved. Second, the format might never suffer obsolescence, so the cost of preparing to migrate it would be wasted. Third, if the format ever did suffer obsolescence, the technology available to handle it when obsolescence occurred would be better than when it was ingested.
Below the fold, I ask how well the predictions have held up in the light of subsequent developments?
Research by Matt Holden at INA in 2012 showed that the vast majority of even 15-year old audio-visual content was easily rendered with current tools. The audio-visual formats used in the early days of the Web would be among the most vulnerable to obsolescence. The UK Web Archive's Interject prototype's Web site claims that these formats are obsolete and require migration:
- image/x-bitmap and image/x-pixmap, both rendered in my standard Linux environment via Image Viewer.
- x-world/x-vrml, versions 1 and 2, not rendered in my standard Linux environment, but migration tools available.
- ZX Spectrum software, not suitable for migration.
Click image to start emulationThe prediction that technology for access to preserved content would improve is borne out by recent developments. Two and a half years ago the team from Freiburg University presented their emulation framework bwFLA which, like those from the Olive Project at CMU and the Internet Archive, is capable of delivering an emulated environment to the reader as a part of a normal Web page. An example of this is Rhizome's art piece from 2000 by Jan Robert Leegte untitled[scrollbars]. To display the artist's original intent, it is necessary to view the piece using a contemporary Internet Explorer, which Rhizome does using bwFLA.
Viewed with Safari on OS XIncreasingly, scrollbars are not permanent but pop up when needed. Viewing the piece with, for example, Safari on OS X is baffling because the scrollbars are not visible.
The prediction that if obsolescence were to happen to a widely used format it would happen very slowly is currently being validated, but not for the expected reason and not as a demonstration of the necessity of format migration. Adobe's Flash has been a very widely used Web format. It is not obsolete in the sense that it can no longer be rendered. It is becoming obsolete in the sense that browsers are following Steve Jobs lead and deprecating its use, because it is regarded as too dangerous in today's Internet threat environment:
Five years ago, 28.9% of websites used Flash in some way, according to Matthias Gelbmann, managing director at web technology metrics firm W3Techs. As of August, Flash usage had fallen to 10.3%.
But larger websites have a longer way to go. Flash persists on 15.6% of the top 1,000 sites, Gelbmann says. That’s actually the opposite situation compared to a few years ago, when Flash was used on 22.2% of the largest sites, and 25.6% of sites overall.If browsers won't support Flash because it poses an unacceptable risk to the underlying system, much of the currently preserved Web will become unusable. It is true that some of that preserved Web is Flash malware, thus simply asking the user to enable Flash in their browser is not a good idea. But if Web archives emulated a browser with Flash, either remotely or locally, the risk would be greatly reduced.
Even if the emulation fell victim to the malware, the underlying system would be at much less risk. If the goal of the malware was to use the compromised system as part of a botnet, the emulation's short life-cycle would render it ineffective. Users would have to be warned against input-ing any sensitive information that the malware might intercept, but it seems unlikely that many users would send passwords or other credentials via a historical emulation. And, because the malware was captured before the emulation was created, the malware authors would be unable to update it to target the emulator itself rather than the system it was emulating.
So, how did my predictions hold up?
- It is clear that obsolescence of widely used Web formats is rare. Flash is the only example in two decades, and it isn't obsolete in the sense that advocates of preemptive migration meant.
- It is clear that if it occurs, obsolescence of widely used Web formats is a very slow process. For Flash, it has taken half a decade so far, and isn't nearly complete.
- The technology for accessing preserved content has improved considerably. I'm not aware of any migration-based solution for safely accessing preserved Flash content. It seems very likely that a hypothetical technique for migrating Flash would migrate the malware as well, vitiating the reason for the migration.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week:
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
I like to take on a lot of projects. I love seeing projects come to fruition, and I want to provide the best possible services for my campus community. I think the work we do as librarians is important work. As I’ve taken on more responsibilities in my current job though I’ve learned I can’t do everything. I have had to reevaluate the number of things I can accomplish and projects I can support.Photo by
Darren Tunnicliff. Published under a CC BY-NC-ND 2.0 license.
Libraries come in all different shapes and sizes. I happen to work at a small library. We are a small staff—3 professional librarians including the director, 2 full-time staff, 1 part-time staff member, and around 10 student workers. I think we do amazing things at my place of employment, but I know we can’t do everything. I would love to be able to do some of the projects I see staff at larger universities working on, but I am learning that I have to be strategic about my projects. Time is a limited resource and I need to use my time wisely to support the campus in the best way possible.
This has been especially true for tech projects. The maintenance, updating, and support needed for technology can be a challenge. Now don’t get me wrong, I love tech and my library does great things with technology, but I have also had to be more strategic as I recognize my capacity does have limits. So with new projects I’ve been asking myself:
- How does this align with our strategic plan? (I’ve always asked this with new projects, but it is always good to remember)
- What are top campus community needs?
- What is the estimated time commitment for a specific project?
- Can I support this long term?
Some projects are so important that you are going to work on the project no matter what your answers are to these questions. There are also some projects that are not even worth the little bit of capacity they would require. Figuring out where to focus time and what will be the most beneficial for your community is challenging, but worth it.
How do you decide priorities and time commitments?
Many cultural institutions have accelerated the development of their digital collections and data sets by allowing citizen volunteers to help with the millions of crucial tasks that archivists, scientists, librarians, and curators face. One of the ways institutions are addressing these challenges is through crowdsourcing.
In this post, I’ll look at a few sample crowdsourcing projects from libraries and archives in the U.S. and around the world. This is strictly a general overview. For more detailed information, follow the linked examples or search online for crowdsourcing platforms, tools, or infrastructures.
In general, volunteers help with:
- Analyzing images, creating tags and metadata, and subtitling videos
- Transcribing documents and correcting OCR text
- Identifying geographic locations, aligning/rectifying historical maps with present locations, and adding geospatial coordinates
- Classifying data, cross-referencing data, researching historic weather, and monitoring and tracking dynamic activities.
The Library of Congress utilizes public input for its Flickr project. Visitors analyze and comment on the images in the Library’s general Flickr collection of over 20,000 images and the Library’s Flickr “Civil War Faces” collection. “We make catalog corrections and enhancements based on comments that users contribute,” said Phil Michel, digital conversion coordinator at the Library.
In another type of image analysis, Cancer Research UK’s Cellslider project invites volunteers to analyze and categorize cancer cell cores. Volunteers are not required to have a background in biology or medicine for the simple tasks. They are shown what visual elements to look for and instructed on how to categorize into the webpage what they see. Cancer Research UK states on its website that as of the publication of this story, 2,571,751 images have been analyzed.
Both of the examples above use descriptive metadata or tagging, which helps make the images more findable by means of the specific keywords associated with — and mapped to — the images.
The British National Archives runs a project, titled “Operation War Diary,” in which volunteers help tag and categorize diaries of World War I British soldiers. The tags are fixed in a controlled vocabulary list, a menu from which volunteers can select keywords, which helps avoid the typographical variations and errors that may occur when a crowd of individuals freely type their text in.
The New York Public Library’s “Community Oral History Project” makes oral history videos searchable by means of topic markers tagged into the slider bar by volunteers; the tags map to time codes in the video. So, for example, instead of sitting through a one-hour interview to find a specific topic, you can click on the tag — as you would select from a menu — and jump to that tagged topic in the video.
The National Archives and Records Administration offers a range of crowdsourcing projects on its Citizen Archivist Dashboard. Volunteers can tag records and subtitle videos to be used for closed captions; they can even translate and subtitle non-English videos into English subtitles. One NARA project enables volunteers to transcribe handwritten old ship’s logs that, among other things, contain weather information for each daily entry. Such historic weather data is an invaluable addition to the growing body of data in climate-change research.
Transcription is one of the most in-demand crowdsourcing tasks. In the Smithsonian’s Transcription Center, volunteers can select transcription projects from at least ten of the Smithsonian’s 19 museums and archives. The source material consists of handwritten field notes, diaries, botanical specimen sheets, sketches with handwritten notations and more. Transcribers read the handwriting and type into the web page what they think the handwriting says. The Smithsonian staff then runs the data through a quality control process before they finally accept it. In all, the process comprises three steps:
- The volunteer types the transcription into the web page
- Another set of registered users compares the transcriptions with the handwritten scans
- Smithsonian staff or trained volunteers review and have final approval over the transcription.
Notable transcription projects from other institutions are the British Library’s Card Catalogue project, Europeana’s World War I documents, the Massachusetts Historical Society’s “The Diaries of John Quincy Adams,” The University of Delaware’s, “Colored Conventions,” The University of Iowa’s “DIY History,” and the Australian Museum’s Atlas of Living Australia.
Optical Character Recognition is the process of taking text that has been scanned into solid images — sort of a photograph of text –and machine-transforming that text image into text characters and words that can be searched. The process often generates incomplete or mangled text. OCR is often a “best guess” by the software and hardware. Institutions ask for help comparing the source text image with its OCR text-character results and hand-correcting the mistakes.
Newspapers comprise much of the source material. The Library of Virginia, The Cambridge Public Library, and the California Digital Newspaper collection are a sampling of OCR-correction sites. Examples outside of the U.S. include the National Library of Australia and the National Library of Finland.
The New York Public Library was featured in the news a few years ago for the overwhelming number of people who volunteered to help with its “What’s on the Menu” crowdsourcing transcription project, where the NYPL asked volunteers to review a collection of scanned historic menus and type the menu contents into a browser form.
NYPL Labs has gotten even more creative with map-oriented projects. With “Building Inspector” (whose peppy motto is, “Kill time. Make history.”), it reaches out to citizen cartographers to review scans of very old insurance maps and identify each building — lot by lot, block by block — by its construction material, its address and its spatial footprint; in an OCR-like twist, volunteers are also asked to note the name of the then-existent business that is hand written on the old city map (e.g. MacNeil’s Blacksmith, The Derby Emporium). Given the population density of New York, and the propensity of most of its citizens to walk almost everywhere, there’s a potential for millions of eyes to look for this information in their daily environment, and go home and record it in the NYPL databases.
Volunteers can also user the NYPL Map Warper to rectify the alignment differences between contemporary maps and digitized historic maps. The British Library has a similar map-rectification crowdsourcing project called Georeferencer. Volunteers are asked to rectify maps scanned from 17th-, 18th- and 19th-century European books. In the course of the project, maps get geospatially enabled and become accessible and searchable through Old Maps Online.
Citizen Science projects range from the cellular level to the astronomical level. The Audubon Society’s Christmas Bird Count asks volunteers to go outside and report on what birds they see. The data goes toward tracking the migratory patterns of bird species.
Geo-Wiki is an international platform that crowdsources monitoring of the earth’s environment. Volunteers give feedback about spatial information overlaid on satellite imagery or they can contribute new data.
Gamification makes a game out of potentially tedious tasks. Malariaspot, from the Universidad Politécnica de Madrid, makes a game of identifying the parasites that lead to malaria. Their website states, “The analysis of all the games played will allow us to learn (a) how fast and accurate is the parasite counting of non-expert microscopy players, (b) how to combine the analysis of different players to obtain accurate results as good as the ones provided by expert microscopists.”
Carnegie Melon and Stanford collaboratively developed, EteRNA, a game where users play with puzzles to design RNA sequences that fold up into a target shapes and contribute to a large-scale library of synthetic RNA designs. MIT’s “Eyewire” uses gamification to get players to help map the brain. MIT’s “NanoDoc” enables game players to design new nanoparticle strategies towards the treatment of cancer. The University of Washington’s Center for Game Science offers “Nanocrafter,” a synthetic biology game, which enables players to use pieces of DNA to create new inventions. “Purposeful Gaming,” from the Biodiversity Heritage Library, is a gamified method of cleaning up sloppy OCR. Harvard uses the data from its “Test My Brain” game to test scientific theories about the way the brain works.
Crowdsourcing enables institutions to tap vast resources of volunteer labor, to gather and process information faster than ever, despite the daunting volume of raw data and limitations of in-house resources. Sometimes the volunteers’ work goes directly into a relational database that maps to target digital objects and sometimes the work resides somewhere until a human can review it and accept or reject it. The process requires institutions to trust “outsiders” — average people, citizen archivists, historians, hobbyists. If a project is well structured and the user instructions are clear and simple, there is little reason for institutions to not ask the general public for help. It’s a collaborative partnership that benefits everyone.