You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 5 days 16 hours ago

Cynthia Ng: Notes for BC SirsiDynix Users Group Meeting 2017

Thu, 2017-05-18 22:54
We got a bunch of presentations from both SD and a couple of presentations from libraries. COSUGI Updates Rick Branham, VP, Pre-Sales Solutions; Steve Donoghue, Senior Library Relations Manager; Tom Walker, Executive Accounts Manager Striving to listen to what customers want, and branding/marketing focus on customers and libraries. Popular “add-on” products: Enterprise, MobileCirc, Visibility, eResource … Continue reading Notes for BC SirsiDynix Users Group Meeting 2017

Harvard Library Innovation Lab: LIL Talks: A Small Study of Epic Proportions

Thu, 2017-05-18 20:26

(This is a guest post by John Bowers, a student at Harvard College who is collaborating with us on the Entropy Project. John will be a Berktern here this Summer.)

In last week’s LIL talk, team member and graduating senior Yunhan Xu shared some key findings from her prize-winning thesis “A Small Study of Epic Proportions: Toward a Statistical Reading of the Aeneid.” As an impressive entry into the evolving “digital humanities” literature, Yunhan’s thesis blended the empirical rigor of statistical analysis with storytelling and interpretive methods drawn from the study of classics.

The presentation dealt with four analytical methodologies applied in the thesis. For each, Yunhan offered a detailed overview of tools and key findings.

  1. 1. Syntactic Analysis. Yunhan analyzed the relative frequencies with which different verb tenses and parts of speech occur across the Aeneid’s 12 books. Her results lent insight into the “shape” of the epic’s narrative, as well as its stylistic character in relation to other works.
  2. 2. Sentiment Analysis. Yunhan used sentiment analysis tools to examine the Aeneid’s emotional arc, analyze the normative descriptive treatment of its heroes and villains, and differentiate—following more conventional classics scholarship—the tonality of its books.
  3. 3. Topic Modeling. Here, Yunhan subjected existing bipartite and tripartite “partitionings” of the Aeneid to statistical inquiry. By applying sophisticated topic modelling techniques including Latent Dirichlet Allocation and Non-Negative Matrix Factorization, she made a compelling case for the tripartite interpretation. In doing so, she added a novel voice to a noteworthy debate in the classics community.
  4. 4. Network Analysis. By leveraging statistical tools to analyze the coincidence of and interactions between the Aeneid’s many characters, Yunhan generated a number of compelling visualizations mapping narrative progression between books in terms of relationships.

 

In the closing minutes of her presentation, Yunhan reflected on the broader implications of the digital humanities for the study of classics. While some scholars remain skeptical of the digital humanities, Yunhan sees enormous potential for collaboration and coevolution between the new way and the old.

LITA: 2017 LITA Forum – Call for Proposals, Deadline Extended

Thu, 2017-05-18 19:39

The LITA Forum is a highly regarded annual event for those involved in new and leading edge technologies in the library and information technology field. Please send your proposal submissions here by June 2, 2017, and join your colleagues in Denver Colorado.

The 2017 LITA Forum Committee seeks proposals for the 20th Annual Forum of the Library and Information Technology Association in Denver, Colorado, November 9-12, 2017 at the Embassy Suites by Hilton Denver Downtown Convention Center.

Submit your proposal at this site

The Forum Committee welcomes proposals for concurrent sessions, workshops, poster sessions, or full-day preconferences related to all types of libraries: public, school, academic, government, special, and corporate. Collaborative, hands-on, and interactive concurrent sessions, such as panel discussions, hands-on practical workshops, or short talks followed by open moderated discussions, are especially welcomed. Proposals may cover projects, plans, ideas, or recent discoveries. We accept proposals on any aspect of library and information technology. The committee particularly invites submissions from first time presenters, and library school students. We deliberately seek and strongly encourage submissions from underrepresented groups, such as women, people of color, the LGBTQA+ community and people with disabilities.

The New Submission deadline is Friday June 2, 2017. 

Presenters will submit final presentation slides and/or electronic content (video, audio, etc.) to be made available online following the event. Presenters are expected to register and participate in the Forum as attendees; a discounted registration rate will be offered.

If you have any questions, contact Vincci Kwong, Forum Planning Committee Chair, at vkwong@iusb.edu.

Additional details are at the Submission site

More information about LITA is available from the LITA websiteFacebook and Twitter.

Questions or Comments?

Contact LITA at (312) 280-4268 or Mark Beatty, mbeatty@ala.org

 

Jonathan Rochkind: On open source, consensus, vision, and scope

Thu, 2017-05-18 19:30

Around minute 27 of Building Rails ActionDispatch::SystemTestCase Framework from Eileen Uchitelle.

What is unique to open source is that the stakeholders you are trying to find consensus with have varying levels of investment in the end result…

…but I wasn’t prepared for all the other people who would care. Of course caring is good, I got a lot of productive and honest feedback from community members, but it’s still really overwhelming to feel like I needed to debate — everyone.

Rails ideologies of simplicity differ a lot from capybara’s ideology of lots of features. And all the individuals who were interested in the feature had differing opinions as well… I struggled with how to respect everyone’s opinions while building system tests, but also maintaining my sense of ownership.

I new that if I tried to please all groups and build systems tests by consensus, then I would end up pleasing no one. Everyone would end up unhappy because consensus is the enemy of vision. Sure, you end up adding everything everyone wants, but the feature will lose focus, and the code will lose style, and I will lose everything that I felt like was important.

I needed to figure out a way to respect everyone’s opinions without making systems tests a hodepodge of idoelogies of feeling like I threw out everything I cared about. I had to remind ourselves that we all had one goal: to integrate systems testing into rails. Even if we disagreed about the implementation, htis was our common ground.

With this in mind, there are a few ways you can keep your sanity when dealing with multiple ideologies in the open source world. One of the biggest things is to manage expectations. In open source there are no contracts, you can’t hold anyone else acountable (except for yourself) and nobody else is going to hold you accountable either… You are the person who has to own the scope, and you are the person who has to say ‘no’. There were a ton of extra features suggested for systems tests that I would love to see, but if I had implemented all of them it still wouldn’t be in rails today. I had to manage the scope and the expectations of everyone involved to keep the project in budget…

…When you are building open source features, you are building something for others. If you are open to suggestions the feature might change for the better. Even if you don’t agree, you have to be open to listening to the other side of things. It’s really easy to get cagey about the code that you’ve worked so hard to write. I still have to fight the urge to be really protective of systems test code… but I also have to remember that it’s no longer mine, and never was mine, it now belongs to everyone that uses Rails….

I new that if I tried to please all groups and build systems tests by consensus, then I would end up pleasing no one. Everyone would end up unhappy because consensus is the enemy of vision. Sure, you end up adding everything everyone wants, but the feature will lose focus, and the code will lose style…


Filed under: General

OCLC Dev Network: Label Printing & July 9 WMS API Install

Thu, 2017-05-18 13:00

As OCLC first notified the community in April, WMS APIs will be upgraded on July 9, 2017 to add security enhancements, which will affect libraries that rely on the label printing application from the University of New Mexico. The changes to the APIs are not backward-compatible with the current UNM application and will break this functionality.  OCLC and UNM have worked together to update the code and ensure its compatibility with the scheduled changes to the WMS APIs.

Open Knowledge Foundation: What is the difference between budget, spending and procurement data?

Thu, 2017-05-18 12:18

Fiscal data is a complex topic. It comes in all different kind of formats and languages, its’ availability cannot be taken for granted and complexity around fiscal data needs special skills and knowledge to unlock and fully understand it. The Global Open Data Index (GODI) assesses three fiscal areas of national government: budgets, spending, and procurement.

Repeatedly our team receives questions why some countries rank low in budgets, public procurement or spending, even though fiscal data is openly communicated. The quick answer: often we find information that is related to this data but does not exactly describe it in accordance with the described GODI data requirements. It appears to us that a clarification is needed between different fiscal data. This blogpost is dedicated to shed light on some of these questions.

As part of our public dialogue phase, we also want to address our experts in the community. How should we continue to measure the status of these three key datasets in the future? Your input counts! Should we set the bar lower for GODI and avoid measuring transactional spending data at all? Is our assessment of transactional spending useful for you? You can leave us your feedback or join the discussion on this topic in our forum.

The different types of fiscal data

A government budget year produces different fiscal data types.

Budgeting is the process where a government body sets its priorities as to how it intends to spend an amount of money over a specific time period (usually annually or semi-annually). Throughout the budgeting cycle  (the process of defining the budget), an initial budget can undergo revisions to result in a revised budget.

Spending is the process of giving away money. This mean, the money might be given as a subsidy, a contract, refundable tax credit, pension or salary.

Procurement is the process of selecting services from a supplier who fits best the need. That might involve selecting vendors, establishing payment terms, some strategic tender or other vetting mechanism meant to prevent corruption.

Not only are the processes linked to each other, the data describing these processes can be linked too (e.g. in cases where identifiers exist linking spending to government budgets and public contracts). For laypersons, it might be difficult to tell the difference when they are confronted with a spending or procurement dataset: Is the money I see in a dataset spending, or part of contracting? The following paragraphs explain the differences.

Budget

As mentioned above, budgeting is called the process where a government body decides how to spend money over a certain time period. The amount is broken into smaller amounts (budget items) which can be classified as follows:

  • Administrative (which government sub-unit gets the money)
  • Functional (what the money is going to be used for)
  • Economic (how the money is going to be used, e.g., procurement, subsidies, salaries etc.)
  • Financing source (where the money should come from).

After the budget period ends, we know how much money was actually spent on each item – in theory. The Global Open Data Index assesses budget information at the highest administrative level (e.g. national government, federal government), which is broken down in one of these classifications. Here is an example of some fully open budget data of Argentina’s national government.

Example of Argentina’s national government budget 2017 (table shortened and cleaned)

The image shows the government entity, and expenditures split into economic classification (how the money is used). At the far right, we can see a column describing the total amount of money effectively spent on a planned budget expenditure. It basically compares allocated and paid money. This column must not be mixed with spending information on a transactional level (which displays each single transaction from a government unit to a recipient).

Spending

The Spending Data Handbook describes spending as “data relating to the specific expenditure of funds from the government”. Money might be given as a subsidy, as payment for a provided service, a salary (although salaries will seldom be published on a transactional level), a pension fund payment, a contract or a loan, to name just a few.
GODI focusses on transactions of service payments (often resulting from a prior procurement process). Monetary transactions are our baseline for spending data. GODI assesses the following information:

  • The amount that was transferred
  • The recipient (an entity external to the government unit)
  • When the transaction took place
  • Government office paying the transaction
  • Data split into individual transactions

GODI exclusively looks for single payment transfers. The reason why we are looking at this type of data is that spending patterns can be detected, and fraud or corruption uncovered. Some of the questions one might be able to address include: Who received what amount of money? Could government get its services from a cheaper service provider? Is government contracting to a cluster of related companies (supporting cartels)?

GODI’s definition of spending data, even though ambitious in scope, does not consider the entire spectrum of transactional spending data. Being produced by many agencies, spending data is scattered  across different places online. We usually pick samples of specific spending data such as government payments to external suppliers (e.g. the single payments through a procurement process). Other types of payment, such as grants, loans or subsidies are then left aside.

Our assessment is also ‘generous’ because we accept spending data that is only published above a certain threshold. The British Cabinet Office, a forerunner in disclosing spending data, only publishes data above £25,000. GODI accepts this as valid, even though we are aware that spending data below this amount remains opaque. There are also many more ways to expand GODI’s definition of spending data. For instance, we could ask if each transaction can be linked to a budget item or procurement contract so that we understand the spending context better.

Example image of British Spending data (Cabinet Office spending over £25,000)

Above is an example image of Great Britain’s Cabinet Office spending. You can see the date and the amount paid by government entity. Using the supplier name, we can track how much money was paid to the supplier. However, in this data no contract ID or contract name is provided that could allow to fully understand as part of what contracts these payments have been made.

Procurement

When purchasing goods and services from an external source, government units require a certain process for choosing the supplier who fits best the need. This process is called procurement and includes planning, tendering, awarding, contracting and implementation. Goals are to enable a fair competition among service providers and to prevent corruption.

Many data traces enable to shed light on each procurement stage. For example one might want to understand from which budget a service is gonna be paid, or what amount of money has been awarded (with some negotiation possible) or finally contracted to a supplier. This blogpost by the Open Contracting Partnership illustrates how each of the procurement stages can be understood through different data.

GODI focuses on two essential stages, that are considered to be a good proxy to understand procurement. These however do not display all information.

Tender phase

  • Tenders per government office
  • Tender name
  • Tender description
  • Tender status

Award phase

  • Awards per government office
  • Award title
  • Award description
  • Value of the award
  • Supplier’s name

Any payment resulting out of government contracts with external suppliers (sometimes only one, sometimes more) has to  be captured in government spending. For example, there might a construction contractor that is being paid by milestone, or an office supplies dealer which is chosen as a supplier. Then each spending transaction is for a specific item purchased through a procurement process.

Below you can see a procurement database of Thailand. It displays procurement phases, but does not display individual transactions following from these. This particular database does not represent actual spending data (monetary transactions), but preceding stages of the contracting process. Despite this the platform is misleadingly called “Thailand Government Spending”.

Procurement database in Thailand

Another example is a procurement database indicating how much money has been spent on a contract:

Example for the procurement website ‘Cuánto y a quién se contrató’ (Colombia)

The road ahead – how to measure spending data in the future

Overall, there is slow but steady progress around the openness of fiscal data. Increasingly, budget and procurement data is provided in machine-readable formats or openly licensed, sometimes presented on interactive government portals or as raw data (more detail see for example in the most recent blogpost of the Open Contracting Partnership around open procurement data).

Yet, there is a long way to go for transactional spending data. Governments do first laudable steps by creating budget or procurement websites which demonstrate how much money will or has been spent in total. These may be confusingly named ‘spending’ portals because in fact they are linked to other government processes such as budgeting (e.g. how much money should be spent) or procurement (how much money has been decided to pay for an external service). The actual spending in form of single monetary transactions is missing. And to date there is no coherent standard or specification that would facilitate to document transactional spending.

We want to address our experts in the community. How should we continue to measure the status of these three key datasets in the future? Your input counts!  You can leave us your feedback and discuss this topic in our forum.

 

This blog was jointly written by Danny Lämmerhirt and Diana Krebs (Project Manager for Fiscal Projects at Open Knowledge International)

Open Knowledge Foundation: Open Data Day events, MyData Japan 2017 and other OK Japan updates

Thu, 2017-05-18 11:43

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Japan team.

International Open Data Day

We had a lot of localities joining the International Open Data Day (IODD) – the international website for the IODD shows 42 localities in Japan, but our listing shows 65. OK Japan members helped promote the event via pre-event, social media, and the Japanese website.

We saw a lot of discussions, hackathons, and some mapping parties, among others. Many ‘Code For’s’ were involved in hosting the event.

Open Knowledge Japan Award at VLED

Annually, OK Japan joins a group of other organisations celebrating major and noteworthy achievements in open data in Japan, by issuing unsolicited awards to whoever we think deserves the annual award. We are happy to share that this year OK Japan awarded the digitisation project of classic Japanese materials by the National Institute of Japanese Literature and Center for Open Data in the Humanities. Their dataset includes some cooking books from Edo Period, and some recipes are modified and put into modern Japanese language and released in the Edo period recipe section of largest recipe sharing site in Japan, Cookpad.

This year’s awardees (in Japanese) include the legislators who worked on the basic law for government and private sector data use promotion, which now provide legal ground for open data (see below), which is the best award; health-related open data by Ministry of Health, Labor, and Welfare; and one-stop search on meeting minutes and transcripts of prefectural and major city legislatures by Bitlet and Yasuo Oda, and so many more.

Basic law to promote data use, including open data

The Japanese Parliament passed a law on data use in early December 2016. Under the law, the Japanese government creates a new high-level committee to promote data usage. National and prefectural governments are required under this law to develop their plans to disseminate easily usable data online. Municipal governments and private sector businesses are also expected to make efforts to help the cause. The goal is to gain economic benefits.

MyData Japan 2017

Inspired by the event hosted by OK Finland, MyData 2016, some attendees and others interested in the proper and active use of personal data have decided to hold MyData Japan 2017. The OK Japan Chapter will serve as the host and organiser of this whole-day event, which takes place on 19 May 2017 in Tokyo.

Contact Tomoaki Watanabe [tomoaki.watanabe@gmail.com], the coordinator of Open Knowledge Japan for more information regarding their events and activities. 

Ed Summers: Infradata

Thu, 2017-05-18 04:00

If you enjoy reading about the relationship between data and metadata and/or the history of the Internet you’ll definitely want to check out Bradley Fidler and Amelia Acker’s excellent paper Metadata, infrastructure, and computer-mediated communication in historical perspective (Fidler & Acker, 2016). If you need a copy drop me an <noscript>&#x65;&#x6d;&#x61;&#x69;&#108;&#32;&#40;&#x65;&#104;&#x73;&#32;&#x61;&#116;&#32;&#112;&#x6f;&#98;&#x6f;&#120;&#32;&#100;&#x6f;&#116;&#32;&#x63;&#x6f;&#x6d;&#x29;</noscript>.

In this paper Acker and Fidler develop the idea of infradata which is a specific kind of metadata that is required maintain infrastructure. They use this idea to examine the evolution of the Arpanet by taking a close look at the development of the Host/Host Protocol that allowed computers to connect to each other. The source for this history is found in the IETF RFC’s many of which were originally circulated in hard copy but have been digitized and made available online.

In my work with metadata as a software developer I’ve always been biased to metadata that is immediately useful. I’ve found focusing on use helps ground and scope discussions about what metadata standards should be. For example during my work on the National Digital Newspaper Project I had to work with a data specification that was quite complicated, and (I felt) raised the bar unreasonably high for data producers (awardees). I worked on a team that built a website that provided access to the historical newspaper content. In building that application we only used a small fraction of the metadata that awardees were required to produce. The specification struck me as unnecessarily complicated at the time, and perhaps it still is.

But maybe I was overly focused on the infradata, or the data that was required for me to build an access system, and not seeing the larger picture that includes the (sometimes unknown) requirements of digital preservation. Future scenarios when knowing what camera was used to image the microfilm frame was actually important are easy to talk about, but they can also expand to fill up all available space.

At the time I comforted myself with the idea that “digital preservation is access in the future” (Summers, 2013) and therefore focusing on access in the present was the best way to ensure preservation. But now I’m not so sure. This idea of infradata highlights that while some metadata is actively used to maintain the system it is a part of, not all of it is, or necessarily should be.

Acker and Fidler got me thinking about the difficulties of studying these systems historically. The IETF has done such a great job of publishing RFCs over its history. But I wonder how easy it is to get at the stories around these specifications, and their various versions?

Since I’m actively engaged in a project to think about the preservation of social media, I began thinking about how the metadata in a tweet has changed over the years. Projects like Hitch make it possible to look at how APIs like Twitter’s change over time. Documentation is sometimes available in the Internet Archive where it can be used to bring historical snapshots of documentation back to life. I thought it could be useful to create a bot that watches the Twitter sample stream and notices any new or changed metadata in the JSON for a Tweet. If you are are interested you can follow it at @tweetjson.

Here’s how it works. The bot watches the Twitter sample stream, and for each tweet it creates a blueprint of the data. It then compares this blueprint against a master blueprint, and announces any new or changed data properties on Twitter. The master blueprint is really just a snapshot of all the previous data fields the bot has seen, which is currently 1229 fields that you can see here.

The blueprint is a uses a jq like syntax to represent each path in the JSON data. It’s a bit more difficult to notice when fields are removed because not all tweets contain all fields. Just because a given field isn’t present in a tweet doesn’t mean it has been removed. I guess the bot could keep some kind of timestamp associated with each field and then if it grows really stale (like months?) it could assume that it has been removed? That was a bit more adventurous for the time I had available to test this idea out.

Anyway, I thought I’d write this up briefly here as a small example of how research can be generative for software projects. I hope that it can work the other way sometimes too.

References

Fidler, B., & Acker, A. (2016). Metadata, infrastructure, and computer-mediated communication in historical perspective. Journal of the Association for Information Science and Technology. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/asi.23660/abstract

Summers, E. (2013). The web as a preservation medium. Retrieved from https://inkdroid.org/2013/11/26/the-web-as-a-preservation-medium/

DuraSpace News: Recording Available–Powering Linked Data and Hosted Solutions with Fedora

Thu, 2017-05-18 00:00

On May 16, 2017 Fedora community members presented a webinar entitled " Powering Linked Data and Hosted Solutions with Fedora." David Wilcox, Fedora Product Manager with DuraSpace, provided an overview of Fedora with a focus on its native linked data capabilities. Hannah Frost, Manager, Digital Library Product and Service Management with Stanford University, presented Hyku, the Hydra-in-a-box repository product, which provides a hosted option for Fedora-based repositories.

DuraSpace News: DSpace UK and Ireland User Group Meeting Slides, Links and More

Thu, 2017-05-18 00:00

From Sarah Molly, Chair, DSpace UK and Ireland User Group, Research Support Manager, Library and Archives, Queen Mary University of London

London, UK  Notes, jottings, slides and links from the May 17, 2017 DSpace UK and Ireland User Group meeting are available here.

DuraSpace News: INVITATION: Fedora and Hydra Camp at Oxford

Thu, 2017-05-18 00:00

DuraSpace and Data Curation Experts are pleased to invite you to attend the Fedora and Hydra Camp at Oxford University, Sept 4 - 8, 2017. The camp will be hosted by Oxford University Oxford, UK and is supported by Jisc.

District Dispatch: Internet Association debunks claims that strong Net Neutrality protections hurt internet investment

Wed, 2017-05-17 20:07

Some opponents of the FCC’s 2015 Open Internet Order claim the order created a regulatory environment that kept Internet Service Providers from investing and building better broadband. Today, the Internet Association (IA)’s Chief Economist responded, finding that ISPs continue to invest and innovate at similar or greater levels in the current regulatory environment, including after Title II reclassification of internet services. IA will release its full research paper on internet service provider (ISP) investment in the coming months. Using multiple sources, IA demonstrates that strong net neutrality protections have NOT harmed investment or innovation in our broadband networks. Some key findings include:

  • SEC filings show a 5.3% or $7.3 billion increase in telecom investment among publicly traded companies from 2013-14 to 2015-16;
  • OECD and U.S. Telecom data show a 5.1% or $4.7 billion increase in telecom investment in 2014 to 2015; and
  • SNL Kagan and NCTA: the Internet and Television Association data show a 56% or $89.9 Billion increase in cable investment from 2009 to 2016.

The Internet Association represents many of the largest and most rapidly growing internet companies. Find IA’s Net Neutrality fact sheet here.

Tomorrow, the FCC will vote on a proposed rulemaking that would begin to undo strong net neutrality protections. The ALA has and will continue to advocate for strong, enforceable net neutrality protections. You can watch the FCC’s Open Meeting live here beginning at 10:30 a.m. EDT.

The post Internet Association debunks claims that strong Net Neutrality protections hurt internet investment appeared first on District Dispatch.

Karen Coyle: Two FRBRs, Many Relationships

Wed, 2017-05-17 17:13
There is tension in the library community between those who favor remaining with the MARC21 standard for bibliographic records, and others who are promoting a small number of RDF-based solutions. This is the perceived conflict, but in fact both camps are looking at the wrong end of the problem - that is, they are looking at the technology solution without having identified the underlying requirements that a solution must address. I contend that the key element that must be taken into account is the role of FRBR on cataloging and catalogs.

Some background:  FRBR is stated to be a mental model of the bibliographic universe, although it also has inherent in it an adherence to a particular technology: entity-relation analysis for relational database design. This is stated fairly clearly in the  introduction to the FRBR report, which says:

The methodology used in this study is based on an entity analysis technique that is used in the development of conceptual models for relational database systems. Although the study is not intended to serve directly as a basis for the design of bibliographic databases, the technique was chosen as the basis for the methodology because it provides a structured approach to the analysis of data requirements that facilitates the processes of definition and delineation that were set out in the terms of reference for the study. 
The use of an entity-relation model was what led to the now ubiquitous diagrams that show separate entities for works, expressions, manifestations and items. This is often read as a proposed structure for bibliographic data, where a single work description is linked to multiple expression descriptions, each of which in turn link to one or more manifestation descriptions. Other entities like the primary creator link to the appropriate bibliographic entity rather than to a bibliographic description as a whole. In relational database terms, this would create an efficiency in which each work is described only once regardless of the number of expressions or manifestations in the database rather than having information about the work in multiple bibliographic descriptions. This is seen by some as a potential efficiency also for the cataloging workflow as information about a work does not need to be recreated in the description of each manifestation of the work.

Two FRBRs
What this means is that we have (at least) two FRBR's: the mental model of the bibliographic universe, which I'll refer to as FRBR-MM; and the bibliographic data model based on an entity-relation structure, which I'll refer to as FRBR-DM. These are not clearly separated in the FRBR final report and there is some ambiguity in statements from members of the FRBR working group about whether both models are intended outcomes of the report. Confusion arises in many discussions of FRBR when we do not distinguish which of these functions is being addressed.

FRBR-Mental Model
FRBR-MM is the thinking behind the RDA cataloging rules, and the conceptual entities define the structure of the RDA documentation and workflow. It instructs catalogers to analyze each item they catalog as being an item or manifestation that carries the expression of a creative work. There is no specific data model associated with the RDA rules, which is why it is possible to use the mental model to produce cataloging that is entered into the form provided by the MARC21 record; a structure that approximates the catalog entry described in AACR2.

In FRBR-MM, some entities can be implicit rather than explicit. FRBR-MM does not require that a cataloguer produce a separate and visible work entity. In the RDA cataloging coded in MARC, the primary creator and the subjects are associated with the overall bibliographic description without there being a separate work identity. Even when there is a work title created, the creator and subjects are directly associated with the bibliographic description of the manifestation or item. This doesn't mean that the cataloguer has not thought about the work and the expression in their bibliographic analysis, but the rules do not require those to be called out separately in the description. In the mental model you can view FRBR as providing a checklist of key aspects of the bibliographic description that must be addressed.

The FRBR report defines bibliographic relationships more strongly than previous cataloging rules. For her PhD work, Barbara Tillett (a principal on both the FRBR and RDA work groups) painstakingly viewed thousands of bibliographic records to tease out the types of bibliographic relationships that were noted. Most of these were implicit in free-form cataloguer-supplied notes and in added entries in the catalog records. Previous cataloging rules said little about bibliographic relationships, while RDA, using the work of Tillett which was furthered in the FRBR final report, has five chapters on bibliographic relationships. In the FRBR-MM encoded in MARC21,  these continue to be cataloguer notes ("Adapted from …"), subject headings ("--adaptations"), and added entry fields. These notes and headings are human-readable but do not provide machine-actionable links between bibliographic descriptions. This means that you cannot have a system function that retrieves all of the adaptations of a work, nor are systems likely to provide searches based on relationship type, as these are buried in text. Also, whether relationships are between works or expressions or manifestations is not explicit in the recorded data. In essence, FRBR-MM in MARC21 ignores the separate description of the FRBR-defined Group 1 entities (WEMI), flattening the record into a single bibliographic description that is very similar to that produced with AACR2.

FRBR-Data Model
FRBR-DM adheres to the model of separate identified entities and the relationships between them. These are seen in the diagrams provided in the FRBR report, and in the section on bibliographic relationships from that report. The first thing that needs to be said is that the FRBR report based its model on an analysis that is used for database design. There is no analysis provided for a record design. This is significant because databases and records used for information exchange can have significantly different structures. In a database there could be one work description linked to any number of expressions, but when exchanging information about a single  manifestation presumably the expression and work entities would need to be included. That probably means that if you have more than one manifestation for a work being transmitted, that work information is included for each manifestation, and each bibliographic description is neatly contained in a single package. The FRBR report does not define an actual database design nor a record exchange format, even though the entities and relations in the report could provide a first step in determining those technologies.

FRBR-DM uses the same mental model as FRBR-MM, but adds considerable functionality that comes from the entity-relationship model. FRBR-DM implements the concepts in FRBR in a way that FRBR-MM does not. It defines separate entities for work, expression, manifestation and item, where MARC21 has only a single entity. FRBR-DM also defines relationships that can be created between specific entities. Without actual entities some relationships between the entities may be implicit in the catalog data, but only in a very vague way. A main entry author field in a MARC21 record has no explicit relationship to the work concept inherent in the bibliographic description, but many people's mental model would associate the title and the author as being a kind of statement about the work being described. Added entries may describe related works but they do not link to those works.

The FRBR-DM model was not imposed on the RDA rules, which were intended to be neutral as to the data formats that would carry the bibliographic description. However, RDA was designed to support the FRBR-DM by allowing for individual entity descriptions with their own identifiers and for there to be identified relationships between those entities. FRBR-DM proposes the creation of a work entity that can be shared throughout the bibliographic universe where that work is referenced. The same is true for all of the FRBR entities. Because each entity has an identified existence, it is possible to create relationships between entities; the same relationships that are defined in the FRBR report, and more if desired. FRBR-DM, however, is not supported by the MARC21 model because MARC21 does not have a structure that would permit the creation of separately identified entities for the FRBR entities. FRBR-DM does have an expression as a data model in the RDA Registry. In the registry, RDA is defined as an RDF vocabulary in parallel with the named elements in the RDA rule set, with each element associated with the FRBR entity that defines it in the RDA text. This expression, however, so far has only one experimental system implementation in RIMMF. As far as I know, no libraries are yet using this as a cataloging system.

The replacement proposed by the Library of Congress for the MARC21 record, BIBFRAME, makes use of entities and relations similar to those defined in FRBR, but does not follow FRBR to the letter. The extent to which it was informed by FRBR is unclear but FRBR was in existence when BIBFRAME was developed. Many of the entities defined by FRBR are obvious, however, and would be arrived at by any independent analysis of bibliographic data: persons, corporate bodies, physical descriptions, subjects. How BIBFRAME fits into the FRBR-MM or the FRBR-DM isn't clear to me and I won't attempt to find a place for it in this current analysis. I will say that using an entity-relation model and promoting relationships between those entities is a mainstream approach to data, and would most likely be the model in any modern bibliographic data design.

MARC v RDF? 
The decision we are facing in terms of bibliographic data is often couched in terms of "MARC vs. RDF", however, that is not the actual question that underlies that decision. Instead, the question should be couched as: entities and relations, or not? if you want to share entities like works and persons, and if you want to create actual relationships between bibliographic entities, something other than MARC21 is required. What that "something" is should be an open question, but it will not be a "unit record" like MARC21.

For those who embrace the entity-relation model, the perceived "rush to RDF" is not entirely illogical; RDF is the current technology that supports entity-relation models. RDF is supported by a growing number of open source tools, including database management and indexing. It is a World Wide Web Consortium (W3C) standard, and is quickly becoming a mainstream technology used by communities like banking, medicine, and academic and government data providers. It also has its down sides: there is no obvious support in the current version of RDF for units of data that could be called "records" - RDF only recognizes open graphs; RDF is bad at retaining the order of data elements, something that bibliographic data often relies upon. These "faults" and others are well known to the W3C groups that continue to develop the standard and some are currently under development as additions to the standard.

At the same time, leaping directly to a particular solution is bad form. Data development usually begins with a gathering of use cases and requirements, and technology is developed to meet the gathered requirements. If it is desired to take advantage of some or all of the entity-relation capabilities of FRBR, the decision about the appropriate replacement for MARC21 should be based on a needs analysis. I recall seeing some use cases in the early BIBFRAME work, but I also recall that they seemed inadequate. What needs to be addressed is the extent to which we expect library catalogs to make use of bibliographic relationships, and whether those relationships must be bound to specific entities.

What we could gain by developing use cases would be a shared set of expectations that could be weighed against proposed solutions. Some of the aspects of what catalogers like about MARC may feed into those requirements, as well what we wish for in the design of the future catalog. Once the set of requirements is reasonably complete, we have a set of criteria against which to measure whether the technology development is meeting the needs of everyone involved with library data.

Conclusion: It's the Relationships
The disruptive aspect of FRBR is not primarily that it creates a multi-level bibliographic model between works, expressions, manifestations, and items. The disruption is in the definition of relationships between and among those entities that requires those entities to be separately identified. Even the desire to share separately work and expression descriptions can most likely be done by identifying the pertinent data elements within a unit record. But the bibliographic relationships defined in FRBR and RDA, if they are to be actionable, require a new data structure.

The relationships are included in RDA but are not implemented in RDA in MARC21, basically because they cannot be implemented in a "unit record" data format. The key question is whether those relationships (or others) are intended to be included in future library catalogs. If they are, then a data format other than MARC21 must be developed. That data format may or may not implement FRBR-defined bibliographic relationships; FRBR was a first attempt to redefine a long-standing bibliographic model and should be considered the first, not the last, word in bibliographic relationships.

If we couch the question in terms of bibliographic relationships, not warring data formats, we begin to have a way to go beyond emotional attachments and do a reasoned analysis of our needs.

Open Knowledge Foundation: Hospital Waiting List – Open Knowledge Ireland Workshop #1

Wed, 2017-05-17 13:30

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Ireland team.

This post was first published on 12th April 2017 by Flora Fleischer on OK Ireland’s website: https://openknowledge.ie/hwl1/

On the sunny Saturday of March 25th, Open Knowledge Ireland held a workshop powered by citizens which focused on discovering how Open Data can help the ever present Hospital Waiting List problem. With the workshop, we created a space to build engagement around open data and hospital waiting lists and offered participants a practical way to get involved. The workshop was possible because, in December 2016, the National Treatment Purchase Fund (NTPF) published Hospital Waiting List Data on data.gov.ie as machine readable data for the first time. Hospital Waiting List data can now be found here, here, and here.

Hospital Waiting List Workshop #1 focused on identifying & discovering the patient journey, the data that is available, an operating model for use case creation using open data and a long list of potential use cases that for prioritisation at Hospital Waiting List Citizen Workshop #2.

The workshop benefited from having experienced professionals from a range of new and disruptive fields of industries and expertise. On the day OK Ireland facilitated Data Analysts, Customer Experience SMEs, Technology Solution Consultants, Digital Natives, Students, and Coders. We also provided Open Data insights from Ireland and abroad and framed the topic for the day – ways of using open data to address the growing Hospital Waiting Lists in Ireland.

Here is an account of Piush Vaish – a participant at the 1st Hospital Waiting List workshop citizen about how the day went. The post first appeared on his LinkedIn page.

Ways to Improve Hospital Waiting List Using Open Data

Ireland has one of the worst hospital’s waiting lists as a developed country. We all have or know someone who has experienced the uncertainty of the length of time to wait before seeing a specialist. We constantly wonder about our health while we wait, affecting not only our physical but mental health as well. For instance, I had to wait overnight to be seen by a specialist at Beaumont hospital.

Therefore, when an opportunity came to tackle the problem of hospital waiting list using data, I had to do something. That chance came through a workshop/hackathon organised by Open Knowledge Ireland on 25th March 2017. It was the first in a series of hospital waiting list focused workshops held at Guinness Enterprise Center. Open Knowledge Ireland is a part of Open Knowledge International with the goal of opening all essential public interest information. It is a non-profit organisation dedicated to promoting open data and open content in all forms to create insights that drive change and benefit the public at large.

When I arrived at the venue there was a short session where we got to know the other participants over a cup of tea and biscuits. The group of participants came from a different background with various different skill sets and industry experience. Some of them were UX designers, web/ application developers, statisticians, past participants and data scientists. However, we all had one reason to be at the workshop.

The motivation was to tackle a very real social problem as a group of experts and for our citizens by using public data about hospital waiting lists to make that information easily accessible for everybody.

Afterwards, we took up an office in a special set-up meeting room to learn about the work of Open Knowledge Ireland, what open data is and the reasons why we should be interested in the hospital waiting list data.

Open Knowledge Ireland explained their mission, vision, and values. The hospital waiting list datasets are produced by the NTPF. Since July 2012, the NTPF is responsible for the publication of outpatient and inpatient waiting lists. However, they originally published this data in pdf format which is not considered an ‘open’ data format. It limits the usability of the data.

Hence, Open Knowledge Ireland has worked over the last two years to create examples of how the Out-Patient Waiting List and Inpatient/Day Case Waiting List can be published in easily accessible format. They also worked together with the NTPF and the Department of Public Expenditure and Reform to get this data published in machine readable format. In December 2016 hospital waiting list data was for the first time made available in machine readable format on data.gov.ie. This now enables anyone to download the datasets and do any sort of analysis on it.

The format of the workshop was unconference or open space conference. It was my first time attending such a conference. We were given a problem statement but we were free to tackle it in any way the group thought to be most useful to understand the problem more. The agenda was driven by the participants and their expertise in technology, digital, User Experience design, Digital, Analytics and backgrounds from various industries.

There were no narrow topics pre-determined, no keynote speakers invited and no panel had been arranged – so the workshop was very interactive and very driven by the participants themselves. The topics to be discussed were refined through the participation of the attendees to problem statements that could be tackled and looked at in one day. If a session among a group did not inspire an attendee or was not contributing, then he/she were free to get up and find a different group. This enabled everyone to leverage and play on their strength, do research and contribute to understanding the problem statement based on their own experience.

We convened at the individual breakout sessions to discuss the progress of each working group and share learning’s between the working groups. In my opinion, this process helped to apply ideas and empowered participants to share their ability. This offered an opportunity to have an unfiltered exchange of creative ideas.

My first work group was working on mapping the journey for the patient right from getting a symptom till diagnosed by the specialist. The aim was to document the end to end experience of the patient through their perspective, understand how patients are interacting with their general practitioner or hospital, find pain points, identify areas for improvement and improve the experience moving forward.

mapping a patient’s journey: from getting a symptom to being diagnosed by a specialist

The visualisation inspired us to seek value-driven decisions based on a patient’s experience model of performance.

There was another group who mapped a patient’s journey from A&E, how this journey is currently tracked and how the data is collated by one specific hospital. This was to understand the pain points that hospitals may come across when gathering and providing the data.

Later, we swapped our findings to create a complete picture of the patient’s journey.

I then swapped from the journey mapping group to another group that was working on data validation. It was essential for the long-term success of the project that the data is open, correct and useful.

We ensured that the data gathered by NTPF was using data/statistical standards. While I was engaging with different groups, the other participants were engaged in data analysis, creating an API and researching the problem in other countries. The figure below shows an early view of the type of insights that can be generated using the hospital waiting list data that is available on data.gov.ie today.

We also had a short video presentation by Bob Harper from Detail Data who created the Hospital Waiting List Dashboards that are available for Northern Ireland. He explained how he is using the data provided by NHS on his website to present information in a way that is more easily accessible to and understandable by the public in Northern Ireland.

At the end of the day, we all presented our findings to the group and decided what we’ll focus on during the next workshop.

Some of the points we aim to discuss in the next workshop are:

  • Understand existing Hospital Wait Time data publicly available in the Republic of Ireland
  •  Understand and highlight data gaps
  • Recommend additional data points required to build tools useful to citizens (suggest via data.gov.ie)
  •  Identify quick-win use cases and begin prototyping
  • Identify more complex use cases and next steps

If you are inspired by what we have achieved and interested to continue the journey to empower the public please register your interest by attending the next workshop: Hospital Waiting List Citizen Workshop #2.

Contact: flora.fleischer@openknowledge.ie

In the Library, With the Lead Pipe: Preservation in Practice: A Survey of New York City Digital Humanities Researchers

Wed, 2017-05-17 13:25

In Brief

Digital Humanities (DH) describes the emerging practice of interpreting humanities content through computing methods to enhance data gathering, analysis, and visualization. Due to factors including scale, complexity, and uniqueness, the products of DH research present unique challenges in the area of preservation. This study collected data with a survey and targeted interviews given to New York City metro area DH researchers intended to sketch a picture of the methods and philosophies that govern the preservation efforts of these researchers and their institutions. Due to their familiarity with evolving preservation principles and practices, librarians are poised to offer expertise in supporting the preservation efforts of digital humanists. The data and interviews described in this report help explore some of the current practices in this area of preservation, and suggest inroads for librarians as preservation experts.

By Malina Thiede (with significant contributions from Allison Piazza, Hannah Silverman, and Nik Dragovic)

Introduction

If you want a definition of Digital Humanities (DH), there are hundreds to choose from. In fact, Jason Heppler’s whatisdigitalhumanities.com alone offers 817 rotating definitions of the digital humanities, pulled from participants from the Day of DH between 2009-2014. A few of these definitions are listed below:

Digital Humanities is the application of computer technology to make intellectual inquiries in the humanities that either could not be made using traditional methods or are made significantly faster and easier with computer technology. It can include both using digital tools to make these inquiries or developing these tools for others to use. –Matthew Zimmerman

DH is the study, exploration, and preservation of, as well as education about human cultures, events, languages, people, and material production in the past and present in a digital environment through the creation and use of dynamic tools to visualize and analyze data, share and annotate primary sources, discuss and publish findings, collaborate on research and teaching, for scholars, students, and the general public. –Ashley Sanders

For the purposes of this article, digital humanities will be defined as an emerging, cross-disciplinary field in academic research that combines traditional humanities content with technology focused methods of display and interpretation. Most DH projects are collaborative in nature with researchers from a variety of disciplines working together to bring these complex works to fruition. DH projects can range from fairly traditional research papers enhanced with computing techniques, such as text mining, to large scale digital archives of content that include specialized software and functionality.

Due to the range of complexity in this field and the challenges of maintaining certain types of digital content, long-term preservation of DH projects has become a major concern of scholars, institutions, and libraries in recent years. While in the sciences, large scale collaborative projects are the norm and can expect to be well funded, DH projects are comparatively lacking in established channels for financial and institutional support over the long term, which can add another layer of difficulty for researchers. As librarians at academic institutions take on responsibility for preserving digital materials, they certainly have a role in ensuring that these DH projects are maintained and not lost.

For the purposes of this paper, a digital humanities project will be broadly defined as cross-disciplinary collaboration that manifests itself online (i.e. via a website) as both scholarly research and pedagogical resource using digital method(s). Methods can include, but are not limited to, digital mapping, data mining, text analysis, visualization, network analysis, and modeling.

Literature Review

The Library of Congress’s (n.d.) catchall definition of digital preservation is “the active management of digital content over time to ensure ongoing access.” Hedstrom (1998) offers a more specific definition of digital preservation as “the planning, resource allocation, and application of preservation methods and technologies necessary to ensure that digital information of continuing value remains accessible and usable.”
Digital preservation is a complex undertaking under the most favorable conditions, requiring administrative support, funding, personnel, and often specialized software and technology expertise.

Kretzschmar and Potter (2010) note that digital preservation, and, in particular, digital humanities preservation, faces a “stand-still-and-die problem” because it is necessary to “continually…change media and operating environments just to keep our information alive and accessible.” This is true of preserving most digital objects, but the complex, multi-faceted nature of many DH projects adds additional layers of complexity to the already challenging digital preservation process. Zorich (2008) lists other components of the “digital ecosystem” that must be preserved in addition to the actual content itself: “software functionality, data structures, access guidelines, metadata, and other…components to the resource.”

Kretzschmar and Potter (2010) lay out three seemingly simple questions about preserving digital projects: “How will we deal with changing media and operating environments? Who will pay for it? And who will do the work?” whose answers are often difficult to pin down. When working with DH projects, ‘what exactly are we preserving?’ may also be an important question because as Smith (2004) notes that “there are…nagging issues about persistence that scholars and researchers need to resolve, such as…deciding which iteration of a dynamic and changing resource should be captured and curated for preservation.” In 2009, Digital Humanities Quarterly published a cluster of articles dedicated to the question of “doneness” in DH projects. Kirschenbaum (2009) notes in the introduction to the cluster that “digital humanities…[is] used to deriving considerable rhetorical mileage and the occasional moral high-ground by contrasting [its] radical flexibility and mutability with the glacial nature of scholarly communication in the fixed and frozen world of print-based publication.” Unlike some digital assets that undergo preservation, DH projects and the components thereof are often in a state of flux and, indeed, may never truly be finished. This feature of DH projects makes their preservation a moving target. Kretzschmar (2009) detailed the preservation process for the Linguistic Atlas Project, a large scale DH project that spanned decades, explaining “we need to make new editions all the time, since our idea of how to make the best edition changes as trends in scholarship change, especially now in the digital age when new technical possibilities keep emerging.” Another example of a DH project that has undergone and continues to undergo significant revisions is described in Profile #5 below.

In addition to the particular technological challenges of preserving often iterative and ever-evolving DH projects, there are structural and administrative difficulties in supporting their preservation as well. Maron and Pickle (2014) identified preservation as a particular risk factor for DH projects with faculty naming a wide range of entities on campus as being responsible for supporting their projects’ preservation needs, which suggested “that what preservation entails may not be clear.” Bryson, Posner, St. Pierre, and Varner (2011) also note that “The general lack of policies, protocols, and procedures has resulted in a slow and, at times, frustrating experience for both library staff and scholars.” Established workflows and procedures are still not easily found in the field of DH preservation, leading scholars, librarians, and other support staff to often attempt to reinvent the wheel with each new project. Other difficult to avoid problems noted across the literature are those of staff attrition and siloing.

Although rife with challenges, the preservation of DH projects is far from a lost cause, and libraries have a crucial role to play in ensuring that, to some degree, projects are successfully maintained. The data and interviews summarized in this paper reveal how some of these projects are being preserved as well as their particular difficulties. There are certainly opportunities for librarians to step in and offer their preservation expertise to help scholars formulate and achieve their preservation goals.

Methodology

The methodology for this project was influenced by time frame and logistics. Initially the project was slated to be completed within five months, but the deadline was later extended to nine months. Because it would have been difficult to interview multiple individuals across New York City within the original time frame, we decided on a two phase approach to conducting the survey, similar to Zorich’s methodology, where an information gathering phase was followed by interviews (Zorich, 2008). The survey involved (1) conducting an online survey of NYC faculty members engaged in digital humanities, and (2) performing in-person or phone interviews with those who agreed to additional questioning. The survey provided a broad, big picture overview of the practices of our target group, and the interviews supplemented that data with anecdotes about specific projects and their preservation challenges. The interviews also provided more detailed insight into the thoughts of some DH scholars about the preservation of their projects and digital preservation in general.

The subjects of our survey and interviews were self-selected faculty members and PhD candidates engaged in digital humanities research and affiliated with an academic institution within the New York City area. This population of academics was specifically targeted to reach members of the DH community that had access to an institutional library and its resources. We limited our scope to the New York City for geographic convenience.

We targeted survey respondents using the NYC Digital Humanities website as a starting point. As of October 2015, when the selection process for this project was underway there were 383 members listed in the NYC Digital Humanities online directory. An initial message was sent to the NYCDH listserv on June 3, 2015, and individual emails were sent to a subset of members in June 15, 2015. We approached additional potential survey respondents that we knew fit our criteria via email and Twitter.

Figure 1: NYC Digital Humanities Logo

Survey

The survey tool was a 34-item online Qualtrics questionnaire asking multiple choice and short answer questions about the researchers’ work and their preservation strategies and efforts to date. The survey questions were developed around 5 specific areas: background information about the projects and their settings, tools used, staff/management of preservation efforts, future goals, and a query about their availability for follow up interviews. As all DH projects are unique, respondents were asked to answer the questions as they pertain to one particular project for which they were the Principal Investigator (PI).

Interviews

Interviewees were located for the second phase of the research by asking survey respondents to indicate if they were willing to participate in a more in-depth interview about their work. Interested parties were contacted to set up in-person or conference call interviews. The interviews were less formal and standardized than the survey, allowing for interviewees to elaborate on the particular issues related to the preservation of their projects. Each interview was recorded but not fully transcribed. Team members reviewed the recordings and took detailed notes for the purpose of comparing and analyzing the results.

Limitations

Although the scope of this project was limited to a particular geographic area with a large population base, the sample size of the survey respondents was fairly small. The institutions of all but three respondents are classified as moderate to high research activity institutions according to the Carnegie Classifications. These types of institutions are by no means the only ones involved in DH work, but the high concentration of respondents from research institutions may indicate that there is greater support for DH projects at these types of institutions. As a result, this paper does not provide much discussion of DH preservation practices at smaller baccalaureate or masters institutions with a stronger emphasis on undergraduate education.

A Note about Confidentiality

Individuals who participated in the online survey were asked to provide their names and contact information so we could follow-up with them if they chose to participate in the interview. Individuals who took part in the interviews were guaranteed confidentiality to encourage open discussion. All findings are reported here anonymously.

Survey Results

The survey was live from June 3, 2015 to July 10, 2015. In total, 18 respondents completed the survey.

Demographics of the Faculty Engaged in Digital Humanities

Our survey respondents represented 10 New York City academic institutions, with the most responses coming from Columbia University. Department affiliations and professional titles are listed below (figure 2).

Institutional Affiliation # of respondents Columbia University 5 CUNY Graduate Center 3 New York University 2 Bard Graduate Center 1 Hofstra University 1 Jozef Pilsudski Institute of America 1 New York City College of Technology 1 Queensborough Community College 1 St. John’s University 1 The New School 1 Department Affiliation # of respondents Library/Digital Scholarship Lab 7 English 4 History 3 Art History 2 Linguistics 1 Unreported 1 Academic Titles # of respondents Professor 4 Assistant Professor 3 Associate Professor 2 Adjunct/Lecturer 2 Digital Scholarship Coordinator or Specialist 2 PhD Candidate 2 Director 2 Chief Librarian 1

Figure 2: Survey respondent demographics (n=18)

We asked respondents where they received funding for their projects (figure 3). Responses were split, with some respondents utilizing two funding sources.

Funding Source # of respondents Institutional funding 28% Grant funding 22% Personal funds 17% Institutional and grant funding 17% No funding 11% Institutional and personal funds 6%

Figure 3: Funding Source

DH Project Characteristics

As previously mentioned, respondents were asked to choose one digital humanities project in which to answer the survey questions. Questions were asked to determine the number of people collaborating on the project and the techniques and software used. The majority of respondents (88%) were working collaboratively with one or more colleagues (figure 4).

# of collaborators # of respondents 2-3 collaborators 33% 6+ collaborators 33% 0 collaborators 22% 4-5 collaborators 11%

Figure 4: Collaborators involved in DH project (n=18)

The techniques utilized are listed in figure 5, with 61% of projects utilizing more than one of these techniques.

Technique # of projects Data Visualizations 39% Other* 32% Data Mining and Text Analysis 28% Geospatial Information Systems (GIS) 22% Network Analysis 17% Text Encoding 11% 3-D Modeling 6%

*maps, interactive digital museum exhibition, audio (2), software code analysis, data analysis tools, OHMS (Oral History Metadata Synchronizer)

Figure 5: Techniques used in DH project (n=18)

The techniques mentioned above are created with software or code, which can be proprietary, open-source, or custom. Respondents utilized a mix of these software types, with 33% of respondents saying that they used proprietary software in their projects, 89% report using open-source software, and 33% used custom software. A list of software examples can be found in figure 6.

Proprietary Software Open-Source Software Adobe Photoshop (2) WordPress (6) Adobe Dreamweaver Omeka (3) Adobe Lightroom Python (2) Google Maps MySQL (2) TextLab Timeline.js (2) SketchUp QGIS (2) Weebly DSpace

Figure 6: Software utilized by respondents

Knowledge of Preservation

33% of respondents reported that they had formal training in digital preservation, which the authors intended to mean academic coursework or continuing education credit. Informally, respondents have consulted numerous resources to inform preservation of their project (figure 7).

Source Percent Published scholarly research 72% Colleagues or informal community resources 66% Digital Humanities Center, library/librarian, archivist 50% Grey literature 44% Professional or scholarly association sponsored events 22% Conferences 33% Campus workshops or events 11% None 6%

Figure 7: Sources consulted to inform Preservation

Project Preservation Considerations

Preservation of their DH project was considered by the majority (72%) of respondents. When asked who first mentioned preservation of their project, 93% of those who had considered preservation said either they or one of their collaborators brought up the issue. In only one instance did a librarian first suggest preservation, and there were no first mentions by either funder or host department.

The majority of initial preservation discussions (53%) took place during the project, with 39% taking place before the project began, and 8% after project completion.

When asked to consider how many years into the future they see their project being usable and accessible, the majority (56%) said 5+ years, followed by 3-4 years (22%), and 17% were unsure. One respondent noted they were not interested in preservation of the project.

Preservation Strategy

Version control, migration, metadata creation, emulation, durable persistent media, and bit stream preservation are just a few strategies for preserving digital materials. We asked respondents to rate each strategy by importance (figure 8).

Figure 8: Preservation strategies by importance

All respondents reported that they backup their work in some capacity. The most respondents (78%) are using cloud services. Half report the use of institutional servers, and 44% use home computers. GitHub was mentioned by two respondents as a safe storage solution for their projects. The majority of respondents (66%) are utilizing more than one way of backing up their work.

Interview Findings

Through follow-up interviews with five respondents, we delved into several of these projects in greater detail. Interviewees gave us more information about their projects and their partnerships, processes, and policies for the preserving the work.

Profile #1: DH Coordinator

Interview conducted and summarized by Nik Dragovic

Respondent 1 was a coordinator in a Digital Humanities Center at their institution and had undertaken the work in collaboration with librarian colleagues because the library works closely with researchers on DH projects at this particular institution.

This initiative was unique in that no preservation measures were being undertaken, a strategy that resulted from discussion during the conception of the project. The resulting life expectancy for the project, comprising a geography-focused, map-intensive historical resource incorporating additional digital content, was three to four years. The reason for the de-emphasis of preservation stemmed from a shared impression that the complexity of preservation planning acts as a barrier to initiating a project. Given their intention to produce a library-produced exemplar work rather than a traditional faculty portfolio piece, the initiative was well-suited to this approach. The technical infrastructure of the project included a PHP stack used to dynamically render the contents of a mySQL database. The general strategy incorporated elements of custom software and open source technologies including Neatline and Omeka.

The unique perspective of the respondent as an institutional DH liaison as well as a practitioner made the interview more amenable to a general discussion of the issues facing a broad set of digital humanists and their interaction with library services. The overriding sentiment of the respondent echoed, to a large extent, existing literature’s assertion that DH preservation is nascent and widely variable.

Specifically, the interviewee opined that no one framework, process, or solution exists for those seeking to preserve DH outputs, and that every project must have its own unique elements taken into account. This requires an individual consultation with any project stakeholder concerned with the persistence of their work. A primary element of such conversations is expectation management. In the respondent’s experience, many practitioners have the intention of preserving a fully functional interface in perpetuity. In most cases, the time, cost, and effort required to undertake such preservation measures is untenable.

The variegated and transformative code stack environments currently underpinning DH projects is a leading issue in permanent maintenance of the original environment of a DH project. As a result, the respondent advocated for a “minimal computing” approach to preservation, in which more stable formats such as HTML are used to render project elements in a static format, predicated on a data store instead of a database, with languages like Javascript as a method for coordinating the front-end presentation. This technique allows not only for a simpler and more stable preservation format, but also enables storage on GitHub or Apache servers, which are generally within institutional resources.

Another preservation solution the respondent explained was the dismantling of a DH project into media components. Instead of migrating the system into a static representation, one leverages an institutional repository to store elements such as text, images, sound, video, and data tables separately. The resulting elements would then require a manifest to be created, perhaps in that format of a TAR file, to explain the technology stack and how the elements can be reassembled. An Internet Archive snapshot is also a wise element to help depict the user interface and further contextualize the assets.

In the experience of the respondent, helping digital humanists understand strategic and scaled approaches to preservation is one of the greatest challenges of acting as a library services liaison. Students and faculty have an astute understanding of the techniques underpinning the basic functionality their work, but not the landscape of current preservation methodologies. Not only is the learning curve steep for these more library-oriented topics, but the ambitions of the library and the practitioner often diverge. Whereas the scholar’s ambition is often to generate and maintain a body of their own work, the library focuses more on standardization and interoperability. This creates a potential point of contention between library staff and those they attempt to counsel. Often the liaison must exercise sensitivity in their approach to users, who themselves are experts in their field of inquiry.

The broader picture also includes emerging funding consideration for national grants. When asked about the intentions of the National Endowment for the Humanities to incorporate preservation and reusability into funding requirements, the respondent expressed skepticism of the agency’s conceptualization of preservation, stating that a reconsideration and reworking of the term’s definition was in order.

To apply too exhaustive a standard would encourage a reductive focus on the resource-intensive preservation methods that the respondent generally avoids. Like most facets of the DH preservation question, this warrants further inquiry from practical and administrative standpoints. In a general sense, realistic expectations and practical measures ruled the overall logic of the respondent, as opposed to adherence to any given emerging standard presently available.

Profile #2: Library Director

The impetus behind respondent 2’s project was not to advance scholarship in a particular subject, so the preservation strategy and goals differed from projects that had a more explicitly scholarly purpose. The idea was hatched by a team of librarians as a means to help librarians learn and develop new skills in working with digital research with the ultimate goal of enhancing their ability to collaborate and consult with researchers on their projects. The learning and training focus of this project informed the team’s preservation strategy.

A number of tools were used to plan, document, and build out this project, and some levels of the production were designed to be preserved where others were intended to be built out, but then left alone, instead of migrated as updates become available. The process was documented on a WordPress blog, and the ultimate product was built on Omeka. The team did preservation and versioning of code on GitHub, but they do not intend to update the code even if that means the website will ultimately become unusable.

What was very important to this team was to preserve the “intellectual work” and the research that went into the project. To accomplish that, they decided to use software, such as Microsoft Word and Excel, that creates easy to preserve files, and they are looking into ways to bundle the research files together and upload them to the institution’s repository. Respondent 2 expressed that an early problem they had with the technology team was that they “wanted everything to be as well thought out as our bigger digital library projects, and we said that DH is a space for learning, and sometimes I could imagine faculty projects where we don’t keep them going. We don’t keep them alive. We don’t have to preserve them because what was important was what happened in the process of working out things.”

This team encountered some challenges working with Omeka. At one point they had not updated their version of Omeka and ended up losing quite a bit of work which was frustrating. “We need to be thinking about preservation all along the way” to guard against these kinds of losses of data. Working with the IT department also posed challenges because “technology teams are about security and about control” and are not always flexible enough to support the evolving technology needs of a DH project. The project had to be developed on an outside server and moved to the institutional server where the code could not be changed.

Profile #3: Art Professor

Respondent 3’s institution has set up a DH center with an institutional commitment to preserving the materials for the projects in perpetuity. The center relies on an institutional server and has a broad policy to download and maintain files in order to maintain them indefinitely on the back end. Front end production of the project was outsourced to another institution, and the preservation of that element of the project had not been considered at the time of the interview.

This researcher’s main challenge was that although many of the artworks that are examples in the project are quite old and not subject to copyright, certain materials (namely photographs of 3D objects) are copyrighted and can only be licensed for a period of 10 years. The front-end developer expressed that 10 years was a long time in the lifetime of a website (which would make that limitation of little concern), but being able to only license items for a decade at a time clashes with the institutional policy of maintaining materials indefinitely on the server and raises questions about who will be responsible for this content over the long term if the original PI were to move on or retire.

Profile #4: Archivist

Interview conducted and summarized by Hannah Silverman

Respondent 4, who has developed a comprehensive set of open source tools for the purpose of archiving documents and resources related to a specific historical era, sees their work within the sphere of Digital Humanities. The sense that their archival work was essentially related to the Digital Humanities came about over a period of time as their technical needs required them to connect with a larger set of people, first with the librarians and archives community through the Metropolitan New York Library Council (METRO), then as a DH activity introduced at a METRO event. “I myself am writing a [DH] blog which originally was a blog by archivists and librarians…So, the way I met people who are doing similar things is at METRO. We are essentially doing DH because we are on the cross of digital technologies and archives. It is just a label, we never knew we were doing DH, but it is exactly that.”

The respondent goes on to describe the value of developing tools that can read across the archive, allowing researchers to experience a more contextual feel for a person described within the material – adding dimensionality and a vividness to the memory of that person:

What I am struggling with is essentially one major way of presenting the data and that is the library way. The libraries see everything as an object, a book is an object, and everything else is as an object. So they see objects. And if you look at the NY Public Library…you can search and you can find the objects which can be a page of an archive but it is very difficult to see the whole archive, the whole collection; it’s not working this way. If you search for an object you will find something that is much in the object but it is not conducive to see the context and the archives are the context, so what I am trying to see if we can expand this context space presentation. We spent very little money on this project product which we use to display the data. There is a software designer…who built it for us, but if we could get more funding I would work on [creating] a better view for visualizing the data. Several projects [like this] are waiting in line for funding here…We collect records, records are not people. Records are just names. We would like to put the records in such a way that all the people are listed and then give the information about this person who was in this list because he was doing something, and in this list because he was doing something else, and in this document because he traveled from here to here and so on. That would be another way of sort of putting all the soldiers and all the people involved in these three (volunteer) uprisings for which we have complete records of in part of the archive. We have complete records of all the people in such a way that you could follow a story of a person and also maybe his comrades in arms. It may be the unit in which he worked, and so on.

The respondent has addressed preservation with multiple arrays of hard drives that are configured with redundancy schemes and daily scrubbing programs for replacing any corrupted digital bits. Also copies stored on tape are routinely managed in multiple offsite locations, as well as quality assurance checks occurring via in both analog and digital processes.

Profile #5: English and Digital Humanities Professor

Interview conducted by Hannah Silverman and summarized by Malina Thiede.

The project discussed in this interview began as a printed text for which an interactive, online platform was later created. The online platform includes data visualizations from user feedback (such as highlights) and a crowdsourced index, as no index was included in the original print text. The code for the project is preserved and shared on GitHub which the interviewee sees as a good thing. The visualizations of the data are not being preserved, but the data itself is. There is an intent to create and preserve new visualizations, but the preservation plan was not set at the time of the interview.

The initial project was conceived and executed in a partnership between an academic institution and a university press on a very short timeline (one year from call for submissions to a printed volume) with very rigid deadlines. Due to the rapid and inflexible timeline, preservation was not considered from the outset of the project, but a data curation specialist was brought in between the launch of the site and the first round of revisions to review the site and give advice on issues of preservation and sustainability. The institution supporting the project has strong support for digital initiatives; however, an informal report from the data curation specialist tasked with reviewing the project indicated that “precarity in the institutional support for the project could result in its sudden disappearance.”

The interviewee stated that “we are less focused on preservation than we should be” because “we’re looking towards the next iteration. Our focus has been less on preserving and curating and sustaining what we have” than on expanding the project in new directions. At the time of the interview, this project was entering a new phase in which the online platform was going to be adapted into a digital publishing platform that would support regular publications. The interviewee indicated several times that more of a focus on preservation would be ideal but that the digital elements of this project are experimental and iterative. The priority for this project is moving ahead with the next iteration rather than using resources to preserve current iterations.

Analysis & Conclusion

Through this survey of NYC librarians, scholars, and faculty, our aim was to capture a sample of the work being done in the digital humanities, paying close attention to this population’s preservation concerns, beliefs, and practices. Through this research, we offer the following observations regarding DH content creators and preservation:

1. Preservation is important to the researchers working on these projects, but it is often not their main focus.
2. Scholars working on DH projects are looking for advice and support for their projects (including their project’s preservation).
3. Librarians and archivists are already embedded in teams working on DH projects.

Preservation Challenges

We noticed through textual responses and follow-up interviews that preservation rarely came up in the earliest stages of the project – sometimes due to tight deadlines, and other times simply because preservation is not generally in the conversation during the onset of a project. Researchers are typically not accustomed to thinking about how their work will be preserved. The workflows for traditional published research leave preservation in the hands of the consumer of the research, which is often the library. However, DH and other digital projects often have less clearly defined workflows and audiences, making it less obvious who should be responsible for preservation and when the preservation process should begin. Our data indicates that most planning about preservation occurs sometime during the course of the project or after its completion, rather than at the beginning. Best practices for digital projects state that preservation should be a consideration as close to the beginning of the project as possible, but researchers may not be aware of that until they have done significant work on a project.

It is also noteworthy that just over half of our survey respondents set a goal of preserving their work for five or more years, and significant percentages (22 and 17, respectively) set goals of three to four years or were unsure of how long they wanted their work to be preserved. This indicates that not all projects are intended to be preserved for the long term, but that does not mean that preservation planning and methods should be disregarded for such projects.

As these projects go forward, respondents who do want their projects to be available long term grapple with the difficulties that surround preservation of digital content and the added time commitment it demands.

The following survey respondent illustrates this potential for complexity:

Unlike many digital humanities projects this project exists/existed in textual book format, online, and in an exhibition space simultaneously. All utilize different aspects of digital technologies and are ideally experienced together. This poses much more complicated preservation problems since preserving a book is different from preserving an exhibition which is different from preserving an online portion of a project. What is most difficult to preserve is the unified experience (something I am well aware of being a theatre scholar who has studied similar issues of ephemerality and vestigial artifacts) and is something that we have not considered seriously up to this point. However, because books have an established preservation history, the exhibition was designed to tour and last longer than its initial five-month run, and the online component will remain available to accompany the tour and hopefully even beyond, the duration of the project as a whole has yet to be truly determined and I am sure that considerations of preservation and version migration will come up in the near future for both the physical materials and the digital instantiations of the project. It promises to provide some interesting conundrums as well as fascinating revelations.

And another survey respondent:

I feel like I should unpack the perpetuity question. Our project is text (and) images (and) data visualizations on a website. The text (and) images I’d hope would be accessible for a long time, the data (visualization) relies on specific WordPress plugins/map applications and may not be accessible for a long time. Since we’re self-administering everything we will take things forward with updates as long as we can, but…

Roles for Librarians and Archivists

As one librarian interviewee explained, preservation is a process that needs to be considered as a project is developed and built out, not a final step to be taken after a project is completed. Hedstrom noted as far back as 1998 that preservation is often only considered at a project’s conclusion or after a “sensational loss,” and this remains a common problem nearly 20 years later. Therefore, librarians and archivists should try to provide preservation support starting at the inception of a project. Considering preservation at an early stage can inform the process of selecting tools and platforms; prevent data loss as the project progresses; and help to clarify the ultimate goals and products of a project.

Nowviskie (2015) posed the question: “is [digital humanities] about preservation, conservation, and recovery—or about understanding ephemerality and embracing change?” Humanists have to grapple with this question as it regards their own work, but librarians and archivists can provide support and pragmatic advice to practitioners as they navigate these decisions. Sometimes this may mean that information professionals have to resist their natural urge to advocate for maximal preservation and instead to focus on a level of preservation that will be sustainable using the resources at hand. Librarians and archivists would do well to consider this advice from Nowviskie (2015):

We need to acknowledge the imperatives of graceful degradation, so we run fewer geriatric teen-aged projects that have blithely denied their own mortality and failed to plan for altered or diminished futures. But alongside that, and particularly in libraries, we require more a robust discourse around ephemerality—in part, to license the experimental works we absolutely want and need, which never mean to live long, get serious, or grow up.

Profiles #1 and #2 exemplified the ‘graceful degradation’ approach to DH preservation by building a website that was intended to be ephemeral with the idea that the content created for the site could be packaged in stable formats and deposited in an institutional repository for permanent preservation. The project discussed in profile #5, while not explicitly designed as an ephemeral project, has a fast moving, future focused orientation, such that any one particular iteration of the project may not exist indefinitely, or even for very long. Of course, an ephemeral final product may not be an acceptable outcome in some cases, but advice from librarians can inform the decision making process about what exactly will be preserved from any project and how to achieve the level of preservation desired.

Due to variations in the scale and aims of individual DH projects and the resources available in different libraries, it would be virtually impossible to dictate a single procedure that librarians should follow in order to provide preservation support for DH projects, but based on our data and interviews, librarians who want to support preservation of DH research can take the following steps:

1. Keep up with existing, new, or potential DH research projects on campus. Depending on the type of institution, those projects may be anything from large scale projects like the Linguistic Atlas mentioned above to undergraduate student work.

2. Offer to meet with people doing DH on campus to talk about their projects. Begin a discussion of preservation at an early stage even if long term preservation is not a goal of the researchers. Establishing good preservation practices early can help to prevent painful data losses like the one mentioned in profile #2 as the project progresses.

3. Work with the researchers to develop preservation plans for their projects that will help them meet their goals and that will be attainable given the resources available at your institution/library.

– In developing a plan, some of the questions from our survey (see Appendix I) may be helpful, particularly questions about the nature of the project and the intended timeline for preservation.

– Also keep in mind what resources are available at your library or institution. Kretzschmar and Potter (2010) took advantage of a large, extant media archive at their library to support preservation of the Linguistic Atlas. The interviewees in profiles #1 and #2 also mentioned the institutional repository (IR) as a possible asset in preserving some of the components of their work. (While useful for providing access, IRs are not a comprehensive preservation solution, especially at institutions that use a hosting service.)

– Coordinate with other librarians/staff that may have expertise to help with preservation such as technology or intellectual property experts. As discussed in profile #3, copyright can pose some challenges for DH projects, especially those that include images. Many libraries have staff members that are knowledgeable about copyright who could help find solutions to copyright related problems.

– For doing preservation work with limited resources, The Library of Congress Digital Preservation site has a lot of information about file formats and digitization. Another good, frequently updated source from the Library of Congress is the digital preservation blog The Signal. Although created in 2013 and not updated, the POWRR Tool Grid could be a useful resource for learning about digital preservation software and tools.

Conclusion

DH projects are well on their way to becoming commonplace at all types of institutions and among scholars at all levels from undergraduates to full professors. The data and interviews presented here provide a snapshot of how some digital humanists are preserving their work and about their attitudes toward preservation of DH projects in general. They show that there are opportunities for librarians to help define the preservation goals of DH projects and work with researchers on developing preservation plans to ensure that those goals are met, whether the goal is long term preservation or allowing a project to fade over time.

Acknowledgements

Although this article is published under a single author’s name, the survey and interviews were created and conducted by a team of four that also included Allison Piazza, Nik Dragovic, and Hannah Silverman. Allison, Nik, Hannah, and I all worked together to write and conduct the survey, analyze the results, and present our findings in an ALA poster session and to the Metropolitan New York Library Council (METRO). Writing and conducting the interviews was likewise a group effort, and all of them contributed to writing our initial report although it was never fully completed. The contributions of these team members was so substantial that they should really be listed as authors of this paper alongside me, but they declined when I offered.

This project was initially sponsored by the Metropolitan New York Library Council (METRO). Tom Nielsen was instrumental in shepherding this project through its early phases.

Special thanks also to the Pratt Institute School of Information for funding the poster of our initial results that was displayed at the 2015 ALA Annual Conference.

Additional thanks to Chris Alen Sula, Jennifer Vinopal, and Monica McCormick for their advice and guidance during the early stages of this research.

Finally, thanks to publishing editor Ian Beilin, and to reviewers Ryan Randall and Miriam Neptune. Their suggestions were immensely helpful in bringing this paper into its final form.

References

Bryson, T., Posner, M., St. Pierre, A., & Varner, S. (2011, November). SPEC Kit 326:
Digital Humanities. Retrieved from
http://www.arl.org/storage/documents/publications/spec-326-web.pdf

Carnegie Classifications | Basic Classification. (n.d.). Retrieved from http://carnegieclassifications.iu.edu/classification_descriptions/basic.php

Hedstrom, M. (1997). Digital preservation: a time bomb for digital libraries. Computers
and the Humanities, 31(3), 189–202.

Kirschenbaum, M. G. (2009). Done: Finishing Projects in the Digital Humanities, Digital Humanities Quarterly, 3(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000037/000037.html

Kretzschmar, W. A. (2009). Large-Scale Humanities Computing Projects: Snakes Eating Tails, or Every End is a New Beginning? Digital Humanities Quarterly, 3(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000038/000038.html

Kretzschmar, W. A., & Potter, W. G. (2010). Library collaboration with large digital
humanities projects. Literary & Linguistic Computing, 25(4), 439–445.

Library of Congress. (n.d.). About – Digital Preservation. Retrieved from
http://www.digitalpreservation.gov/about/

Maron, N. L., & Pickle, S. (2014, June 18). Sustaining the Digital Humanities: host
institution support beyond the start-up phase. Retrieved from
http://www.sr.ithaka.org/publications/sustaining-the-digital-humanities/

Nowviskie, B. (2015). Digital Humanities in the Anthropocene. Digital Scholarship in the
Humanities, 30(suppl_1), i4–i15. https://doi.org/10.1093/llc/fqv015

Smith, A. (2004). Preservation. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.). A
Companion to Digital Humanities. Oxford: Blackwell. Retrieved from
http://www.digitalhumanities.org/companion/view?docId=blackwell/978140510313/9781405103213.xml&chunk.id=ss1-5-7&toc.depth=1&toc.id=ss1-5-7&branddefault

Walters, T., & Skinner, K. (2011, March). New roles for new times: digital curation for
preservation. Retrieved from
http://www.arl.org/storage/documents/publications/nrnt_digital_curation17mar11pdf

What is digital humanities? (2015, January). Retrieved from
http://whatisdigitalhumanities.com/

Zorich, D. M. (2008, November). A survey of digital humanities centers in the US. Retrieved from http://f-origin.hypotheses.org/wp-content/blogs.dir/1834/files/2013/08/zorich_2008_asurveyofdigitalhumanitiescentersintheus2.pdf

Appendix: Survey

Preservation in Practice: A Survey of NYC Academics Engaged in Digital Humanities

Thanks for clicking on our survey link! We are a group of four information professionals affiliated with the Metropolitan New York Library Council (METRO) researching the digital preservation of DH projects. Contextual information is available at the myMETRO Researchers page. Our target group is New York City digital humanists working in academia (such as professors or PhD candidates) who have completed or done a significant amount of work on a DH project. If you meet this criteria, we’d appreciate your input. The survey will take less than 15 minutes. The information we gather from this survey will be presented at a METRO meeting, displayed on a poster at the annual conference of the American Library Association, and possibly included as part of a research paper. Published data and results will be de-identified unless prior approval is granted. Please note that your participation is completely voluntary. You are free to skip any question or stop at any time.

You can reach the survey administrators with any questions or comments:
Nik Dragovic, New York University, nikdragovic@gmail.com
Allison Piazza, Weill Cornell Medical College, allisonpiazza.nyc@gmail.com
Hannah Silverman, JDC Archives, hannahwillbe@gmail.com
Malina Thiede, Teachers College, Columbia University, malina.thiede@gmail.com

Is your project affiliated with a New York City-area institution or being conducted in the New York City area?
Yes
No

Title or working title of your DH project:

Does your project have an online component?
Yes (Please provide link, if available):
To be determined
No

What techniques or content types have you used or will you use in your project? Select all that apply.
Data visualizations
Data mining and text analysis
Text encoding
Network analysis
GIS (Geospatial Information Systems)
3-D modeling
Timelines

What date did you begin work on this project (MM/YY)

Approximately how many people are working on this project?
2-3
4-5
6+
I am working on this project alone

Has preservation been discussed in relation to this project?
Yes
No

Who first mentioned the preservation of your project?
Self
Librarian
DH center staff
Project member
Funder
Host department
Other:

At what stage in the project was preservation first discussed?
Before the project began
During the project
After project completion

Who is/will be responsible for preserving this project? Select up to two that best apply.
Self (PI)
Library
Host department
Another team member
Institution
Person or host to be determined
Campus IT
Another institution

How important are each of these processes to your overall preservation strategy for this project?
Bit-stream preservation or replication (making backup copies of your work)
Durable persistent media (storing data on tapes, discs, or another physical medium)
Emulation (using software and hardware to replicate an environment in which a program from a previous generation of hardware or software can run)
Metadata creation
Migration (to copy or convert data from one form to another)
Version control

Are there any other preservation strategies essential to your work that are not listed in the above question? If so, please list them here.

Do you have defined member roles/responsibilities for your project?
Yes
No
Not applicable, I am working on this project alone.

What is your main contribution to this project team? Select all that apply.
Technical ability
Subject expertise
Project management skills

Is there a specific member of your team that is responsible for preservation of the technical infrastructure and/or display of results?
Yes
No

Is there a DH center at your institution?
Yes
No

How often have you consulted with the DH center for your project?
Never
Once
A few times
Many times
DH center staff member is a collaborator on this project
My institution does not have a DH center

How is this project funded? Select all that apply
Institutional funding
Grant funding
Personal funds

Were you required to create a preservation plan for a funding application?
Yes
No

What kinds of resources have you consulted to inform the preservation of your project? Select all that apply.
Published scholarly research (such as books or journal articles)
Guides, reports, white papers and other grey literature
Professional or scholarly association sponsored events or resources (such as webinars)
Conferences
Campus workshops or events
Colleagues or informal community resources
None
DH Center, Library/librarian, archivist

Have you had any training in digital preservation?
Yes
No

How many years into the future do you see your project being usable/accessible?
1-2 years
3-4 years
5+ years
Not sure

Is your resource hosted at your own institution?
Yes
No

If no, where is it hosted?

How are you backing up your work? Select all that apply.
Cloud service
Institutional server
Home computer
DAM tools
Not currently backing up work
Other

Which of the following types of software have you used to create your project? Select all that apply.
Proprietary software (Please list examples)
Open-source software (Please list examples)
Custom software

If you would like to add any perspectives not captured by the previous questions, or clarify your answers, please use the comment box below:

Your full name

Email address

Institutional affiliation

Primary department affiliation

Academic title

If applicable, when did/will you complete your PhD?

Would you be willing to be the subject of an approximately 45-minute interview with a member of our team to talk more in-depth about your project and preservation concerns?

Islandora: iCampEU Instructors Announced

Wed, 2017-05-17 13:17

Islandora Camp is going to Delft, Netherlands from June 13 - 15. This will be our only stop in Europe in 2017, and we'll be holding our traditional three day camp, with two days of sessions bookending a day of hands-on training from experienced Islandora instructors. We are very pleased to announce that the instructors for our training workshop will be:

Rosie Le Faive started with Islandora in 2012 while creating the a trilingual digital library for the Commission for Environmental Cooperation. With experience and - dare she say - wisdom gained from creating highly customized sites, she's now interested in improving the core Islandora code so that everyone can use it. Her interests are in mapping relationships between objects, and intuitive UI design. She is the Digital Infrastructure and Discovery librarian at UPEI. This is her third Islandora Camp as an instructor.

Diego Pino is an experienced Islandora developer and an official Committer. This is his second camp as an instructor and he has been helping folks learn how to get the most out of Islandora on our community listserv since he joined up. Diego started with Islandora in the context of handling biodiversity data for REUNA Chile and has transitioned over to develop and support the many Islandora sites of the Metropolitan New York Library Council.

Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp.

Frits van Latum worked until his retirement at TU Delft Library on several subjects. He is working with Islandora on an admin and developer level since 2014. He built a considerable part of http://colonialarchitecture.eu/ and also worked on http://repository.tudelft.nl/. At the moment he is a freelance consultant and co-organiser of iCamp Europe 2017.

Open Knowledge Foundation: Our learning from the Open Data Day mini grants scheme

Wed, 2017-05-17 10:01

2017 was the third year of OKI Open Data Day Mini-grants scheme. Although we are working on it for a while, we never had the time or capacity to write our learnings from the last two schemes. This year, we decided to take more time to learn about the project and improve it. So we decided to look at the data and share our learnings, so the open data day community can use it in the future.

This year, we used some of our Hewlett grant to team up with groups all over the world who are doing open data day events. We were also lucky to find more funding thanks to Hivos, Article 19, Foreign Commonwealth Office and SPARC. Each partner organisation had their own criteria for the giving the mini-grants. This blog post refers only to the OKI scheme – Open Data for Environment and Open Data for Human Rights. We did include some figures about the other grants, but we can not write about their rationale for how to distribute the money.

How did we decide on the themes for the scheme?

In past years, we awarded the mini-grants without any clear geographical or thematic criteria. We simply selected events that looked interesting to us or that we thought can spark discussion around open data in places where it is not done. We also gave priority to our network members as recipients.

This year, we decided to be more systematic and to test some assumptions. We set up a staff-wide call to discuss the scheme and how it will be built. We decided that Open Data Day is a great opportunity to see how data can be used, and we wanted to limit it to specific topics so we can see this use. Themes like education and health were thrown into the air, but we decided to focus on the environment and human rights – two fields where we saw some use of open data, but not a lot of examples. We tried to gather all that we know on a doc, that then became a staff-wide collaborative work.

We also set other criteria in the meeting. We wanted to see small tangible events rather than big ideas that can not be implemented in one day. We also wanted to see the actual use or promotion of use, rather than a general presentation of open data.

After speaking to David Eaves, Open Data Day spiritual father, we decided to add also a Newbie fund, to support events in places where open data is a new thing.

See all of the details that we gathered here.

 

What themes did people apply to?

 

(Note that FCO joined the grant after the submissions phase closed, and therefore there is no dedicated track for their grant)  

 

Who applied for the grant?

In the 2.5 weeks, we got 204 applications, the majority from the Global South. Just to compare, in the 2016 scheme, we got 61 applications, the majority of them from the Global North. This means that this year we had 3 times more applications to deal with..

View Open Data Day 2017 Mini-Grant Applications in a full screen map

As you can see in the map (made by our talented developer advocate Serah Rono), more than half of the applications (104 if we want to be precise) came from the African continent. Our staff members Serah Rono, David Opoku and Stephen Abbott Pugh, have good networks in Africa and promoted the applications in them. We believe that the aggressive outreach that the three did and the fact that other individuals who champion open data in Africa helped us to promote it are the reason for the increase in applications from there.

In both of our the tracks – human rights and environment, around 25% of the applications we got were from groups who didn’t work with open data or group that didn’t suggest an activity on the theme  – 15 in human rights track and 13 in the environment track.

 

How did we choose who will get the grant?

4 of our staff members – Serah, David, Oscar and Mor gave a score to each application

-1  – the application did not meet the criteria
0  –  the submission met the criteria but did offer anything unique not
1 – The submission met the criteria and offered a new perspective on data use on the topic.

We tried to make the bias as little as possible by having a diverse committee from different genders and locations.  We decided not take into consideration where the application is coming from geographically and gender sex of the applicant.

In our final list, when we had two applications from the same country, we tried to give the money only to one group.

 

What should we have paid attention to?

Gender. Our friends from SPARC checked that they distribute the grant equitably between men and women. We tried to have

We then decided to investigate even further the gender of the applicant. Since we didn’t qualify for the applicant’s gender in the application form, we determined their genders through their names and validate it through a google search. Out of 202 applications, 140 were made by men, and only one applicant was a joint gender application. (See visualisation).

We don’t know why more men apply than women to the grants and it will be good to hear if other organisations had the same experience with this topic. If so, it is important to see why women are not applying for these opportunities.

 

Who received the grant?

Unlike previous years, this year we took the time to reply to all the applicants about their status as fast as we could. However, we realised that answering back takes longer t

Also, we published all winners in a blog post before open data and tried to keep the process as transparent as we can. See our announcement blog post here. However, during the last couple of month, some groups could not organise the event, and they asked us to give the money to someone else. These groups were from Costa Rica, Morocco, Uganda, Zimbabwe and Brazil. We decided, therefore, to give the grant to another group, Open Knowledge Philippines, for their annual Open Data Day event.

Newbie category

Since some of the groups that applied had no experience in open data, we wanted to try and give the grant to two of these so we can build capacity and see how open data can become part of their work. However, since we announce the winner a week before open data day, we didn’t have enough time to work with them so the event will be meaningful. We are currently looking at how we can cooperate with them in the future.

 

What were the outcomes?

All of the learning from the grant recipients are on our blog where you can see different types of data use and the challenges that the community is facing in getting quality data to work with. Some of our recipients started to inquire more about OK network and how to participate and create more events. We would like to hear more from you about how to improve the next open data day by writing on the open data day mailing list.

DuraSpace News: Fedora 4 in Production at Penn State ScholarSphere

Wed, 2017-05-17 00:00

Deploying Fedora 4, or Migrating from Fedora 3 to Fedora 4 is a challenge with built-in rewards. This series of articles, “Fedora 4 in Production” looks into why and how community members are working with Fedora 4 to enhance both collections and workflow at their institutions.

In this article Dan Coughlin, IT Manager, Digital Scholarship and Repository Development, Penn State Libraries, describes Fedora 4 in production at Penn State ScholarSphere.

Cynthia Ng: Write the Docs Day 2: Afternoon Talks

Tue, 2017-05-16 22:58
Notes from the afternoon talks. Lightning Talks Top 10 Myths: True or False? A lot of things to learn. Top 10 myths: Nobody reads documentation. Docs like code. Just let the developer do it. You only need Microsoft Word. FALSE Docs maintain themselves. TRUE Technical writing is easy. You don’t need documentation because products are … Continue reading Write the Docs Day 2: Afternoon Talks

Pages