You are here

Feed aggregator

District Dispatch: Top 10 things to know (and do) about saving library funding

planet code4lib - Thu, 2017-03-30 16:54

The talk of Washington and the library community (when people aren’t talking about the President’s tweets, anyway) is the recent recommendation by the President to completely eliminate funding for the Institute of Museum and Library Services (IMLS), including their library funding implementing the Library Services and Technology Act (LSTA) and the Innovative Approaches to Literacy program within the Department of Education. Here are the top ten things that you need to know about saving IMLS and more than $210 million in annual federal library funding that will be going on all year.

10. Exactly how much money are we talking about?

LSTA received $183.4 million in FY 2016 while IAL received $27 million. These funding levels are essentially the same for FY 2017 as the year before because Congress failed to enact almost any of the twelve individual appropriations bills that fund specific parts of the federal government and is keeping the governments doors open under a series of temporary authorizations called Continuing Resolutions, or “CRs” in Beltway-speak. Under the terms of a CR, programs are funded at the previous year’s levels (though this year the CR includes a de minimus across the board cut of less than 0.5%). If Congress returns from its upcoming April recess on April 24 and figures out how to pass 11 of the 12 unfinished FY 2017 appropriations bills in less than a week, funding levels for FY 2018 could change. However, that narrow window for Congressional action makes another CR running through the end of the current fiscal year (September 30, 2017) vastly more likely.

9. Why this doesn’t matter:

Congress almost always treats the President’s budget submission, which has no force of law, as “DOA.” As noted above, to actually fund the government, Congress is supposed to pass 12 separate appropriations bills. While that process can be messy and protracted, the key thing is that it’s controlled by the Appropriations Committees of the House and Senate which are free to ignore the President (and often do). Many Members of Congress and their staffs tell us that they are not giving serious consideration to the President’s “skinny budget.” So what’s all the fuss about?!?

8. Why this DOES matter:

The Republican party is now in control of both chambers of Congress and the Administration. As you may have seen during last week’s debacle surrounding efforts to repeal the Affordable Care Act, Republicans (at least in the House) are hardly a unified party. However, there are a significant number of conservatives in Congress that would love to deconstruct much of the Federal government and are looking for programs, such as LSTA and IAL, to “zero out.” While libraries have strong allies on Capitol Hill, the President’s request provides ‘red meat’ to others and likely signals that this battle will continue beyond the current fight for FY 2018 funding for the duration of the Administration.

7. What is LSTA and the Grants to States program?

Of the $183.4 million last appropriated by Congress for LSTA, $155 million is dedicated to the Grants to State program. By law, every state gets a portion of that sum per a population-based formula (and subject to a 34% state matching requirement.) Each state librarian or library agency then determines the best use of those funds in their state. In some states, the funds go for the development and maintenance of statewide lending, shared databases, or other state-wide tools. In others, funding is provided for specific programs to, for example: assist veterans transitioning to civilian life, help small businesses to develop an on-line presence, underwrite summer reading programs, foster programs for families with disabled children or even help a small library replace part of their collection lost in a flood. Note: because states are required to match 34% of the Federal funding commitment, a cut on the Federal level necessarily means a cut in state library funding too.

6. What is the Innovative Approaches to Literacy program and why are we talking about it separately from IMLS and LSTA?

IAL grants, which are awarded by the Department of Education rather than IMLS, are the only dedicated source of Federal funding for school libraries and are not part of what was approved by Congress under the LSTA. School libraries can use IAL grant funds received to buy books and other materials in support of early literacy efforts directed to the nation’s neediest children. Of the $27 million appropriated for the program overall, at least half is reserved exclusively for school libraries while the remainder goes to non-profits (which may partner with libraries). Individual schools or school districts apply every two years for funding to the Department of Education. Awards typically go to schools for book distribution, literacy-technology tools and literacy training for teachers and families.

5. What is the ALA Washington Office doing to save IMLS and library funding?

The Washington Office is working early, late and every minute in between to ensure that no stone is left unturned and to keep IMLS, LSTA, IAL and other library funds from being eliminated in FY 2018. As we do every year, the Office of Government Relations’ staff regularly lobbies directly for library funding to Congressional offices individually and/or with multiple coalitions. (Coalitions allow ALA to share information with peers in Washington and amplify our message.)

This year, however, we are working even more closely with ALA leadership and the Association’s state Chapters to “raise the heat” on every Member of Congress to tangibly – not just broadly or rhetorically – support libraries. Specifically, at OGR’s request, ALA President Julie Todaro has conducted hour-long “call to arms” conference calls with all ALA Division Presidents, Executive Directors, caucus leaders, the Executive Board and all 51 Chapter Presidents to brief them on the current threats to library funding and how they can help mobilize library supporters to save IMLS and library funding.

At President Todaro’s request, we have even built a new one-stop-shopping webpage called “Fight for Libraries!” where library advocates can quickly contact Congress, sign up for alerts, share their own stories of what LSTA and IAL have meant in their libraries and communities, learn how to write a quick letter to the Editor of a local paper and access a wealth of other advocacy resources.

We are also coordinating closely, as always, with the Chief Officers of State Library Agencies and have – with special and invaluable help from Cengage Learning, Inc. – launched a new pro-library business coalition called “Libraries Mean Business” to help make libraries’ profound value to society and the economy clear to all Representatives and Senators.

IMPORTANT: All of these efforts are focused for starters on first getting as many Members of the House of Representatives – and soon thereafter as many Senators as possible – to sign two incredibly important letters, one supporting funds for LSTA and the other IAL, that will be delivered to their colleagues on the House and Senate Appropriations Committees. The number of Members of Congress who sign these two “Dear Appropriator” letters can spell the difference between LSTA and IAL being eliminated or left unscathed by the budget ax!

4. Why are the Dear Appropriator letters so important?

Every year, ALA makes a strong push in March/April to ALA members to urge Representatives and Senators to sign the letters in support of LSTA and IAL (two letters in each chamber). The more signatures these letters gain—and the more bi-partisan— the stronger the message to the Appropriations Committees that these programs have wide support. A powerful Senator or Representative will ask the staff to find a program to cut to benefit their favored program. Staff has been known—off the record of course—to look at how many signatures in support various programs have garnered. Cutting a program with 15 signatures means fewer upset members than axing a program backed by 75 or 100 Members of Congress. It’s that simple.

3. Are the letters more important than ever this year?

YES! YES! YES! Need we say more? OK, we will. The President has sent a message to Congress that LSTA and IAL are not important, but we and our members know just how vital these funds and programs are. With so many programs on the chopping block, supporters of programs of all kinds are trying to ward off cuts with as many signatures as they can on their own letters. We have to win that competition. The letters for LSTA and IAL have always received solid support, but in the face of especially stiff headwinds, we need to double the number of LSTA and IAL Dear Appropriator signers that we most recently received.

2. So, specifically what can ALA members and other library lovers do to save LSTA and IAL?

Whether you can spare 5 minutes or 5 hours to help save $210+ million, here’s how you can get involved…today:

1. The #1 thing you need to know about the fight to save LSTA and IAL funding?

It will not succeed without you – and the friends, colleagues, neighbors, patrons, customers, relatives and complete strangers that you actively recruit – getting involved TODAY and staying involved all through 2017. Without that sustained commitment on a scale that we have never achieved before, we will not succeed at a cost much greater than money to the hundreds of millions of people and businesses who depend on libraries.

Don’t let them down. Act now!

The post Top 10 things to know (and do) about saving library funding appeared first on District Dispatch.

LITA: Call for Writers

planet code4lib - Thu, 2017-03-30 15:54

The LITA Blog wants your content. Are you an avid technologist? An early career librarian? Are you a coder, an explorer, or just a dabbler? All of these perspectives are welcome.

We are big on creativity and embracing new post formats, and 2016 saw the first LITA Vlog series, Begin Transmission. Whether you want to write a traditional blog post, create a podcast, host a Twitter chat, or stream a post with Periscope, we are eager to see what you make. Possible post formats could include interviews, how-tos, hacks, and beyond. Your posts can have an immediate impact, and you’ll find an audience of readers in the thousands. 

We embrace diverse formats and diverse voices. Library students and members of underrepresented groups are particularly encouraged to apply.

Writers contribute one post per month. A buddy system means you’ll be paired with an experienced blogger to help you draft and review your posts. You’ll join an active, supportive, and talented cadre of writers on whom you can rely for feedback. The average time commitment is between one and two hours a month.

If that sounds like too much of a time commitment, our guest contributor option may be a good option for you.

To apply, send an email to lindsay dot cronk at gmail dot com by Friday, November 11th. Please include the following information:

  • A brief bio
  • Your professional interests, including 2-3 example topics you would be interested in writing about
  • If possible, links to writing samples, professional or personal, to get a feel for your writing style

Send any and all questions my way!

Lindsay Cronk, LITA blog editor


Open Knowledge Foundation: Open Data Day Sudan 2017: Openness for Advancing Research and Discovery

planet code4lib - Thu, 2017-03-30 13:00

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open Research theme.

Open Sudan’s Open Data Day event was a conference that took place in Khartoum on the 11th of March. The event was held on this date to ensure inclusiveness and more representation as it requested by many local institutions.

The conference gathered stakeholder groups’ and representatives from the academia to debate and study the future of research practice in Sudan with a great emphasis on the use of open data. The event also introduced the concept of open research data as a tool to improve research communication methods in Sudan.

We had discussions on and assessed the local research behaviour, the environment and available infrastructure to determine the need for the implementation of such policies. The conference provided a significant space for networking and collaboration with the objective of stimulating the progress of Sudanese research data output leading to the acceleration of discoveries.

In addition to our goals in convening an event that sought to foster knowledge development and intellectual exchange, we worked towards creating an inclusive, innovative, and empowering event by inviting 85 representatives from different academic institutions, research centres, government agencies, research groups and civil society organisations.

We also invited a number of talented young researchers who do not only have fresh views but have a profound knowledge, experience and vision in the area of scientific research.

The conference held a one-hour exhibition with some of the very impacting groups and organisations in the area of research promotion. This featured some exceptional student groups as well.

Exhibitors were able to present their projects in advancing research practice locally with their use of open data in research. They could introduce their related future projects and form collaborations with one another.

One of the speakers – Prof. Ahmed M Musa from the Institute of Endemic Diseases – spoke about his project that publishes Leishmania-related research data of Sudan openly. He showed how beneficial this practice would be for the advancement of leishmania related research.

One major outcome from this talk was shedding the light on the importance of open data in medical research in particular and scientific research in general.

A large number of young career researchers who attended the conference are now committed to opening their research and data after excellent talks from Dr. Iman Abuel Maaly and Dr. Rania Baleela. Some have also expressed their interest in advocating for openness among their peers. Conversations around potential future collaborations between different early career researchers’ groups took place throughout the conference.

Open Knowledge Foundation: Gamification of Open Data to Empower School Learners in Pretoria

planet code4lib - Thu, 2017-03-30 10:00

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

The Centre for Geoinformation Science (CGIS), the University of Pretoria in South Africa, hosted an event on 4 March 2017 for school learners to celebrate International Open Data Day. 

The aim was to introduce school learners to South African open data on public spending through gamification. This provided the learners with an interactive way of engaging with the data and exploring national budgets and spending.

The day began with a brief introduction to open data and it benefits to our local community and South Africa as a whole. Gamification was used to encourage active participation during the event, by providing the learners with questions relating to public funds and spending. For example, participants could exchange points for hints to assist in answering the questions. For the quiz, 15 questions based on various provided datasets were developed. Datasets were acquired from the South African National Treasury and the Code4SA open data portal. Below are some examples of questions asked:

  1. Which University had the highest income in 2014?
  2. How much was the Chief Justice paid in the year 2014?
  3. What was the average salary per month before tax of a member of Parliament in 2014?

The open source TCexam computer-based assessment system was used for the quiz, allowing a user-friendly system.


Working with computers was a highlight for the learners, as they did not only learn about open data but also had the opportunity to display their computer literacy skill. CGIS students assisted learners during the day with any software issues and provided additional guidance during the quiz.

The day ended with prizes for the top 5 groups. The prizes included scientific calculators, notebooks and other stationary. The general feedback from the learners was very positive and they were generally excited about open data and its many possibilities in helping deal with societal, political and economic issues. 

A short video on the day is available below 

We would like to thank the sponsors of this event, Open Knowledge International, Hivos and Article 19. Without their support, this event would not have been possible.


District Dispatch: She is our rock star

planet code4lib - Thu, 2017-03-30 03:24

Dr. Carla Hayden speaking to more than 3,000 librarians the 2017 Association of College and Research Libraries (ACRL) conference in Baltimore.

The grand finale to the informative (and well-attended) 2017 Association of College and Research Libraries (ACRL) conference was the closing keynote address by Dr. Carla Hayden. Dr. Hayden arrived 45 minutes early to individually greet librarians queued in the aisles with smart phones in hand for a photo with the Librarian of Congress. For the more than 3,000 librarians in attendance, Hayden’s comments about her plans to make the Library of Congress “America’s Library,” to greatly increase access to the treasures of the Library and to open the Library on Sundays (gasp!) struck a chord. One excited librarian said: “She is our rock star.” Yes, she is.

Already a cultural hero for keeping the Enoch Pratt Free Library branch open to the public during the Freddie Gray riots in Baltimore, for standing up to U.S. Attorney General Ashcroft and the surveillance provisions of the USA Patriot Act and for mentoring scores of librarians throughout her career, Dr. Hayden’s humility shone through when she talked about her decision to serve as the Librarian of Congress. It was not an easy decision. She wondered if she could affect change in the lives of people every day, like she felt she could as a librarian. She wanted to maintain her public spirit in this new role and not be just a figure head who raised funds and attended posh dinners. She wanted to serve.

Hayden accepted her appointment with the goal to increase access to the Library’s collections across the nation through digitization and by making the Library more public facing. You still won’t be able to borrow books, but you will soon see the some of the largest collection of comic books at Comic Con because the Library will be there. Accessibility to the National Library Service for the Blind and Handicapped will greatly increase access for people with disabilities. The Library will be a service hub to the Digital Public Library of America. And yes, the Library will open for storytelling.

Dr. Hayden encouraged the audience to revel in the public’s perception of librarians as trustworthy in the era of fake news. She repeated a conversation she had with a library patron who said that at present, “librarians are having a moment.” Whether you call it cyber hygiene or information literacy, librarians are having a moment to make the world a better place

During the Q-and-A session, a retired librarian who attended the first ACRL conference in 1978 and many thereafter said Dr. Hayden was the liveliest speaker that ACRL has ever had and that her presence was inspirational and invigorating. It was that kind of closing session—smiles, laughter, a couple of standing ovations and a group of librarians dancing on the stage as the audience left the auditorium.

The post She is our rock star appeared first on District Dispatch.

District Dispatch: The next move for net neutrality

planet code4lib - Thu, 2017-03-30 02:27

Threats to net neutrality continue to be a serious concern for libraries, concurrent with the extraordinary pressures around the Institute of Museum and Library Services, created by the President’s Budget Blueprint. (For more on the FY 2018 appropriations process, please go to Fight for Libraries!, a new advocacy portal.) Today, however, the American Library Association (ALA) and the Association of College & Research Libraries (ACRL) made our next move to defend the hard-won Open Internet Order adopted by the Federal Communications Commission (FCC) in 2015.

In a letter to FCC Chairman Ajit Pai, Commissioners Mignon Clyburn and Michael O’Rielly, ALA and ACRL – joined by eight other higher education and library organizations – articulated net neutrality principles that should form the basis of any review of the FCC’s 2015 Open Internet Order. The letter was also hand-delivered by ALA’s Office of Government Relations today to the leaders of Congress’ Commerce Committees in both the Senate (Senators John Thune (R-SD) and Bill Nelson (D-FL)) and the House (Representatives Greg Walden (R-OR-2) and Frank Pallone (D-NJ-6)), and shared with other key Members of Congress.

This week’s letter follows hard on the heels of a broader but similar letter sent on March 7 to Chairman Pai and his colleagues, signed by ALA and 170 other organizations.

In a news release regarding our most recent communication to the Commission, ALA President Julie Todaro stated:

“America’s libraries collect, create, curate, and disseminate essential information to the public over the Internet, and enable our users to build and distribute their own digital content and applications. Network neutrality is essential to ensuring open and nondiscriminatory access to information for all. The American Library Association is proud to stand with other education and learning organizations in outlining core principles for preserving the open Internet as a vital platform for free speech, innovation, and civic engagement.”

ACRL President Irene Herold affirmed:

“In the modern era, the Internet is the primary open platform for information exchange, intellectual discourse, research, civic engagement, teaching, and learning. College and university libraries are prolific providers and users of content, services and applications in which a privileged—or non-neutral—Internet would create a significant detrimental barrier. Having this innovative content openly available upholds our values of academic freedom and serves the public interest and common good.”

We appreciate the engagement of ALA and ACRL members on this issue. We will need even stronger interest and support in the coming weeks. Hope to see you at National Library Legislative Day.

The post The next move for net neutrality appeared first on District Dispatch.

David Rosenthal: EU report on Open Access

planet code4lib - Wed, 2017-03-29 07:28
The EU's ambitious effort to provide immediate open access to scientific publications as the default by 2020 continues with the publication of Towards a competitive and sustainable open access publishing market in Europe, a report commissioned by the OpenAIRE 2020 project. It contains a lot of useful information and analysis, and concludes that:
Without intervention, immediate OA to just half of Europe's scientific publications will not be achieved until 2025 or later. The report:
considers the economic factors contributing to the current state of the open access publishing market, and evaluates the potential for European policymakers to enhance market competition and sustainability in parallel to increasing access. Below the fold, some quotes, comments, and an assessment.

The report acknowledges that this is a strange "market":
It has long been recognised that the scholarly publishing market has, in economic terms, an unusual relationship between sellers and buyers. Essentially a supply-driven market, scholarly publishing serves the needs of researchers engaged in:
  • Conducting research
  • Writing publications
  • Reviewing the quality of other research, and
  • Constituting the main readership for scholarly work.
However, while researchers are both producers and consumers of scholarly publications, their purchase is typically undertaken by academic libraries. Under the dominant subscription model (variously described as 'reader-pays' or 'toll-access'), this results in an 'intermediated market' which weakens the price sensitivity of consumers, be they authors or readers.Their estimates of the current total market and the open access part are:
The market for scholarly journals alone is worth some $10 billion per year, with scientific, technical and medicine (STM) publications accounting for the vast majority of this figure.
The global open access market is approaching $500 million in size, but accounts for only 5% of the journals market. The proportion of immediate open access content is substantially higher, at almost 17% of global articles in 2014. The wide discrepancy between open access's share of revenues and articles reflects both the use of non-market based mechanisms to deliver open access content, and the lower cost of open access publication. Between a quarter and a third of this, $2-3 billion, is siphoned off into the profits of the publishers which, as the report underlines, means that resistance to change is fierce.

The report identifies four types of open access:
  • Open access archiving ('Green OA') - the practice of archiving a version of an article for free public use in an institutional or subject repository.
  • Gold-Hybrid - peer-reviewed articles within a subscription-based journal are made immediately open access, typically on payment of an article publication charge (or APC) to the publisher or through an offsetting agreement.
  • Gold-APC - publication in journals that make all of their content OA via payment of an APC, and do not rely on subscriptions.
  • Gold no-APC - publication in fully open-access journals which do not charge an APC.
Any large-scale move to these models would reduce the $2-3 billion significantly:
The commercial incentives for subscription publishers to move to APC-based open access remain weak. Mean APCs are approximately €1,500 (Gold-APC) and €2,500 (Gold-Hybrid), while average subscription revenues are double this, at €4- 5,000 per article. Open access would also jeopardise licensing revenues and corporate subscriptions, estimated at some 20% of STM publishers' current income. Recent initiatives aimed at repurposing existing library subscription budgets for open access, such as the Open Access 2020 movement, assert that there is sufficient money in the system to make the transition. Publishers have also recognised the opportunity for OA to generate additional revenue streams. However, unless the gap between per article revenues under the OA and subscription models narrows significantly, or threats to the sustainability of the subscription model increase, progress towards a large-scale transition is likely to remain slow. Thus, absent some major intervention, it won't happen. What does the world get in return for the $10 billion per year?
In most cases, researcher-authors freely transfer copyright in their work to publishers, or grant them an exclusive right to publish the final version of their manuscript, also known as the 'version of record'. In return, journals perform four auxiliary but essential functions of scientific communication:
  • Registration: establishing the author's precedence and ownership of an idea
  • Dissemination: communicating the findings to its intended audience
  • Certification: ensuring quality control by managing the peer review process
  • Archival record: preserving a fixed version of the paper for future reference and citation.
The report recommends six main areas of intervention::
Our work has identified six main roadblocks to open access that should be addressed through appropriate policies and measures. These are:
  1. Weak author incentives for open access: The single greatest barrier to wider uptake of open access is cultural resistance within the academic community. Until there are sufficient incentives for researchers to actively choose open access publication and archiving, demand will remain muted - and publisher support for open access will mirror this. If Europe is serious about increasing access, then its mechanisms for research assessment, grant funding, academic promotion, and institutional funding need to reflect this.
  2. Unclear route to transition for subscription publishers: The gap between per article revenues under a subscription model and those available under an APC or Gold no-APC model still appears unbridgeable for many commercial and society publishers. More must be done to show that the transition can be made without irreparable damage to publishers' business models - whether through offsetting mechanisms, acceptance of higher APCs, or increased adoption of Gold no-APC models like FAIR OA. However, this must be accompanied by increased expectations of the service provided by publishers, including licensing and machine-readability.
  3. Lack of transparency in the market: The lack of transparency in the subscription market compounds the problem of journal non-substitutability, and results in a dysfunctional market which serves neither researchers, institutions nor the public interest effectively. Piecemeal attempts to improve transparency through Freedom of Information requests, often undertaken only by students and grassroots activists, must give way to a concerted policy-led effort to deliver transparency and improve competition in both the subscription and pure open access markets.
  4. Disparate national and disciplinary contexts: There is no single pathway to open access that finds support from a clear majority of stakeholders. Therefore the adoption of a pathway should not preclude also adopting other OA strategies and pathways. Different approaches are needed depending on the national and disciplinary context, and policy interventions must therefore promote and enable flexibility. Funders and institutions need to proactively support not only the entry of new players into the market, but also their development at scale.
  5. Suboptimal infrastructure: The administrative burden associated with open access models remains too high for all stakeholders - whether authors, institutions, publishers or funders. Improved processing, payment, deposit and reporting mechanisms, built on common standards and infrastructure, are needed to allow open access to scale rapidly and efficiently.
  6. Inadequate monitoring and reporting: Europe's ability to track progress and assess the effect of interventions in the market is greatly inhibited by fragmented and underdeveloped monitoring and reporting mechanisms. Investment is needed in standards and tools to track compliance with open access policies, reliably determine the aggregate proportion of the scientific literature which is available in open access form, and monitor sustainability on both the demand and supply-side.
Despite all this valuable content, in my view the report suffers from three significant flaws, one fairly minor and two really major.

The minor one is the issues of what "open access" actually means and whether Gold OA delivers it. The report writes:
The European Commission has defined open access (OA) as 'the practice of providing on-line access to scientific information that is free of charge to the user and that is re-usable'.

Legally binding definitions of 'open access' and 'access' in this context do not exist, but authoritative definitions of open access can be found in the Budapest Declaration and the Berlin Declaration. These definitions define 'open access' as including not only basic elements such as the right to read, download and print, but also the right to copy, distribute, search, link, crawl, and mine. This is commonly achieved through application of a Creative Commons Attribution License (CC-BY). The report fails to observe that, while these may be the definitions of OA for the EC and OA advocates, these are not the definitions of OA in use by the oligopolistic publishers. As Poynder has observed some are using their own bogus "OA" licenses which are far more restrictive than CC-BY and specifically exclude most of the rights of the Budapest and Berlin Declarations, those necessary for the EC's goal that information be "re-usable".

Even when they do label content CC-BY, they can place technological barriers in the way of attempts to "copy, distribute, search, link, crawl, and mine". Publishers' ability to place such barriers is about to be greatly increased by the advent of Web DRM.

In addition, the Wellcome Trust and others have identified many cases in which an APC was charged but the article remains behind a paywall, illustrating the problem that neither the author nor the publisher have any interest in policing compliance. The report's economic analysis thus skewed, because the APCs paid to publishers using poor compliance, bogus "OA" licenses or technological protection measures do not obtain the benefits they assume. Not to mention the many tricky legal issues with the use of CC-BY that Poynder identifies.

The first major flaw is that the report not merely fails to assess the value obtained in return for APCs, but more importantly fails to assess the value obtained in return for the much larger subscription payments. In each of the four "auxiliary but essential functions" the value obtained has been rapidly decreasing:
  • Registration: the long, variable and unpredictable delays in the publication pipeline mean that submission first is no guarantee of publication first. For a clear example, see DNA as a storage medium. A friend whose paper in Nature took 22 months from submission to publication wondered "whether one of the referees had been trying to delay it so they could get something out"; gaming the system in these and similar ways is endemic.
  • Dissemination: the ineffectiveness of the current system at "communicating the findings to its intended audience" is aptly illustrated by the large proportion of "convenience" accesses via Sci-Hub documented by Kramer: Overall, 75% of Utrecht Sci-Hub downloads would have been available either through our library subscriptions (60%) or as Gold Open Access/free from publisher (15%). In so far as these downloads were requested by academic users (i.e. affiliated with Utrecht University), use of Sci-Hub for these materials could be seen as ‘convenience’. For the remaining 25%, Sci-Hub use could be seen as a way of getting access to articles that are not freely available through publishers. As I wrote in Elsevier and the Streisand Effect:What the readers want is a single portal, preferably Google, that provides free, instant access to the entire corpus of published research. ... Sci-Hub is as close as anyone has come to providing what the readers want. None of the big publishers can provide it, not merely because doing so would destroy their business model, but also because none of them individually control enough of the content. And the publishers' customers don't want them to provide it, because doing so would reduce even further the libraries' role in their institutions. No-one would need "personal shoppers who moonlight as border guards". These are all problems even if you do not (as the EU does) believe that the public, which pays for the research, has no right to access the results.
  • Certification: It has been clear for many years that the publishers aren't "ensuring quality control by managing the peer review process". They have abdicated their role as gatekeepers by proliferating journals, so that the question is no longer whether a paper is published, only where it is published. The peer-reviewed research on peer-review shows that the journal in which it was published provides no information as to the quality of the paper. But the higher the bogus "impact factor" of the journal, the more likely the paper is to be retracted.
  • Archival record: As I've recently written, the idea that the owner of some content is best suited to preserve it is a counsel of despair. Not merely do paywalls make it impossible for archives to preserve journals without permission, the DMCA makes it impossible for them to preserve non-CC open access content, or content carrying CC-licenses protected by even ineffective DRM. The publishers will not allow archiving by entities over which they lack adequate control, because that risks their $2-3 billion per year profits. In addition, publishers currently personalize, watermark and re-format content that they do deliver, making it extremely difficult to establish that the content eventually received by readers is the same as that approved at the end of the review and editing process.
These deficiencies add another major economic argument in favor of OA, in that the vast sums being spent on non-OA content are not providing the benefits the recipients claim.

The other major flaw is that the report completely fails to address the fundamental role that copyright ownership and transfer plays in the multifarious problems of academic publishing. The sole mention of anything related that I could find was in Figure 8, where "Enable retention of copyright by author" was one of a laundry list of "Recommendations on promoting the transition to open access (sourced from 20 published studies)". The report fails to understand that it is precisely copyright ownership by authors that is at the root of the problems they describe, and that the problems will not be solved while authors retain copyright.

The tradition and practice, of somewhat dubious legality, that copyright in works by faculty and staff belongs to the authors rather than to the institution or the funder of the research ensures that the interests of the institution, the funders, and the public are never represented in the fundamental transaction of academic publishing, which is between one or more individuals and a corporation (the publisher):
  • The individual authors bear none of the economic or societal costs of their decisions. Even if they could not divert project funds to pay APCs, the career benefits to be obtained by publication would persuade them to pay out of their own pocket. They gain no benefit from negotiating with the publisher to refuse copyright transfer or exclusivity in licensing.
  • The publishers have massive incentives to, and vast resources to devote to, ensuring that they do have copyright ownership or license exclusivity. Even when they have been paid an APC, they can simply ignore the OA requirement, redefine it to mean exclusivity, or use technological means to implement exclusivity. Expecting authors, even if they had motivation to do so, to emerge victorious from such an unequal negotiation is futile.
There are only two effective means of ensuring open access to future works by publicly- or charitably-funded research. Either:
  • The author's institution must change the terms of employment for faculty and staff to make it clear that either their writings are works-for-hire, and thus the copyright belongs to the institution, or that these writings not copyrightable, as is the case for works by employees of the United States. Or:
  • Funding institutions must change the terms and conditions of their grants to make it clear that copyright in works derived from the grant belongs to the funder.
In neither case would authors' signatures on copyright transfer or license forms be valid. In both cases the institutions that bear the economic and societal costs would be across the table from the publishers, and would be both motivated to, and able to, prevent the publisher acquiring exclusivity. Without exclusivity, the publishers' extortionate margins could not be sustained.

Ed Summers: CL

planet code4lib - Wed, 2017-03-29 04:00

Corpus Linguistics (CL) involves the study of language using machine readable texts or corpora. It’s interesting to run across CL in the context of discourse analysis. I first ran up against corpus linguistics as an undergraduate student, where I remember writing a term paper using a printed concordance to analyze the use of the word “orange” in Hemingway’s A Farewell to Arms. I don’t quite remember what I concluded there, but I do remember thinking how interesting it was that there were books that contained lists of words that were in other books. I guess nowadays there might be quite a bit of overlap between corpus linguistics and computational linguistics – which relies heavily on corpora.

CL in the context of Discourse Analysis focuses on written and spoken text, and rests on a particular theory that sees language variation as systematic, functional and tied to a particular social context. I think this view is fundamental to discourse analysis and is shared by the other types of discourse analysis we’ve looked at this semester such as pragmatics, conversational analysis, ethnography of communication, critical discourse analysis, interaction sociolinguistics. But what is different in corpus linguistics is the attention to the corpus, both the way it is assembled and the way that it is used. CL also is distinguished by taking a quantitative or statistical approach to the study of language. It is not purely statistical however, since the interpretation of words and their significance often involves social context and cultural factors.

While there are several examples of well known large corpus datasets (BASE, BAWE, TOEFL, LSWE) they don’t necessarily have to be super large. Specialized corpora focused in specific areas can be very useful, even if they don’t contain many documents. Results from specialized corpora can also be compared to results from larger corpora for context. Paltridge (2012) outlines some key issues to keep in mind when building a corpus of text:

  • authenticity: what language and/or dialects are present in the corpus. Which ones are missing? How does this impact the types of questions that can be asked of the corpus?
  • time: What time period are they taken from? How often is it updated? How does this influence the types of questions that can be asked?
  • size: the number and length of documents needed can vary depending on the research question. Making assertions about a larger population requires a sample that reflects the size of the population of texts. Is the size of that population even known?
  • balance: sampling should reflect the distribution of texts within the corpus

Some examples of structures that researchers have looked for in corpora of written text include:

  • non-clausal units: utterances that lack a subject or verb
  • personal pronouns and ellipsis: where items are left out of conversation because they are part of the context
  • repetition: words that repeat in particular types of conversation to add emphasis
  • lexical bundles: formulaic multi-word sequences

Corpora can also be made of text from conversation, where researchers can look at:

  • pauses: gaps in utterances and the flow of conversation
  • prefaces: patterns for introducing conversation
  • tags: phrases added to the end of utterances, such as questions, or repeating
  • concepts with different wording
  • informality or casualness of words

Maybe I’m not being imaginative enough but it seems like it could be difficult to locate some of these features automatically using a computational methods. Identifying repetitions and lexical bundles seems like it could be fairly easy once the text has been modeled as n-grams and collocation statistics are generated so that they can be browsed. But to programmatically identify pauses seems to require some kind of pre-existing markup for the time gaps like what we saw in CA transcription. I guess these could be determined by a computer if audio recordings are available and digitized. But it seems like it could be difficult to identify turns in conversation (changes in speaker), and when the pause occurs there or within the flow of a turn. Also identifying where contextual features are being elided in conversation seems like it would require some degree understanding of an utterance, which is notoriously difficult for computers (McCarthy & Hayes, 1969).

However some of the features CL enables you to study are quite down to earth and useful. Collocation looks at what words appear most in particular texts, and across genres. Analysis can be top-down where a particular discourse structure is identified beforehand and then the corpus is examined looking for that structure. It can also be bottom-up where lower level shifts and repetitions in word usage are used to identify discourse structures. I imagine that a given study could cycle back and forth between these modes: bottom up leading to discourse structure that is then examined top-down?

The main criticisms of corpus studies is that its quantitative focus on words, tends to reduce the focus on the social context that the words are used in. Ways of countering this are to do qualitative interviews and surveys to provide this extra dimension, which is what Hyland (2004) did in his study of academic discourse.

Tribble (2002) offers a framework for using corpus studies to look at contextual features such as: cultural values, communicative purpose and grammatical features that are stylistically salient. The framework works better on a corpus that has a genre focus than a register focus – or one that is more tightly scoped. The difference between register and genre seems significant and a little bit difficult to grasp. I guess it’s a matter of scale or abstraction– but also of materiality perhaps? A genre suggests a particular embodiment of text. It could be a useful distinction to make if I am going to study collection development policies which are embodied in a particular way as documents on the Web.

Mautner (2016) has several interesting things to say about using CL as a methodology in support of Critical Discourse Analysis (CDA).

  • CL is built on a theoretical foundation that positions language variation as systematic, functional and tied to social context (see Firth)
  • CL allows for the analysis of larger amounts of data
  • CL provides a different view of data that can be useful in triangulation.
  • CL can bring some measure of quantification that can temper potential researcher bias and subjectivity.
  • CL provides some methodological for qualitative analysis of data, such as browsing collocational information.

If words, or lexical items, are an important measure in your research then CL is a good tool to use. In many ways it allows CDA researchers, who are usually focused beyond the text, to ground their analysis on the text itself.

Matuner points out how collocation lists with their t-score and Mutual Information (MI) score can be used in CDA. The MI score is a ratio that measures the observed number of co-occurring words against the expected number of co-occurring words. The expected number is known because you know how many times a word occurs in the entire text. The t-score is a complementary statistic that weighs the MI score based on the number of times the word appears in the corpus.

CL tools allow analyst to scan and examine word lists, or examples of co-ocurring words which can suggest qualitative or contextual factors. CDA encourages people not to jettison context in making texts machine readable. Features such as textual layout and emphasis, or video/audio from recorded speech are important to retain.

The concept of saturation, that is foundational in qualitative methods is somewhat at odds with CL, because the strength of how common some features are is important to measurements such as MI and the t-score. So saturation artificially limits them. Ideally you want to sample the entire population of texts. I guess this presupposes that the texts need to be digitized in some way. At any rate some judgment about the entire population of texts needs to be made, and this is a really important decision to be made in CL studies.

It is interesting to see that a skill-gap was identified as a criticism of CL. It reminds me of criticisms of digital humanities. Realistically I imagine the same critique could be directed at CA or EC as conversation transcription and field studies are learned skills. Disciplinary boundaries between computational and social scientists seems a little irrelevant in the deeply interdisciplinary space that we’re in now – at least in most information studies departments.

I thought the criticisms of the embarrassment of riches that the Web offers to be really fascinating, especially in light of how I’ve been collecting Twitter data. Texts should still be reflexively selected because they speak to the research being performed, and should not simply be slurped up mindlessly just because it’s easy to do. The nature of the documents that make up the corpora, and the means and manner in which they were selected are of key significance. It’s also important to remember that context still matters, and that some patterns in text will be invisible to the quantitative measures provided in CL.


Hyland, K. (2004). Disciplinary discourses: Social interactions in academic writing. University of Michigan Press.

Mautner, G. (2016). Methods of critical discourse studies. In R. Wodak & M. Meyer (Eds.),. Sage.

McCarthy, J., & Hayes, P. J. (1969). Some philosophical problems from the standpoint of artificial intelligence. Readings in Artificial Intelligence, 431–450.

Paltridge, B. (2012). Discourse analysis: An introduction. Bloomsbury Publishing.

Tribble, C. (2002). Corpora and corpus analysis: New windows on academic writing. Academic Discourse, 131–149.

District Dispatch: An appropriations update from the field

planet code4lib - Tue, 2017-03-28 22:09

After almost a week of activity around ALA’s intensive appropriations campaign and less than a week until the House “Dear Appropriator” letter deadline, we wanted to check in and let you know how things are going.

As you may remember, this campaign is part of our annual push to get as many Members of Congress to sign “Dear Appropriator” letters supporting the Library Services and Technology Act (LSTA) and the Innovative Approaches to Literacy (IAL) program. These programs add up to over $210 million in library funding that meets a variety of needs throughout the library world. (For more information about LSTA, IAL, and the appropriations process, check out the recent article written by Committee on Legislation Chair Ann Dutton Ewbank.)

The good news is that the campaign has received a lot of support and interest from the library world and beyond. In fact, as of March 28, 2017, over 6,300 emails to 426 (out of 435) Representatives have been processed through the ALA Action Center. This number does not include anyone who sent their emails through one of the ALA State Chapter sites, or who sent their email independent of the Action Center, so we hope this number is only a small reflection of the number of contacts Congress has likely received about the “Dear Appropriator” letters. Additionally, 8,949 tweets have been posted using the #SaveIMLS tag, and 224 of those tweets were specifically directed at Representatives through the Action Center.

The bad news is that we still have a long way to go. In response to this action, there are currently 98 signatures on the IAL letter (we had 124 last year), and 42 on the LSTA letter (88 last year).  Keep those calls and emails going!

We have until April 3rd to get signatures from the House. If you want to know if your Representative has signed the LSTA and IAL letters, check out this handy database. If your Representative hasn’t signed, give them a call and ask them! And if they have, a thank you call or email never hurts.

Time is running out to get the support we need for LSTA and IAL. Head over to the ALA Legislative Action Center to find talking points and email templates, or visit our friends at and use their script to make a call!

The post An appropriations update from the field appeared first on District Dispatch.

LibUX: Listen: Google makes JPEGs faster, and the Digital Divide persists (5:44)

planet code4lib - Tue, 2017-03-28 21:09

W3 Radio is a bite-sized newscast recapping the week in web design in 10 minutes or less. It is exclusive as a pilot project to LibUX Patreon supporters – for a little while longer, anyway. We are super close to making this a real thing. So, if you’re in a position to support LibUX, access to these podcasts is just $5 per month. Your support goes a long way. | Support

Download on Patreon.

Jonathan Rochkind: Monitoring your Rails apps for professional ops environment

planet code4lib - Tue, 2017-03-28 16:25

It’s 2017, and I suggest that operating a web app in a professional way necessarily includes 1) finding out about problems before you get an error report from your users (or at best, before they even effect your users at all), and 2) and having the right environment to be able to fix them as quickly as possible.

Finding out before an error report means monitoring of some kind, which is also useful for diagnosis to get things fixed as quickly as possible. In this post, I’ll be talking about monitoring.

If you aren’t doing some kind of monitoring to find out about problems before you get a user report, I think you’re running a web app like a hobby or side project, not like a professional operation — and if your web app is crucial to your business/organization/entity’s operations, mission, or public perception, it means you don’t look like you know what you’re doing. Many library ‘digital collections’ websites get relatively little use, and a disproportionate amount of use comes from non-affiliated users that may not be motivated to figure out how to report problems, they just move on — if you don’t know about problems until they get reported by a human, the problem could have existed for literally months before you find out about them. (Yes, I have seen this happen).

What do we need from monitoring?

What are some things we might want to monitor?

  • error logs. You want to know about every fatal exception (resulting in a 500 error) in your Rails app. This means someone was trying to do or view something, and got an error instead of what they wanted.
    • even things that are caught without 500ing but represent something not right that your app managed to recover from, you might want to know about. These often correspond to erorr or warn (rather than fatal) logging levels.
    • In today’s high-JS apps, you might want to get these from JS too.
  • Outages. If the app is totally down, it’s not going to be logging errors, but it’s even worse. The app could be down because of a software problem on the server, or because the server or network is totally borked, or whatever, you want to know about it no matter what.
  • pending danger or degraded performance.
    • Disk almost out of capacity.
    • RAM excessively swapping, or almost out of swap. (or above quota on heroku)
    • SSL licenses about to expire
    • What’s your median response time? 90th or 95th percentile?  Have they suddenly gotten a lot worse than typical? This can be measured on the server (time server took to respond to HTTP request), or on the browser, and actual browser measurements can include actual browser load and time-to-onready.
  • More?

Some good features of a monitoring environment:

  • It works, there are minimal ‘false negatives’ where it misses an event you’d want to know about. The monitoring service doesn’t crash and stop working entirely, or stop sending alerts without you knowing about it. It’s really monitoring what you think it’s monitoring.
  • Avoid ‘false positives’ and information overload. If the monitoring service is notifying or logging too much stuff, and most of it doesn’t really need your attention after all — you soon stop paying attention to it altogether. It’s just human nature, the boy who cried wolf. A monitoring/alerting service that staff ignores doesn’t actually help us run professional operations.
  • Sends you alert notifications (Emails, SMS, whatever.) when things look screwy, configurable and least well-tuned to give you what you need and not more.
  • Some kind of “dashboard” that can give you overall project health, or an at a glance view of the current status, including things like “are there lots of errors being reported”.
  • Maybe uptime or other statistics you can use to report to your stakeholders.
  • Low “total cost of ownership”, you don’t want to  have to spend hours banging your head against the wall to configure simple things or to get it working to your needs. The monitoring service that is easiest to use will get used the most, and again, a monitoring service nobody sets up isn’t doing you any good.
  • public status page of some kind, that provides your users (internal or external) a place to look for “is it just me, or is this service having problems” — you know, like you probably use regularly for the high quality commercial services you use. This could be automated, or manual, or a combination.
Self-hosted open source, or commercial cloud?

While one could imagine things like a self-hosted proprietary commercial solution, I find that in reality people are usually choosing between ‘free’ open source self-hosted packages, and for-pay cloud-hosted services.

In my experience, I have come to unequivocally prefer cloud-hosted solutions.

One problem with open-source self-hosted, is that it’s very easy to misconfigure them so they aren’t actually working.  (Yes, this has happened to me). You are also responsible for keeping them working — a monitoring service that is down does not help you. Do you need a monitoring service for your monitoring service now?  What if a network or data center event takes down your monitoring service at the same time it takes down the service it was monitoring? Again, now it’s useless and you don’t find out. (Yes, this has happened to me too).

There exist a bunch of high-quality commercial cloud-hosted offerings these days. Their prices are reasonable (if not cheap; but compare to your staff time in getting this right), their uptime is outstanding (let them handle their own monitoring of the monitoring service they are providing to you, avoid infinite monitoring recursion); many of them are customized to do just the right thing for Rails; and their UI’s are often great, they just tend to be more usable software than the self-hosted open source solutions I’ve seen.

Personally, I think if you’re serious about operating your web apps and services professionally, commercial cloud-hosted solutions are an expense that makes sense.

Some cloud-hosted commercial services I have used

I’m not using any of these currently at my current gig. And I have more experience with some than others. I’m definitely still learning about this stuff myself, and developing my own preferred stack and combinations. But these are all services I have at least some experience with, and a good opinion of.

There’s no one service(I know of) that does  everything I’d want in the way I’d want it, so it probably does require using (and paying for) multiple services. But price is definitely something I consider.

  • Captures your exceptions and errors, that’s about it. I think Rails was their first target and it’s especially well tuned for Rails, although you can use it for other platforms too.
  • Super easy to include in your Rails project, just add a gem, pretty much.
  • Gives you stack traces and other (configurable) contextual information (logged in user)
  • But additional customization is possible in reasonable ways.
  • Including manually sending ‘errors’
  • Importantly, groups the same error repeatedly together as one line (you can expand), to avoid info overload and crying wolf.
  • Has some pretty graphs.
  • Let’s you prioritize, filter, ‘snooze’ errors to pop up only if they happen again after ‘snooze’ time, and other features that again let you avoid info overload and actually respond to what needs responding to.
  • Email/SMS alerts in various ways including integrations with services like PagerDuty

Very similar to bugsnag, it does the same sorts of things in the same sorts of ways. From some quick experimentation, I like some of it’s UX better, and some worse. But it does have more favorable pricing for many sorts of organizations than bugsnag — and offers free accounts “for non-commercial open-source projects”, not sure how many library apps would qualify.

Also includes optional integration with the Rails uncaught exception error page, to solicit user comments on the error that will be attached to the error report, which might be kind of neat.

Honeybadger also has some limited HTTP uptime monitoring functionality, ability to ‘assign’ error log lines to certain staff, and integration with various issue-tracking software to do the same.

All in all, if you can only afford one monitoring provider, I think honeybadger’s suite of services and price make it a likely contender.

(disclosure added in retrospect 28 Mar 2017. Honeybadger sponsors my rubyland project at a modest $20/month. Writing this review is not included in any agreement I have with them. )

New Relic

New Relic really focuses on performance monitoring, and is quite popular in Rails land, with easy integration into Rails apps via a gem.  New Relic also has javascript instrumentation, so it can measure actual perceived-by-user browser load times, as effected by network speed, JS and CSS efficiency, etc. New Relic has sophisticated alerting setup, that allows you to get alerted when performance is below the thresholds you’ve set as acceptable, or below typical long-term trends (I think I remember this last one can be done, although can’t find it now).

The New Relic monitoring tool also includes some error and uptime/availability monitoring; for whatever reasons, many people using New Relic for performance monitoring seem to use another product for these features, I haven’t spent enough time with New Relics to know why. (These choices were already made for me at my last gig, and we didn’t generally use New Relic for error or uptime monitoring).


Statuscake isn’t so much about error log monitoring or performance, as it is about uptime/availability.

But Statuscake also includes a linux daemon that can be installed to monitor internal server state, not just HTTP server responsiveness. Including RAM, CPU, and disk utilization.  It gives you pretty graphs, and alerting if metrics look dangerous.

This is especially useful on a non-heroku/PaaS deployment, where you are fully responsible for your machines.

Statuscake can also monitor looming SSL cert expiry, SMTP and DNS server health, and other things in the realm of infrastructure-below-the-app environment.

Statuscake also optionally provides a public ‘status’ page for your users — I think this is a crucial and often neglected piece, that really makes your organization seem professional and meet user needs (whether internal staff users or external). But I haven’t actually explored this feature myself.

At my prevous gig, we used Statuscake happily, although I didn’t personally have need to interact with it much.


My previous gig at the Friends of the Web consultancy used Librato on some projects happily, so I’m listing it here — but I honestly don’t have much personal experience with it, and don’t entirely understand what it does. It really focuses on graphing over time though — I think.  I think it’s graphs sometimes helped us notice when we were under a malware bot attack of various sorts, or otherwise were getting unusual traffic (in volume or nature) that should be taken account of. It can use it’s own ‘agents’, or accept data from other open source agents.

Heroku metrics

Haven’t actually used this too much either, but if you are deploying on heroku, with paid heroku dynos, you already have some metric collection built in, for basic things like memory, CPU, server-level errors, deployment of new versions, server-side response time (not client-side like New Relic), and request time outs.

You’ve got em, but you’ve got to actually look at them — and probably set up some notification/alerting on thresholds and unusual events — to get much value from this! So just a reminder that it is there, and one possibly budget-conscious option if you are already on heroku.

Phusion Union Station

If you already use Phusion Passenger  for your Rails app server, then Union Station is Phusion’s non-free monitoring/analytics solution that integrats with Passenger. I don’t believe you have to use the enterprise (paid) edition of Passenger to use Union Station, but Union Station is not free.

I haven’t been in a situation using Passenger and prioritizing monitoring for a while, and don’t have any experience with this product. But I mention it because I’ve always had a good impression of Phusion’s software quality and UX, and if you do use Passenger, it looks like it has the potential to be a reasonably-priced (although priced based on number of requests, which is never my favorite pricing scheme) all-in-one solution monitoring Rails errors, uptime, server status, and performance (don’t know if it offers Javascript instrumentation for true browser performance).


It would be nice if there were a product that could do all of what you need, for a reasonable price, so you just need one. Most actual products seem to start focusing on one aspect of monitoring/notification — and sometimes try to expand to be ‘everything’.

Some products (New Relic, Union Station) seem to be trying to provide an “all your monitoring needs” solution, but my impression of general ‘startup sector’ is that most organizations still put together multiple services to give them the complete monitoring/notification they need.

I’m not sure why. I think some of this is just that few people want to spend time learning/evaluating/configuring a new solution, and if they have something that works, or have been told by a trusted friend works, they stick with it.  Also perhaps the the ‘all in one’ solutions don’t provide as good UX and functionality for particular areas they weren’t originally focusing on as other tools that were originally focusing on those areas. And if you are a commercial entity making money (or aiming to do so), even an ‘expensive’ suite of monitoring services is a small percentage of your overall revenue/profit, and worth it to protect that revenue.

Personally, if I were getting started from virtually no monitoring, in a very budget-conscious non-commercial environment, I’d start with Honeybadger (including making sure to use it’s HTTP uptime monitoring), and then consider adding one or both of New Relic or Statuscake next.

Anyone, especially from the library/cultural/non-commercial sector, want to share your experiences with monitoring? Do you do it at all? Have you used any of these services? Have you tried self-hosted open source monitoring solutions? And found them great? Or disastrous? Or somewhere in between?  Any thoughts on “Total cost of ownership” of self-hosted solutions, do you agree or disagree with me that they tend to be a losing proposition? Do you think you’re providing a professionally managed web app environment for your users now?

One way or another, let’s get professional on this stuff!

Filed under: General

Open Knowledge Foundation: How Open Data Can Fight Waste – Open Data Day 2017, Akure, Nigeria

planet code4lib - Tue, 2017-03-28 14:48

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open Data for Environment theme.

This blog has been reposted from

On the 4th of March, 2017 we at Space Club FUTA held an open data day event where we used OpenStreetMap, TeachOSM Tasking Manager, Remote Sensing and GIS tools to map garbage sites in Akure, Nigeria and track the exact location, size and type of waste. The data collected will be handed over to the agency in charge of clean up to help them organise the necessary logistics.

The event started exactly 10:01 am on Saturday, 4th of March, 2017 with a group of a total of 48 participants –  42 students and 6 organisers. By 10:20 am the President of Space Club FUTA Dennis Irorere gave a detailed explanation of the concept of open data. He was assisted by the past president Oniosun Temidayo; who talked about how to acquire data.

We listed the goals of the open data day event during those presentations which were:

  1. What really is open data (for those who know little about it)
  2. Where you can get free and licensed data, to prevent copyright violations
  3. Environmental awareness and how open data can help
  4. Demonstration of how open data can help and make the city cleaner
  5. Keynote presentations to provide examples of open data.

Subsequently, participants were asked to get their hands dirty and actually work with data to solve one or more challenges within the environment sector.

Urban waste is one of the major threats to the global environment in the world today, and it’s also one of the causes of flooding and airborne diseases. As global civilisation keeps improving, there is an increase in commercial, residential and infrastructure development due to the population growth, which has a negative impact on the environment if not properly planned, monitored and managed.

The event was later divided into 4 groups which work on different elements of Open Street Map;

  • Group 1 and Group 2: in charge on digitising the entire building of the area.
  • Group 3: in charge with the task of entring the roads leading to the various area.
  • Group 4: making appropriate edits.

Due to our awareness of the fact that there is a lot of trash lying around, and in order to clean it up, we must first get a proper picture of the situation. This was eventually the main goal of the open data day event. With this project,  the data can give us the exact location, amount and type of garbage. This is essential to organise the logistics and the handling for clean-up by the government, other agencies and NGOs. With the base map we are making reference to we saw some trends, and the location affected by this trash disposal.

We also had an instance in the area we visited where we saw how waste is terribly dumped, which in turns block the drainage. This is how bad the trash disposal in Akure really is.

Urban waste in Nigeria

But that’s not all, perhaps the most important reason to map is to spread the message. If seeing is believing, then looking at a virtual map of overflowing trash sites will help open our eyes to the problem. And, hopefully, inspire people around to join us and do something about it.

Every year, 20 million tons of garbage is added to the ocean, 80% of it from mainland waste. That’s like dumping over 710,000 Boeing 737 aeroplanes into the sea each year. So, what are we going to do about it?

We are all living in an era full of trash, and it’s clear the problem is massive, growing steadily and varies considerably by region. With this urban waste mapping of Akure, better plans and a more structured waste system can be made and enforced. This is one of the major steps to take in promoting a better and improved environment and atmosphere, which is a step in combatting global warming.

The urban mapping of Akure project is still in development phase and is currently online the organiser was excited to see the participants working with the data provided by the government! A website was developed for the club which was aim to help enhance publication and communication. This website has really been helpful to help portray our opinion and show the world what we have been up to.

DPLA: Join the DPLA Community Reps! Application for 5th Class Now Open

planet code4lib - Tue, 2017-03-28 14:45

We are pleased to announce that the application for the fifth class of the DPLA Community Reps program is now open! Learn more below, tell a friend or colleague, and complete the application form to apply by April 24, 2017.

What is the DPLA Community Reps program?

In brief, we are looking for enthusiastic volunteers who are willing to help us bring DPLA to their local communities through outreach activities. A local community could be a school, library, academic campus, professional network, or another group of folks who you think might be interested in DPLA and what it has to offer. Reps give a small commitment of time to community engagement, and check-ins with DPLA staff. We have worked with hundreds of terrific reps from diverse places and professions so far.

Community Reps are…

    • enthusiastic about DPLA. Some reps are broadly interested in what DPLA is up to, while others choose to focus on specific aspects of DPLA’s work, such as our education resources, using the research portal for family research, initiatives around ebooks, or technical development using our API.
    • committed to connecting to their communities. The single most important factor in selection is the applicant’s ability to clearly identify communities they can serve, such as a school, a neighborhood, or a professional network, and plan relevant outreach activities for them.
    • a network of volunteers across the country and around the world. Community Reps generously offer their time to plan at least one DPLA-related outreach activity for their community in the course of one year. We welcome community reps from all fifty states and countries beyond the US.
    • an essential element of DPLA’s outreach network. Community Reps help us connect DPLA with new audiences across the country and around the world. Outreach can take a variety of forms, both in person and online, such as a workshop for teachers, a presentation at a local conference, a blog post series, a hackathon, or a webinar.

To answer general inquiries about what type of work reps normally engage in and to provide information about the program in general, we’re offering an open information and Q&A session with DPLA staff members and current community reps.  

Reps Info Session: Tuesday, April 11 at 5:00pm Eastern. If you would like to join this webinar, please register.

For more information about the DPLA Community Reps program, please contact

Community Reps Application Form


HangingTogether: OCLC Research Library Partners meet in Sydney

planet code4lib - Tue, 2017-03-28 12:00

Last month representatives from twelve of the fifteen OCLC Research Library Partners in Australia, Hong Kong and New Zealand met together at the University of Sydney. Before the two-hour meeting, participants had submitted their “top three priorities”. Most (25 of 36) fell under the rubric of “research collections and support”, for example:

  • Research processes: Addressing researcher/learner gaps and needs; building sustainable services to support researchers; models that focus on proactive outreach and faculty liaisons; new collaborations for better learning environments
  • Collections: Understanding how users utilize the scholarly content provided; capitalizing on the decreasing use of physical collections and relocating low-use print sources off-site to transform library spaces to better meet student and researcher needs; enhancing access to collections through digital services; implementing Open Access policies.
  • Research Information Management: The library’s role in tracking research impact; building better research support infrastructure to capture, disseminate and promote research and its impact; improve support for Excellence for Research in Australia (ERA), Australia’s national research evaluation framework.

Most of the discussion focused on research information management and specifically the library’s role – from advisory (not managing) to various degrees of involvement and collaboration with the university’s research office. Participants use a variety of research information management systems: Converis, Pure, Symplectic Elements and in-house developed systems. These are supplemented by other systems such as DSpace, Figshare, Omeka and VIVO.

It was noted that there are many stakeholders in the research life cycle and no one system offers end-to-end support throughout the research process. Managing and preserving born-digital is considered a particularly thorny issue.

Many of the issues reflect those of OCLC Research Library Partners elsewhere. Indeed, several of our Australian colleagues are collaborating with other Partner staff on the working group led by my colleague Rebecca Bryant to investigate the institutional requirements and activities for adopting research information management systems (see Rebecca’s January 2017 post, An increasing role for libraries in research information management). This work is an example of how the OCLC Research Library Partnership offers a venue for addressing common concerns among geographically-dispersed institutions.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

Mail | Web | Twitter | More Posts (75)

Open Knowledge Foundation: Data to control, Data for participation – Open Data Day 2017 in Chernivtsi, Ukraine

planet code4lib - Tue, 2017-03-28 10:25

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

This blog has been reposted from

It was the meeting of friends who care about the city. We gathered in the cosy 28/33 Home Bar which is situated in historical part of Chernivtsi city (regional centre in the western part of Ukraine). A bit partisan and hipster atmosphere inside and hot drinks took us from spring coldness and brought us to the world of ideas.

The idea of open data is just starting to be talked about in Ukraine. From one side, this worldwide trend is influencing the situation in our country. From another side, there are only a few organisations, cities and groups of people who have enough capacity not only to speak but also to do something about open data. Unfortunately, our city is still in the beginning of this process. Nevertheless, we want to move further.

So we figured out several spheres we could try to start from. First of all, we discussed about the idea that there are two different “groups of data”:

Data to control [national and local government, officials]

Data to participate and to make city more efficient

Data to control

As the great majority of our group were NGO activists, the “control” and “watchdog” functions are important for us. The best examples of data to control we want to work with are the e-declarations, budget spending data and results of voting of local deputies.

Last year we had a big portal of e-declarations launched in Ukraine. All the officials must fill in declarations online which will become open to the citizens. This portal has an open API with data in JSON format. This is a good opportunity to make some tools for automatic analysis of that data. We played a little in this API and decided to make a separate meeting before or after 1st of April deadline to work with declarations of deputies of the local city council. Also, we understood that we need to learn how to work with JSON format.

One more thing that we spoke about was the open contracting and budget spending data. In Ukraine, at this moment two perfect instruments are working: the open contract system Prozzoro and (all the budget transactions are published there). In our company that night we had an admin of the investigative media project Procurement in Chernivtsi, so she shared her experience on how to use the information to make some successful investigation on the theme. We also spoke a little how to work with API

Lastly, I presented the national project Rada4You. This project is the Ukrainian replication of the Australian “They vote for you” project. The replication was made by the Civil Network OPORA activists in 2016. The main idea is to scrape voting results from the Ukrainian parliament and to use these data for some analysis. For example, at this moment it is possible to unite several voting results into one policy and to use this instrument for fact checking (to check the public speeches of MPs and how do they vote for different policies). It is also possible to analyse the compatibility of different MPs as there is such an instrument in this project as “Friends for voting”. At last, the project shares all its data through API in JSON format. During our meeting, we decided to work on replication of this tool for the local level and to make the similar tool for Chernivtsi city council. That evening we had a city mayor adviser between us. So it helped us to understand if we could rely on support from that side.

Data to participate

We spoke a lot about how open data can be used by the local authorities as an instrument for participation. To be honest, we understand that the city council has a lack of technical and financial capacity to work deep and intensive on open data. Also, we know that are not a lot of open data specialists in the city. Nevertheless, there are some spheres that we can and must speak about the open data approach. So we detected these crucial directions.

Transport data. At this moment we had some kind of transport crisis in the city. The city council is working on some ideas for improving the situation. So we need to speak with all the stakeholders to achieve the situation that transport data won’t be closed from the community. In addition, we understand that this type of data is not easy to be work with so we need to learn how to use and work with transport data.

Data from Education system. We talked about how the education system accumulates a lot of data and is not sharing them. These data types can be used to make some relevant tools for parents to choose schools for their children.

Data from Health Care System. In our point of view, this type of data should also be in focus, as Ukraine currently going under a health care reform. The datasets dealing with the free of charge medicines, lists of hospitals and pharmacies, medical equipment should be opened.

GIS (geographic information system). In Chernivtsi, the city council is working on GIS implementation. The situation is the same as with transport data. There are some risks that information from GIS can be closed from the community. So we need to have an advocacy campaign to make it open.

This Open Data Day meeting was possible thanks to the Open Knowledge International support. And I hope it is only the first but not the last. We have some plans, they are not clear now but we are ready to make them not only clear but also real.

DuraSpace News: REGISTRATION Open for the VIVO 2017 Conference

planet code4lib - Tue, 2017-03-28 00:00

From the organizers of VIVO 2017

Registration is now open for the 2017 VIVO Conference, August 2-4.

District Dispatch: Bill to make Copyright Register a Presidential appointment “mystifying”

planet code4lib - Mon, 2017-03-27 20:27

Late last Thursday, in a relatively rare bicameral announcement, five senior members of the House and Senate Judiciary Committees endorsed legislation to transfer the power to appoint the Register of Copyrights from the Librarian of Congress to the President. The Register of Copyrights Selection and Accountability Act (H.R. 1695) was authored by House Judiciary Committee Chairman Bob Goodlatte (R-VA6). It also was cosponsored on its introduction by the Committee’s Ranking Member, John Conyers (D-MI13), and 29 other members of the House (21 Republicans and 8 Democrats). Senate supporters currently are Judiciary Committee Chairman Charles Grassley (R-IA), Ranking Member Dianne Feinstein (D-CA) and Sen. Patrick Leahy (D-VT).


The bill was referred to Mr. Goodlatte’s Committee for consideration and is widely expected to be voted upon (at least in Committee, if not the full House of Representatives) prior to the upcoming spring recess beginning April 10. No parallel Senate bill yet has been introduced and the pace of H.R. 1695’s or a similar bill’s review in that chamber, as well as which committee or committees will have jurisdiction over it, is currently unclear.

In a sharply worded statement, the Library Copyright Alliance (LCA) unqualifiedly opposed the bill on multiple grounds, particularly that it would politicize the Register’s position to the public’s detriment and inevitably slow the critically needed modernization of the Copyright Office. LCA, comprised of ALA, ACRL and ARL, also called the bill “mystifying” given that – if passed – Congress would voluntarily give up its power to appoint its own copyright advisor to the President to whom the bill also grants the power to fire the appointee at any time (despite the bill also confusingly specifying a 10-year renewable term of office for the Register)! Further, while the Senate would at least retain the power to confirm the nominee, the House would no longer have any influence on the selection process.

LCA’s statement was quoted at length by the widely read Beltway publications Washington Internet Daily (behind a paywall) and Broadcasting & Cable. ALA and its LCA partners will be monitoring H.R. 1695’s progress in the House and Senate closely.

The post Bill to make Copyright Register a Presidential appointment “mystifying” appeared first on District Dispatch.

Islandora: Meet Your Developer: Jonathan Green

planet code4lib - Mon, 2017-03-27 13:07

It has been a while since we had a entry for Meet Your Developer, but there is no better person to re-launch the series than longtime contributor Jonathan Green. The architect of Tuque and a part of the Islandora community since 2010, Jonathan Green is a Committer on both the 7.x and CLAW branches of Islandora. He returns to the community as a DevOps Engineer at LYRASIS, after a hiatus in another industry. Here's what Jon had to say:

Please tell us a little about yourself. What do you do when you’re not at work?

When I’m not at work I’m often still tinkering with computers in one way or another. I’ve always been interested in hacking on both hardware and software. Recently I’ve been playing with the Rust programming language and machine learning.

My other hobby is brewing beer, building things to brew beer and being an amateur beer snob. Recently I converted an old refrigerator into a keg fridge for my homebrew. Right now, I have a stir plate going in my kitchen to grow some yeast for a brew on Saturday.

How long have you been working with Islandora? How did you get started?

After moving back to PEI in late 2009, I started working at discoverygarden, Inc. in January 2010 and quickly started hacking on the 6.x version of Islandora. Spent a few years at DGI, working on the 6.x and 7.x versions of Islandora. In my time at DGI I was involved in building the 7.x version of Islandora and wrote the initial version of the Tuque library.

Then I took a bit of a detour in my Islandora experience and spent a couple years working on embedded software for the marine industry, primarily control systems for power distribution on oil rigs. After a few years floating around on oil rigs in Korea and the Gulf of Mexico, I joined LYRASIS as a contract developer and have been getting back into Islandora development.

Sum up your area of expertise in three words:

All the things.

What are you working on right now?

Right now, I’m focusing on two things primarily. I am working on updating and improving the LYRASIS hosting environment for Islandora 7.x. We are always working on continually rolling out improvements for our Islandora hosting clients, so they can use the latest and greatest Islandora features.

The most exciting thing I’ve been working on is Islandora CLAW. LYRASIS has been generous enough to donate a day or two of my time every week to the CLAW project, so I’ve been jumping into that stack and trying to help with development of the MVP. Recently I committed the CLAW equivalent of the D6 and D7 Drupal filter. This time we are using Json Web Tokens to provide authentication against the external services like Fedora. I’m very excited about CLAW and I feel privileged to be involved in its development.

What contribution to Islandora are you most proud of?

I’m really proud of the work I did at DGI while developing the architecture for Islandora 7.x. It was a huge team effort when moving from Islandora 6.x to Islandora 7.x, and I was a very small part of it, but it’s been great to see how the initial small kernel of Islandora 7.x has grown into an amazing collection of modules and features, and to see the open source community grow around the Islandora project.

What new feature or improvement would you most like to see?

Usability and focus on user experience, especially new user experience. I think that we could do a much better job making the software work as one would expect out of the box.

What’s the one tool/software/resource you cannot live without?

There are so many fundamental pieces of open source software that I couldn’t develop as efficiently without, it’s hard to name just one. I spend my days standing on the shoulders of open source giants.

If you could leave the community with one message from reading this interview, what would it be?

Jump in and contribute, an open source community like Islandora depends on its members. Breaking things is the best way to learn.

Jonathan Green - LYRASIS

Open Knowledge Foundation: Open Data got to Paraguay to stay – Open Data Day 2017

planet code4lib - Mon, 2017-03-27 12:47

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Follow Public Money theme.

The original post was published Girolabs blog in Spanish and was translated by Oscar Montiel 

Open Data got to Paraguay to stay! This was proven in this year’s edition of Open Data Day which took place at the Loffice Terrace.


The event brought together more than 40 people and friends of the data community, from Civil Society and Government. The event was organized by Girolabs, TEDIC, CIRD and SENATICs. We started with a small snack to warm up and then we continued with lightning talks and discussions.


Maps and Open Data


The first ones to speak were José Gonzalez and the creators of TOPA app. José introduced the Open Street Map (OSM) Paraguay community and how everyone can contribute to open data and this big repository of geodata. Then came the TOPA team, who introduced an app based on OSM to create a Traffic and Transport Information System; an integral, sustainable, collaborative platform where users, transit drivers and government offices collaborate to gather and process information about mobility, traffic and transport in real time, from crowdsourced and their own data.


Open Data and Civil Society


In this edition of #ODD17, civil society generated and presented relevant data for better transparency and control by the citizens.


This was the case of CIRD, who presented their project Ñañomoiru, where they try to make more transparent and improve the services provided by the Social Accion Secretary. This office’s goal is to improve the quality of life of people living in a state of poverty and vulnerability, providing easy access to nourishment, health and education, by increasing access to these basic services and strengthening relations to reduce intergenerational poverty. In another CIRD project, called “A quienes Elegimos”, they released their tools of authority monitoring and a data set of municipal authorities that they gathered.

Techo presented their project Relevamiento de Asentamientos Informales (RAP) and their platform which makes the data about life conditions of settlements in Asunción available as open data.

Gabriela Gaona told us about her experience working in many apps based on open data and how citizens can request information through


Where’s our money?


One of the main subjects of the night was government data about public money. Federico Sosa, the director of Open Government in Paraguay showed the progress of the government’s open data.


Right now, everybody can monitor what is done with public money. All the data from the Tax Office about the budgeting, public debt and spending are available in their portal.

Let’s request data!


To end the night, SENATICs, using Open Data Day as a platform, in the presence of Leticia Romero and the Minister David Campos, launched the Ideathon InnovandoPY challenge, where they want citizens to support government, companies and civil society organizations to know which data should be available. The challenge will be open until March 31, 2017. The SENATICs will provide mentorship to show participants how to open data.
This was a relaxed event but full of information, debate and sharing between people committed to transparency, innovation and citizen participation. We also gave people stickers and t-shirts from the event. We want to thank the Open Data Day organizers for the support and for making Paraguay visible in the map of open data communities.

Library of Congress: The Signal: Centralized Digital Accessioning at Yale University

planet code4lib - Mon, 2017-03-27 12:33

This is a guest post from Alice Prael, Digital Accessioning Archivist for Yale Special Collections at the Beinecke Rare Book & Manuscript Library at Yale University.

Photo by Alice Prael

As digital storage technology progresses, many archivists are left with boxes of obsolete storage media, such as floppy disks and ZIP disks.  These physical storage media plague archives that struggle to find the time and technology to access and rescue the content trapped in the metal and plastic. The Digital Accessioning Service was created to fix this problem across Yale University Libraries and Museums for special collections units that house unique and rare materials and require specialized preservation and supervised access. Nine of Yale’s special collections units participate in the Digital Accessioning Service.

The goal of the Service is to centralize the capture of content off physical media and eliminate the backlog of born-digital material that has not yet been captured for preservation. Until now, this work was completed in an ad hoc fashion (often in response to researcher requests), which has led to a large backlog of disks that may have been described and separated from the collection but have never been fully processed. By centralizing digital accessioning, Yale Libraries leverages its hardware, software and expertise to make the Service available to special collections units that may lack the resources to capture and preserve born digital collections.

The Di Bonaventura Digital Archaeology and Preservation Lab, shared by the Beinecke Rare Book and Manuscript Library and the YUL Preservation Department, hosts the Digital Accessioning workstations. There are two custom-built computers created to  capture content from storage media such as floppy disks, CDs and hard drives. One non-networked computer is used to scan media for viruses prior to capturing the content (it is disconnected from the network so that viruses cannot get loose and spread throughout the network). Another computer has specialized software to scan the content for private information as well as for other in-depth processing tasks. These machines form the technological base of the Digital Accessioning Service.

The Service is mainly staffed by me (with guidance from Gabby Redwine, Beinecke’s Digital Archivist) and the Born Digital Archives Working Group, which is made up of practitioners from across Yale University Libraries and Museums. The Service also employs student assistants to help with disk imaging and data entry.

Before Drafting Policies and Procedures
Before we drafted policies and procedures, the Digital Archivist and I met with the participating special collection units and talked with each unit about the collections they hold and their expectations for future born-digital acquisitions. (It’s important that the Service be able to provide services for all the major media types within our collections.) We completed an informal environmental scan prior to the creation of the Service to determine what media types the Service should be ready for and how much storage would be necessary to preserve all the captured content. Once the challenges began to take shape, I consulted with the Born Digital Archives Working Group and began building workflows and testing tools.

The Service uses a variety of software and hardware tools, including Kryoflux, Forensic Toolkit, IsoBuster and the BitCurator environment. More details about our usage of these tools are available in the Disk Imaging and Content Capture Manual on our Digital Accessioning Service Libguide. I tested the workflow with dummy media, mostly using software-installation disks. In an effort to stay as transparent as possible to special collections units and the larger digital-archives community, I published much of the Service’s documentation — including workflows, manuals and frequently asked questions — on the Born Digital Archives Working Group Libguide.

The main steps of the workflow are:

  1. Complete a submission form (done by the special collections unit) and deliver media securely to the Lab
  2. Confirm that the boxes of media that arrived match the content described by the special collections unit
  3. Photograph the disks
  4. Scan the disks for viruses
  5. Connect to writeblockers (which block archivists from making any changes — accidentally or deliberately — to the original disk) and attempt to create an exact copy, called a disk image, of the content
  6. If disk-image creation fails, attempt to transfer files off storage media
  7. Scan captured content for personally identifiable information
  8. Package all captured content, photographs and associated metadata files for ingest into the preservation system.

Some record creators use every inch of their labels, leaving little room for archivists to apply their own naming conventions. Photo by Alice Prael.

In creating the Service, I encountered some unexpected challenges, many of which I documented on the Saving Digital Stuff blog. One challenge was determining a standard method for labeling the storage media. It is important that media is labeled in order to correctly identify content and ensure that the description is permanently associated with the storage media. Each special collections unit labels storage media prior to submission to the Service. We had challenges in labeling media that were already covered with text from the original record creator. We also faced difficulties labeling fragile media such as CDs and DVDs. Another challenge was the need for different tools for handling Compact Disks-Digital Audio, or CD-DAs, which have a higher error rate than CDs that contain other data. The Service ultimately decided to use Exact Audio Copy, a software tool created for capturing content from CD-DAs.

The Digital Accessioning Service is only one piece of a larger digital preservation and processing environment. The Service requires that special collections units provide a minimum level of description via spreadsheets that get imported into ArchivesSpace, the archival description management system adopted at Yale University Libraries. However not all of the special collection units have fully implemented ArchivesSpace yet. By using the spreadsheets as an intermediate step, the Service can accommodate all special collections units’ needs regardless of their current stage of ArchivesSpace implementation.

Once the Service’s disk processing is complete, the disk image, photographs, log files and other associated files get moved into the Library’s digital-preservation system, Preservica. Yale University Libraries’ implementation of Preservica is integrated with ArchivesSpace descriptions, which will aid future archivists in locating digital material described in our finding aids. Content from each disk is ingested into Preservica and listed as a digital object in ArchivesSpace, associated with the item-level description for the disk.

After Drafting Policies and Procedures
After drafting and revising the policies and procedures in collaboration with the Born Digital Archives Working Group, the Digital Archivist and I returned to the special collections units to make sure that our workflows would be sufficient for their materials.

One concern was regarding the immediate ingest of material into Preservica. Since many special collections units do not have the hardware to preview disks prior to submission for accessioning, the files themselves have not yet been appraised to determine their archival value. Once content is ingested for preservation, deletion is possible but much more onerous. For special collections units that require appraisal post-accessioning, the Service decided to use the SIP Creator tool, developed by Preservica to package content and maintain the integrity of the files, then move the packaged content onto a shared network storage folder. Special collections units may then access and appraise their content prior to ingest for long-term preservation.

The focus of the Service at this point is to address the significant backlog of material that has been acquired but not yet captured for preservation. The Service is currently funded as a two-year project. As we approach the eight-month mark, we are using this time to determine the ongoing needs for special collections units at Yale. I hope that, as the backlog is diminished, the existence of the Service will aid in future born-digital collection development. Some special collections units have noted that in the past they were hesitant to accept certain donated material because they could not ensure the capture and preservation of the content. By removing this barrier, I hope that donors, curators and archivists across Yale University will be more comfortable working with born-digital material.


Subscribe to code4lib aggregator