You are here

Feed aggregator

District Dispatch: ALA president welcomes FY2018 federal budget agreement, applauds library advocates

planet code4lib - Fri, 2018-02-09 18:42

Today Congress passed and the president signed an FY2018 budget deal that will likely include at least level funding for federal library programs at FY2017 levels. Below is the statement of ALA President Jim Neal:

We are pleased that Congress has passed an FY2018 spending agreement that includes an increase in federal funding for domestic priorities, which, we hope, will include library funding. Most of all, we are pleased that Congress rejected the president’s call to eliminate many important programs for libraries, programs that benefit millions of Americans in every corner of our country – students, seniors, children, job seekers, veterans, entrepreneurs and researchers. While this budget agreement is a positive step towards resolving the FY2018 budget, ALA continues to call on Congress to include funding for libraries as it writes the final spending bills in the coming weeks.

One lesson from this long FY2018 appropriations process is that when libraries speak, decision-makers listen. At critical points in the process last year, ALA members from every U.S. congressional district responded to our calls to action. As a result, a record number of representatives and senators signed our annual “Dear Appropriator” letters in the spring. As the House and Senate Appropriations Committees worked on their respective bills last summer, ALA members made more targeted phone calls and visits and leveraged their local media to tell their library stories. Our advocacy earned bipartisan support in both chambers of Congress, preserving full funding in the House and even garnering an increase in the Senate.

The persistence of library advocates has paid off for every single community in our nation, from the most populous metropolitan centers to rural areas where the local library offers the only no-fee access to the internet. This is a time to honor the power of our advocacy.

This is also a time to strengthen our resolve. The White House will soon release its FY2019 budget, and we’re expecting even deeper cuts than proposed last year. To protect federal library funding, we need to remind Congress that libraries bring leaders and experts together to solve difficult problems, that we deliver opportunities – from academic success, work-readiness and literacy to housing stability and historical preservation. We need to invite elected leaders into our libraries to see what we do and what we can do for their constituents with a small investment of federal dollars.

The FY2018 budget passage represents a major win for libraries, and that win will fuel even more aggressive efforts to advocate for federal library funding in FY2019. ALA’s Washington Office will continue to provide the expertise, strategy and resources that have helped make our advocacy so effective. For a preview of our new federal advocacy tools, visit:

The post ALA president welcomes FY2018 federal budget agreement, applauds library advocates appeared first on District Dispatch.

District Dispatch: States Push Back on Net Neutrality

planet code4lib - Fri, 2018-02-09 10:45

A new front is emerging in the fight to save net neutrality: states and localities. On December 14, the FCC voted to gut net neutrality protections limiting the power of internet service providers (ISPs) to block, throttle, degrade or assign preference to some online content and services over others, and we predicted potential Congressional action and legal challenges. Less clear at the time was the role state or local policymakers might take.

Now, we are beginning to see some trends emerge:

Some state lawmakers have argued that they have an obligation to protect consumers with net neutrality rules and that local governments can approve or deny requests by telecommunications providers to operate in their states. At least 17 states have introduced net neutrality legislation, and more are expected soon. So far, the bills have been falling into three categories:

  1. Some states have passed bills that state a general sense of disapproval or that provide bright-line net neutrality rules for the whole state. A joint Senate resolution in Alaska urges Congress to overturn the FCC order, while a bill in Nebraska seeks rules against blocking, throttling and degrading of internet traffic while also banning paid prioritization of content. These bills may meet with a challenge because of a section of the final order from December, in which the FCC asserts authority to prevent states from pursuing laws inconsistent with the net neutrality repeal.
  2. Other states are introducing measures that would require ISPs to certify they are abiding by net neutrality rules in order to do business with the state, such as the bill in Rhode Island and one in New York State. Last week, Montana’s governor signed the first Executive Order that would achieve the same goal.
  3. And the third category of potential action includes bills or ordinances that would give preference or incentives to ISPs that comply with net neutrality rules for services purchased or used by public entities.

It’s worth noting that bills like those in Nebraska appear most likely to be challenged based on the FCC Order’s language pre-empting state or local net neutrality protections. Legal challenges to those in categories two and three are less clear as they are more in the realm of contract law.

In addition to legislation, states can take the FCC to court. Twenty-two states have signed on to a lawsuit led by New York Attorney General Eric Schneiderman, who had encouraged the FCC to delay their vote after discovering that 2 million identities may have been stolen for fake comments submitted during the FCC’s public comment period. When the FCC refused to cooperate, a multistate lawsuit was filed on the grounds that the public comment process was corrupted.

Other Opportunities
Local governments also have power over ISPs due to access and rights of way they are granted. When it comes time to negotiate a new agreement, the city or state generally contracts an independent third-party evaluator to do a needs assessment. That report can serve as a starting point for the city to negotiate. From there, the city or states can require the company to commit to their requests. Some policymakers have suggested making net neutrality conditions part of this process.

What’s next?
We will track these activities and keep ALA members informed about local and state actions as they come up. In the meanwhile, library advocates can:

  1. Stay informed via District Dispatch
  2. Sign up for alerts from ALA so we can reach you quickly when direct action is needed. Right now, you can email your members of Congress and ask them to use a Joint Resolution of Disapproval under the CRA to repeal the recent FCC action and restore the 2015 Open Internet Order protections.

The post States Push Back on Net Neutrality appeared first on District Dispatch.

Library of Congress: The Signal: New Audio Digitization Performance Testing Guidelines and Software from FADGI

planet code4lib - Thu, 2018-02-08 15:34

The following is a guest post by Kate Murray, organizer of the FADGI Audio-Visual Working Group and Digital Projects Coordinator at the Library of Congress.

The Federal Agencies Digital Guidelines Initiative is pleased to announce the release of ADCTest, an open source software application designed to facilitate performance testing of analog-to-digital converters (ADC) used in audio digitization preservation workflows. Developed by AVP in collaboration with Christian Landone, ADCTest automates a series of tests to measure the performance of ADC devices against FADGI’s 2017 Audio Analog-to-Digital Converter Performance Specification and Test Method Guideline (Low Cost).

Federal Agencies Digital Guidelines Initiative

A complement to the high quality ADC performance test guidelines approved in summer 2016, the pass-fail specifications recommended in the low cost guideline are for modified performance testing which takes into account minimum levels of performance, with lower equipment costs and requiring more modest engineering skills.  This low cost test guideline, approved by FADGI in September 2017, is not intended for absolute, high precision testing for the purpose of comparing the performance multiple ADCs, or analyzing nuanced aspects of ADC performance. Instead, its primary purpose is to provide users with a low-cost and simple test tool that will be employed routinely. In FADGI’s ADC performance testing work, we found that one of the biggest issues was that performance testing wasn’t being performed routinely, or at all, due to cost and complexity. Routine performance testing, even with lower performance test tools, enables the identification of significant failures, the relative comparison of ADCs in certain cases, and the ability to establish a benchmark for a given ADC to track trends in performance over time.

For more background and history of FADGI’s ADC performance testing project, see Avoid Jitter! Measuring the Performance of Audio Analog-to-Digital Converters.

The ADCTest software, available for download on the Library of Congress GitHub site, automates the testing protocol for all the low cost guidelines with a goal towards answering three basic questions:

  • Is my ADC/system failing?
  • How does my calibrated, healthy ADC perform relative to the guideline and other ADCs?
  • Is my ADC/system performing optimally relative to its own specifications?

It provides a flexible approach – with the ability to enable and disable individual test and accommodate support up to 24 bit, 96kHz signals – combined with easy to interpret pass/fail results, allowing users to customize the tool based on their needs.

Selecting and deselecting tests to run in ADCTest.

ADCTest evaluates the performance of the ADC device against the seven criteria specified in the FADGI low cost guidelines including frequency response, dynamic range and crosstalk. The reports for each test are clearly identified as pass/fail against the FADGI criteria spelled out in the low cost guidelines.

Pass/fail test results.

Double clicking on a given test will open up a detailed view of the test results to allow further investigation and interpretation. Users can even create their own test suite XML file and open it from the file menu to replace the FADGI test suite represented by default.xml in the application directory.

Inserting a user-defined test suite.

ADCTest will continue to be refined with support from AVP and feedback is welcome through GitHub or directly to FADGI.

In keeping with FADGI’s goal to provide wide access to improved tools and workflows, ADCTest carries a BSD-3 license and the low cost guidelines carry a CC0 1.0 Universal license for worldwide use and reuse.

For more information about the FADGI ADC Performance Testing project, see

ADCTest software is available for download on the Library of Congress GitHub site.

DuraSpace News: DuraSpace Accepted into Cohort of Mozilla Open Leaders

planet code4lib - Thu, 2018-02-08 00:00

DuraSpace is thrilled to announce its acceptance into the latest cohort of the Mozilla Open Leaders program, with its project Open Sourcing DuraCloud: Beyond the License.

Mozilla Open Leaders is a 12-week, online mentorship & training course. It focuses on growing a culture of openness in innovation and research, as well as, fueling the Internet Health movement by engaging contributors in their work and connecting with other leaders in the Mozilla Network.

DuraSpace News: New Lower Prices for DuraCloud Subscriptions with 20+ TB of Storage

planet code4lib - Thu, 2018-02-08 00:00

DuraSpace is pleased to announce new pricing for DuraCloud subscriptions of 20 or more terabytes. The new pricing structure passes on storage savings and offers greater flexibility and more predictable pricing for the long-term preservation storage of growing digital collections. 

DuraCloud will continue to offer the same cost-effective pricing for subscription plans under 20 TB, including Preservation subscriptions starting at 1 TB of storage for an annual cost under $2,000. 

LITA: Jobs in Information Technology: February 7, 2018

planet code4lib - Wed, 2018-02-07 20:03

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Dayton Metro Library, Technology Development Manager, Dayton, OH

Hobart and William Smith Colleges, Systems Librarian, Geneva, NY

MIT Libraries, Head of Digital Library Engineering, Cambridge, MA

Medical University of South Carolina, Director for Digital Strategies and Innovation, Charleston, SC

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Lucidworks: Omnichannel Retailer Personalization Data Sources

planet code4lib - Wed, 2018-02-07 16:03

Every Omnichannel retailer is seeking to provide better search and recommendations to influence customers and increase conversions. What sources of data should retailers utilize in generating recommendations?

Online retailers have only web-based events like clicks and queries to use for customer behavior modeling. An Omnichannel retailer can combine online data with other data from other channels like loyalty programs and in-store interactions, resulting in more comprehensive intelligence. Research shows that most customers want to be recognized across channels through which they encounter a brand. Customers are also more loyal towards retailers that personalize their experience.

Every action a customer takes is a “signal” that tells us more about who they are, what they like, and what they (and users like them) are likely to do next. Those signals include every customer action, from a purchase, to visiting a retailer website without purchasing anything. When a customer comes into a store and picks something off the shelf and doesn’t buy anything, that too is information. These “signals” can be used to make recommendations and influence the customer.

Here are some examples of signals that we should consider capturing:

Mobile action data source starts or opens app Time, location App, GPS searches Time, location, query App, GPS Taps item Time, location query, item App, GPS Adds item to cart Time, location, item App, GPS Abandons item in cart Time, location, item App, GPS Purchases item Time, location, item App, GPS Web visit time Web tag, Appstudio search Time, query Appstudio Clicks on item Time, item Appstudio Adds an item to cart Time, item Appstudio, commerce suite Abandons item in cart Time, item Appstudio, commerce suite Purchases item Time, item Appstudio, commerce suite Clicks for service Time, item, reason Appstudio, CSR Phone Customer calls Time, Item, reason CSR Store Purchase Time, item, store Checkout Return Time, item, store Checkout interest Time, item, store, aisle Mobile, camera Uses coupon Time, item store, publication Checkout Employee contact Time, employee, location, store Mobile, Camera Email Opens Time, topic, item Lead gen tool


AI Technology for the Omnichannel Retailer 

Capturing mere purchase information reveals a lot. However, customers are unlikely to use our app when they are in the store. Research shows that instead of using an app, people text their friends or talk while shopping. As retailers, this behavior is something we have to embrace. The technology to recognize a customer by appearance is already available and has already been deployed.

Where a customer goes within a store says a lot about them. If a customer spends a lot of time in the sporting goods section they probably have an interest in sports. If they pick up everything with Nike stamped on it, then they probably have an affinity for that brand. If they only buy things in the store after talking to an employee, then we may want to make sure they talk to someone.

The right technology on the backend is critical to any successful retail store of the future. The  right technology is needed to receive customer signals. The right AI technology in the middle is needed to make recommendations and influence customers. Every customer’s visual experience is critical whether it be merchandising in-store, the layout on our app, or the search on our website.

And in the web…

For web and mobile web applications, Lucidworks App Studio is our best friend. We can rapidly deploy a search application and improve it just as rapidly when our business evolves. With App Studio, you don’t need to write or maintain any of the code involved in signal capture. AppStudio can also automatically connect many of the kinds of UI features necessary to influence customers!

For the omnichannel retailer, there are signals on the web, in the store, on a mobile app and any other customer interaction. In order to use AI technology and provide personalized service to modern customers, we need to deploy best of breed technologies. Lucidworks is the only company that offers a complete AI-driven search solution that help win and retain customers.

Get Started

The post Omnichannel Retailer Personalization Data Sources appeared first on Lucidworks.

LITA: #LITAchat – LITA Membership Benefits

planet code4lib - Wed, 2018-02-07 15:17

Interested in joining LITA or already a member and want a refresher on LITA membership benefits?

Join members of the Membership Development Committee and other LITA members on Friday, February 23, 1:00-2:00pm EST on Twitter to discuss and ask questions about LITA membership.

To participate, launch your favorite Twitter mobile app or web browser and search for the #LITAchat hashtag and select “Latest” to follow along and reply to questions asked by moderator or other participants. When replying to discussion or asking questions, add or incorporate the hashtag #LITAchat.

See you there!

District Dispatch: YX librarians at MidWinter 2018: Confidence and facilitation is key for youth tech programs

planet code4lib - Wed, 2018-02-07 14:43

Shannon Lake, Tori Ogawa, and Heather Thompson are a few of the outstanding and innovative librarians who are part of the inaugural cohort of YX Librarians, a post-graduate certificate program offered by the University of Maryland. Alongside one of their instructors, Dr. Mega Subramaniam, these front-line librarians will discuss implementing new technology programs for and with today’s youth. In their presentation, they will discuss the importance of building confidence when executing tech-infused programs and incorporating youth voice into planning and implementation processes. 

YX Librarians present Confidence and Facilitation is Key: Infusing Technology into Youth Programs on Sunday, February 11, 2018 at 10:30 a.m.

The YX Librarians will share ways to run tech-infused programs and address how reaching out to the youth and community can make this possible, even when individuals aren’t technology experts. Attendees of this presentation will learn about a replicable program that integrates unplugged and high tech activities to teach youth how to code in HTML, CSS, and Javascript. They will additionally learn K-5 activities that teach the fundamentals of coding without the use of computers, as well as how to blend high and low tech into one program with Sphero robots. Attendees of Confidence and Facilitation is Key will walk away with a toolkit of ideas and examples to facilitate rich youth technology programs at their own libraries.

Come see the YX Librarians present Confidence and Facilitation is Key: Infusing Technology into Youth Programs on Sunday, February 11, 2018 at 10:30 a.m. in the Colorado Convention Center, Rm 104.

Interested in becoming part of the next cohort of YX librarians? This online post-masters certificate gives youth service librarians and school librarians the skills they need to integrate the latest research on learning, technology, community partnerships, and design thinking into their practice with youth ages 0-18. Applications are being accepted through February 28th, 2018. More information can be found at:

About the Presenters:

Shannon Lake is the Teen Educator/Librarian at Providence Public Library.

Tori Ogawa is a Children’s Librarian and the Harold W. McGraw, Jr. Fellow at Darien Library.

Heather Thompson is the Youth Services Programming Librarian at the Kenosha Public Library and co-creator of the blog

Mega Subramaniam is an Associate Professor in the College of Information Studies at the University of Maryland.

The post YX librarians at MidWinter 2018: Confidence and facilitation is key for youth tech programs appeared first on District Dispatch.

In the Library, With the Lead Pipe: Spotlight on Digital Government Information Preservation: Examining the Context, Outcomes, Limitations, and Successes of the DataRefuge Movement

planet code4lib - Wed, 2018-02-07 12:48

In Brief:
Access and preservation of online government data and information has been a long-standing and complex issue for librarians in government information librarianship, but it has recently started to receive attention on a larger level from the media, public, and libraries in general. The most recent initiative to archive digital government data was the DataRefuge movement in 2016 and 2017, which sponsored DataRescue events where people came together to capture static webpages and harvest dynamic online content for preservation purposes. This article examines the history and context of print and digital government information preservation initiatives and then focuses in on the DataRefuge movement to discuss its outcomes, limitations, and successes in light of long-term preservation and public access.

by Eric Johnson and Alicia Kubas


When publicly useful datasets and information under government control are no longer freely available, when they are hidden, or when the public does not know they exist (or existed), those datasets are lost. The restriction of data availability can occur for many reasons, including political bias, temporary government shutdown, link rot, and routine redesigning of websites. In the worst case, not only access will be affected, but data collection itself may be halted. While some might view a reduction in data collection favorably, there are many datasets integral to research, improving health, mitigating crime, community building, and other areas important to improving quality of life in the United States.

While the loss of information can be considered a nuisance or inconvenience to people who use that information on a regular basis, there is a more significant consequence to creating political bias in available information. If information is no longer available to be read, our freedom to effectively govern our democracy is curtailed. “The right of free inquiry and the equally important right of forming their own opinions…is of the utmost importance to the continued existence of democracy…and that freedom of the press in all forms of public communication be defended and preserved” (Texas Library Association 2013). That includes not just political opinion statements, but also the data and information citizens can use to make informed decisions about political leaders as well as daily choices. For instance, knowledge about climate change can persuade a person to make personal behavioral changes as well as inform their decisions about local and national leadership.

Furthermore, government information is not just important to the general population: it is used by policymakers, academics, researchers, and many others for decision-making and critical discussion. The U.S. government collects, aggregates, and disseminates a large volume of information. The authors believe that this information should be freely available without bias based on political views or motivations.

Accessibility of information, including government information, has been of concern to libraries since their inception (although, unfortunately, there were times in their history where access was limited to certain groups of people). The term “free” library or “public” library shows a philosophical difference from private subscription libraries of the 1700s and 1800s. Localities decided to make books freely available to the populace, and thereby increase societal good (American Library Association n.d.). Libraries participating in the Federal Depository Library Program (FDLP), a program run by the U.S. Government Publishing Office (GPO) for federal publication dissemination to the public, agree to provide access to government publications and information free of charge to anyone who requests it.

Within the profession, librarians generally believe that freedom of information is essential (American Library Association 2006). Freedom of information is an extension of freedom of speech and is not only supported by the U.S. Constitution’s Bill of Rights, but by the Universal Declaration of Human Rights supported by 48 countries (United Nations n.d.). All of these ideas underscore librarians’ philosophy of universal access to information of all types, including data, and show why the topic of access to government information and data is so pertinent to libraries.

The presidential debates and political campaign statements in 2016 showed that one of the candidates scorned and rejected evidence that the world’s climate has been changing as the result of human activities. The New York Times reported that “Mr. Trump has called human-caused climate change a ‘hoax.’ He has vowed to dismantle the Environmental Protection Agency in almost every form.” (Davenport 2016). After President Trump was elected, information under his control and in conflict with his own viewpoints was seen to be at risk and of concern to the public. Consequently, scientists and concerned citizens began collecting and archiving data from government websites (Dennis 2016), and thus began the DataRefuge movement and its DataRescue events where academics, librarians, coders, concerned citizens, students, and many others developed new models of cooperation to assist in the collecting and archiving of government information. These events also helped raise awareness of the longstanding issue of online government information preservation. Libraries and other non-profit organizations contributing to this effort would then manage and provide access to that information into the future.

Within the context of DataRescue efforts and this article, government data is defined as data, metadata, and information under government control and generally accepted to be in the public’s best interest to be made publicly available. This includes data that is collected or created by the government, or required to be submitted to the government for public dissemination or organized access. The organization and access to data and summary information along with metadata that is provided by government websites is essential to retrieving and understanding the underlying data. This does not include sensitive or secure information.

To fully appreciate the efforts, goals, and impact of the DataRefuge movement and DataRescue events as well as the roles librarians can play, it is important to understand the context in which DataRescue was born. This article includes a brief history and current status of government information dissemination and access through the Federal Depository Library Program as well as current initiatives to make government information and data accessible long-term and preserved for public access. Throughout, we will discuss challenges and barriers to preservation and future access of digital government information and data that come up with each program or initiative. Finally, we will examine the impact as well as the limitations of DataRescue events and U.S. government data preservation as a whole.

Overview of the Current Landscape The Federal Depository Library Program (FDLP)

Government information in the United States has been disseminated to the American people since 1813 when Congress authorized House and Senate Journals as well as other Congressional publications to be distributed to select universities, state libraries, and other organizations (McGarr 2000). Since then, the federal government has taken major strides forward in ensuring public accessibility of government information of all types and in all formats through the inception of the Government Publishing Office, formerly known as the Government Printing Office, and its public dissemination arm in 1895, which took up the role of the Federal Depository Library Program to deposit tangible government materials into depository libraries across the United States and its territories. The height of paper document dissemination by GPO was in the 1980s, when the number of publications distributed to depository libraries increased by more than one-third (U.S. Government Printing Office 2011). While this program seeks to disseminate to libraries all government information of public interest, there have always been documents, even in the print era, that have not been funneled through the program. What further complicates this issue is that federal agencies began to question the statutory authority and constitutionality of requiring agencies to contract with GPO for printing after a 1983 Supreme Court decision that called similar issues into question (Petersen 2017). Today, this continues to be a contentious issue with agencies deciding to skip over the GPO altogether in favor of other publishing means and providers. Naturally, this problem is even more dire in a digital-heavy publishing environment where dissemination and deposit does not occur through the FDLP. In an era of print, a category of participating depository libraries, called regional depositories, agree to retain all disseminated documents to ensure long-term access to tangible government documents.

With the era of electronic government information, the system became more complex. By the 1990s, government information had an electronic presence that eventually came under GPO’s purview with passage of the 1993 GPO Electronic Information Access Enhancement Act (Government Printing Office Electronic Information Access Enhancement Act of 1993 1993). GPO was then required to provide “a system of online access” to various prominent government publications deemed appropriate for electronic access by the Superintendent of Documents. A year later, GPO rolled out GPO Access, its database for online access to government publications–primarily Congressional materials–and thus began its efforts to provide permanent public access to online government materials published in conjunction with GPO. This does not mean that libraries were not still continuing to receive many publications duplicated in print or only available in a physical format. The push for online access was precipitated by a rising user preference for digital access as well as the cost-saving that digital publishing provided over print, and the publishing format was often decided by what would be the most inexpensive to produce (McGarr 2000). By the 2000s, many depository libraries were moving towards digital access instead of tangible access precipitated by a variety of factors, including user preference, availability of publications in print, and availability of physical space in libraries. This phenomenon was also reflected in the decline in tangible publication distribution through the FDLP: 12.2 million in 2000 to 3.2 million in 2009 (U.S. Government Printing Office 2011). Today, depositories that receive few to no print documents are becoming more common, and the amount of tangible distribution of government publications is declining as many agencies are publishing exclusively online content.

Although GPO Access provided a major step forward in public access to electronic government information, by the early 2000s an update was needed. In 2009, GPO’s Federal Digital System, or FDsys, was released to the public. This new system offered an improved interface for online access to mostly post-1996 Congressional and Executive government publications, both digitized and born-digital, and it also served as an archive for long-term preservation and access to this digital content. In 2016, GPO unveiled, its newly redesigned beta interface, for improved user experience and enhanced functionality with a purpose to “ingest, authenticate, preserve, and to provide version control to the information products of the Legislative, Judicial, and Executive branches of the U.S. Government” (U.S. Government Publishing Office, Office of the Superintendent of Documents 2017, 6). With the overall move to a majority of government publications being published digitally without being deposited in libraries but rather linked from a local catalog to GPO’s catalog or system, digital preservation and access has become a prominent concern for GPO, the depository community, and users.

Born-Digital Government Information and Data

It is clear that GPO has a vested interest in preservation of digital information, whether digitized from a physical copy or born-digital, although they have limited statutory authority to act on this interest. One of the major problems, though, is that GPO can only preserve the publications and information that it receives from federal agencies and organizations. The failure of many agencies to submit all of their published or distributed information is a long-standing problem for GPO. In a 1996 report, GPO had already identified notification and compliance from agencies as a major issue in the context of a more electronic government landscape: “With the increasing emphasis on electronic dissemination and decreasing compliance with statutory requirements for agencies to print through GPO, identifying and obtaining information for the FDLP is becoming increasingly difficult” (U.S. Government Printing Office 1996, ii). Another major concern is the cost-saving move to reduce publishing of government products through GPO in general. Today, access to agency publications has become more difficult as the landscape is almost exclusively digital and more agencies are simply posting data, reports, and information on their webpages and not submitting any content to GPO to be disseminated or archived.

To assist with this problem, GPO has a Document Discovery Program where lost or fugitive documents–online or tangible materials published by a federal agency that were not distributed through GPO–can be submitted for potential inclusion in GPO’s catalog and the FDLP (Medsger and Webb 2016). These are usually instances where the agency has posted a report on its website believing that decision is compliant for public access, when federal mandate also requires agencies to submit this information to GPO. Organized access to these materials by depository libraries and their preservation becomes more tenuous the more disaggregated they become, especially as digital publishing of federal publications continues to increase. In 2012, federal agencies published approximately 92% of government publications in a digital format (Federal Depository Library Program 2012, sec. 13.3), and today this is likely much higher with only a relatively small amount still published in print since the Obama administration’s call for expanded digital offerings of agency information in 2012 (Office of Management and Budget, The Executive Office of the President 2012).

While government-collected data and databases are included in the FDLP’s purview (U.S. Government Publishing Office, Office of the Superintendent of Documents 2017), data portals and raw datasets are content which GPO itself does not actively preserve nor deposit to depository libraries for local preservation. The U.S. federal government is the largest collector of data in the world, making this a wealth of information to which the public does not have guaranteed long-term access. Mistakenly, some assume that the bulk of this published data is accessible via, the federal government’s open data portal, but this is a metadata portal without hosting capabilities or preservation practices for the data itself. GPO does, however, archive some government webpages that are in scope of the FDLP and not duplicated in print, although it is not clear that this is done in a consistent or comprehensive manner (Dahlen 2017). Furthermore, they are not able to capture content that is not easily crawled or imaged, like video, datasets, portals, and other more complicated information types. This overall issue is why libraries, both depository and non-depository, as well as others, have turned their attention to this daunting problem through various initiatives and efforts.

Initiatives to Preserve or Replicate Government Information

With the many issues and gaps in existing efforts by GPO or federal agencies to provide long-term public access to born-digital government data, a variety of libraries, consortia, federal agencies, and others have stepped up to make progress in finding a solution to this large and complex issue of preservation of digital federal government data. The initiatives discussed below have worked on this issue in a variety of ways. Some have sought to archive individual webpages while others have focused on content specific to GPO’s online systems. Still others are seeking to raise awareness and advocacy towards the issue which is a step forward in finding more interested parties and resources to assist with this growing problem. The efforts are ordered chronologically from their inception to show the progress that has been made and built upon throughout the last decade.


One partnership that has been active for many years is the LOCKSS-USDOCS network, a privately-run network also known as the Digital Federal Depository Library Program. This network began working on replication of digital federal content in 2010, building on the work done by libraries already using LOCKSS to preserve documents harvested from GPO Access between 1991 and 2007 (Jacobs and Reich 2010).

LOCKSS stands for “lots of copies keep stuff safe” and this is the main principle on which this Stanford-developed preservation network leans. The LOCKSS-USDOCS private network consists of 36 libraries, universities, the Library of Congress, and GPO, each of which host copies of ingested digital content from GPO’s FDsys (LOCKSS n.d.).

A major advantage to this model is that access is always guaranteed to member libraries in the case of a government shutdown during which GPO’s servers may be offline; the disappearance of digital content from government websites; or tampering with authenticity of a document (Jacobs and Reich 2010). However, the downside is that access is limited to those libraries that are LOCKSS members in the USDocs private network; it is free to be a member of LOCKSS-USDOCS if a library is already a member of the larger LOCKSS community, which requires a fee based on institution size and affords access to a variety of preserved content on top of government documents (LOCKSS n.d.). However, there is an option to join LOCKSS-USDOCS without being a member of the LOCKSS group, and this entails a smaller fee to cover support costs (LOCKSS Program n.d.). Although this initiative is admirable in working to mirror GPO’s FDsys content for future public access, it has not been able to address born-digital content on agency websites that has not been funnelled through the FDLP, and it is only available to those who are members.

End of Term Projects

There are a variety of initiatives underway to preserve government information and data for future public access, with the End of Term Project being one of the first. This collaboration among the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, GPO, and the public nominated 11,382 web pages from executive, judicial, and legislative government sites to “harvest and preserve” around the conclusion of the presidential administration ending on January 20, 2017 (Federal Depository Library Program 2016; University of North Texas 2016). It actually took several months to capture the data.

Previous end of term web collections are available at the End of Term Web Archive with collections for 2009 and 2013. The information collected is meant to be an archive of web pages and has expanded to include some of the underlying datasets (End of Term Web Archive 2016). However, these datasets have not been comprehensively captured in the past. The disadvantage to this project is that it cannot capture dynamic data from backend databases since they are not directly exposed on the internet. To access that data, typically a web form is filled and submitted and the server uses those parameters to query a database. The results are displayed on another web page. Downloading the whole underlying database is difficult or impossible. To programmatically access the data through the web requires someone to analyze the website’s database interface and program a tool to methodically query for data from that database.

Preservation of Electronic Government Information (PEGI) Project

The Preservation of Electronic Government Information (PEGI) Project is a new venture into digital government preservation which brings together information professionals from universities and federal agencies, including GPO and the National Archives and Records Administration. They recommend an environmental scan and a registry of what data are being collected and preserved (Halbert 2016). In addition, the PEGI Project will focus on stakeholder outreach and awareness through national forums to be held in 2018 and funded by an IMLS National Leadership Grant (Preservation of Electronic Government Information Project n.d.). Overall, the goal is to educate and take measure of the overall status of government data preservation and then make recommendations for future work in this area.

DataRefuge Project by University of Pennsylvania Program in the Environmental Humanities (PPEH DataRefuge Project)

The DataRefuge project was a collaboration between the PPEH, Penn Libraries, the Environmental Data & Governance Initiative (EDGI), and other groups concerned with climate change data that in 2016 launched what they called DataRescue events. These events consist of downloading and archiving at-risk data into a refuge, or repository, mirrored in multiple locations (PPEH Lab-DataRefuge 2016). The collaboration focused on addressing five concerns about federal climate and environmental data:

  • What are the best ways to safeguard data?
  • How do federal agencies play crucial roles in data collection, management, and distribution?
  • How do government priorities impact data’s accessibility?
  • Which projects and research fields depend on federal data?
  • Which datasets are of value to research and local communities, and why?

On top of these concerns, one of the most difficult and tedious issues in an initiative like this is keeping track of the chain of custody of the data–namely, “where the data comes from originally, who copied them and how, and then who and how they are re-distributed” (Allen 2017). This would not be an issue for content that is preserved and made accessible by the originating authority, but for any harvesting done by an external actor it is an important concern. DataRefuge points to this as the “cornerstone” of the entire initiative. Documenting how data were archived preserves integrity, ensures verifiability, and contributes to the overall usability of the data in the future. Trying to predict how data may be used in the future and by whom is tricky at best, but consulting with those who already use data can help those preserving this data to do an optimal job of creating metadata for future users.

The collaborators kept these larger issues in mind when they created the event workflow and social network to help facilitate DataRescue events to nominate and download datasets for preservation. Nominated websites were copied into the Internet Archive–a non-profit digital library with more than 15 petabytes of information (Internet Archive n.d.). After more than 200 terabytes of government website data were downloaded, the next step was to create metadata records and provide access to the data in the DataRefuge repository (Environmental Data & Governance Initiative 2017). The focus of the DataRefuge project has also expanded to suggesting other possible event ideas to help communities build data durability (PPEH Lab- DataRefuge 2017d).

PPEH, EDGI, and their partners made an attempt to crowd source identification and collection of government information. They assisted in the collection of large amounts of information by creating and modifying data collection workflows. Collected webpage data are available from using keyword searches. Additional descriptive metadata and indexing are being created to facilitate access when browsing by government agencies (Bailey 2017).

After the large wave of DataRescue events swept the country, the momentum continued with a meeting of representatives from universities, research libraries, federal agencies, repositories, academic consortia, non-profits, and other stakeholders in Washington, D.C., in May 2017 to discuss the next steps towards a larger scale approach to this initiative. This meeting of the nascent Libraries+ Network culminated in a report summarizing the events of the meeting, but unfortunately, no significant progress has been made by the group, with the most recent activity a blog post from June 2017 (Libraries+ Network 2017).

DataRescue Events

While there was concern for data preservation prior to the Trump administration coming into power, data loss occurrences immediately after the inauguration prompted organizations such as the Humane Society of the United States (HSUS) to act. For example, the Washington Post reported on February 3, 2017, that animal welfare data had been removed from the U.S. Department of Agriculture (USDA) websites, which included inspection reports on the treatment of animals at labs, zoos, breeding operations, and other sites (Brulliard 2017). In response, the HSUS notified the USDA that it would initiate legal action (Wadman 2017). The USDA responded that the website change was not the result of the new presidential administration, but just an ongoing website review. On August 18, the Public Search Tool for Animal Welfare Act compliance records was reinstated to the USDA website (USDA APHIS n.d.). Some organizations reacted to the threat of removed data by hosting “DataRescue” events (Harmon 2017).

Beginning in Toronto in late December 2016, and followed by Philadelphia, Chicago, Indianapolis, Los Angeles, and Ann Arbor in January 2017, the DataRescue movement continued across the country with many universities sponsoring and organizing events. The number of public events peaked in March 2017.

The number of DataRecuse events per month, December 2016-June 2017 (graphic created by the authors)

More than 50 events were publicized on the PPEH website, and nearly 200 newspaper and magazine articles were published related to DataRefuge, with 173 listed as of September 18, 2017 (PPEH Lab- DataRefuge 2017b, 2017c). The events were spread across the United States with a concentration around the coasts and the Midwest (PPEH Lab- DataRefuge 2017a).

A map of DataRecuse locations (graphic created by the authors)

After downloading hundreds of datasets and thousands of webpages, the work continued with cataloging the rescued data. Metadata needed to be created to make the data findable. PPEH DataRefuge organized a workflow for making data uploaded to the DataRefuge repository accessible (PPEH Lab- DataRefuge 2017e).

DataRescue Limitations and Challenges

It is difficult to determine the success of the DataRescue movement, and indicators of success may lie in the organizers’ goals for each DataRescue event. Many event organizers reported the number of websites they archived or gigabytes of data downloaded (Torres-Saez 2017), but much of the data downloaded is still available on government websites.

The motivations for government website changes are difficult to ascertain, and this makes the issue more complicated because there is no one root cause for disappearing data. Broken links and modified text can happen without malice or intent as institutions routinely re-organize their websites for a variety of reasons. However, each administration has policies that agencies are expected to follow. One summer intern on a communications team at the EPA was asked to draft tweets they could post to promote environmentally friendly lifestyles. She was also instructed, “Don’t mention climate change or going ‘green.’” Links to EPA’s information on climate change had become dead and she wondered, while policies may change, “could science, based on years of research, really become outdated, too?” (Miller 2018). No matter what the cause of data disappearance, the need still remains to make preservation copies of data so that they can be accessed by future researchers.

Additionally, it will take time for downloaded data and websites to be indexed and made fully available. There were terabytes of data downloaded during the events, and it is not clear if all of that data made it through the curation phase to be placed in the DataRefuge repository. Additionally, deficiencies in the data because of how the data were collected may not be obvious until later.

Another challenge is that by the time the DataRescue movement was in full swing, it was already too late and critical data may already have gone missing without anyone realizing it. Not being able to get to the data at the source (i.e., federal agencies) causes this problem and is an inherent issue in the DataRescue workflow.

The most glaring downside of the DataRefuge initiative and DataRescue events was the questionability of the accomplishment of long term preservation of government data. The main goal of DataRescue was to save government data for the future if it ever disappears. However, viewing the datasets indexed and archived in the DataRefuge repository through a lens of data curation for reuse and long term usability finds the metadata and documentation generally lacking for preservation purposes. Much of this can be ascribed to the workflow itself, which emphasized chain of custody and authenticity of the data over usability of the data and the ability to understand the context of the data if the website where it resided is gone in the future. The goal of the initiative was to save the data, and the curated datasets can indeed be found in the DataRefuge repository, but the larger picture of long-term use was not addressed. Part of this issue is relying upon community-driven work completed by individuals who do not have the expertise of data curation and understanding of archival practices.

Public outcry around this issue occurred because of a fear and concern for the disappearance of government data, and this is why events spurred such interest and coverage from the media. DataRescue attendees had a feeling of productiveness and accomplishment in contributing to this cause and felt satisfaction with the initiative taken up by DataRescue organizers and groups. Even though attendees of these events were fulfilled at the idea of contributing to this lofty goal, it is akin to the feeling of contributing to a charity for the sake of making oneself feel good and then ignoring the fact that much more needs to be done to solve the problem.

Furthermore, only a fraction of government data was harvested. EDGI reported 63,076 pages were seeded during DataRescue events to the Internet Archive using their Chrome extension and workflow, with 21,798 of these pages containing datasets or other dynamic content. While this is positive at a surface level, over 194 million unique URLs were captured for the EOT 2012 through human-contributed URLs and web crawlers that automatically snapshot the pages (Phillips and Tarver 2016). It would be nearly impossible for humans to go through every agency webpage looking for dynamic content or datasets that need to be specially harvested for preservation.

Lastly, this initiative has come to a standstill after its wave of initial interest and involvement from stakeholders, leaving no overall systemic plan to continue these efforts into the future. Frankly, federal agencies are not currently empowered by the current presidential administration to work on this issue with well-meaning academics and non-profits.

Positive Outcomes of DataRescue

Despite the lack of meaningful preservation efforts under the DataRefuge banner, the opportunity for a library to showcase its services and its ideals was valuable. Furthermore, depending upon participants’ roles in the event, each person likely had different takeaways and appreciations for the goals of DataRescue. For example, those nominating pages to the Internet Archive or End of Term Project could see the vast amount of government information and the many different types of information the government publishes, which gave librarians working with government publications collections an opportunity to speak about their role in government and civic literacy, preservation of historical print government content, and to promote research and reference services. Many libraries that hosted these events were also able to get positive press coverage about these efforts and highlight their roles in civic and community engagement. This is especially important to those that are land-grant universities with their “democratic mandate for openness, accessibility, and service to people” (Association of Public and Land-Grant Universities 2012). Highlights in the local newspaper, student newspaper, local radio, and even national print and radio media were widespread during the height of the DataRescue events early in 2017. While these events did not precipitate a deep shift in government data preservation, they are a step in the right direction to raise awareness of the issue.

In addition, DataRescue events gave university libraries a reason to collaborate with other campus groups and external partners. This most often included a digital humanities group, political science and government departments, environmental advocacy groups, and open data and hacker groups, among others. A few events included companion events such as speakers, workshops, discussions, and teach-ins. For example, at the two-day DataRescueDC event at Georgetown University, one day focused on a round table discussion about open data and data vulnerability as well as a teach-in focusing on the importance of climate data while the second was spent on data archiving. Both talks featured speakers and guests from around campus as well as the outside community (DataRescueDC 2017).

The additional context around government transparency was another positive outcome of these events. Encouraging civil engagement and literacy in light of what the federal government can do with access to information was also important. At NYU’s DataRescue event in February 2017, Jerome Whitington from NYU showed changes to the EPA website which removed the text “and mitigate[d] the effects of climate change” from a section on waste water energy use (Phiffer 2017). Starting these sorts of discussions and highlighting the overall issue of access to government information were the successes of these events, even if long-term preservation was not achieved with this movement.


Regardless of the limitations of the DataRefuge project and DataRescue events, this initiative brought widespread awareness to this complex and longstanding issue. The need to create a historical record of government websites and protect access to data between and during every presidential administration is still important, as this data will continue to disappear through a variety of means including government shutdowns, broken links, and policy changes. As organizations gain more experience and funding, the processes for pre-emptive collection and metadata creation are becoming more sophisticated; in the future this will hopefully pay off, although partnering with federal agencies would be the most ideal step forward in getting ahead of the problem.

There is a need for more institutional support through organized, well-funded programs and tasking the GPO with perpetual archiving and access to all public government data and websites. With this in mind, there is also a need for advocating for adequate funding for GPO to do this work. While there is no one answer to this overwhelmingly complex issue, it is clear that libraries in particular view this issue as an opportunity to contribute their expertise and resources in making forward progress towards solving this problem. Although there are many private, non-profit organizations that can help with the issue of disappearing or inaccessible government data and information, many libraries and librarians are uniquely positioned as publicly-funded and public-focused centers that have vested interest in these efforts because of their overall mission to provide universal access to information. Even though the DataRefuge movement did not precipitate a long-term solution or continue its momentum, there is no way to succeed in addressing this problem without making some attempts and seeing what happens. Even if some of those attempts are less than successful, sometimes it is the missteps that pave the way for more successful endeavors.


Many thanks for the insight and feedback from reviewers Shari Laster and Denisse Solis, and publishing editor Amy Koester.


Allen, Laurie. 2017. “Data Refuge Rests on a Clear Chain of Custody.” PPEH Lab. February 1, 2017.

American Library Association. 2006. “Library Bill of Rights.” Text. Advocacy, Legislation & Issues. June 30, 2006.

———. n.d. “Social Role of the Library | Libraries Matter.”

Association of Public and Land-Grant Universities. 2012. “The Land-Grant Tradition.”

Bailey, Jefferson. 2017. “2016 End of Term Web Archive,” November 30, 2017.

Brulliard, Karin. 2017. “USDA Abruptly Purges Animal Welfare Information from Its Website.” Washington Post, February 3, 2017, sec. Animalia.

Dahlen, Ashley. 2017. Time Machine for Federal Info – Using Web Archive Content… WebEx streaming video.…&dseq=18332&dtseq=108454&emdisc=2&mkey=public1172&vbDTA=0&viNA=0&vsDTA=&PAN=2&bDTC=0&topictype=standard+default+linear&

DataRescueDC. 2017. “DataRescueDC.” DataRescueDC. 2017.

Davenport, Coral. 2016. “Donald Trump Could Put Climate Change on Course for ‘Danger Zone.’” New York Times, November 10, 2016, sec. Politics.

Dennis, Brady. 2016. “Scientists Are Frantically Copying U.S. Climate Data, Fearing It Might Vanish under Trump.” Washington Post, December 13, 2016, sec. Energy and Environment.

End of Term Web Archive. 2016. “U.S. Government Websites.” End of Term Web Archive: U.S. Government Websites. 2016.

Environmental Data & Governance Initiative. 2017. “Archiving Data.” EDGI (blog). May 2017.

Federal Depository Library Program. 2012. “Federal Depository Library Handbook.”

———. 2016. “End of Term Presidential Harvest 2016.” September 1, 2016.

Government Printing Office Electronic Information Access Enhancement Act of 1993. “Public Law 103-40, 107 Stat 112-114.” June 8, 1993.

Halbert, Martin. 2016. “Digital Preservation of Federal Information Summit: Reflections.” Report. Digital Preservation of Federal Information Summit, April 3-4, 2016, San Antonio, Texas. April 2016.

Harmon, Amy. 2017. “Activists Rush to Save Government Science Data — If They Can Find It.” New York Times, March 6, 2017, sec. Science.

Internet Archive. n.d. “Internet Archive: Digital Library of Free Books, Movies, Music & Wayback Machine.”

Jacobs, James R., and Victoria Reich. 2010. “Preservation for All: LOCKSS-USDOCS and Our Digital Future.” Documents to the People (DttP) 38 (3).

Libraries+ Network. 2017. “May Meeting — Libraries+ Network.” May 8, 2017.

LOCKSS. n.d. “Digital Federal Depository Library Program | LOCKSS.”

———. n.d. “How to Join | LOCKSS.”

LOCKSS Program. n.d. “The LOCKSS Program, Digital Federal Depository Library Program, Frequently Asked Questions.”

McGarr, Sheila M. 2000. “Snapshots of the FDLP (August 2000).” August 2000.

Medsger, Melanie, and Ben Webb. 2016. “Poster Presentation: Wanted – Fugitives, Lostdocs, and Document Discovery at GPO.” In Depository Library Council. Arlington, VA.

Miller, Katie. 2018. “Perspective | As an EPA Intern, I Was Barred from Mentioning Climate Change.” Washington Post, January 3, 2018, sec. Outlook Perspective Perspective Discussion of news topics with a point of view, including narratives by individuals regarding their own experiences.

Office of Management and Budget, The Executive Office of the President. 2012. “Digital Government: Building a 21st Century Platform to Better Serve the American People.”

Petersen, R. Eric. 2017. “Government Printing, Publications, and Digital Information Management: Issues and Challenges.” 7–5700, R45014. Congressional Research Service. Congressional Research Service.

Phiffer, Dan. 2017. “Grabbing Government Data Before It’s Destroyed.” February 14, 2017.

Phillips, Mark, and Hannah Tarver. 2016. “End of Term Publications Metadata Guide.” 2016.

PPEH Lab- DataRefuge. 2016. “DataRefuge.” PPEH Lab. November 2016.

———. 2017a. “Data Rescue Events.” 2017.

———. 2017b. “DataRefugePress.” PPEH Lab. 2017.

———. 2017c. “DataRescue Events.” PPEH Lab. 2017.

———. 2017d. “DataRescue Workflow.” PPEH Lab. 2017.

———. 2017e. “Clean Up Datarefuge.Org.” PPEH Lab. 2017.

Preservation of Electronic Government Information Project. n.d. “Objectives.” PEGI Project.

Torres-Saez, Joan. 2017. “Data Rescue Boulder Saved Actual Science Data, Lots and Lots of Data.” March 9, 2017.

United Nations. n.d. “Universal Declaration of Human Rights | United Nations.”

University of North Texas. 2016. “Nomination Tool: Project URL Nomination.” 2016.

U.S. Government Printing Office. 1996. “Study to Identify Measures Necessary for a Successful Transition to a More Electronic Federal Depository Library Program.” June 1996.

———. 2011. “Keeping America Informed.” U.S. Government Printing Office.

U.S. Government Publishing Office, Office of the Superintendent of Documents. 2017. “Public Policy Statement 2016-1: Scope of Government Information Products Included in the Cataloging and Indexing Program and Disseminated Through the Federal Depository Library Program.”

———. 2017. “GPO’s Sysem of Online Access: Collection Development Plan.”

USDA APHIS. n.d. “USDA APHIS | AWA Inspection and Annual Reports.”

Wadman, Meredith. 2017. “Updated: USDA Responds to Outcry over Removal of Animal Welfare Documents, Lawsuit Threats.” Science | AAAS. February 6, 2017.

Open Knowledge Foundation: Open Budget Survey 2017: global comparison of budget transparency comes at a critical time

planet code4lib - Tue, 2018-02-06 08:00

On 30 January 2018 the International Budget Partnership (IBP) has published the Open Budget Survey 2017 with an interactive Data Explorer developed for the IBP by Open Knowledge International and updated for the 2017 survey.

Launched in 2006, the Open Budget Survey (OBS) is the world’s only independent, comparative assessment of the three pillars of public budget accountability: transparency, oversight and public participation. The sixth round of this biennial assessment in 2017 evaluated 115 countries across six continents. The coverage of the survey was expanded to include 13 countries for the first time, including some advanced economies such as Japan and Australia, emerging economies such as Côte d’Ivoire and Paraguay, and fragile states such as Somalia and South Sudan.

Open Budget Survey Data Explorer, map view

The results of the survey show that many governments around the world are making less information available about how they raise and spend public money. After 10 years of steady progress by countries, the 2017 survey shows a modest decline in average global budget transparency scores, from 45 in 2015 to 43 in 2017 for the 102 countries that were surveyed in both rounds (scores are out of a possible 100). This is in stark contrast to the average increase of roughly two points documented among comparable countries in each round of the survey between 2008 and 2015. The reversal of transparency gains is particularly discouraging given roughly three-quarters of the countries assessed do not publish sufficient budget information (a score of 61 or higher), seriously undermining the ability of citizens worldwide to hold their government accountable for using public funds efficiently and effectively.

The Data Explorer, built by Open Knowledge International in 2006 and updated for this weeks release, allows users to visualise the data from current and previous surveys in a number of different ways. A map view shows the changing geography of openness over the six surveys, while a timeline  shows the movements of individual countries over the same period. A more detailed page of rankings shows graphically how each country’s score is calculated. A datasheet for each country presents the full data, letting the user see how it has performed on each test in every survey. Users can also generate custom reports, or download the entire dataset. Another useful feature allows users to see how a country’s score might change for the next survey. You can click to decide what changes to make to your chosen country’s budget systems, and the change that would result to its openness score is shown.

Open Budget Survey Data Explorer, timeline view

The Open Budget Survey 2017 could not come at a more critical juncture as we look to reinvigorate democratic practice, re-engage the disaffected, and restore public trust in public institutions. Around the world, there has been a decline in public trust in government, in part due to instances of corruption but also because of dramatic increases in inequality. In a number of countries, leaders who have disguised their intolerant and reactionary agendas with populist rhetoric have been swept into power by those who’ve been left behind. These political shifts have driven out many government champions of transparency and accountability — especially those from countries in the global south. More broadly across countries, there has been shrinking of civic space, rollbacks of media freedoms, and a crackdown on those who seek to hold government to account, including individual activists, civil society organizations, and journalists.

Because open and accountable public budgeting is at the center of democratic practice and equity, it is the first place we should look for ways to strengthen the interaction between governments and citizens. Ensuring that the budgeting process is characterized by high levels of transparency, appropriate checks and balances, and opportunities for public participation is key to stemming the decline in confidence in government and representative democracy.

In the face of the spread of profound threats to active, informed public participation, and thus the ability of citizens to ensure their governments will pursue policies that improve their lives, the Open Budget Survey 2017 provides essential data on the state of budget transparency and accountability around the world. The survey answers two fundamental questions to assess whether national governments produce and dissemination key budget documents to the public in a timely, comprehensive and accessible manner:

  1. Are the basic conditions needed for representative democracy to function — the free flow of information and opportunities for public participation in government decision making and oversight — being met in the budget sphere?
  2. Are empowered oversight institutions in place that can ensure adequate checks and balances?

In addition, the 2017 survey includes a newly enhanced evaluation of whether governments are providing formal opportunities for citizens and their organizations to participate in budget decisions and oversight, as well as emerging models for public engagement from a number of country innovators. It also examines the role and effectiveness of legislatures and supreme audit institutions in the budget process.

Open Knowledge International helps governments to meet the goal of budget transparency by providing OpenSpending, a project to unlock public fiscal data. OpenSpending offers a platform and a fiscal data standard that makes publishing and visualising budget data easy and efficient. In collaboration with the Global Initiative for Fiscal Transparency (GIFT), OKI supports several federal ministries of finances around the globe in using the OpenSpending tools and the Fiscal Data Package, a light-weighted fiscal data schema built on the Frictionless Data Specifications that allows the data to be visualised and analysed.

The full report of the Open Budget Survey is available from this page: the data visualisations can be viewed through the Data Explorer site.

DuraSpace News: Instructors Announced for Fedora Camp at NASA

planet code4lib - Tue, 2018-02-06 00:00
From David Wilcox, Fedora Product Manager  

Code4Lib Journal: Editorial: Musing on learning to be a selfish librarian

planet code4lib - Mon, 2018-02-05 20:44
One of the perks of being the coordinating editor is you get to write the opening editorial for the issue.  It’s an opportunity to think broadly about the community, the journal…current events.  And if you look back over the past year or so, those that have taken on this role have been more than up […]

Code4Lib Journal: Approaching the largest ‘API’: extracting information from the Internet with Python

planet code4lib - Mon, 2018-02-05 20:44
This article explores the need for libraries to algorithmically access and manipulate the world’s largest API: the Internet. The billions of pages on the ‘Internet API’ (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

Code4Lib Journal: Using R and the Tidyverse to Generate Library Usage Reports

planet code4lib - Mon, 2018-02-05 20:44
Gathering, analyzing, and communicating library usage data provides a foundation for thoughtful assessment. However, the amount of time and expertise required creates a barrier to actually using this data. By using the statistical programming language R and the tools and approach of the Tidyverse, the process of gathering, analyzing, and communicating data can be automated in ways that reduce the amount of time and energy required. At the same time, this approach increases staff capacity for other data science projects and creates a shareable model and framework for other libraries. This article focuses on electronic resource usage reports - especially Counter DB1 Reports - but this approach could be extended to other data sources and needs.

Code4Lib Journal: Archidora: Integrating Archivematica and Islandora

planet code4lib - Mon, 2018-02-05 20:44
“Archidora” is shorthand for the publicly available integration between the open source software packages Archivematica and Islandora. Sponsored by the University of Saskatchewan Library, this integration enables the automated ingest into Archivematica of objects created in Islandora. This will allow institutions that use Islandora as a digital asset management system, particularly for digitized material, to take advantage of Archivematica’s standards-based digital preservation functionality, without requiring staff doing digitization to interact with Archivematica. This paper outlines the basic functionality and workflow of archidora; provides an overview of the development process including challenges and lessons learned; and discusses related initiatives and possible future directions for development.

Code4Lib Journal: Microdata in the IR: A Low-Barrier Approach to Enhancing Discovery of Institutional Repository Materials in Google

planet code4lib - Mon, 2018-02-05 20:44
Georgetown University Library curates a multitude of open access resources in its institutional repository and digital collections portal, DigitalGeorgetown. Over the last several years, the Library has experimented with methods for making these items increasingly visible in search engine search results. This article describes the Library’s low-barrier approach to applying vocabulary to its DSpace institutional repository using microdata, as well as the challenges with and strategies used for assessing this work. The effects of the application of microdata to DigitalGeorgetown on Google search results were tracked over time using three different metrics, providing new insights about its impact.

Code4Lib Journal: Getting Real in the Library: A Case Study at the University of Florida

planet code4lib - Mon, 2018-02-05 20:44
In the fall of 2014, the University of Florida (UF) Marston Science Library, in partnership with UF IT, opened a new computer lab for students to learn and develop mobile applications. The Mobile Application Development Environment (MADE@UF) features both software and circulating technology for students to use in an unstructured and minimally-staffed environment. As the technological landscape has shifted in the past few years, virtual and augmented reality have become more prominent and prevalent, signaled by companies like Facebook, Google, and Microsoft making significant financial investments in these technologies. During this evolution, MADE@UF has migrated to focus more on virtual and augmented reality, and we will discuss the opportunities and challenges that hosting and managing such a space has provided to the science library and its staff.

Code4Lib Journal: Accio e-Libri: Magically Delivering Digital Resources to Patrons Using NFC Technology

planet code4lib - Mon, 2018-02-05 20:44
To coincide with the 20th anniversary of the publication of Harry Potter and the Philosopher’s/Sorcerer’s Stone, our library created a Happee Birthdae Harry display incorporating Near Field Communication (NFC) technology alongside print materials in order to magically place electronic resources in our users’ hands. The display was a spellbinding success, increasing usage of both print and electronic items, and helping our students become familiar with this innovative technology in an engaging manner. This article will provide step-by-step instructions on the materials and procedures librarians need to implement NFC technology in their own libraries, and will discuss the challenges and opportunities associated with this rapidly spreading technology.


Subscribe to code4lib aggregator