You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 11 hours 11 min ago

District Dispatch: Connecting with your members of Congress

Tue, 2017-08-01 18:46

Guest post by: Eileen M. Palmer, NJLA Public Policy Committee Chairs (July 2016-June 2017)

We’ve all heard it before but it is nonetheless true: effective advocacy is about building relationships. Building strong relationships is more than the occasional call to an elected official’s office requesting support for a bill or funding. Learning who your officials are and understanding their interests and concerns is at the heart of building that relationship and should be ongoing.

Members of the NJ delegation at National Library Legislative Day 2017 with Congressman Leonard Lance (NJ-7).

The New Jersey Library Association (NJLA) has worked to develop strong relationships with our congressional delegation through training for advocates provided by our Public Policy Committee, during our annual NJ Library Advocacy Week and at ALA’s National Library Legislative Day. And over the last several months we’ve seen the benefits of relationship building in our work supporting the ALA Washington Office’s advocacy efforts for federal funding.

As 2017 began we learned that the House Committee on Appropriations would be chaired by a representative from New Jersey. Rodney Frelinghuysen represents the 11th district, one rich with libraries and passionate library advocates, from library staff to trustees to mayors. When ALA reached out to us we were ready, willing and able to get to work taking our message to Rep. Frelinghuysen and his staff. Our NJLA Public Policy Committee was the key link in communications between ALA, NJLA and selected advocates from the 11th district. By working together, we were able to develop and execute a plan that has been successful on several fronts. Our plan included:

  • Repeatedly requesting a meeting with the congressman. Though we were unsuccessful in securing a face to face meeting, these communications were critical opportunities to convey our messages on both library funding and access to Congressional Research Service (CRS) reports, an issue also included in the appropriations legislation.
  • Making sure all local advocates in the 11th district knew our issues and the need to make their own contacts with Rep. Frelinghuysen’s local office. Parsippany Library Director Jayne Beline has had a longstanding relationship with the Congressman and his office, which was invaluable in communicating our message when he was in her library. Building relationships also includes making sure your local congressional office knows if your library has a meeting room they can use for events!
  • Working with local stakeholders – trustees, local officials and even patrons – to convey our message about how federal library funding impacts local library patrons. This message is so much more powerful when delivered locally with local examples.

ALA chapters play an indispensable role in ALA’s advocacy efforts. Coordinating our chapter efforts with the ALA Washington Office has amplified our message and assured each member of our NJ congressional delegation knows, not just how much money we are requesting but, even more importantly, how those funds impact their constituents.

At this point in the legislative process we have reached a significant milestone. The House Committee on Appropriations has passed a bill that holds IMLS, LSTA and IAL funding at current levels and includes a provision to make CRS reports available to all. But we are not close to being done. To move forward, we must work with the Senate to support similar funding as their process begins in earnest this fall. I encourage all chapters to take an active role in working with ALA on these issues. Here are some specific ways to do that:

  • Get friends from inside and outside the library world to sign up for alerts and to act. The ALA Action Center or your local Chapter Action Center makes this very easy.
  • Offer your library for a town hall, tour, summer reading or other program visit by members and/or their staff.
  • Write a brief, personal letter-to-the-editor about the issues we care about. ALA has resources to help you.
  • Ask to meet with your representative and senator (or their staff) over the summer. Don’t be discouraged if you are turned down. Use the opportunity to convey your concern about library funding. Also, ask to be included on the invitation list for any telephone town halls.

Each of these activities can help to build the lasting relationships we need to effectively tell our story to every member of Congress. We’ve seen a very positive impact in New Jersey, not only with the optimistic budget outlook, but also in the further development of our relationship with our legislators and their staff. The benefits of advocacy are well worth the effort of all of us.

The post Connecting with your members of Congress appeared first on District Dispatch.

David Rosenthal: Disk media market update

Tue, 2017-08-01 15:34
Its time for an update on the disk media market., based on reporting from The Register's Chris Mellor here and here and here.

WD hard disk shipmentsWD's hard disk shipments were basically flat in unit terms but sharply up in exabytes:
Financial statements revealed 39.3 million disk drives were shipped, slightly down on the 40.1 million a year ago. But that's 81.2 disk exabytes shipped, much more than the year-ago total of 66.1. The average selling price per drive stayed the same at $63.

A look at the disk segment splits shows the long-term slump in disk drive sales as flash takes over in PCs, notebooks and the high-performance enterprise drive areas.Note the graph showing a kickup in "Consumer electronics". This may represent more large customers deciding that cheaper consumer drives are "good enough" for bulk storage. use.

Seagate hard disk shipments were flat in exabyte terms, meaning a decline in unit terms:
In terms of exabytes shipped – mostly in disk drives – Seagate said enterprise mission-critical exabyte shipments were flat year-on-year, and there was 4.5 per cent growth on the previous quarter.

Nearline high capacity enterprise capacity shipped declined 14 per cent, while PC exabyte shipments were up 14.3 per cent year-over-year. Non-compute exabyte ships were down quarter-on-quarter.The reason is that Seagate was late delivering 10TB helium drives, a favorite in the bulk storage market:
Luczo has Seagate focused on bulk storage of enterprise data on high-capacity disk drives, yet shipments of such drives fell in the quarter as Seagate missed a switchover to 10TB helium-filled drives. Stifel analyst and MD Aaron Rakers sees Western Digital having an 80 per cent ship share in this market.This failure, and even more Seagate's failure in the flash market, had a big impact on Segate's revenues and their position against WD:
WD vs. Seagate revenueThe right-hand side of the chart shows the $2.4bn gap in revenues that is the result of Seagate boss Steve Luczo's failure to break into the flash drive business and being late to helium-filled disk drives. Seagate is now a shrinking business while WD is growing.Seagate's response has been to kick Luzco upstairs:
Steve Luczo will go upstairs to become executive chairman of the board on October 1, with president and chief operating officer Dave Mosley taking over the CEO spot and getting a board slot.He's supposed to focus on the long term, but this doesn't seem to be his forte. Mellor writes:
Seagate says Luczo will focus on longer-term shareholder value creation, whatever that means. As he’s so far avoided Seagate getting any more involved in the NAND business than if it were playing Trivial Pursuit, we don’t have high hopes for moves in that direction.Seagate's poor performance poses a real problem for the IT industry, similar to problems it has faced in other two-vendor areas, such as AMD's historically poor performance against Intel, and ATI's historically poor performance against Nvidia. The record shows that big customers, reluctant to end up with a single viable supplier of critical components, will support the weaker player by strategic purchases of less-competitive product.

HDD shipmentsThe even bigger problem for the IT industry is that flash vendors cannot manufacture enough exabytes to completely displace disk, especially in the bulk storage segment:
NAND capacity shipped in the second quarter, including for phones and other smart devices (some 40 per cent if capacity shipped), and enterprise storage, was about 35 exabytes. The total HDD capacity shipped number was 159.5 exabytes, almost five times larger, with some 58 exabytes constituting nearline/high-capacity enterprise disk drives. So bulk storage could consume nearly twice the entire flash production, leaving none for the higher-value uses such as phones. Note that these numbers, combined with Aaron Rakers' revenue estimates:
Revenues in 2nd QuarterAnnual ChangeQuarter ChangeFlashc$13.2 Bn55%8%Disk Drivesc$5.7 Bn-5.5%-4%imply that Flash averages $0.38/GB where HDD averages $0.036/GB, or ten times cheaper per byte.

HDD revenue & exabytesSo the industry needs disk vendors to stay in business and continue to invest in increasing density, despite falling unit shipments. Because hard disk is a volume manufacturing business, falling unit shipments tend to put economies of scale into reverse, and reduce profit margins significantly.

Kryder's Law implies that capacity shipped will increase faster than revenues. The graph shows capacity shipped increasing while revenues decrease. The IT industry must hope that this trend continues without killing the goose that is laying this golden egg.

LibUX: Writing for the User Experience with Rebecca Blakiston

Tue, 2017-08-01 12:15


Writing for the User Experience is our second ever Library User Experience Community webinar. Ours is a community of designers, developers, librarians, info architecture people, content strategists, marketing folks, accessibility enthusiasts, and others, started by me — Michael Schofield (@schoeyfeld) — and Amanda L. Goodman (@godaisies). We do our best to make content that pushes the conversation around the user experience and libraries, non-profits, and higher-ed forward.

In this session, Rebecca Blakiston — author of books on usability testing and writing with clarity; Library Journal mover and shaker — talks shop and makes the case for content strategy, using active and authentic voice, removing unnecessary words, writing meaningful titles/headings, using parallelism, and more.

A [really good] transcript is being made and will be available to supporters and patreons.

Rebecca is super

Rebecca Blakiston (@blakistonr) is the team lead for Web Design & User Experience at the University of Arizona Libraries. She is the author of two books: Usability Testing: a Practical Guide for Librarians, and Writing Effectively in Print and on the Web: a Practical Guide for Librarians. She’s also the former Chair of the University Libraries Section, Association of College and Research Libraries (ACRL ULS). In 2016, she was named a Library Journal Mover and Shaker.


These events are bootstrapped with your help. Either through donations, old-fashioned street-teaming, talking these events up to the boss – it all goes a long way.

Specific and heartfelt thanks to Lauren Seaton, Stephen Bateman, Emma Boettcher, Emily King, Rum Rubinstein, Kelly Sattler, Anna Stackeljahn, Alyssa Hanson, Amanda Brite, Amy Croft, Tobias Treppmann, Stephanie Van Ness, and Angie Chan-Geiger.

Novare Library Services provides our webinar-space, records, and archives our video. They specializes in IT solutions for libraries and small businesses. In addition to our Library User Experience Community webinars, they’re behind a bunch of other events, too.


Islandora: New Partner: Born-Digital

Tue, 2017-08-01 12:11

The Islandora Foundation is very happy to announce that long time member Born-Digital (also known as Common Media) has become a Partner in the Foundation. A service company that specializes in support for open source digital tools in the cultural preservation field, Born-Digital has been an active member of the Islandora community for several years, with particularly notable contributions in the Dev-Ops Interest Group, the Islandora ISLE project, and a big presence at both of our conferences. As a part of this new Partner membership, the Islandora Foundation also welcomes Noah Smith to our Board of Directors.

District Dispatch: New report explores rural library technology access

Mon, 2017-07-31 19:24

A new report from the Office for Information Technology Policy focuses attention on the capacity of rural public libraries to deploy Internet-enabled computing technologies and other resources to meet the needs of their residents.

Rural Libraries in the United States: Recent Strides, Future Possibilities, and Meeting Community Needs” explores nuances of rurality, details challenges rural libraries face in maximizing their community impacts and describes how existing collaborative regional and statewide efforts help rural libraries and their communities.

Authors Brian Real and Norman Rose combine data from the final Digital Inclusion Survey with Public Libraries Survey data from the Institute of Museum and Library Services to find:

  • Sixty percent of rural libraries have a single location as part of their administrative system, hampering economies of scale.
  • Rural libraries furthest from population centers (“rural remote”) are most likely to be single-outlet entities and lag rural counterparts (“rural distant” and “rural fringe”) in most measures of operational capacity.
  • Rural library broadband capacity falls short of benchmarks set for U.S. home access, which is 25 Mbps download and 4 Mbps upload speeds. By contrast, rural fringe libraries average 13/8.6 Mbps, rural distant is 7.7/2.2 Mbps and rural remote is 6.7/1 Mbps.
  • Overall, one in 10 rural libraries report their internet speeds rarely meet patron needs.
  • Rural libraries are on par with colleagues in larger communities in terms of public wi-fi access and providing patrons’ assistance with basic computer and internet training, but more specialized training and resources can lag.
  • More than half of all rural libraries offer programs that help local residents apply for jobs and use job opportunity resources (e.g., online job listings, resume software), and rural libraries are comparable to their peers in providing work space for mobile workers.
  • Significant proportions of all rural libraries (even the most remote) offer programs and services related to employment, entrepreneurship, education, community engagement and health and wellness.
  • The level of programming and services is particularly noteworthy in light of staffing levels: 4.2 median FTE for rural fringe, 2.0 for rural distant and just 1.3 for rural remote libraries.
  • Rural libraries were the least likely to report renovations had taken place in the past five years; about 15 percent, compared with a national average of 21 percent. The Digital Inclusion Survey noted a relationship between facility updates and services and library program offerings.

Finally, the authors consider the roles of state and regional cooperation in adding capacity and resources for rural libraries, looking at examples from Maryland and Iowa.

One-third of all U.S. public libraries serve areas with populations of 2,500 or fewer people, and this new report provides one of the most detailed looks at their services available to date.

The post New report explores rural library technology access appeared first on District Dispatch.

District Dispatch: Net neutrality, e-rate hot topics again in Washington

Mon, 2017-07-31 14:00

Telecommunications policy has figured prominently in the Washington Office’s work recently. Most visibly, ALA participated actively with scores of other organizations, companies and trade associations in a nationwide “Day of Action” on July 12 to let the Federal Communications Commission know that we strongly oppose its pending anti-net neutrality proposal and filed initial comments (joined by American Association of Law Libraries and COSLA) with the FCC to that effect. Recently, both the House and Senate held committee hearings at which we anticipated ALA priority issues– most notably net neutrality and potential changes in the E-rate program – being prominently discussed, as they were. We worked with key members of Congress serving on these committees to submit questions and background material ahead of the hearings to be placed in their official records. More on these strategic committee meetings follows:

Senate Holds Nominations Hearings for Three FCC Commissioners

The Senate Commerce committee recently held a hearing on three nominees to the FCC who would fill out the current vacancies at the Commission. Two of those tapped, FCC Chairman Ajit Pai and former FCC Commissioner Jessica Rosenworcel, are well known to the Senate and ALA. The third nominee, Brendan Carr, has not previously served as a Commissioner though he has been an attorney at the Commission since 2012. All three nominees are expected to be confirmed by the Senate.

ALA noted with interest the dialogue surrounding E-rate and net neutrality at the hearing. While all three nominees agreed that E-rate continues to be an important conduit for affordable broadband to libraries and schools, Chairman Pai and nominee Carr declined to commit to maintaining its present funding level or to taking a “hands-off” approach to changing E-rate modernization orders just adopted in 2015 and not yet fully implemented. Rosenworcel, a longtime supporter of the E-rate program, noted that “the future belongs to the connected. No matter who you are or where you live in the country, you need access to modern communication for a fair shot at 21st century success.”

Chairman Pai also declined to commit to any firm position on net neutrality as the Commission has only just begun to reviewing the millions of public comments just submitted on his proposal to effectively reverse current law assuring net neutrality and strongly backed by ALA.

FCC Oversight and Reauthorization at House Subcommittee

The three sitting FCC Commissioners – Chairman Ajit Pai, Commissioners Mignon Clyburn and Mike O’Reilly – appeared last week before the House Energy and Commerce Subcommittee on Communications and Technology. They addressed a range of telecommunications issues with net neutrality figuring especially prominently in the hearing. The Commissioners received numerous questions on the issue from both Republicans and Democrats on the Subcommittee. As noted above, ALA continues to oppose any legislation that would reverse the 2015 FCC Open Internet Order.

At the hearing, full Committee Chairman Greg Walden (R-OR) expressed interest in bi-partisan legislation to address net neutrality. Chairman Walden noted that “it’s time for Congress to call a halt on the back-and-forth and set clear net neutrality ground rules for the internet.” There appears, however, to be very little interest among Democratic members in joining the Chairman.

Several senior Subcommittee Democrats criticized the FCC proposed rule to reverse the 2015 Order. Rep. Peter Welch (D-VT) questioned “Why change the existing regime where everyone agrees that there is an open internet?” Rep. Mike Doyle (D-PA) criticized Chairman Pai for proceeding on “an agenda that is anti-consumer, anti-small business, anti-competition, anti-innovation, and anti-opportunity.” Also echoing ALA’s position on net neutrality were Senior Democrats Rep. Frank Pallone (D-NJ) and Rep. Anna Eshoo (D-CA).

Congresswoman Eshoo recently hosted a net neutrality roundtable in her California district. Director of Redwood City Public Library Derek Wolfgram joined the panel to discuss the importance of net neutrality for libraries.

The House Subcommittee also questioned the FCC commissioners on a discussion draft of legislation that would reauthorize the Commission. The Republican draft, not yet introduced, would reauthorize the FCC through 2022 and implement procedural changes at the Commission. The FCC was last reauthorized in 1990.

ALA will continue to work with E-rate and net neutrality supporters in the House and Senate over the coming months. Stay tuned as these issues develop.

The post Net neutrality, e-rate hot topics again in Washington appeared first on District Dispatch.

Hugh Rundle: Identify. Not too much. Mostly collections.

Sun, 2017-07-30 04:17

This week I’ve been using one of the tools I learned about at VALA Tech Camp to clean up membership data as part of a migration we’re doing at MPOW from the aging Amlib system to Koha ILS. Open Refine is a powerful data cleaning tool that can be thought of as Excel with regex. As we’ve been working on getting data migration right, we hit a bit of a snag. In Koha, each discrete part of a member’s address is in a different field:

  • street number
  • street address1
  • street address2
  • city (suburb)
  • state
  • postcode


This is quite typical of most databases storing address data. Amlib, however, is a fairly old system with some data architecture that is ...interesting. For whatever reason, the address data coming out of Amlib is all stored in a single field. Combine that with multiple possibilities for how addresses can appear, and literally twenty years of data entry into a single free text field, and I’m sure you can imagine how consistent the data is. Working out how to split it out in a sane way that minimises work later has been time consuming. Part of the problem is having to consider all the possibly ways an address may have been entered. Even when library staff have followed procedures correctly, the data is still inconsistent - for example some records list a phone number first, then the street address, whereas other include no phone number. Consider all of the following possibilities, which are all ‘correct’:

0401 234 567, 1 Main St, Booksville, 3020

0401234567, 1 Main St, Booksville, 3020

1 Main St, Booksville, 3020

90123456, 1 Main St, Booksville, 3020

9012 3456, 1 Main St, Booksville, 3020

9012-3456, 1 Main St, Booksville, 3020

There are thousands of examples of all these types of records within our 130,000 member records. Initially, it looked like these were the major differences. Urban and suburban Australia tends to have very few street numbers above 999, partially because streets often change their name when they hit a new suburb. I wrote a quick regex query in OpenRefine to find every record where the first four characters didn’t include a space, and created a new column with the part before the first comma for records matching that query. That was fine until I realised that “P.O. Box 123” would appear to be a phone number under this rule, so I adjusted to exclude anything with a space or a full stop. That was the easy bit. Addresses aren’t as simple as you might think:

1 Main St, Booksville, 3020

Unit 1, 10 Main St, Booksville, 3020

1/10 Main St, Booksville, 3020

F1/10 Main St, Booksville, 3020

F1 10 Main St, Booksville, 3020

Unit 1, The Mews, 1 Main St, Booksville, 3020

1 Main St, Booksville, Vic, 3020

Welcome to regex hell. After a bit of trial and error, I eventually split out the ‘number’ from each address. There are some edge cases where the address information somehow ended up with no commas at all or was incomplete, that we will need to clean up manually, but that’s probably about 4000 out of 130,000, which isn’t so bad. I’ll post something on GitHub at some point with some of the formulas I used to clean the data up for import - for when all you Amlib libraries move over to Koha amirite?

A need to know basis

Going through this process has helped me to keep top of mind something that all librarians, indeed everyone working with any database of personal information, needs to constantly question:

What data are we storing about people, and do we need to store it?

For example, public libraries generally record a member’s sex or gender (no distinction is generally made, though usually it’s labelled as ‘sex’ but actually means ‘gender’). Why? Do I need to know the gender of a member in order to provide information, advice or assistance? The only real argument I’ve heard about this is that it assist in finding members in the database when they do not have their membership card, but that seems to be a fairly weak argument for storing a sometimes intensely personal data point that isn’t always readily ascertained, and can change over time. Of course, most public libraries, in Australia at least, aren’t necessarily able to make decisions like this alone. National, state and local government standards about ‘minimum data sets’ define what must or at least should be collected, sometimes seemingly in contradiction of privacy standards. Once we ask this question of whether we need to store certain data at all, however, another one pops up, in some ways just as important.

How are we storing this data about people?”

I don’t mean databases vs index cards here. What was frustrating me about migrating user address data was the the process of normalising it. Koha wants address data to be chopped into discrete data points - street number, street name, city/suburb, etc. Amlib just stores it as one field, so I need to ‘normalise’ the Amlib data to fit into Koha’s database model. These questions of course feed into each other. Why you want the data affects how you record it. How you record it affects how it can be used. In the case of postal addresses this is pretty innocuous. The fact Koha chops it up like this makes it much easier to correctly format postal addresses on library notices, and allows the system to conform to many different postal service standards in terms of whether the street number is listed first, or the state before the postcode or after it, for example.

But normalising, by definition, smooths out inconvenient differences in how information is turned into data. Consider the gender data point - the overwhelming majority of systems (and the Australian national government data standards) allow, at most, three options - male, female, or ‘not known’. O'Reilly's Introduction to SQL book even uses gender as an example of a data point that only has two possibly options. Note that the assumption here is that if someone’s gender is known, then it must be binary - either male or female, so if it is known that you identify as something else it has to be recorded incorrectly. This is why Tim Sherratt cautioned in A Map and Some Pins that even “open data” needs to be viewed critically and its biases interrogated: data must always, to some extent, be closed. Categories have been determined, data has been normalised, decisions made about what is significant and why. There is power embedded in every CSV file, arguments in every API. This is inevitable. There is no neutral position.

There is no neutral position. This is the case whether we are describing people in our user databases or people within the books that line our shelves. Under pressure from librarians and the ALA, the Library of Congress decided in March 2016 to replace the term “Illegal aliens” with the terms “Noncitizens” and “Unauthorized immigration”. In the middle of a nasty Presidential election campaign, this was inevitably controversial.

When we classify items in our collections, we are deciding as much what terms not to use as we are deciding what terms to use. When Sherratt says we are determining categories, he is also pointing out that we have determined what categories are not used, not appropriate, not valid. When we decide what is significant, we also decide what is insignificant. Every act of classification is an act of erasure. Every ‘finding aid’ also helps to hide things.

Never normalise

Discussions about and changes in how people are described in library collections - whether due to their sexuality and gender, their ethnicity, or their religion - are important, but insufficient. The terms we use to classify people within our collections can affect the broader discourse. But it isn’t just in our collections that we classify, catalogue, and define. Every piece of data recorded about library users is a potential landmine. “People who are Jewish”, “People whose gender identity doesn't match their biological sex”, and “People who read books about socialism” are all identities that have been a death sentence at various times and places. As former NSA chief, General Michael Hayden, put it so clearly: “We kill people based on metadata”. If you’re keeping data about users, you need to think about the worst case scenario, and mitigate against it.

Jarrett M Drake, addressing the British Colombia Library Association earlier this year and seeing the danger, had a simple piece of advice: “Never normalize”:

...the rising tide of fascism should offer pause regarding the benefits of normalized data that can easily be piped from system to another. Local languages, taxonomies, and other forms of knowledge that only people within specific communities can decipher might well be a form of resistance in a country where a president not only advocates for a Muslim database but also for “a lot of systems… beyond databases.” In other words, a world of normalized data is a best friend to the surveillance state that deploys the technologies used to further fascist aspirations.

Identities can be exciting, empowering, and comforting. But they can also be stifling, exclusive, or dangerous. An identity can be something you embrace, accept, or create, but it can just as easily be something that is given to or stamped upon you - sometimes literally. Identity is Pride March and St Patrick’s Day, but it’s also the Cronulla riots and Auschwitz tattoos. In libraries, as well as other cultural and memory institutions like archives and museums, we must take care in how we identify objects and people.

In these public institutions there is no neutral position. Every identity is dangerous. Every database is a site of erasure. Every act is a political act.

District Dispatch: What’s NAFTA got to do with it?

Sat, 2017-07-29 16:45

Many may not realize that trade treaties can impact copyright law, by not including exceptions that are important for libraries services, research, user access, and fair use. So, when the U.S. Trade Representative (USTR) asked for comments before negotiations to re-write the North American Free Trade Agreement (NAFTA) get underway, the Library Copyright Alliance (LCA) took the opportunity to provide our perspective in a letter. Our message hasn’t changed—Congress put exceptions in the copyright law for a reason, so trade negotiators, don’t mess around with our copyright law, even when interested parties urge you do so.

“The Trading Post” by Clinton Steeds is licensed under CC BY 2.0

During Trans-Pacific Partnership (TPP) negotiations in 2012, LCA was happy to see that balanced copyright was recognized as desirable element of the treaty by including library exceptions in treaty language including fair use:

“Each Party shall endeavor to achieve an appropriate balance in its copyright and related rights system, inter alia by means of limitations or exceptions that are  consistent with Article 18.65 (Limitations and Exceptions), including those for the  digital environment, giving due consideration to legitimate purposes such as, but not limited to: criticism; comment; news reporting; teaching, scholarship, research, and other similar purposes; and facilitating access to published works for persons who are blind, visually impaired, or otherwise print disabled.” (TPP Article 18.66).

The Library Copyright Alliance recommended to NAFTA negotiators that this same language be included in the treaty. In addition, LCA asked that first sale or “exhaustion” be addressed. This is the U.S. exception that allows librarians to lend books, and more broadly allows consumers with lawfully acquired copies of a work the right to distribute that work without authorization. Without exhaustion, there would be no eBay, no Salvation Army collection centers and no second-hand book stores. If included in the treaty, we would advance first sale policy into the international realm which would be interesting because many countries do not have first sale in their respective copyright laws. Of course, that would be a baby step.

LCA also submitted comments on intermediary safe harbors that ensure libraries will not be held liable for the actions of library users. Additionally, LCA addressed copyright term, the public domain, and DRM (digital rights management).

This is just the beginning of a trade negotiation process that will be hidden from the public—unless parts of the treaty are leaked (which often occurs). Only private sector players can negotiate, so it is extremely important to have library concerns that represent the public interest on record. Once the treaty is approved, it will still have to pass in the Senate by two thirds vote. The Senate’s option will be “take it or leave it” because modifications of the treaty cannot be allowed without going back to the drawing board to seek country approval for any modifications. Because the current administration has made trade a priority, we may see a trade treaty negotiated more quickly than usual. LCA will follow its developments.

The post What’s NAFTA got to do with it? appeared first on District Dispatch.

Tara Robertson: UBC’s Open Dialogues Series: How to make open content accessible

Fri, 2017-07-28 19:23

A couple of months ago I had the pleasure of chatting with the folks from the Centre for Teaching and Learning at UBC about accessibility, universal design for learning and inclusion. I’m really happy with how this video turned out. I love that captioning is now part of their production workflow, and not an afterthought. Yay born accessible content!

I’m also thrilled that the Accessibility Toolkit I co-wrote with Sue Doner and Amanda Coolidge has been remixed by UBC  for their guide on creating accessible resources.

Evergreen ILS: Evergreen 2.11.7 and 2.12.4 released

Fri, 2017-07-28 19:05

The Evergreen community is pleased to announce two maintenance releases of Evergreen: 2.11.7 and 2.12.4.

Evergreen 2.12.4 has the following changes improving on Evergreen 2.12.3:
  • A fix to a web client bug where adding copies through the Add Volumes and Copies menu item could fail silently.
  • A fix to a bug that allowed users to access some web client admin interfaces without a login.
  • A fix to the display of the loan duration and fine level fields in the web client Item Status Detail view.
  • A fix to the display of duplicate data on the bib record View Holds page when toggling between the holds and OPAC view.
  • A fix to a bug that prevented the web client patron registration page from loading.
  • Support for Org Unit Includes alert text, notice text, event text, header text, and footer text in the web client print templates.
  • A fix to make the web client MARC Editor’s flat text editor selection sticky.
  • A fix to make the Patron Search library selector sticky.
  • A fix to a bug in the web client that prevented the user from saving a new copy after using the MARC Edit Add Item option.
  • A fix to a patron registration bug that did not require the entry of a required user statistical category for stat cats that do not allow free-text entries.
  • The addition of the bad bacode image file in the web client.
  • An improvement to the MARC Batch Edit progress indicator to reduce the likelihood of system backlogs.
  • Downloading checkout history as a CSV from My Account has been fixed for users with a large circulation history. Previously, this would time out for patrons with more than 100 or so circulations.
  • A fix to syntax in the Spanish lang.dtd file that was creating an error when using the Closed Date Editor.
  • Improvements to CSS to silence some Mozilla extension warnings.
  • A fix to a failure to update targeted circulations when utilzing the recall functionality.
  • The addition of text wrapping in the copy details table on the bib record to prevent contents from falling off the page.
  • A fix to the adjust to zero option so that it can be applied correctly to multiple billings.
  • A fix to the “Hold/Copy Ratio per Bib and Pickup Library (and Descendants)” data source so that it will now include counts of eligible copies at locations that are not a pickup library for bib’s holds.
  • A fix to the XUL client Item Status ? Alternate View ? Holds / Transit tab so that it properly refreshes all data when switching between copies.

Note that any report templates using the “Hold/Copy Ratio per Bib and Pickup Library (and Descendants)” reporting source will need to be recreated for the change to be effective.

Evergreen 2.11.7 includes the following changes improving on 2.11.6:
  • Improvements to CSS to silence some Mozilla extension warnings.
  • A fix to a failure to update targeted circulations when utilzing the recall functionality.
  • The addition of text wrapping in the copy details table on the bib record to prevent contents from falling off the page.
  • A fix to the adjust to zero option so that it can be applied correctly to multiple billings.
  • A fix to the “Hold/Copy Ratio per Bib and Pickup Library (and Descendants)” data source so that it will now include counts of eligible copies at locations that are not a pickup library for bib’s hold

Please visit the downloads page to view the release notes and retrieve the server software and staff clients.

Brown University Library Digital Technologies Projects: Python 2 => 3

Fri, 2017-07-28 17:58

We’ve recently been migrating our code from Python 2 to Python 3. There is a lot of documentation about the changes, but these are changes we had to make in our code.


First, the print statement had to be changed to the print function:

print 'message'


print('message') Text and bytes

Python 3 change bytes and unicode text handling, so here some changes related to that:

json.dumps required a unicode string, instead of bytes, so




basestring was removed, so

isinstance("", basestring)


isinstance("", str)

This change to explicit unicode and bytes handling affected the way we opened files. In Python 2, we could open and use a binary file, without specifying that it was binary:


In Python 3, we have to specify that it’s a binary file:

open('', 'rb')

Some functions couldn’t handle unicode in Python 2, so in Python 3 we don’t have to encode the unicode as bytes:




Of course, Python 3 reorganized parts of the standard library, so the last line would actually be:

urllib.parse.quote('tëst') Dicts

There were also some changes to Python dicts. The keys() method now returns a view object, so



list(dict.keys()) dict.iteritems()

also became

dict.items() Virtual environments

Python 3 has virtual environments built in, which means we don’t need to install virtualenv anymore. There’s no in Python 3 environments, though, so we switched to using django-dotenv instead.


Some more changes we made include imports:

from base import * => from .base import *

function names:

func.func_name => func.__name__

and exceptions:

exception.message => str(exception) except Exception, e => except Exception as e Optional

Finally, there were optional changes we made. Python 3 uses UTF-8 encoding for source files by default, so we could remove the encoding line from the top of files. Also, the unicode u” prefix is allowed in Python 3, but not necessary.

District Dispatch: Email privacy protection measures introduced in Senate

Fri, 2017-07-28 15:06

Nearly six months ago, the Email Privacy Act (H.R. 387) was approved overwhelmingly in the House. Now, bipartisan legislation just introduced in the Senate goes further. It fully incorporates and significantly expands the protections laid out in H.R. 387 to comprehensively update the 1986 Electronic Communications Privacy Act (ECPA). The “ECPA Modernization Act of 2017” was co-authored by Sens. Mike Lee (R-UT) and Patrick Leahy (D-VT). It will be referred to the Senate Judiciary Committee on which both serve.

ALA has long been a staunch supporter of comprehensive ECPA reform, which has been proposed but failed to pass in the past several Congresses. President James Neal greeted the milestone introduction with this public statement:


“No freedoms are more vital, and important to librarians, than those of inquiry and speech. Without real privacy, Americans effectively have neither. Current law that allows our government to get and view the full content of our most private electronic communications without a search warrant isn’t just outdated, it’s dangerous in a democracy. ALA strongly supports the bipartisan Lee/Leahy “ECPA Modernization Act” to finally and fully bring the Electronic Communications Privacy Act – and with it our fundamental rights to privacy, inquiry and speech – into the modern era.”

Like the House’s bill, the ECPA Modernization Act will for the first time require a warrant for authorities to access the content of many forms of electronic communications not now protected. It also goes further to impose a similar requirement for “geo-location” information from cell phones. In addition, among other important new measures outlined on Sen. Lee’s website, the bill puts “teeth” in the cell phone location clause by permitting courts to suppress such evidence if acquired in an illegal warrantless search.

No action in the Judiciary Committee is anticipated on the bill before the Senate recesses for its August break. ALA and fellow public and private sector members of the Digital Due Process coalition collectively will be pushing hard in the fall, however, for adoption of this potentially landmark legislation. (You can read many of their statements of support for the bill here.)

As a hedge against this ambitious reform package stalling, supporters also introduced a second bill identical to H.R. 387 as adopted by the House in February. Were the Senate to pass that more limited but still valuable measure, it would move directly to the President’s desk for signature. The broader ECPA Modernization Act, if passed by the Senate, would require further consideration and approval by the House. Its currently broader scope could make that difficult.

The post Email privacy protection measures introduced in Senate appeared first on District Dispatch.

District Dispatch: New legislation would protect your right to research

Thu, 2017-07-27 20:53

ALA applauds the introduction of the Fair Access to Science and Technology Research Act (FASTR). Reps. Mike Doyle (D-PA), Kevin Yoder (R-KS), and Zoe Lofgren (D-CA) introduced the bipartisan legislation as H.R. 3427 yesterday.

FASTR would ensure that, when taxpayers fund scientific research, they are able to freely access the results of that research. Every federal agency that significantly funds research would have to adopt a policy to provide for free, online public access to research articles resulting from that public funding.

As our colleagues at SPARC explain:

The government funds research with the expectation that new ideas and discoveries resulting from that research will advance science, stimulate innovation, grow the economy, and improve the lives and welfare of Americans. The Internet makes it possible to advance these goals by providing public online access to federally funded research and has revolutionized information sharing by enabling prompt sharing of the latest advances with every scientist, physician, educator, entrepreneur and citizen.

FASTR would build on the law, first signed by then-President George W. Bush, that created the National Institutes of Health’s Public Access Policy. Subsequently, the White House Office of Science and Technology Policy under then-President Barack Obama directed other agencies to adopt similar plans to make their research transparent. FASTR would codify and strengthen that directive and speed up public access to this important information.

ALA welcomes the growing bipartisan recognition that public access to information accelerates innovation and encourages Congress to “move FASTR.”

The post New legislation would protect your right to research appeared first on District Dispatch.

LITA: Technical Debt: that escalated quickly

Thu, 2017-07-27 19:00

If you’re not familiar with the term “technical debt”, it’s an analogy coined by Ward Cunningham[1], used to relay what happens when rather than following best practices and standards we take shortcuts on technical projects to have a quick fix. Debt occurs when we take on a long-term burden in order to gain something in the short term.

I want to note that inevitably we will always take on some sort of debt, often unknowingly and usually while learning; the phrase “hindsight is 20/20” comes to mind, we see where we went wrong after the fact. There is also inherited technical debt, the bit that you can’t control. In all of my jobs, current and past, I’ve inherited technical debt, this is out of my control, it happens and I still need to learn how to deal with it. This piece aims to give some guidelines and bits I’ve learned over the years in dealing with technical debt and doing me best to maintain it, because really, it’s unavoidable and ignoring it doesn’t make it go away. Believe me, I’ve tried. 

Technical debt can refer to many different things including, but not limited to: infrastructure, software, design/UX, or code. Technical debt reduces the long term agility of a team; it forces us to rely on short term solution thinking and make trade-offs for short term agility. When done haphazardly and not managed, technical debt can shut down a team’s ability to move forward on a project, their long term agility.

It accrues quickly and often we don’t realize just how quickly. For example, I’d been tasked with implementing single-sign on (SSO) for a multitude of applications in our library. In the process of mapping out the path of action this led to learning that in order to implement the bits we needed for SSO most of the applications needed to be updated and the newer versions weren’t compatible with the version of PHP running on our servers, and to use the version of PHP that would be compatible we needed to upgrade our server and the upgrade on the server was a major upgrade which led to having to do a full server upgrade and migration. Needless to say, SSO has not yet been implemented. This technical debt accrued from a previous admin’s past decisions to not stay on top of the upgrades for many of our applications because short term hacks were put in place and the upgrades would break those hacks. These decisions to take on technical debt ultimately caught up with us and halted the ability to move forward on a project. Whether the debt is created under your watch or inherited, it will eventually need to be addressed.

The decisions that are made which result in technical debt should be made with a strategic engineering perspective. Technical debt should only be accrued on purpose because it enables some business goal, intentional and unintentional. Steve McConnell’s talk on Managing Technical Debt [2] does a good job of laying the business and technical aspects of taking on technical debt. Following that, ideally there should be a plan in place on how to reasonably reduce the debt down the road. If technical debt is left unaddressed, at some point the only light at the end of the tunnel is to declare bankruptcy, analogically: just blow it up and start over.

Technical debt is always present, it’s not always bad either but it’s always on the verge of getting worse. It is important to have ways of hammering through it, as well as having preventative measures in place to keep debt to a minimum and manageable for as long as possible.

So how do you deal with it?

Tips for dealing with inherited technical debt:

  • Define it. What counts as technical debt? Why is it important to do something about it?
  • Take inventory, know what you’re working with.
  • Prioritize your payoffs. Pick your technical battles carefully, which bits need addressing NOW and which bits can be addressed at a later date?
  • Develop a plan on what and how you’re going to address and ultimately tidy up the debt.
  • Track technical debt. However you track it, make sure you capture enough detail to identify the problem and why it needs to be fixed.

Preventative tips to avoiding technical debt (as much as you can):

  • Before taking on debt ask yourself…
    • Do we have estimates for the debt and non-debt options?
    • How much will the quick & dirty option cost now? What about the clean options?
    • Why do we believe that it is better to incur the effort later than to incur it now? What is expected to change to make taking on that effort more palatable in the future?
    • Have we considered all the options?
    • Who’s going to own the debt?
  • Define initial requirements in a clear and constant style. A good example of this is Gherkin:
  • Create best practices. Some examples:  KISS (Keep It Simple Stupid), DRY (Don’t Repeat Yourself), YAGNI (You Aren’t Gonna Need it)
  • Have a standard, an approved model of taking shortcuts, and stick to it. Remember to also reevaluate that standard periodically, what once was the best way may not always be the best way.
  • Documentation. A personal favorite: the “why-and” approach. If you take a temporary (but necessary) shortcut, make note of it and explain why you did what you did and what needs to be done to address it. Your goal is to avoid having someone look at your code/infrastructure/digital records/etc and asking “why is it like that?” Also for documentation, a phenomenal resource (and community) is Write The Docs (
  • Allow for gardening. Just as you would with a real garden you want to tidy up things in your projects sooner rather than later. General maintenance tasks that can be done to improve code/systems/etc now rather than filed on the low priority “to-do” list.
  • TESTS! Write/use automated tests that will catch bugs and issues before your users. I’m a fan of using tools like Travis CI (, Cucumber (, Fiddler ( and Nagios (  for testing and monitoring. Another resource recommended to me (thanks Andromeda!)  is Obey the Testing Goat (
  • Remember to act slower than you think. Essentially, think through how something should be done before actually doing it.

And my final thought, commonly referred to as the boy scout rule, when you move on from a project or team and someone else inherits what you leave behind, do your best to leave it better than when you found it.

  1. Ward Cunningham, Explaing Debt Metaphor [Video]
  2. Managing Technical Debt by Steve McConnell (slides)
Extra Reading/Tools:

How to deal with technical debt? by Vlad Alive

Obey the Testing Goat by Harry Percival

How to write a good bug report? Tips and Tricks

Tools & Services list

Don’t take the technical debt metaphor too far 

David Rosenthal: Decentralized Long-Term Preservation

Thu, 2017-07-27 16:46
Lambert Heller is correct to point out that:
name allocation using IPFS or a blockchain is not necessarily linked to the guarantee of permanent availability, the latter must be offered as a separate service.Storage isn't free, and thus the "separate services" need to have a viable business model. I have demonstrated that increasing returns to scale mean that the "separate service" market will end up being dominated by a few large providers just as, for example, the Bitcoin mining market is. People who don't like this conclusion often argue that, at least for long-term preservation of scholarly resources, the service will be provided by a consortium of libraries, museums and archives. Below the fold I look into how this might work.

These institutions would act in the public interest rather than for profit, and thus somehow be exempt from the effects of increasing returns to scale. Given the budget pressures these institutions are under, I'm skeptical. But lets assume that they are magically exempt.

The whole point of truly decentralized peer-to-peer systems is that they cannot be centrally managed; for example by a consortium of libraries. A system of this kind needs management that arises spontaneously by the effect of its built-in incentives on each individual participant. Among the functions that this spontaneous management needs to perform for a long-term storage service is to ensure that:
  • the storage resources needed to meet the demand are provided,
  • they are replaced as they fail or become obsolete,
  • each object is adequately replicated to ensure its long-term viability,
  • the replicas maintain suitable geographic and organizational diversity,
  • the software is maintained to fix the inevitable vulnerabilities,
and that the software is upgraded as the computing infrastructure evolves through time. Note that these are mostly requirements on the network as a whole rather than on individual peers. The SEC's report on Initial Coin Offerings recognizes similar needs:
Investors in The DAO reasonably expected and its co-founders, and The DAO’s Curators, to provide significant managerial efforts after The DAO’s launch. The expertise of The DAO’s creators and Curators was critical in monitoring the operation of The DAO, safeguarding investor funds, and determining whether proposed contracts should be put for a vote. Investors had little choice but to rely on their expertise.

By contract and in reality, DAO Token holders relied on the significant managerial efforts provided by and its co-founders, and The DAO’s Curators, as described above.Even in the profit-driven world of crypto-currencies, the incentive from profit doesn't always lead to concensus (see the issue of increasing the Bitcoin block size, and the DAO heist), or to the provision of resources to meet the demand (see Bitcoin's backlog of unconfirmed transactions). Since we have assumed away the profit motive, and all we have left is a vague sense of the public interest, the built-in incentives powering the necessary functions will be weak.

This lack of effective governance is a problem in the short-term world of crypto-currency speculation (see the surplus GPUs flooding the market as Ethereum miners drop out). It is a disaster in digital preservation, where the requirement is to perform continuously and correctly over a time-scale of many technology generations. Human organizations can survive much longer time-scales; 8 years ago my University celebrated its 800-th birthday. Does anybody believe we'll be using Bitcoin or Ethereum 80 years from now as it celebrates its 888-th?

We have experience in these matters. Seventeen years ago we published the first paper describing the LOCKSS peer-to-peer digital preservation system. At the software level it was, and has remained through its subsequent evolution, a truly decentralized system. All peers are equal, no peer trusts any other, peers discover others through gossip-style communication. At the management and organizational level, however, formal structures arose such as the LOCKSS Alliance, the MetaArchive and the CLOCKSS Archive to meet real-world demand for the functions above to be performed in a reliable and timely fashion.

Trying by technical means to remove the need to have viable economics and governance is doomed to fail in the medium- let alone the long-term. What is needed is a solution to the economic and governance problems. Then a technology can be designed to work in that framework. Blockchain is a technology in search of a problem to solve, being pushed by ideology into areas where the unsolved problems are not technological.

District Dispatch: The 2017 Congressional App Challenge is live!

Thu, 2017-07-27 14:55

The 2017 Congressional App Challenge is live!

The App Challenge is an annual congressional initiative to encourage student engagement in coding and computer science through local events hosted by the Members of Congress.

Between now and November 1, high school students from across the country will be busy creating an app for mobile, tablet or computer devices.

This year, there are over 165 Members of Congress signed up to participate in the launch! Check to see if your district is participating. If not, we encourage you to connect with your Representative to make sure that s/he does sign up. The App Challenge website also has a library letter template you can use to send to your Member of Congress.

How does it work?
Students work solo and in teams to turn a personal interest or social issue into an app that solves a problem or adds another layer to something they are interested in. In past years students developed apps that help reduce the impact of disease in developing countries; guide you through choosing the best soccer cleats online; allow chemistry students to learn the history of atoms in a virtual reality; translate American sign language into other languages; monitor allergies by scanning product barcodes; and to organize your recipe collection.

Every participating district has a winner who is recognized by their Member of Congress and many come to Washington to exhibit their winning app and meet with their Member during the #HouseofCode celebration. The Challenge is sponsored by the Internet Education Foundation and supported by ALA as part of our Libraries Ready to Code (RtC) initiative.

Why code at the library?
Through the Libraries Ready to Code work, we have heard from libraries all over the country and have heard about the variety of ways libraries facilitate coding programs for youth. The variety of programs is as varied as the libraries and the communities they serve. What we have learned (that our current RtC Phase III grant program is now promoting!) is library coding programs should incorporate basic RtC concepts. The App Challenge is a perfect way to bring coding into your library and expose kids to the opportunities coding can open up.

Whether you already have coding programs at your library or not, you can get teens excited about the App Challenge. In addition to building an app, the Challenge introduces teens to the idea of connecting with their elected officials through a fun and creative way. Participating in the Challenge can pave the way for future civic engagement on issues that matter to the teens you work with. At last year’s #HouseofCode event, three young men had designed a climate change strategy game, Code Carbon and were very excited to talk to their Representative about where she stands on climate change.

There are lots of ways libraries can encourage students to participate in the Challenge! Host an App Challenge event, an “app-a-thon,” a game night for teens to work on their apps, or start an app building club. Students wishing to participate work through their Member of Congress who must sign up.

Again, check to see if your district is participating and connect with your Representative to make sure that s/he does sign up.

If you do participate we want to hear about it! Share using the App Challenge hashtag #CAC17 and ALA’s hashtag #readytocode. The App Challenge runs through November 1.

The post The 2017 Congressional App Challenge is live! appeared first on District Dispatch.

Open Knowledge Foundation: Open Data for Tax Justice design sprint: building a pilot database of public country-by-country reporting

Thu, 2017-07-27 14:36

Tax justice advocates, global campaigners and open data specialists came together this week from across the world to work with Open Knowledge International on the first stages of creating a pilot country-by-country reporting database. Such a database may enable anyone to understand the activities of multinational corporations and uncover potential tax avoidance schemes. 

This design sprint event was part of our Open Data for Tax Justice project to create a global network of people and organisations using open data to improve advocacy, journalism and public policy around tax justice in line with our mission to empower civil society organisations to use open data to improve people’s lives. In this post my colleague Serah Rono and I share our experiences and learnings from the sprint. 


What is country-by-country reporting?

Image: Financial Transparency Coalition

Country-by-country reporting (CBCR) is a transparency mechanism which requires multinational corporations to publish information about their economic activities in all of the countries where they operate. This includes information on the taxes they pay, the number of people they employ and the profits they report. Publishing this information can bring to light structures or techniques multinational corporations might be using to avoid paying tax in certain jurisdictions by shifting their profits or activities elsewhere.

In February 2017, Open Knowledge International published a white paper co-authored by Alex Cobham, Jonathan Gray and Richard Murphy which examined the prospects for creating a global public database on the tax contributions and economic activities of multinational companies as measured by CBCR.

The authors found that such a public database was possible and concluded that a pilot database could be created by bringing together the best existing source of public CBCR information – disclosures made by European Union banking institutions in line with the Capital Requirements Directive IV (CRD IV) passed in 2013.  The aim of our design sprint was to take the first steps towards the creation of this pilot database.


What did we achieve?

From left to right: Tim Davies (Open Data Services), Jonathan Gray (University of Bath/Public Data Lab), Tommaso Faccio (University of Nottingham/BEPS Monitoring Group), Oliver Pearce (Oxfam GB), Elena Gaita (Transparency International EU), Dorcas Mensah (University of Edinburgh/Tax Justice Network – Africa) and Serah Rono (Open Knowledge International). Photo: Stephen Abbott Pugh

A design sprint is intended to be a short and sharp process bringing together a multidisciplinary team in order to quickly prototype and iterate on a technical product.

On Monday 24th and Tuesday 25th July 2017, Open Knowledge International convened a team of tax justice, advocacy, research and open data experts at Friends House in London to work alongside developers and a developer advocate from our product team. This followed three days of pre-sprint planning and work on the part of our developers. All the outputs of this event are public on Google Drive, Github and

To understand more from those who had knowledge of trying to find and understand CRD IV data, we heard expert presentations from George Turner of Tax Justice Network on the scale of international tax avoidance, Jason Braganza of Tax Justice Network – Africa and Financial Transparency Coalition on why developing countries need public CBCR (see report for more details) and Oliver Pearce of Oxfam Great Britain on the lessons learned from using CRD IV data for the Opening the vaults and Following the money reports. These were followed by a presentation from Adam Kariv and Vitor Baptista of Open Knowledge International on how they would be reusing open-source tech products developed for our Open Spending and OpenTrials projects to help with Open Data for Tax Justice.

Next we discussed the problems and challenges the attendees had experienced when trying to access or use public CBCR information before proposing solutions to these issues. This lead into a conversation about the precise questions and hypotheses which attendees would like to be able to answer using either CRD IV data or public CBCR data more generally.

From left to right: Georgiana Bere (Open Knowledge International), Adam Kariv (Open Knowledge International), Vitor Baptista (Open Knowledge International). Photo: Stephen Abbott Pugh

As quickly as possible, the Open Knowledge International team wanted to give attendees the knowledge and tools they needed to be able to answer these questions. So our developers Georgiana Bere and Vitor Baptista demonstrated how anyone could take unstructured CRD IV information from tables published in the PDF version of banks’ annual reports and follow a process set out on the Github repo for the pilot database to contribute this data into a pipeline created by the Open Knowledge International team.

Datapackage-pipelines is a framework – developed as part of the Frictionless Data toolchain – for defining data processing steps to generate self-describing Data Packages. Once attendees had contributed data into the pipeline via Github issues,  Vitor demonstrated how to write queries against this data using Redash in order to get answers to the questions they had posed earlier in the day.


Storytelling with CRD IV data

Evidence-based, data-driven storytelling is an increasingly important mechanism used to inform and empower audiences, and encourage them to take action and push for positive change in the communities they live in. So our sprint focus on day two shifted to researching and drafting thematic stories using this data.

Discussions around data quality are commonplace in working with open data. George Turner and Oliver Pearce noticed a recurring issue in the available data: the use of hyphens to denote both nil and unrecorded values. The two spent part of the day thinking about ways to highlight the issue and guidelines that can help overcome this challenge so as to avoid incorrect interpretations.

Open data from a single source often has gaps so combining it with data from additional sources often helps with verification and to build a stronger narrative around it. In light of this, Elena Gaita, Dorcas Mensa and Jason Braganza narrowed their focus to examine a single organisation to see whether or not this bank changed its policy towards using tax havens following a 2012 investigative exposé by a British newspaper. They achieved this by comparing data from the investigation with the bank’s 2014 CRD IV disclosures. In the coming days, they hope to publish a blogpost detailing their findings on the extent to which the new transparency requirements have changed the bank’s tax behaviour.


Visual network showing relation between top 50 banks and financial institutions who comply with Capital Requirements Directive IV (CRD IV) and countries in which they report profits. Image: Public Data Lab

To complement these story ideas, we explored visualisation tools which could help draw insights and revelations from the assembled CRD IV data. Visualisations often help to draw attention to aspects of the data that would have otherwise gone unnoticed. Oliver Pearce and George Turner studied the exploratory visual network of CRD IV data for the EU’s top 50 banks created by our friends at Density Design and the Public Data Lab (see screengrab above) to learn where banks were recording most profits and losses. Pearce and Turner quickly realised that one bank in particular recorded losses in all but one of its jurisdictions. In just a few minutes, the finding from this visual network sparked their interest and encouraged them to ask more questions. Was the lone profit-recording jurisdiction a tax haven? How did other banks operating in the same jurisdiction fare on the profit/loss scale in the same period? We look forward to reading their findings as soon as they are published.


What happens next?

The Open Data for Tax Justice network team are now exploring opportunities for collaborations to collect and process all available CRD IV data via the pipeline and tools developed during our sprint. We are also examining options to resolve some of the data challenges experienced during the sprint like the perceived lack of an established codelist of tax jurisdictions and are searching for a standard exchange rate source which could be used across all recorded payments data.

In light of the European Union Parliament’s recent vote in favour of requiring all large multinational corporations to publish public CBCR information as open data, we will be working with advocacy partners to join the ongoing discussion about the “common template” and “open data format” for future public CBCR disclosures which will be mandated by the EU.

Having identified extractives industry data as another potential source of public CBCR to connect to our future database, we are also heartened to see the ongoing project between the Natural Resource Governance Institute and Publish What You Pay Canada so will liaise further with the team working on extracting data from these new disclosures.

Please email if you’d like to be added to the project mailing list or want to join the Open Data for Tax Justice network. You can also follow the #OD4TJ hashtag on Twitter for updates.


Thanks to our partners at Open Data for Development, Tax Justice Network, Financial Transparency Coalition and Public Data Lab for the funding and support which made this design sprint possible.




In the Library, With the Lead Pipe: Editorial: Recent Reads

Wed, 2017-07-26 16:17

It’s summer in the northern hemisphere, and your editors at In the Library with the Lead Pipe are busy keeping up with the influx of patrons, with improving our instruction programs, and with other joys of summer. As always, we’re also thinking of ways librarians can improve our profession.

Here’s a few recent articles that we’ve been revisiting and think you might also enjoy reading or revisiting. If you have other reading recommendations, feel free to suggest them in the comments.


Terry Reese: MarcEdit Updates (all)

Wed, 2017-07-26 16:09

I’ve posted update for all versions.  Windows and linux updates for 6.3.x Sunday evening and updates to MacOS for 2.5.x on Wed. morning.  Change log below:


* Bug Fix: MarcEditor: Convert clipboard content to….: The change in control caused this to stop working – mostly because the data container that renders the content is a rich object, not plain text like the function was expecting.  Missed that one.  I’ve fixed this in the code.
* Enhancement: Extract Selected Records:  Connected the exact match to the search by file
* Bug Fix: MarcEditor: Right to left flipping wasn’t working correctly for Arabic and Hebrew if the codes were already embedded into the file.
* Update: Cleaned up some UI code.
* Update: Batch Process MarcXML: respecting the native versus the XSLT options.

MacOS Updates:

* Bug Fix: MarcEditor: Right to left flipping wasn’t working correctly for Arabic and Hebrew if the codes were already embedded into the file.
* Update: Cleaned up some UI code.
* Update: Batch Process MarcXML: respecting the native versus the XSLT options.
* Enhancement: Exact Match searching in the Extract, Delete Selected Records tool
* Enhancement: Exact Match searching in the Find/Replace Tool
* Enhancement: Work updates in the Linked data tool to support the new MAC proposal