You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 1 hour 38 min ago

Open Knowledge Foundation: Europe in the age of Tr… Transparency

Tue, 2017-02-21 10:01

For the past few years, the USA has been an example of how governments can manage open government initiatives and open data particularly. They have done this by introducing positions like federal chief information officer and chief data officers. With datasets being opened on a massive scale in a standardised format, it laid the ground for startups and citizen apps to flourish. Now, when referring to the example of the US, it is common to add ‘under Obama’s administration’ with a sigh. Initiatives to halt data collection put the narrative on many sensitive issues such as climate change, women’s rights or racial inequality under threat. Now, more than ever, the EU should take a global lead with its open data initiatives.

One of these initiatives just took place last week: developers of civic apps from all over Europe went on a Transparency Tour of Brussels. Participants were the winners of the app competition that was held at TransparencyCamp EU in Amsterdam last June. In the run up to the final event, 30 teams submitted their apps online while another 40 teams were created in a series of diplohacks that Dutch embassies organised in eight countries. If you just asked yourself ‘what is diplohack?’, let me explain.

ConsiliumVote team pitching their app at TCampEU, by EU2016NL

Diplohacks are hackathons where developers meet diplomats – with initial suspicion from both sides. Gradually, both sides understand how they can benefit from this cooperation. As much as the word ‘diplohack’ itself brings two worlds together, the event was foremost an ice breaker between the communities. According to the survey of participants, direct interaction is what both sides enjoyed the most. Diplohacks helped teams to find and understand the data, and also enabled data providers to see the points of improvement like better interface, adding relevant data fields to their datasets, etc.  

Experience the diplohack atmosphere by watching this short video:

All winners of the app competition were invited last week for the transparency tour at the EU institutions. The winning teams were Citybik.es, which h makes use of bike data; Harta Banilor Publici (Public Spending Map) in Romania; and ConsiliumVote, a visualization tool of the votes in the Council of the EU. Developers were shown the EU institutions from the inside, but the most exciting part of it was a meeting with the EU open data steering committee.

Winners of the app competition at the Council of EU, by Open Knowledge Belgium

Yet again, it proved how important it is to meet face to face and discuss things. Diplomats encouraged coders to use their data more. Tony Agotha, a member of the cabinet of First Vice-President Frans Timmermans, reminded and praised coders for the social relevance of their work. Developers, in turn, provided feedback with both specific comments like making the search on the Financial Transparency website possible across years; and general ideas such as making the platform of the European data portal open sourced so that regional and municipal portals can build on it.

Open data is not a favour, it’s a right’ – said one of the developers. To use this right, we need more meetings between publishers and re-users, we need community growth, we need communication of data and ultimately, more data. TransparencyCamp Europe and last week’s events in Brussels were good first steps. However, both EU officials and European citizens using data should keep the dialogue going if we want to take up the opportunity for the EU to lead on open data. Your comments and ideas are welcome. Join the discussion here.

 

 

Terry Reese: MarcEdit Mac Update

Tue, 2017-02-21 06:34

It seems like I’ve been making a lot of progress wrapping up some of the last major features missing from the Mac version of MarcEdit.  The previous update introduced support for custom/user defined fonts and font sizes which I hope went a long way towards solving accessibility issues.  Today’s update brings plugin support to MarcEdit Mac.  This version integrates the plugin manager and provides a new set of templates for interacting with the program.  Additionally, I’ve migrated one of the Windows plugins (Internet Archive to HathiTrust Packager) to the new framework.  Once the program is updated, you’ll have access to the current plugins.  I have 3 that I’d like to migrate, and will likely be doing some work over the next few weeks to make that happen.

Interested in seeing what the plugin support looks like? See: https://youtu.be/JM-0i5KLm74

You can download the file from the downloads page (http://marcedit.reeset.net/downloads) or via the automatic updating tool in the program.

Questions?  Let me know.

–tr

DuraSpace News: VIVO Updates for Feb 19–Camp, Wiki, Ontology

Tue, 2017-02-21 00:00

From Mike Conlon, VIVO Project Director

Islandora: Islandoracon General Sessions Schedule

Mon, 2017-02-20 17:01

The full schedule is now available for Islandoracon's general sessions, happening May 16th & 17th. We previously shared the slate of workshops and will soon be publishing a schedule for post-conference evnts on May 19th.

OCLC Dev Network: Can A Worldcat Query Collection Solve Your Collection Loading Problems?

Mon, 2017-02-20 14:00

Learn how Lawrence Technological University Library used WorldCat® query collections to load our records and holdings into a statewide interlibrary loan (ILL) system.

Open Knowledge Foundation: Open Knowledge International receives $1.5 million from Omidyar Network

Mon, 2017-02-20 13:00

We’ve recently received funding from Omidyar Network, which will allow us to further our commitment to civil society organisations!

Open Knowledge International has received a two-year grant amounting to $1.5 million from Omidyar Network to support the development and implementation of our new civil society-focused strategy. Running until the end of December 2018, this grant reflects Omidyar Network’s confidence in our shared vision to progress openness in society and we are looking forward to using the funds to strengthen the next phase of our work.

With over a decade’s experience opening up information, we will be turning our attention and efforts to focus on realising the potential of data for society. The unrestricted nature of the funding will help us to build on the successes of our past, work with new partners and implement effective systems to constructively address the challenges before us.

2017 certainly presents new challenges to the open data community. Increased access to information simply is not enough to confront a shrinking civic space, the stretched capacities of NGOs, and countless social and environmental issues. Open Knowledge International is looking to work with new partners on these areas to use open data as an effective tool to address society’s most pressing issues. Omidyar Network’s support will allow us to work in more strategic ways, to develop relationships with new partners and to embed our commitment to civil society across the organisation.

Pavel Richter, Open Knowledge International’s CEO, underlines the impact that this funding will have on the organisation’s continued success: “Given the expertise Open Knowledge International has amassed over the years, we are eager to employ our efforts to ensure open data makes a real and positive impact in the world. Omidyar Network’s support for the next two years will allow us to be much more strategic and effective with how we work.”

Of course implementing our strategic vision will take time. Long-term funding relationships like the one we have with Omidyar Network play an instrumental role in boosting Open Knowledge International’s capacity as they provide the space to stabilise and grow. For the past six years, Omidyar Network has been an active supporter of Open Knowledge International, and this has allowed us to cultivate and refine the strong vision we have today. More recently Omidyar Network has provided valuable expertise for our operational groundwork, helping to instil a suitable structure for us to thrive. Furthermore, our shared vision of the transformative impact of openness has allowed us to scale our community and grow our network of committed change-makers and activists around the world.

“We are proud to continue our support for Open Knowledge International, which plays a critical role in the open data ecosystem,” stated Martin Tisné, Investment Partner at Omidyar Network. “Open Knowledge International has nurtured several key developments in the field, including the Open Definition, CKAN and the School of Data, and we look forward to working with Open Knowledge International as it rolls out its new civil society-focused strategy.”

As we continue to chart our direction, Open Knowledge International’s work will focus on three areas to unlock the potential value of open data for civil society organisations: we will demonstrate the value of open data for the work of these organisations, we will provide organisations with the tools and skills to effectively use open data, and we will work to make government information systems more responsive to the needs of civil society. Omidyar Network’s funding ensures Open Knowledge International has the capacity to address these three areas. We are grateful for the support and we welcome our new strategic focus to empower civil society organisations to use open data to improve people’s lives.

Further information:

Open Knowledge International
www.okfn.org

Open Knowledge International is a global non-profit organisation focused on realising open data’s value to society by helping civil society groups access and use data to take action on social problems. Open Knowledge International does this in three ways: 1) we show the value of open data for the work of civil society organizations; 2) we provide organisations with the tools and skills to effectively use open data; and 3) we make government information systems responsive to civil society.

Omidyar Network 

Omidyar Network is a philanthropic investment firm dedicated to harnessing the power of markets to create opportunity for people to improve their lives. Established in 2004 by eBay founder Pierre Omidyar and his wife Pam, the organization invests in and helps scale innovative organizations to catalyze economic and social change. Omidyar Network has committed more than $1 billion to for-profit companies and nonprofit organizations that foster economic advancement and encourage individual participation across multiple initiatives, including Education, Emerging Tech, Financial Inclusion, Governance & Citizen Engagement, and Property Rights.

To learn more, visit www.omidyar.com, and follow on Twitter @omidyarnetwork

#PositiveReturns

Open Knowledge Foundation: Excel is threatening the quality of research data — Data Packages are here to help

Mon, 2017-02-20 10:15

This week the Frictionless Data team at Open Knowledge International will be speaking at the International Digital Curation Conference #idcc17 on making research data quality visible. Dan Fowler looks at why the popular file format Excel is problematic for research and what steps can be taken to ensure data quality is maintained throughout the research process.

Our Frictionless Data project aims to make sharing and using data as easy and frictionless as possible by improving how data is packaged.The project is designed to support the tools and file formats researchers use in their everyday work, including basic CSV files and popular data analysis programming languages and frameworks like R and Python Pandas.  However, Microsoft Excel, both the application and the file format, remains very popular for data analysis in scientific research.

It is easy to see why Excel retains its stranglehold: over the years, an array of convenience features for visualizing, validating, and modeling data have been developed and adopted across a variety of uses.  Simple features, like the ability to group related tables together, is a major advantage of the Excel format over, for example, single-table formats like CSV.  However, Excel has a well documented history of silently corrupting data in unexpected ways which leads some, like data scientist Jenny Bryan, to compile lists of “Scary Excel Stories” advising researchers to choose alternative formats, or at least, treat data stored in Excel warily.

“Excel has a well-documented history of silently corrupting data in unexpected ways…”

With data validation and long-term preservation in mind, we’ve created Data Packages which provide researchers an alternative format to Excel by building on simpler, well understood text-based file formats like CSV and JSON and adding advanced features.  Added features include providing a framework for linking multiple tables together; setting column types, constraints, and relations between columns; and adding high-level metadata like licensing information.  Transporting research data with open, granular metadata in this format, paired with tools like Good Tables for validation, can be a safer and more transparent option than Excel.

Why does open, granular metadata matter?

With our “Tabular” Data Packages, we focus on packaging data that naturally exists in “tables”—for example, CSV files—a clear area of importance to researchers illustrated by guidelines issued by the Wellcome Trust’s publishing platform Wellcome Open Research. The guidelines mandate:

Spreadsheets should be submitted in CSV or TAB format; EXCEPT if the spreadsheet contains variable labels, code labels, or defined missing values, as these should be submitted in SAV, SAS or POR format, with the variable defined in English.

Guidelines like these typically mandate that researchers submit data in non-proprietary formats; SPSS, SAS, and other proprietary data formats are accepted due to the fact they provide important contextual metadata that haven’t been supported by a standard, non-proprietary format. The Data Package specifications—in particular, our Table Schema specification—provide a method of assigning functional “schemas” for tabular data.  This information includes the expected type of each value in a column (“string”, “number”, “date”, etc.), constraints on the value (“this string can only be at most 10 characters long”), and the expected format of the data (“this field should only contain strings that look like email addresses). The Table Schema can also specify relations between tables, strings that indicate “missing” values, and formatting information.

This information can prevent incorrect processing of data at the loading step.  In the absence of these table declarations, even simple datasets can be imported incorrectly in data analysis programs given the heuristic (and sometimes, in Excel’s case, byzantine) nature of automatic type inference.  In one example of such an issue, Zeeberg et al. and later Ziemann, Eren and El-Osta describe a phenomenon where gene expression data was silently corrupted by Microsoft Excel:

A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted to ’1-DEC.’ [16]

These errors didn’t stop at the initial publication.  As these Excel files are uploaded to other databases, these errors could propagate through data repositories, an example of which took place in the now replaced “LocusLink” database. In a time where data sharing and reproducible research is gaining traction, the last thing researchers need is file formats leading to errors.

Much like Boxed Water, Packaged Data is better because it is easier to move.

Zeeberg’s team described various technical workarounds to avoid Excel problems, including using Excel’s text import wizard to manually set column types every time the file is opened.  However, the researchers acknowledge that this requires constant vigilance to prevent further errors, attention that could be spent elsewhere.   Rather, a simple, open, and ubiquitous method to unambiguously declare types in column data—columns containing gene names (e.g. “DEC1”) are strings not dates and “RIKEN identifiers” (e.g. “2310009E13”) are strings not floating point numbers—paired with an Excel plugin that reads this information may be able to eliminate the manual steps outlined above.

Granular Metadata Standards Allow for New Tools & Integrations

By publishing this granular metadata with the data, both users and software programs can use it to automatically import into Excel, and this benefit also accrues when similar integrations are created for other data analysis software packages, like R and Python.  Further, these specifications (and specifications like them) allow for the development of whole new classes of tools to manipulate data without the overhead of Excel, while still including data validation and metadata creation.

For instance, the Open Data Institute has created Comma Chameleon, a desktop CSV editor.  You can see a talk about Comma Chameleon on our Labs blog.  Similarly, Andreas Billman created SmartCSV.fx to solve the issue of broken CSV files provided by clients.  While initially this project depended on an ad hoc schema for data, the developer has since adopted our Table Schema specification.

Other approaches that bring spreadsheets together with Data Packages include Metatab which aims to provide a useful standard, modeled on the Data Package, of storing metadata within spreadsheets.  To solve the general case of reading Data Packages into Excel, Nimble Learn has developed an interface for loading Data Packages through Excel’s Power Query add-in.

For examples of other ways in which Excel mangles good data, it is worth reading through Quartz’s Bad Data guide and checking over your data.  Also, see our Frictionless Data Tools and Integrations page for a list of integrations created so far.   Finally, we’re always looking to hear more user stories for making it easier to work with data in whatever application you are using.

This post was adapted from a paper we will be presenting at International Digital Curation Conference (IDCC) where our Jo Barratt will be presenting our work to date on Making Research Data Quality Visible .

Dan Scott: Truly progressive WebVR apps are available offline!

Sun, 2017-02-19 22:15

I've been dabbling with the A-Frame framework for creating WebVR experiences for the past couple of months, ever since Patrick Trottier gave a lightning talk at the GDG Sudbury DevFest in November and a hands-on session with AFrame in January. The @AFrameVR Twitter feed regularly highlights cool new WebVR apps, and one that caught my attention was ForestVR - a peaceful forest scene with birds tweeting in the distance. "How nice would it be", I thought, "if I could just escape into that little scene wherever I am, without worrying about connectivity or how long it would take to download?"

Then I realized that WebVR apps are a great use case for Progressive Web App (PWA) techniques that allow web apps to be as fast, reliable, and engaging as native Android apps. With the source code for ForestVR at my disposal, I set out to add offline support. And it turned out to be surprisingly easy to make this work on Android in both the Firefox and Chrome browsers.

If you just want to see the required changes for this specific example, you can find the relevant two commits at the tip of my branch. The live demo is at https://stuff.coffeecode.net/forestvr/.

ForestVR with "Add to Home Screen" menu on Firefox for Android 51.0.3

ForestVR with "Add" prompt on Chrome for Android 57

In the following sections I've written an overview of the steps you have to take to turn your web app into a PWA:

Describe your app with a Web App Manifest

ForestVR already had a working Web App Manifest (Mozilla docs / Google docs), a simple JSON file that defines metadata about your web app such as the app name and icon to use when it is added to your home screen, the URL to launch, the splash screen to show when it is loading, and other elements that enable it to integrate with the Android environment.

The web app manifest for ForestVR is named manifest.json and contains the following code:

{ "name": "Forest VR", "icons": [ { "src": "./assets/images/icons/android-chrome-144x144.png", "sizes": "144x144", "type": "image/png" } ], "theme_color": "#ffffff", "background_color": "#ffffff", "start_url": "./index.html", "display": "standalone", "orientation": "landscape" }

You associate the manifest with your web app through a simple <link> element in the <head> of your HTML:

<link rel="manifest" href="manifest.json"> Create a service worker to handle offline requests

A service worker is a special chunk of JavaScript that runs independently from a given web page, and can perform special tasks such as intercepting and changing browser fetch requests, sending notifications, and synchronizing data in the background (Google docs / Mozilla docs). While implementing the required networking code for offline support would be painstaking, bug-prone work, Google has fortunately made the sw-precache node module available to support generating a service worker from a simple configuration file and any static files in your deployment directory.

The configuration I added to the existing gulp build system gulpfile uses runtime caching for assets that are hosted at a different hostname or, in the case of the background soundtrack, is not essential for the experience at launch and can thus be loaded and cached after the main experience has been prepared. The staticFileGlobs list, on the other hand, defines all of the assets that must be cached before the app can launch.

swConfig = { cacheId: packageJson.name, runtimeCaching: [{ urlPattern: /^https:\/\/cdn\.rawgit\.com\//, handler: 'cacheFirst' },{ urlPattern: /^https:\/\/aframe\.io\//, handler: 'cacheFirst' },{ urlPattern: /\/assets\/sounds\//, handler: 'cacheFirst' }], staticFileGlobs: [ 'assets/fonts/fabrica-webfont.woff', 'assets/images/bg.jpg', 'assets/images/tree_icon.png', 'assets/models/**.dae', 'bundle.css', 'bundle.js', 'index.html' ] }

I defined the configuration inside a new writeServiceWorkerFile() function so that I could add it as a build task to the gulpfile:

function writeServiceWorkerFile(callback) { swConfig = {...} swPrecache.write('service-worker.js', swConfig, callback); }

In that gulp task, I declared the 'scripts' and 'styles' tasks as prerequisites for generating the service worker, as those tasks generate the bundle.js and bundle.css files. If the files are not present in the build directory when sw-precache runs, then it will simply ignore their corresponding entry in the configuration, and they will not be available for offline use.

gulp.task('generate-service-worker', ['scripts', 'styles'], function(callback) { writeServiceWorkerFile(callback); });

I added the generate-service-worker task to the deploy task so that the service worker will be generated every time we build the app:

gulp.task('deploy',['scripts','styles','generate-service-worker']) Register the service worker

Just like the Web App Manifest, you need to register your service worker--but it's a little more complex. I chose Google's boilerplate service worker registration script because it contains self-documenting comments and hooks for adding more interactivity, and added it in a <script> element in the <head> of the HTML page.

Host your app with HTTPS

PWAs--specifically service workers--require the web app to be hosted on an HTTPS-enabled site due to the potential for mischief that service workers could cause if replaced by a man-in-the-middle attack that would be trivial with a non-secure site. Fortunately, my personal VPS already runs HTTPS thanks to free TLS certificates generated by Let's Encrypt.

Check for success with Lighthouse

Google has made Lighthouse, their PWA auditing tool, available as both a command-line oriented node module and a Chrome extension for grading the quality of your efforts. It runs a separate instance of Chrome to check for offline support, responsiveness, and many other required and optional attributes and generates succinct reports with helpful links for more information on any less-than-stellar results you might receive.

Check for success with your mobile web browser

Once you have satisfied Lighthouse's minimum requirements, load the URL in Firefox or Chrome on Android and try adding it to your home screen.

  • In Firefox, you will find the Add to Home Screen option in the browser menu under the Page entry.
  • In Chrome, the Add button (Chrome 57) or Add to Home Screen button (Chrome 56) will appear at the bottom of the page when you have visited it a few times over a span of five minutes or more; a corresponding entry may also appear in your browser menu.

Put your phone in airplane mode and launch the app from your shiny new home screen button. If everything has gone well, it should launch and run successfully even though you have no network connection at all!

Conclusion

As a relative newbie to node projects, I spent most of my time in figuring out how to integrate the sw-precache build steps nicely into the existing gulp build, and in making the app relocatable on different hosts and paths for testing purposes. The actual service worker itself was straightforward. While I used ForestVR as my proof of concept, the process should be similar for turning any other WebVR app into a Progressive WebVR App. I look forward to seeing broader adoption of this approach for a better WebVR experience on mobile!

As an aside for my friends in the library world, I plan to apply the same principles to making the My Account portion of the Evergreen library catalogue a PWA in time for the 2017 Evergreen International Conference. Here's hoping more library software creators are thinking about improving their mobile experience as well...

LibUX: Today, I learned about the Accessibility Tree

Sun, 2017-02-19 05:53

Today, I learned about the “accessibility tree.

I am not sure who attribute this diagram to, but I borrowed this from Marcy Sutton.

The accessibility tree and the DOM tree are parallel structures. Roughly speaking the accessibility tree is a subset of the DOM tree. It includes the user interface objects of the user agent and the objects of the document. Accessible objects are created in the accessibility tree for every DOM element that should be exposed to an assistive technology, either because it may fire an accessibility event or because it has a property, relationship or feature which needs to be exposed. Generally if something can be trimmed out it will be, for reasons of performance and simplicity. For example, a <span> with just a style change and no semantics may not get its own accessible object, but the style change will be exposed by other means. W3C Core Accessibility Mappings 1.1

Basically, when a page renders in the browser, there is the Document Object Model (DOM) that is the underlying structure of the page that the browser interfaces with. It informs the browser that such-and-such is the title, what markup to render, and so on. It’s hierarchically structured kind of like a tree. There’s a root and a bunch of branches.

At the same time, there is an accessibility tree that is created. Browsers make them to give assistive technology something to latch on to.

When we use ARIA attributes, we are in part giving instructions to the browser about how to render that accessibility tree.

There’s a catch: not all browsers create accessibility trees in the same way; not all screen readers interpret accessibility trees in the same way; not all screen readers even refer to the accessibility tree, but they scrape the DOM directly — some do both.

LITA: The Space Age: Library as Location

Fri, 2017-02-17 18:47

On the surface, a conversation about the physical spaces within libraries might not seem relevant in:re technology in libraries, but there’s a trend I’ve noticed — not only in my own library, but in other libraries I’ve visited in recent months: user-supplied tech in library landscapes.

Over the course of the last decade, we’ve seen a steady rise in the use of portable personal computing devices. In their Evolution of Technology survey results, Pew Research Center reports that 51% of Americans own a tablet, and 77% own smartphones. Library patrons seem to be doing less browsing and more computing, and user-supplied technology has become ubiquitous — smartphones, and tablets, and notebooks, oh my! Part of the reason for this BYO tech surge may be explained by a triangulation of high demand for the library’s public computer stations, decreased cost of personal devices, and the rise of telecommuting and freelance gig-work in the tech sector. Whatever the reasons, it seems that a significant ratio of patrons are coming to the library to use the wi-fi and the workspace.

I recently collected data for a space-use analysis at my library, and found that patrons who used our library for computing with personal devices outnumbered browsers, readers, and public computer users 3:1. During the space use survey, I noted that whenever our library classrooms are not used for a class, they’re peopled with multiple users who “camp” there, working for 2 – 4 hours at a time. Considering elements of these more recently constructed rooms that differ from the space design in the rest of the 107-year-old building offers a way into thinking about future improvements. Below are a few considerations that may support independent computers and e-commuters in the library space.

Ergonomic Conditions

Furnish work spaces with chairs designed to provide lumbar support and encourage good posture, as well as tables that match the chairs in terms of height ratio to prevent wrist- and shoulder-strain.

Adequate Power

A place to plug in at each surface allows users to continue working for long periods. It’s important to consider not only the number of outlets, but their position: cords stretched across spaces between tables and walls could result in browsers tripping, or knocking laptops off a table.

Reliable Wireless Signal

It goes without saying that telecommuters need the tele– to do their commuting. Fast, reliable wi-fi is a must-have.

Concentration-Inducing Environment

If possible, a library’s spaces should be well-defined, with areas for users to meet and talk, and areas of quiet where users can focus on their work without interruption. Sound isn’t the only environmental consideration. A building that’s too hot or too cold can be distracting. High-traffic areas — such as spaces near doors, teens’ and children’s areas, or service desks — aren’t the best locations for study tables.

Relaxed Rules

This is a complex issue; it’s not easy to strike a balance. For instance, libraries need to protect community resources — especially the expensive electronic ones like wiring — from spills; but we don’t want our patrons to dehydrate themselves while working in the library! At our library, we compromise and allow beverages, as long as those beverages have a closed lid, e.g., travel mugs, yes; to go cups (which have holes that can’t be sealed) no.

As library buildings evolve to accommodate digital natives and those whose workplaces have no walls, it’s important to keep in mind the needs of these library users and remix existing spaces to be useful for all of our patrons, whether they’re visiting for business or for pleasure.

 

Do you have more ideas to create useful space for patrons who bring their own tech to the library? Any issues you’ve encountered? How have you met those challenges?

 

Evergreen ILS: 2018 Evergreen International Conference – Host Site Selected

Fri, 2017-02-17 18:09

The 2018 Evergreen Conference Site Selection Committee has chosen the next host and venue for the 2018 conference.  The MOBIUS consortium will be our 2018 conference host and St. Charles, Missouri will be the 2018 location.  Conference dates to be determined.

Congratulations, MOBIUS!  

LITA: LITA Personas Task Force

Fri, 2017-02-17 17:53

Coming soon to the LITA blog: the results of the LITA Personas Task Force. The initial report contains a number of useful persona types and was submitted to the LITA Board at the ALA Midwinter 2017 conference. Look for reports on the process and each of the persona types here on the LITA blog starting in March 2017.

As a preview, go behind the scenes with this short podcast presented as part of the LibUX Podcast series, on the free tools the Task Force used to do their work.

Metric: A UX Podcast
@metricpodcast
Metric is a #libux podcast about #design and #userExperience. Designers, developers, librarians, and other folks join @schoeyfield and @godaisies to talk shop.

The work of the LITA Personas Task Force
https://overcast.fm/+DVQCBwh2o

In this podcast Amanda L. Goodman (@godaisies) gives you a peek into the work of the LITA Persona Task Force, who are charged with defining and developing personas that are to be used in growing membership in the Library and Information Technology Association.

The ten members of the task force were from academic, public, corporate, and special libraries located in different timezones. With such challenges, the Task Force had to use collaborative tools which were easy to use for all. Task member, Amanda L. Goodman, presented this podcast originally on LibUX’s Metric podcast.

Open Knowledge Foundation: How could a global public database help to tackle corporate tax avoidance?

Fri, 2017-02-17 10:00

A new research report published today looks at the current state and future prospects of a global public database of corporate accounts.

Shipyard of the Dutch East India Company in Amsterdam, 1750. Wikipedia.

The multinational corporation has become one of the most powerful and influential forms of economic organisation in the modern world. Emerging at the bleeding edge of colonial expansion in the seventeenth century, entities such as the Dutch and British East India Companies required novel kinds of legal, political, economic and administrative work to hold their sprawling networks of people, objects, resources, activities and information together across borders. Today it is estimated that over two thirds of the world’s hundred biggest economic entities are corporations rather than countries.

Our lives are permeated by and entangled with the activities and fruits of these multinationals. We are surrounded by their products, technologies, platforms, apps, logos, retailers, advertisements, publications, packaging, supply chains, infrastructures, furnishings and fashions. In many countries they have assumed the task of supplying societies with water, food, heat, clothing, transport, electricity, connectivity, information, entertainment and sociality.

We carry their trackers and technologies in our pockets and on our screens. They provide us not only with luxuries and frivolities, but the means to get by and to flourish as human beings in the contemporary world. They guide us through our lives, both figuratively and literally. The rise of new technologies means that corporations may often have more data about us than states do – and more data than we have about ourselves. But what do we know about them? What are these multinational entities – and where are they? What do they bring together? What role do they play in our economies and societies? Are their tax contributions commensurate with their profits and activities? Where should we look to inform legal, economic and policy measures to shape their activities for the benefit of society, not just shareholders?

At the moment these questions are surprisingly difficult to answer – at least in part due to a lack of publicly available information. We are currently on the brink of a number of important policy decisions (e.g. at the EU and in the UK) which will have a lasting effect on what we are able to know and how we are able to respond to these mysterious multinational giants.

Image from report on IKEA’s tax planning strategies. Greens/EFA Group in European Parliament.

A wave of high-profile public controversies, mobilisations and interventions around the tax affairs of multinationals followed in the wake of the 2007-2008 financial crisis. Tax justice and anti-austerity activists have occupied high street stores in order to protest multinational tax avoidance. A group of local traders in Wales sought to move their town offshore, in order to publicise and critique legal and accountancy practices used by multinationals. One artist issued fake certificates of incorporation for Cayman Island companies to highlight the social costs of tax avoidance. Corporate tax avoidance came to epitomise economic globalisation with an absence of corresponding democratic societal controls.

This public concern after the crisis prompted a succession of projects from various transnational groups and institutions. The then-G8 and G20 committed to reducing the “misalignment” between the activities and profits of multinationals. The G20 tasked the OECD with launching an initiative dedicated to tackling tax “Base Erosion and Profit Shifting” (BEPS). The OECD BEPS project surfaced different ways of understanding and accounting for multinational companies – including questions such as what they are, where they are, how to calculate where they should pay money, and by whom they should be governed.

For example, many industry associations, companies, institutions and audit firms advocated sticking to the “arms length principle” which would treat multinationals as a group of effectively independent legal entities. On the other hand, civil society groups and researchers called for “unitary taxation”, which would treat multinationals as a single entity with operations in multiple countries. The consultation also raised questions about the governance of transnational tax policy, with some groups arguing that responsibility should shift from the OECD to the United Nations  to ensure that all countries have a say – especially those in the Global South.

Exhibition of Paolo Cirio’s “Loophole for All” in Basel, 2015. Paolo Cirio.

While many civil society actors highlighted the shortcomings and limitations of the OECD BEPS process, they acknowledged that it did succeed in obtaining global institutional recognition for a proposal which had been central to the “tax justice” agenda for the previous decade: “Country by Country Reporting” (CBCR), which would require multinationals to produce comprehensive, global reports on their economic activities and tax contributions, broken down by country. But there was one major drawback: it was suggested that this information should be shared between tax authorities, rather than being made public. Since the release of the the OECD BEPS final reports in 2015, a loose-knit network of campaigners have been busy working to make this data public.

Today we are publishing a new research report looking at the current state and future prospects of a global database on the economic activities and tax contributions of multinationals – including who might use it and how, what it could and should contain, the extent to which one could already start building such a database using publicly available sources, and next steps for policy, advocacy and technical work. It also highlights what is involved in making of data about multinationals, including social and political processes of classification and standardisation that this data depends on.

New report on why we need a public database on the tax contributions and economic activities of multinational companies

The report reviews several public sources of CBCR data – including from legislation introduced in the wake of the financial crisis. Under the Trump administration, the US is currently in the process of repealing and dismantling key parts of the Dodd-Frank Wall Street Reform and Consumer Protection Act, including Section 1504 on transparency in the extractive industry, which Oxfam recently described as the “brutal loss of 10 years of work”. Some of the best available public CBCR data is generated as a result of the European Capital Requirements Directive IV (CRD IV), which gives us an unprecedented (albeit often imperfect) series of snapshots of multinational financial institutions with operations in Europe. Rapporteurs at the European Parliament just published an encouraging draft in support of making country-by-country reporting data public.

While the longer term dream for many is a global public database housed at the United Nations, until this is realised civil society groups may build their own. As well as being used as an informational resource in itself, such a database could be seen as form of “data activism” to change what public institutions count – taking a cue from citizen and civil society data projects to take measure of issues they care about – from migrant deaths to police killings, literacy rates, water access or fracking pollution.

A civil society database could play another important role: it could be a means to facilitate the assembly and coordination of different actors who share an interest in the economic activities of multinationals. It would thus be not only a source of information, but also a mechanism for organisation – allowing journalists, researchers, civil society organisations and others to collaborate around the collection, verification, analysis and interpretation of this data. In parallel to ongoing campaigns for public data, a civil society database could thus be viewed as a kind of democratic experiment opening up space for public engagement, deliberation and imagination around how the global economy is organised, and how it might be organised differently.

In the face of an onslaught of nationalist challenges to political and economic world-making projects of the previous century – not least through the “neoliberal protectionism” of the Trump administration – supporting the development of transnational democratic publics with an interest in understanding and responding to some of the world’s biggest economic actors is surely an urgent task.

Launched in 2016, supported by a grant from Omidyar Network, the FTC and coordinated by TJN and OKI, Open Data for Tax Justice is a project to create a global network of people and organisations using open data to improve advocacy, journalism and public policy around tax justice. More details about the project and its members can be found at datafortaxjustice.net.

This piece is cross-posted at OpenDemocracy.

Evergreen ILS: Security releases: OpenSRF 2.4.2 and 2.5.0-alpha2, Evergreen 2.10.10, and Evergreen 2.11.3

Thu, 2017-02-16 21:52

OpenSRF 2.4.2 and 2.5.0-alpha2, Evergreen 2.10.10, and Evergreen 2.11.3 are now available. These are security releases; the Evergreen and OpenSRF developers strongly urge users to upgrade as soon as possible.

The security issue fixed in OpenSRF has to do with how OpenSRF constructs keys for use by memcached; under certain circumstances, attackers would be able to exploit the issue to perform denial of service and authentication bypass attacks against Evergreen systems. Users of OpenSRF 2.4.1 and earlier are should upgrade to OpenSRF 2.4.2 right away, while testers of OpenSRF 2.5.0-alpha should upgrade to 2.5.0-alpha2.

If you are currently using OpenSRF 2.4.0 or later, you can update an Evergreen system as follows:

  • Download OpenSRF 2.4.2 and follow its installation instructions up to and including the make install step and chown -R opensrf:opensrf /<PREFIX> steps.
  • Restart Evergreen services using osrf_control.
  • Restart Apache

If you are running a version of OpenSRF older than 2.4.0, you will also need to perform the make and make install steps in Evergreen prior to restarting services.

Please visit the OpenSRF download page to retrieve the latest releases and consult the release notes.

The security issue fixed in Evergreen 2.10.10 and 2.11.3 affects users of the Stripe credit card payment processor and entails the possibility of attackers gaining access to your strip credentials. Users of Evergreen 2.10.x and 2.11.x can simply upgrade as normal, but if you are running Evergreen 2.9.x or earlier, or if you cannot perform a full upgrade right away, you can apply the fix by running the following two SQL statements in your Evergreen database:

UPDATE config.org_unit_setting_type SET view_perm = (SELECT id FROM permission.perm_list WHERE code = 'VIEW_CREDIT_CARD_PROCESSING' LIMIT 1) WHERE name LIKE 'credit.processor.stripe%' AND view_perm IS NULL; UPDATE config.org_unit_setting_type SET update_perm = (SELECT id FROM permission.perm_list WHERE code = 'ADMIN_CREDIT_CARD_PROCESSING' LIMIT 1) WHERE name LIKE 'credit.processor.stripe%' AND update_perm IS NULL;

In addition, Evergreen 2.10.10 has the following fixes since 2.10.9:

  • A fix to correctly apply floating group settings when performing no-op checkins.
  • A fix to the HTML coding of the temporary lists page.
  • A fix of a problem where certain kinds of requests of information about the organizational unit hierarchy to consume all available open-ils.cstore backends.
  • A fix to allow staff to use the place another hold link without running into a user interface loop.
  • A fix to the Edit Due Date form in the web staff client.
  • A fix to sort billing types and non-barcoded item types in alphabetical order in the web staff client.
  • A fix to the return to grouped search results link in the public catalog.
  • A fix to allow pre-cat checkouts in the web staff client without requiring a circulation modifier.
  • Other typo and documentation fixes.

Evergreen 2.11.3 has the following additional fixes since 2.11.2:

  • A fix to correctly apply floating group settings when performing no-op checkins.
  • An improvement to the speed of looking up patrons by their username; this is particularly important for large databases.
  • A fix to properly display the contents of temporary lists (My List) in the public catalog, as well as a fix of the HTML coding of that page.
  • A fix to the Spanish translation of the public catalog that could cause catalog searches to fail.
  • A fix of a problem where certain kinds of requests of information about the organizational unit hierarchy to consume all available open-ils.cstore backends.
  • A fix to allow staff to use the place another hold link without running into a user interface loop.
  • A fix to the Edit Due Date form in the web staff client.
  • A fix to the definition of the stock Full Overlay merge profile.
  • A fix to sort billing types in alphabetical order in the web staff client.
  • A fix to the display of the popularity score in the public catalog.
  • A fix to the return to grouped search results link in the public catalog.
  • A fix to allow pre-cat checkouts in the web staff client without requiring a circulation modifier.
  • A fix to how Action/Trigger event definitions with nullable grouping fields handle null values.
  • Other typo and documentation fixes.

Please visit the Evergreen download page to retrieve the latest releases and consult the release notes.

District Dispatch: New amicus briefs on old copyright cases

Thu, 2017-02-16 20:32

The American Library Association (ALA), as a member of the Library Copyright Alliance (LCA), joined amicus briefs on Monday in support of two landmark copyright cases on appeal.

Photo credit: Anita Hart, flickr

The first (pdf) is the Georgia State University (GSU) case—yes, that one— arguing that GSU’s e-reserves service is a fair use. The initial complaint was brought back in 2008 by three academic publishers and has been bankrolled by the Copyright Clearance Center and the American Association of Publishers ever since.
Appeals and multiple requests for injunction from the publishers have kept this case alive for eight years. (The long history of the ins and outs of these proceedings can be found here, and the briefs filed by the Library Copyright Alliance (LCA) can be found here.) Most recently, in March 2016, a federal appeals court ruled in GSU’s favor and many thought that would be the end of the story. The publishers appealed again, however, demanding in part that the court conduct a complicated market effect analysis and reverse its earlier ruling.

While not parties to the case, LCA and co-author the Electronic Frontier Foundation (EFF) make three principal points in their “friend of the court” (or “amicus”) brief:

  • First, they note that that GSU’s e-reserve service is a fair use of copyrighted material purchased by its library, underscoring that the service was modeled on a broad consensus of best practices among academic libraries.
  • Second, and more technically, the brief explains why the district court should have considered the goals of faculty and researchers who wrote most the works involved to disseminate works broadly as a characteristic of the “nature of the use” factor of fair use.
  • Third, and finally, the brief addresses the fourth factor of the statutory fair use test: the effect of the material’s use on the market for the copyrighted work.

Libraries and EFF note that the content loaned by GSU through its e-reserve service is produced by faculty compensated with state funds. Accordingly, they contend, “A ruling against fair use in this case will create a net loss to the public by suppressing educational uses, diverting scarce resources away from valuable educational investments, or both. This loss will not be balanced by any new incentive for creative activity.”

Photo credit: Pixabay

The second amicus brief just filed by ALA and its LCA allies, another defense of fair use, was prepared and filed in conjunction with the Internet Archive on behalf of ReDigi in its ongoing litigation with Capitol Records. ReDigi is an online business that provides a cloud storage service capable of identifying lawfully acquired music files. Through ReDigi, the owner of the music file can electronically distribute it to another person. When they do, however, the ReDigi service is built to automatically and reliably delete the sender’s original copy. ReDigi originally maintained that this “one copy, one user” model and its service should have been considered legal under the “first sale doctrine” in U.S. copyright law. That’s the statutory provision which allows libraries to lend copies that they’ve lawfully acquired or any individual to, for example, buy a book or DVD and then resell or give it away. Written long before materials became digital, however, that part of the Copyright Act refers only to tangible (rather than electronic) materials. The Court thus originally rejected ReDigi’s first sale doctrine defense.

In their new amicus brief on ReDigi’s appeal, LCA revives and refines an argument that it first made way back in 2000 when ReDigi’s automatic delete-on-transfer technology did not exist. Namely, that digital first sale would foster more innovative library services and, for that and other reasons, should be viewed as a fair use that is appropriate in some circumstances.

With the boundaries of fair use or first sale unlikely to be productively changed in Congress, ALA and its library and other partners will continue to participate in potentially watershed judicial proceedings like these.

The post New amicus briefs on old copyright cases appeared first on District Dispatch.

District Dispatch: Look Back, Move Forward: librarians combating misinformation

Thu, 2017-02-16 17:38

Librarians across the field have always been dedicated to combating misinformation. TBT to 1987, when the ALA Council passed the “Resolution on Misinformation to Citizens” on July 1 in San Francisco, California. (The resolution is also accessible via the American Library Association Institutional Repository here.)

Resolution on Misinformation to Citizens, passed on July 1, 1987, in San Francisco, California.

In response to the recent dialogue on fake news and news literacy, the ALA Intellectual Freedom Committee crafted the “Resolution on Access to Accurate Information,” adopted by Council on January 24.

Librarians have always helped people sort reliable sources from unreliable ones. Here are a few resources to explore:

  • IFLA’s post on “Alternative Facts and Fake News – Verifiability in the Information Society”
  • Indiana University East Campus Library’s LibGuide, “Fake News: Resources”
  • Drexel University Libraries’ LibGuide, “Fake News: Source Evaluation”
  • Harvard Library’s LibGuide, “Fake News, Misinformation, and Propaganda”
  • ALA Office for Intellectual Freedom’s “Intellectual Freedom News,” a free biweekly compilation of news related to (among other things!) privacy, internet filtering and censorship.
  • This Texas Standard article on the “CRAAP” (Currency, Relevance, Authority, Accuracy & Purpose) test.

If you are working on or have encountered notable “fake news” LibGuides, please post links in the comments below!

The post Look Back, Move Forward: librarians combating misinformation appeared first on District Dispatch.

Evergreen ILS: Upcoming Evergreen and OpenSRF security releases

Thu, 2017-02-16 16:26

Later today we will be releasing security updates for Evergreen and OpenSRF. We recommend that Evergreen users be prepared to install them as soon as possible.

The Evergreen security issue only affects users of a certain credit card payment processor, and the fix can be implemented by running two SQL statements; a full upgrade is not required.

The OpenSRF security issue is more serious and can be used by attackers to perform a denial of service attack and potentially bypass standard authentication.  Consequently, we recommend that users upgrade to OpenSRF 2.4.2 as soon as it is released.

If you are currently using OpenSRF 2.4.0 or OpenSRF 2.4.1, the upgrade will consist of the following steps:

  • downloading and compiling OpenSRF 2.4.2
  • running the ‘make install’ step
  • restarting Evergreen services

If you are currently running a version of OpenSRF that is older than 2.4.0, we strongly recommend upgrading to 2.4.2; note that it will also be necessary to recompile Evergreen.

There will also be an second beta release of OpenSRF 2.5 that will include the security fix.

David Rosenthal: Postel's Law again

Thu, 2017-02-16 16:00
Eight years ago I wrote:
In RFC 793 (1981) the late, great Jon Postel laid down one of the basic design principles of the Internet, Postel's Law or the Robustness Principle:
"Be conservative in what you do; be liberal in what you accept from others."Its important not to lose sight of the fact that digital preservation is on the "accept" side of Postel's Law,Recently, discussion on a mailing list I'm on focused on the downsides of Postel's Law. Below the fold, I try to explain why most of these downsides don't apply to the "accept" side, which is the side that matters for digital preservation.

Two years after my post, Eric Allman wrote The Robustness Principle Reconsidered, setting out the reasons why Postel's Law isn't an unqualified boon. He writes that Postel's goal was interoperability:
The intent of the Robustness Principle was to maximize interoperability between network service implementations, particularly in the face of ambiguous or incomplete specifications. If every implementation of some service that generates some piece of protocol did so using the most conservative interpretation of the specification and every implementation that accepted that piece of protocol interpreted it using the most generous interpretation, then the chance that the two services would be able to talk with each other would be maximized.But:
In recent years, however, that principle has been challenged. This isn't because implementers have gotten more stupid, but rather because the world has become more hostile. Two general problem areas are impacted by the Robustness Principle: orderly interoperability and security.Allman argues, based on his experience with SMTP and Kirk McKusick's with NFS, that interoperability arises in one of two ways, the "rough consensus and running code" that characterized NFS (and TCP), or from detailed specifications:
the specification may be ambiguous: two engineers build implementations that meet the spec, but those implementations still won't talk to each other. The spec may in fact be unambiguous but worded in a way that some people misinterpret. ... The specification may not have taken certain situations (e.g., hardware failures) into account, which can result in cases where making an implementation work in the real world actually requires violating the spec. ... the specification may make implicit assumptions about the environment (e.g., maximum size of network packets supported by the hardware or how a related protocol works), and those assumptions may be incorrect or the environment may change. Finally, and very commonly, some implementers may find a need to enhance the protocol to add new functionality that isn't defined by the spec.His arguments here are very similar to those I made in Are format specifications important for preservation?:
I'm someone with actual experience of implementing a renderer for a format from its specification. Based on this, I'm sure that no matter how careful or voluminous the specification is, there will always be things that are missing or obscure. There is no possibility of specifying formats as complex as Microsoft Office's so comprehensively that a clean-room implementation will be perfect. Indeed, there are always minor incompatibilities (sometimes called enhancements, and sometimes called bugs) between different versions of the same product.The "rough consensus and running code" approach isn't perfect either. As Allman relates, it takes a lot of work to achieve useful interoperability:
The original InterOp conference was intended to allow vendors with NFS (Network File System) implementations to test interoperability and ultimately demonstrate publicly that they could interoperate. The first 11 days were limited to a small number of engineers so they could get together in one room and actually make their stuff work together. When they walked into the room, the vendors worked mostly against only their own systems and possibly Sun's (since as the original developer of NFS, Sun had the reference implementation at the time). Long nights were devoted to battles over ambiguities in the specification. At the end of those 11 days the doors were thrown open to customers, at which point most (but not all) of the systems worked against every other system. The primary reason is that even finding all the corner cases is difficult, and so is deciding for each whether the sender needs to be more conservative or the receiver needs to be more liberal.

The security downside of Postel's Law is even more fundamental. The law requires the receiver to accept, and do something sensible with, malformed input. Doing something sensible will almost certainly provide an attacker with the opportunity to make the receiver do something bad.

An example is in encrypted protocols such as SSL. They typically provide for the initiator to negotiate with the receiver the specifics of the encryption to be used. Liberal receivers can be negotiated down to the use of an obsolete algorithm, vitiating the security of the conversation. Allman writes:
Everything, even services that you may think you control, is suspect. It's not just user input that needs to be checked—attackers can potentially include arbitrary data in DNS (Domain Name System) results, database query results, HTTP reply codes, you name it. Everyone knows to check for buffer overflows, but checking incoming data goes far beyond that. Security appears to demand receivers be extremely conservative, but that would kill off interoperability; Allman argues that a balance between these conflicting goals is needed.

Ingest and dissemination in digital preservation are more restricted cases of both interoperability and security. As regards interoperability:
  • Ingest is concerned with interoperability between the archive and the real world. As digital archivists we may be unhappy that, for example, one of the consequences of Postel's Law is that in the real world almost none of the HTML conforms to the standard. But our mission requires that we observe Postel's Law and not act on this unhappiness. It would be counter-productive to go to websites and say "if you want to be archived you need to clean up your HTML".
  • Dissemination is concerned with interoperability between the archive and an eventual reader's tools. Traditionally, format migration has been the answer to this problem, whether preemptive or on-access. More recently, emulation-based strategies such as Ilya Kreymer's oldweb.today avoid the problem of maintaining interoperability through time by reconstructing a contemporaneous environment.
As regards security:
  • Ingest. In the good old days when Web archives simply parsed the content they ingested to find the links, the risk to their ingest infrastructure was minimal. But now the Web has evolved from inter-linked static documents to a programming environment, the risk to the ingest infrastructure from executing the content is significant. Precautions are needed, such as sandbox-ing the ingest systems.
  • Dissemination. Many archives attempt to protect future readers by virus-scanning on ingest. But, as I argued in Scary Monsters Under The Bed, this is likely to be both ineffective and counter-productive. As digital archivists we may not like the fact that the real world contains malware, but our mission requires that we not deprive future scholars of the ability to study it. Optional malware removal on access is a suitable way to mitigate the risk to scholars not interested in malware (cf. the Internet Archive's Malware Museum).
Thus, security considerations for digital preservation systems should not focus on being conservative by rejecting content for suspected malware, but instead focus on taking reasonable precautions so that content can be accepted despite the possibility that some might be malicious.

FOSS4Lib Recent Releases: ArchivesSpace - 1.5.3

Thu, 2017-02-16 14:21

Last updated February 16, 2017. Created by Peter Murray on February 16, 2017.
Log in to edit this page.

Package: ArchivesSpaceRelease Date: Wednesday, February 15, 2017

Open Knowledge Foundation: Mapping open data governance models: Who makes decisions about government data and how?

Thu, 2017-02-16 11:00

Different countries have different models to govern and administer their open data activities. Ana Brandusescu, Danny Lämmerhirt and Stefaan Verhulst call for a systematic and comparative investigation of the different governance models for open data policy and publication.

The Challenge

An important value proposition behind open data involves increased transparency and accountability of governance. Yet little is known about how open data itself is governed. Who decides and how? How accountable are data holders to both the demand side and policy makers? How do data producers and actors assure the quality of government data? Who, if any, are data stewards within government tasked to make its data open?

Getting a better understanding of open data governance is not only important from an accountability point of view. If there is a better insight of the diversity of decision-making models and structures across countries, the implementation of common open data principles, such as those advocated by the International Open Data Charter, can be accelerated across countries.

In what follows, we seek to develop the initial contours of a research agenda on open data governance models. We start from the premise that different countries have different models to govern and administer their activities – in short, different ‘governance models’. Some countries are more devolved in their decision making, while others seek to organize “public administration” activities more centrally. These governance models clearly impact how open data is governed – providing a broad patchwork of different open data governance across the world and making it difficult to identify who the open data decision makers and data gatekeepers or stewards are within a given country.  

For example, if one wants to accelerate the opening up of education data across borders, in some countries this may fall under the authority of sub-national government (such as states, provinces, territories or even cities), while in other countries education is governed by central government or implemented through public-private partnership arrangements. Similarly, transportation or water data may be privatised, while in other cases it may be the responsibility of municipal or regional government. Responsibilities are therefore often distributed across administrative levels and agencies affecting how (open) government data is produced, and published.

Why does this research matter? Why now?

A systematic and comparative investigation of the different governance models for open data policy and publication has been missing till date. To steer the open data movement toward its next phase of maturity, there is an urgency to understand these governance models and their role in open data policy and implementation.

For instance, the International Open Data Charter states that government data should be “open by default” across entire nations. But the variety of governance systems makes it hard to understand the different levers that could be used to enable nationwide publication of open government data by default. Who holds effectively the power to decide what gets published and what not? By identifying the strengths and weaknesses of governance models, the global open data community (along with the Open Data Charter) and governments can work together and identify the most effective ways to implement open data strategies and to understand what works and what doesn’t.

In the next few months we will seek to increase our comparative understanding of the mechanisms of decision making as it relates to open data within and across government and map the relationships between data holders, decision makers, data producers, data quality assurance actors, data users and gatekeepers or intermediaries. This may provide for insights on how to improve the open data ecosystem by learning from others.

Additionally, our findings may identify the “levers” within governance models used to provide government data more openly. And finally, having more transparency about who is accountable for open data decisions could allow for a more informed dialogue with other stakeholders on performance of the publication of open government data.

We are interested in how different governance models affect open data policies and practices – including the implementations of global principles and commitments. We want to map the open data governance process and ecosystem by identifying the following key stakeholders, their roles and responsibilities in the administration of open data, and seeking how they are connected:

  • Decision makers – Who leads/asserts decision authority on open data in meetings, procedures, conduct, debate, voting and other issues?
  • Data holders – Which organizations / government bodies manage and administer data?
  • Data producers – Which organizations / government bodies produce what kind of public sector information?
  • Data quality assurance actors – Who are the actors ensuring that produced data adhere to certain quality standards and does this conflict with their publication as open data?
  • Data gatekeepers/stewards – Who controls open data publication?

We plan to research the governance approaches to the following types of data:

  • Health: mortality and survival rates, levels of vaccination, levels of access to health care, waiting times for medical treatment, spend per admission
  • Education: test scores for pupils in national examinations, school attendance rates, teacher attendance rates
  • National Statistics: population, GDP, unemployment
  • Transportation: times and stops of public transport services – buses, trains
  • Trade: import and export of specific commodities, balance of trade data against other countries
  • Company registers: list of registered companies in the country, shareholder and beneficial ownership information, lobbying register(s) with information on companies, associations representatives at parliamentary bodies
  • Legislation: national legal code, bills, transcripts of debates, finances of parties
Output of research

We will use different methods to get rapid insights. This includes interviews with stakeholders such as government officials, as well as open government initiatives from various sectors (e.g. public health services, public education, trade). Interviewees may be open data experts, as well as policymakers or open data champions within government.

The type of questions we will seek to answer beyond the broad topic of “who is doing what”

  • Who holds power to assert authority over open data publication? What roles do different actors within government play to design policies and to implement them?
  • What forms of governance models can be derived from these roles and responsibilities? Can we see a common pattern of how decision-making power is distributed? How do these governance models differ?
  • What are criteria to evaluate the “performance of the observed governance models? How do they for instance influence open data policy and implementation?
Call for contributions

We invite all interested in this topic to contribute their ideas and to participate in the design and execution of one or more case studies. Have you done research on this? If so, we would also like to hear from you!

Contact one or all of the authors at:

Ana Brandusescu: ana.brandusescu@webfoundation.org

Danny Lämmerhirt: danny.lammerhirt@okfn.org

Stefaan Verhulst: stefaan@thegovlab.org

Pages