You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 9 hours 24 min ago

Library of Congress: The Signal: IEEE Big Data Conference 2016: Computational Archival Science

Mon, 2017-02-13 14:32

This is a guest post by Meredith Claire Broadway,a consultant for the World Bank.

Jason Baron, Drinker Biddle & Reath LLP, “Opening Up Dark Digital Archives Through The Use of Analytics To Identify Sensitive Content,” 2016. Photo by Meredith Claire Broadway.

Computational Archival Science can be regarded as the intersection between the archival profession and “hard” technical fields, such as computer science and engineering. CAS applies computational methods and resources to large-scale records and archives processing, analysis, storage, long-term preservation and access. In short: big data is a big deal for archivists, particularly because old-school pen-and-paper methodologies don’t apply to digital records. To keep up with big data, the archival profession is called upon to open itself up to new ideas and collaborate with technological professionals.

Naturally, collaboration was the  theme of the IEEE Big Data Conference ’16: Computational Archival Science workshop. There were many speakers with projects that drew on the spirit of collaboration by applying computational methods — such as machine learning, visualization and neuro-linguistic programming — to archival problems. Subjects ranged from improving optical-character-recognition efforts with topic modeling to utilizing vector-space models so that archives can better anonymize PII and other sensitive content.

For example, “Content-based Comparison for Collections Identification” was presented by a team led by Maria Esteva of the Texas Advanced Computing Center. Maria and her team created an automated method of discovering the identity of datasets that appear to be similar or identical but may be housed in two different repositories or parts of different collections. This service is important to archivists because datasets often exist in multiple formats and versions and in different stages of completion. Traditionally, archives determine issues such as these through manual effort and metadata entry. A shift to automation of content-based comparison allows archivists to identify changes, connections and differences between digital records with greater accuracy and efficiency.

The team’s algorithm operates in straightforward manner. First, two collections are analyzed to determine the types of records they contain and then a list is generated for each collection. Next the records analysis creates a list of pairs from each collection for comparison. Finally, a summary report is created to show differences between the collections.

Weijia Xu, Ruizhu Huang, Maria Esteva, Jawon Song, Ramona Walls, “Content-based Comparison for Collections Identification,” 2015.

To briefly summarize Maria’s findings, metadata alone isn’t enough for the content-based comparison algorithm to determine whether a dataset is unique. The algorithm needs more information from datasets to make improved comparisons.

Automated collection-based comparison is the future of digital archives. Naturally, it raises questions, among them, “What is the best way for archivists to meet automated methods?” and  ”How can current archival workflows be aligned with computational efforts?”

The IEEE Computational Archival Science session ended on a contemplative note. Keynote speaker Mark Conrad, of the National Archives and Records Administration, asked the gathering about what skills they thought the new generation of computational archival scientists should be taught. Topping the list were answers such as “coding,” “text mining” and “history of archival practice.”

What interested me most was the ensuing conversation about how CAS deserves its own academic track. The assembly agreed that CAS differs enough from the traditional Library and Information Science and Archival tracks, in both the United States and Canada, that it qualifies as a new area of study.

CAS differs from the LIS and Archival fields in large part due to its technology-centric nature. “Hard” technical skills take more than two years (the usual time it takes to complete an LIS master’s program) to develop, a fact I can personally attest to as a former LIS student and R beginner. It makes sense, then, that for CAS students to receive a robust education they should have a unique curriculum.

If CAS, LIS and the Archival Science fields merge, there’s an assumption that they will run the risk of taking an “inch-deep, mile-wide” approach to studies. Our assembly agreed that, in this case, “less is more” if it allows students to cultivate fully developed skills.

Of course these were just the opinions of those present at the IEEE workshop. As the session emphasized, CAS encourages collaboration, discussion and differing opinions. If you have something to add to any of my points, please leave a comment.

DuraSpace News: VIVO Updates for Feb 12–VIVO 1.9.2, Helping each other, VIVO Camp

Mon, 2017-02-13 00:00

From Mike Conlon, VIVO Project Director

VIVO 1.9.2 Released  VIVO 1.9.2 is a maintenance release addressing several bugs. Upgrading to 1.9.2 should be straightforward, there are no ontology or functional changes.  Bugs fixed:

Dan Scott: schema.org, Wikidata, Knowledge Graph: strands of the modern semantic web

Sun, 2017-02-12 21:05

My slides from Ohio DevFest 2016: schema.org, Wikidata, Knowledge Graph: strands of the modern semantic web

And the video, recorded and edited by the incredible amazing Patrick Hammond:

In November, I had the opportunity to speak at Ohio DevFest 2016. One of the organizers, Casey Borders, had invited me to talk about schema.org, structured data, or something in that subject area based on a talk about schema.org and RDFa he had seem me give at the DevFest Can-Am in Waterloo a few years prior. Given the Google-oriented nature of the event and the 50-minute time slot, I opted to add in coverage of the Google Knowledge Graph and its API, which I had been exploring from time to time since its launch in late 2014.

Alas, the Google Knowledge Graph Search API is still quite limited; it returns quite minimal data in comparison to the rich cards that you see in regular Google search results. The JSON results only include links for an image, a corresponding Wikipedia page, and for the ID of the entity. I also uncovered errors that had lurked in the documentation for quite some time; happily, the team quickly responded to correct those problems.

So I dug back in time and also covered Freebase, the database of linked and structured data that had both allowed individual contributions and which had made its database freely available--until it was purchased by Google, fed into the Knowledge Graph, and shut down. Not many people knew what we had once had until it was gone (Ed Summers did, for one), but such is the way of commercial entities.

In that context, Wikidata looks something like the Second Coming of an open (for contribution and access) linked and structured database, with sustainability derived financially from the Wikimedia Foundation and structurally by its role in underpinning Wikipedia and Wikimedia Commons. Google also did a nice thing by putting resources into adding the appropriately licensed data they could liberate from Freebase: approximately 19 million statements and IDs.

The inclusion of Google Knowledge Graph IDs in Wikidata means that we can use the Google Search API to find an entity ID, then pull the corresponding richer data from Wikidata for that ID to populate relationships and statements. You can get there from here! Ultimately, my thesis is that Wikidata can and will play a very important role in the modern (much more pragmatic) semantic web.

Terry Reese: Fonts, Font-sizes and the MacOS

Sun, 2017-02-12 05:49

So, one of the questions I’ve occasionally been getting from Mac users is that they would really like the ability to shift the font and font sizes of the programs’ interface.  If you’ve used the Windows version of MarcEdit, this has been available for some time, but I’ve not put it into the Mac version in part, because, I didn’t know how.  The Mac UI is definitely different from what I’m use to, and the way that the AppKit exposes controls and the way controls are structures as a collection of Views and Subviews complicates some of the sizing and layout options.  But I’ve been wanting to provide something because on really high resolution screens, the application was definitely getting hard to read.

Anyway, I’m not sure if this is the best way to do it, but this is what I’ve come up with.  Essentially, it’s a function that can determine if an element has text, an image, and perform the font scaling, control resizing and ultimately, windows sizing to take advantage of Apples Autolayout features.  Code is below.

 

public void SizeLabels(NSWindow objW, NSControl custom_control = null) { string test_string = "THIS IS MY TEST STRING"; string val = string.Empty; string font_name = ""; string font_size = ""; NSStringAttributes myattribute = new NSStringAttributes(); cxmlini.GetSettings(XMLPath(), "settings", "mac_font_name", "", ref font_name); cxmlini.GetSettings(XMLPath(), "settings", "mac_font_size", "", ref font_size); if (string.IsNullOrEmpty(font_name) && string.IsNullOrEmpty(font_size)) { return; } NSFont myfont = null; if (String.IsNullOrEmpty(font_name)) { myfont = NSFont.UserFontOfSize((nfloat)System.Convert.ToInt32(font_size)); } else if (String.IsNullOrEmpty(font_size)) { font_size = "13"; myfont = NSFont.FromFontName(font_name, (nfloat)System.Convert.ToInt32(font_size)); } else { myfont = NSFont.FromFontName(font_name, (nfloat)System.Convert.ToInt32(font_size)); } if (custom_control == null) { CoreGraphics.CGSize original_size = NSStringDrawing.StringSize(test_string, myattribute); myattribute.Font = myfont; CoreGraphics.CGSize new_size = NSStringDrawing.StringSize(test_string, myattribute); CoreGraphics.CGRect frame = objW.Frame; frame.Size = ResizeWindow(original_size, new_size, frame.Size); objW.MinSize = frame.Size; objW.SetFrame(frame, true); objW.ContentView.UpdateConstraints(); //objW.ContentView.UpdateTrackingAreas(); //MessageBox(objW, objW.Frame.Size.Width.ToString() + ":" + objW.Frame.Size.Height.ToString()); foreach (NSView v in objW.ContentView.Subviews) { if (v.IsKindOfClass(new ObjCRuntime.Class("NSControl"))) { NSControl mycontrol = ((NSControl)v); switch (mycontrol.GetType().ToString()) { case "AppKit.NSTextField": case "AppKit.NSButtonCell": case "AppKit.NSBox": case "AppKit.NSButton": if (mycontrol.GetType().ToString() == "AppKit.NSButton") { if (((NSButton)mycontrol).Image != null) { break; } } mycontrol.Font = myfont; //if (!string.IsNullOrEmpty(mycontrol.StringValue)) //{ // mycontrol.SizeToFit(); //} mycontrol.UpdateConstraints(); break; } if (mycontrol.Subviews.Length > 0) { SizeLabels(objW, mycontrol); } } else if (v.IsKindOfClass(new ObjCRuntime.Class("NSTabView"))) { NSTabView mytabview = ((NSTabView)v); foreach (NSTabViewItem ti in mytabview.Items) { foreach (NSView tv in ti.View.Subviews) { if (tv.IsKindOfClass(new ObjCRuntime.Class("NSControl"))) { SizeLabels(objW, (NSControl)tv); } } } } } } else { if (custom_control.Subviews.Length == 0) { if (custom_control.GetType().ToString() != "AppKit.NSButton" || (custom_control.GetType().ToString() == "AppKit.NSButton" && ((NSButton)custom_control).Image == null)) { custom_control.Font = myfont; custom_control.UpdateConstraints(); } } else { foreach (NSView v in custom_control.Subviews) { NSControl mycontrol = ((NSControl)v); switch (mycontrol.GetType().ToString()) { case "AppKit.NSTextField": case "AppKit.NSButtonCell": case "AppKit.NSBox": case "AppKit.NSButton": if (mycontrol.GetType().ToString() == "AppKit.NSButton") { if (((NSButton)mycontrol).Image != null) { break; } } mycontrol.Font = myfont; //if (!string.IsNullOrEmpty(mycontrol.StringValue)) //{ // mycontrol.SizeToFit(); //} mycontrol.UpdateConstraints(); break; default: if (mycontrol.Subviews.Length > 0) { SizeLabels(objW, mycontrol); } break; } } } } }

And that was it. I’m sure there might be better ways, but this is (crossing my fingers) working for me right now.

Terry Reese: MarcEdit KBart Plugin

Sun, 2017-02-12 05:42

Last year, I had the opportunity to present at NASIG, and one of the questions that came up was related to the KBart format and if MarcEdit could generate it.  I’ll be honest, I’d never heard of KBart and this was the first time it had come up.  Well, fast forward a few months and I’ve heard the name a few more times, and since I’ll be making my way to NASIG again later this year to speak, I figured this time I’d come bearing new gifts.  So, I spent about 20 minutes this evening wrapping up a kbart plugin.  The interface is very basic:

And essentially has been designed to allow a user to take a MARC or MarcEdit mnemonic file and output a kbart file in either tab or comma delimited format.

Now, a couple of caveats — I still don’t really have a great idea of why folks want to create kbart files — this isn’t my area.  But the documentation on the working group’s website was straightforward, so I believe that the files generated will be up to par.  Though, I’m hoping that prior to NASIG, a few of the folks that may actually find something like this interesting may be willing to give it a spin and provide a little feedback.

Again, this will be available after the next update is posted so that I can allow it to take advantage of some of the new plugin management features being added to the tool.

–tr

Terry Reese: MarcEdit Workstreams: MacOS and Windows/Linux

Sat, 2017-02-11 02:33

Over the past couple of months, I’ve had some interesting questions that have been leading me to going back and relooking at how a handful of things work within MarcEdit.  To that end, I’m hoping to complete the following two workstreams this weekend.

MacOS

Two of the features most often asked for at this point deal with accessibility options and plugin support.  The creation of the AddPinyin plugin for windows (https://library.princeton.edu/eastasian/addpinyin-plugin-marcedit) has got people asking if this will show up for Mac Users as well.  My guess is that it could, but in order for that to happen, I need to implement plugin support in the Mac.  The challenge is figuring out how, since the process I used with Windows and Linux simply won’t work with the Mac UI thread model.  So, I’ve been thinking on this, and this weekend, I’ll be including the first parts of code that should allow me to start making this happen.  Ideally, I’ll start by migrating some of the current MarcEdit plugins, probably the Internet Archive 2 HathiTrust Packager first; and then go from there.

The other change that I’m working on that will show up in this update is the ability to control the application font and font sizes.  You can see the start of this work here: https://www.youtube.com/watch?v=k00OL7QVcI4  Like the windows version, I’ll eventually add language support, which will enable the use of language files to set the text in the application.  But for now, I’ll be enabling the ability to modify the application font and change the size of the fonts within the application and editor.

Windows/Linux

The interest in the plugins have made me take another look at how they are managed.  Currently it is clunky, users get no notification when they change, and updating them takes multiple steps.  That will change.  I’ve been restructuring how plugins are managed, so that they will now automatically notify users when they have changed, as well as offer the ability to download the update.  Additionally, I’ve extended the plugin manager so that it can manage access to plugins outside of the MarcEdit website, so I’ll be including links to the AddPinyin plugin, and be able to include this plugin in the automated management (i.e., update notification).  Overall, I believe that this will make plugins easier to use, and much, much easier to manage.

–tr

Harvard Library Innovation Lab: LIL talks: Becky

Fri, 2017-02-10 21:05

Today Becky taught us about the lifetime of a star, and all of our minds were blown.

Harvard Library Innovation Lab: LIL talks: Andy

Fri, 2017-02-10 20:59

We started a weekly series where members and friends of the LIL team teach us about something they are interested in.

Last Friday, Andy showed us how to make homemade mayo and aioli:

Cynthia Ng: Adding Custom Reports to Horizon

Fri, 2017-02-10 16:58
I have been putting a lot of our reports into Horizon so that staff can take a look at them any time and do not need an intermediary. While not even SirsiDynix could help me put all of them in Horizon, I have a few in and I am adding more as I encounter them. … Continue reading Adding Custom Reports to Horizon

District Dispatch: Buy it, own it you do! “YODA” is back!

Thu, 2017-02-09 21:37

Blogging about Congress and legislation can sometimes be, well, a little challenging.  How to make it exciting, engaging . . . or at least interesting to a non- policy wonk audience?  Then there are those happy times when we here at District Dispatch just can’t improve on the original. This week, was one of those times. Reps.

Source: http://comicbook.com/2015/06/27/analyst-predicts-star-wars-the-force-awakens-to-make-nearly-2b/

Blake Farenthold (R-KS) and Jared Polis (D-CO) have teamed up for the third Congress in a row to re- re-introduce their signature bill. This key copyright legislation will assure that when consumers in the age of the Internet of Things buy devices with built-in software – everything from cars, to tractors to toasters – they won’t have to get permission from the owner of that software to sell or give away the device in which it’s embedded. The bill’s called . . .  wait for it . . . the You Own Devices Act (yup, “YODA”) and on Rep. Farenthold’s own press release, improve we cannot:

Congressman Farenthold, Polis reintroduce YODA Act

Improve copyright system, YODA would

WASHINGTON, February 7, 2017 | Elizabeth Peace (202-225-7742) | 0 comments

Today, introduced was You Own Devices Act (YODA), by Rep. Blake Farenthold (R-Texas) and Rep. Jared Polis (D-Colo.). In 2015, was when the pair introduced the legislation together.

Allowing essential software to travel with physical devices, like computers, YODA will do. By modernizing copyright law in this way, a consumer can sell, lease or give away a computer, as well as the licenses for essential software that allows the computer to operate, it will.

“The YODA bill, today, I did file,” said Congressman Farenthold. “YODA simply states that your device belongs to you. If you wish to sell that device, the software that enables it to work is transferred along with it, and that any right you have to security and bug fixing of that software is transferred as well.”

“Old copyright laws stifle innovation and block consumers from being able to access more affordable products and technology,” said Congressman Polis. “YODA will cut red tape and allow people to sell or give away devices without being strangled by paperwork and empty legal threats.”

Last session, waylaid in committee YODA was.

The post Buy it, own it you do! “YODA” is back! appeared first on District Dispatch.

LITA: Jobs in Information Technology: February 8, 2017

Thu, 2017-02-09 16:27

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

City of Los Angeles, Department of Water and Power, Director of Cyber Security, Los Angeles, CA

California Digital Library (CDL), Metadata Analyst (7584), Oakland, CA

University of Vermont, Science and Data Librarian, Burlington, VT

Yale University, Director of Preservation Services, New Haven, CT

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

William Denton: Don't forget Aaron Swartz

Thu, 2017-02-09 15:04

Today’s Data Scraping Mini Episode of the Partially Derivative podcast appalled me. It’s about scraping data off web sites, and there’s lots of laughter and joking as Chris Albon relates how in preparation for his PhD comps he wrote a script to download thousands of articles from JSTOR. And sure enough he got in a little bit of trouble because he hadn’t read the terms of service, and the next day the university librarian passed the word down that all that had to stop immediately or there would be serious repercussions. More wry laughter at having escaped without anything worse.

All that and no mention, not one, of Aaron Swartz. I don’t know when Albon was doing the downloading, but from the dates on his MA and PhD it looks like it might have been after Swartz got arrested in January 2011 (and before he committed suicide in 2013). Even if it was before that, it astounds me that anyone now could talk about doing a mass download from JSTOR without mentioning Swartz. I can understand there wasn’t time to get into how messed up everything about scholarly publishing is, but a joking reminder about how it’s important to read terms of service trivializes important issues that everyone in this field should know and discuss.

Aaron Swartz, slightly blurry in lower left, February 2008

I met Swartz very briefly one day in 2008, and wrote it up after he died.

State Library of Denmark: SolrWayback Machine

Thu, 2017-02-09 14:13

Another ‘google innovation week’ at work has produced the SolrWayback Machine. It works similar to the Internet Archive: Wayback Machine (https://archive.org/web/) and can be used to show harvested web content (Warc files).  The Danish Internet Archive has over 20billion harvested web objects and takes 800Petabyte of storage.

The SolrWayback engine require you have indexed the Warc files using the Warc-indexer tool from British Library. (https://github.com/ukwa/webarchive-discovery/tree/master/warc-indexer).

It is quite fast and comes with some additional features as well:

  •  Image search similar to google images
  •  Link graphs showing  links (ingoing/outgoing) for domains using the D3 javascript framework.
  •  Raw download of any harvested resource from the binary Arc/Warc file.

Unfortunately  the collection is not available for the public so I can not show you the demo. But here is a few pictures from the SolrWayback machine.

 

SolrWayback at GitHub: https://github.com/netarchivesuite/solrwayback/


LibUX: Design Committee Power-Play: “You all get an opinion, but only three of you get a say.”

Thu, 2017-02-09 13:46

Design committees are part and parcel of the kind of work we do. Design in education and not-for-profits tends to lean itself toward bureaucracy, where the opinions of stakeholders arbitrarily trump otherwise sound, good user experience decisions because they feel it’s important to represent this department, or that service.

I mentioned in another post that I rely on defining almost contractual — sometimes literally contractual — axioms at the start of any project. Come to think of it, I think this deserves its own post. Anyway, until then, here’s the rundown:

These are guiding decisions, parts of the contract, that shape all the choices that follow. They’re useful with multiple stakeholders to prevent eventual scope creep, or to bake-in user-centricity from the beginning if you anticipate the upper ranks to make decisions based off what they like as opposed to what works.

I often pair this tactic with another: identifying from the larger committee or organization lead project stakeholders. These are three individuals who have two important attributes:

  1. they aren’t department heads – hopefully
  2. they are imbued with the authority by their department heads / bosses to make directional decisions about a project

The idea is that they are responsible for both synthesizing input from users and their respective stakeholders into decision making, hopefully while not prone to inter-organizational one-upping. This doesn’t mean that others aren’t part of workshops and can’t share opinions, but when I need a decision there are just a few at the table.

I use this tactic if I feel I need to disempower the larger body and reduce the number of decision-making stakeholders. As you probably know, too much input from too many masters is just noise. This affords me a lot of benefit, particularly:

  1. I am able to honestly welcome feedback and criticism from everyone and respond to patterns rather than the departmental weight of the opinion
  2. When I need to defend a decision of my own, I only need to convince a few rather than many. This relationship is also more intimate, and fewer stakeholders seem to be more receptive to my expertise – which in a larger body might be eschewed in favor of my being just an implementor of wills.

Anyway, this is definitely a power play. You sell it as part of your process, but you only bust this out if it appears the organization is dysfunctional.

Open Knowledge Foundation: Data and the City: New report on how public data is fostering civic engagement in urban regions

Thu, 2017-02-09 12:00

How can city data infrastructures support public participation in local governance and policy-making? Research by Jonathan Gray and Danny Lämmerhirt examines the new relationships and public spaces emerging between public institutions, civil society groups, and citizens.

The development of urban regions will significantly affect the lives of millions of people around the world. Urbanization poses challenges including housing shortages, the growth of slums and urban decay, inadequate provision of infrastructure and public services, poverty, or pollution. At the same time, cities around the world publish a wide variety of data, reflecting the diversity and heterogeneity of the information systems used in local governance, policy-making and service delivery.

Full report PDF

These “data infrastructures” are commonly considered as a “raw” resource, a trove for mining data composed of databases, APIs, cables, and servers. Opening up these data infrastructures to the public is said to advance progress towards a range of goals – including transparency, accountability, public participation, public service delivery, technological innovation, efficiency and economic growth. However, knowledge is scarce over how the public sphere, and civil society in particular, engage with data infrastructures to advance progress around urban issues.

To shed light on this question, Jonathan Gray and myself have published a new report titled “Data And The City: How Can Public Data Infrastructures Change Lives in Urban Regions?” We are interested in how city data infrastructures can support public participation around local governance and policy-making. The report demonstrates how public data infrastructures create new kinds of relationships and public spaces between public institutions, civil society groups, and citizens.

In contrast to more supply-oriented ideas around opening (government) data, we argue that data infrastructures are not a mere “raw” resource that can be exploited. Instead they are best conceived as a lively network or ecosystem in which publics creatively use city data to engage with urban institutions.

We intend to spark imagination and conversation about the role that public data infrastructures may play in civic life – not just as neutral instruments for creating knowledge, but also as devices to organise publics and evidence around urban issues; creating shared spaces for public participation and deliberation around official processes and institutions; and securing progress around major social, economic and environmental challenges that cities face.

image credit: San Francisco, California (CC BY 4.0)

Our report describes six case studies from cities around the world to demonstrate civil society’s vast action repertoire to engage with urban data infrastructures. One case study demonstrates how a British civil society organisation gathered budget data through freedom of information requests from municipal government. This information was fed into an open database and made accessible to finance experts and scholars in order to allow them to run a “public debt audit”. This audit enabled government officials and the larger public to debate the extent of public debt in British cities and to uncover how a lack of public scrutiny increased profits of financial institutes while putting a strain on the public purse.

Official data is often creatively repurposed, aggregated or augmented with other data sources in the context of evolving data infrastructures which are attuned to the specific needs and interests of civil society actors.

Another case shows how a research group re-appropriated data on incarceration to highlight structural issues in urban neighbourhoods and to reform criminal justice in the United States. The case studies highlight how official data is often creatively repurposed, aggregated or augmented with other data sources in the context of evolving data infrastructures which are attuned to the specific needs and interests of civil society actors.

Image credit: Oklahoma City (CC BY 4.0)

In detail, civic actors can engage with data infrastructures to:

  • Identify spaces for intervention. Having cadastral data at hand helped civic actors to identify vacant publicly-owned land, to highlight possibilities for re-using it and to foster community building in neighbourhoods around its re-use.
  • Open spaces for accountability. Using government’s own accounting measurements may provide civil society with evaluation criteria for the effectiveness of public sector programs. Civil society actors may develop a ‘common ground’ or ‘common language’ for engaging with institutions around the issues that they care about.
  • Enable scrutiny of official processes, institutional mechanisms and their effects. By opening public loan data, civil society was able to identify how decentralised fiscal audit mechanisms may have negative effects on public debt.
  • Change the way an issue is framed or perceived. By using aggregated, anonymized data about home addresses of inmates, scholars could shift focus from crime location to the origin of an offender – which helped to address social re-entry programs more effectively.
  • Mobilise community engagement and civic activism. Including facilitating the assembly and organisation of publics around issues.

The report makes the claim that a broader vision on public data infrastructures is needed beyond technicalities such as technical and legal openness. Drawing on ongoing research around participatory data infrastructures, our report foregrounds how governments may take steps to make public information systems responsive to the interests and concerns of different publics.

You can find the full report here.

Open Knowledge Foundation: Data and the City: New report on how public data is fostering civic engagement in urban regions

Thu, 2017-02-09 12:00

How can city data infrastructures support public participation in local governance and policy-making? Research by Jonathan Gray and Danny Lämmerhirt examines the new relationships and public spaces emerging between public institutions, civil society groups, and citizens.

The development of urban regions will significantly affect the lives of millions of people around the world. Urbanization poses challenges including housing shortages, the growth of slums and urban decay, inadequate provision of infrastructure and public services, poverty, or pollution. At the same time, cities around the world publish a wide variety of data, reflecting the diversity and heterogeneity of the information systems used in local governance, policy-making and service delivery.

Full report PDF

These “data infrastructures” are commonly considered as a “raw” resource, a trove for mining data composed of databases, APIs, cables, and servers. Opening up these data infrastructures to the public is said to advance progress towards a range of goals – including transparency, accountability, public participation, public service delivery, technological innovation, efficiency and economic growth. However, knowledge is scarce over how the public sphere, and civil society in particular, engage with data infrastructures to advance progress around urban issues.

To shed light on this question, Jonathan Gray and myself have published a new report titled “Data And The City: How Can Public Data Infrastructures Change Lives in Urban Regions?” We are interested in how city data infrastructures can support public participation around local governance and policy-making. The report demonstrates how public data infrastructures create new kinds of relationships and public spaces between public institutions, civil society groups, and citizens.

In contrast to more supply-oriented ideas around opening (government) data, we argue that data infrastructures are not a mere “raw” resource that can be exploited. Instead they are best conceived as a lively network or ecosystem in which publics creatively use city data to engage with urban institutions.

We intend to spark imagination and conversation about the role that public data infrastructures may play in civic life – not just as neutral instruments for creating knowledge, but also as devices to organise publics and evidence around urban issues; creating shared spaces for public participation and deliberation around official processes and institutions; and securing progress around major social, economic and environmental challenges that cities face.

image credit: San Francisco, California (CC BY 4.0)

Our report describes six case studies from cities around the world to demonstrate civil society’s vast action repertoire to engage with urban data infrastructures. One case study demonstrates how a British civil society organisation gathered budget data through freedom of information requests from municipal government. This information was fed into an open database and made accessible to finance experts and scholars in order to allow them to run a “public debt audit”. This audit enabled government officials and the larger public to debate the extent of public debt in British cities and to uncover how a lack of public scrutiny increased profits of financial institutes while putting a strain on the public purse.

Official data is often creatively repurposed, aggregated or augmented with other data sources in the context of evolving data infrastructures which are attuned to the specific needs and interests of civil society actors.

Another case shows how a research group re-appropriated data on incarceration to highlight structural issues in urban neighbourhoods and to reform criminal justice in the United States. The case studies highlight how official data is often creatively repurposed, aggregated or augmented with other data sources in the context of evolving data infrastructures which are attuned to the specific needs and interests of civil society actors.

Image credit: Oklahoma City (CC BY 4.0)

In detail, civic actors can engage with data infrastructures to:

  • Identify spaces for intervention. Having cadastral data at hand helped civic actors to identify vacant publicly-owned land, to highlight possibilities for re-using it and to foster community building in neighbourhoods around its re-use.
  • Open spaces for accountability. Using government’s own accounting measurements may provide civil society with evaluation criteria for the effectiveness of public sector programs. Civil society actors may develop a ‘common ground’ or ‘common language’ for engaging with institutions around the issues that they care about.
  • Enable scrutiny of official processes, institutional mechanisms and their effects. By opening public loan data, civil society was able to identify how decentralised fiscal audit mechanisms may have negative effects on public debt.
  • Change the way an issue is framed or perceived. By using aggregated, anonymized data about home addresses of inmates, scholars could shift focus from crime location to the origin of an offender – which helped to address social re-entry programs more effectively.
  • Mobilise community engagement and civic activism. Including facilitating the assembly and organisation of publics around issues.

The report makes the claim that a broader vision on public data infrastructures is needed beyond technicalities such as technical and legal openness. Drawing on ongoing research around participatory data infrastructures, our report foregrounds how governments may take steps to make public information systems responsive to the interests and concerns of different publics.

You can find the full report here.

Ed Summers: Social Facts

Thu, 2017-02-09 05:00

@congressedits turned up one of the more interesting edits from the House of Representatives today:

United States Attorney General Wikipedia article edited anonymously from US House of Representatives https://t.co/qZAjzOKrHV

— congress-edits ((???)) February 9, 2017

If you follow the link in the tweet and go and look at the edit you can see that this text was removed (emphasis mine):

Alabama Senator Jeff Sessions has been nominated, as of January 2017, to succeed Loretta Lynch as the 84th Attorney General of the United States. Dana J. Boente of Virginia was appointed on January 30, 2017 to serve as acting Attorney General. The first acting Attorney General under President Trump – Sally Yates – was dismissed by Trump after Yates told justice department lawyers not to defend his executive order restricting entry for people from seven countries. Sessions was confirmed by the Senate on February 8, 2017.

In case you were wondering what the rationale for the edit was they boldly stated:

Then three minutes later another anonymous user reverted the change saying:

The thing that I find interesting here is that the person who reverted the edit clearly knew the user was accessing Wikipedia from the US House of Representatives.

Now it’s possible that they found out by some other means, but I think it’s likely that they found out from @congressedits itself. Maybe someone else would have noticed the edit independently, but it’s significant how the provenance of this edit mattered.

I think it’s a great example of how social media can work to establish and maintain facts rather than just be mechanism for spreading alternate facts and so called fake news.

District Dispatch: Potpourri in the swamp, but it is all good!

Wed, 2017-02-08 19:20

OITP staff at Stratton Library, a branch of the Cobb County Public Library system.

Yes, this is a weird title — but I was involved in several activities from the last couple of weeks that I wanted to communicate. One or two of these could each merit a full post themselves. Given Midwinter and the frenetic pace of Washington at the moment, I did not get around to it, unfortunately. So, here is the anthology as of late.

First, the U.S. State Department released the Open Licensing Playbook, a tool for federal agencies and others that want to develop grant-supported open licensing projects, especially for educational purposes. The Playbook was developed by the Global Education Subcommittee (of which I was a member) of the State Department’s Advisory Committee on Strategic Dialogue with Civil Society.

Last week, I had the pleasure of attending a reception for the participants of the National Digital Stewardship Residency (NDSR) program. The Library of Congress and the Institute of Museum and Library Services established the NDSR program to provide hands-on experience to complement graduate education. I met participants from a broad range of institutions such as the Chicago Botanical Garden, Library of Congress, Smithsonian Institution, Harvard University, and the World Bank. The participants spend a full year in residency and I was impressed with their intellectual capability and enthusiasm. I learned a lot about this leading-edge digital preservation program and had the opportunity to relay a bit about ALA’s work in Washington on behalf of the nation’s libraries.

On the topic of fellows, I spent some time at the American Association for the Advancement of Science (AAAS) serving on a selection committee for one of their many fellows programs. I can’t provide details, but merely suggest it to characterize another dimension of the kinds of engagement here in Washington.

I conclude this post with a commentary on our recent visit to Cobb County Public Library system (CCPLS). You have have seen our earlier post about this visit, focusing on CCPLS’s coding and computational thinking programming. To add to that summary of their good work, I would like to acknowledge how essential these visits are for us in Washington. They help us to retain a continuing tie to the profession and ensure that our advocacy at the national level is in sync with the needs and directions of library communities across the country. Reading articles and collaborating with committee members help, but there is nothing like seeing libraries in person and talking with staff on-site.

And we saw a lot! Always instructive for me is the enormous range that a system must manage, from the central library with its array of services, to a small branch library. CCPLS has new programming that uses virtual reality technology as well as offering basic bread-and-butter services such as job search and literacy. Putting my political hat back on, I also admit that it is especially useful to know more about Georgia libraries at this point in time, as a number of prominent Georgians are headed for leading roles in the government. For example, Rep. Tom Price, who represents Cobb County, is the nominee to head the U.S. Department of Health and Human Services.

I am really grateful to both library director Helen Poyer and branch manager (and long time OITPer) Pat Ball for arranging this visit for us.

See? All good news from the swamp.

The post Potpourri in the swamp, but it is all good! appeared first on District Dispatch.

Library of Congress: The Signal: The University of Richmond’s Digital Scholarship Lab

Wed, 2017-02-08 13:30

Screenshot of American Panorama website.

In November, 2016, staff from the Library of Congress’s National Digital Initiatives division visited the University of Richmond’s Digital Scholarship Lab as part of NDI’s efforts to explore data librarianship, computational research and digital scholarship at other libraries and cultural institutions.

Like many university digital labs, the DSL is based in the library, which DSL Director Robert Nelson said is “…a logical, conventional place to put a Digital Humanities center.”

The DSL takes up a small physical space and it has a small staff. Nelson, a historian, produces content and writes the JavaScript code for DSL’s projects. Nathanial Ayers creates the visualizations and web designs and Justin Madron oversees the GIS encoding. Other contributors include Edward Ayers, former University of Richmond president, senior research fellow and professor of history (and co-host of the Back Story history podcast) and Lauren Tilton, visiting assistant professor of Digital Humanities. The DSL hires students to help perform such labor-intensive tasks as scanning documents and data entry.

Despite the small space and staff, DSL’s content-rich projects make a large impact. The Lab’s specialty is interactive maps and timeline-based visualizations of historic events, using public-domain maps and data, much of it from the U.S. National Archives.

Screenshot of the Virginia Secession Convention

A sample of DSL’s projects includes:

Screenshot from “Mining the Dispatch.”

The open-source software that DSL uses to build its projects includes:

For years, Nelson has been using data computation in his own research. In a May 29, 2011, New York Time Op Ed piece, Nelson wrote about using Topic Modelling to research vast amounts of newspaper text published during the Civil War. He wrote, “No historian has yet to display the patience and attention to detail to read through the more than 100,000 articles and nearly 24 million words of the wartime (Richmond Daily) Dispatch, let alone conduct the sophisticated statistical analysis necessary to draw conclusions from the data…Topic modeling is a probabilistic, statistical method that can uncover themes and categories in amounts of text so large that they cannot be read by any individual human being. Applied to the Dispatch for the entirety of the war, topic modeling enables us to see both broad and subtle patterns in the Civil War news that we would otherwise be unable to detect. It also helps historians quickly detect the larger themes addressed in individual articles and then trace those themes in other documents…”

The current Digital Scholarship Lab projects are self-contained, browse-and-click multimedia scholarly resources, but they demonstrate only a fraction of the Lab’s potential. DSL staff are pondering ways to make the Lab more service oriented, to address the needs and expectations of the students, researchers and scholars that use the University of Richmond library. Nelson pointed to the Scholars’ Lab at the University of Virginia Library and the Hunt Library at North Carolina State as inspirational models.

Whatever direction the Digital Scholars Lab grows in, partnerships and teamwork will always be essential to its progress. Whether the contributions come from in-house or from visiting scholars or colleagues at other universities, the DSL’s multimedia scholarly projects are evidence that collaboration benefits everyone.

Pages