You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 4 days 20 hours ago

FOSS4Lib Recent Releases: Evergreen - 2.12.0

Fri, 2017-03-24 13:36
Package: EvergreenRelease Date: Wednesday, March 22, 2017

Last updated March 24, 2017. Created by gmcharlt on March 24, 2017.
Log in to edit this page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.
Other notable new features and enhancements for 2.12 include:

Ed Summers: Teaching Networks

Fri, 2017-03-24 04:00

Yesterday I had the good fortune to speak with Miriam Posner, Scott Weingart and Thomas Padilla about their experiences teaching digital humanities students about network visualization, analysis and representation. This started as an off the cuff tweet about teaching Gephi, which led to an appointment to chat, and then to a really wonderful broader discussion about approaches to teaching networks:

(???) (???) have either of you taught a DH class about how to use Gephi? Or do you know of someone else who has?

— Ed Summers ((???)) March 10, 2017

Scott suggested that other folks who teach this stuff in a digital humanities context might be interested as well so we decided to record it, and share it online (see below).

The conversation includes some discussion of tools (such as Gephi, Cytoscape, NodeXL, Google Fusion Tables, DataBasic, R) but also some really neat exercises for learning about networks with yarn, balls, short stories and more.

A particular fun part of discussion focuses on approaches to teaching graph measurement and analytics as well as humanistic approaches to graph visualization that emphasize discovery and generative creativity.

During the email exchange that led up to our chat Miriam, Scott and Thomas shared some of their materials which you may find useful in your own teaching/learning:

I’m going to be doing some hands-on exercises about social media, networks and big data in Matt Kirschenbaum‘s Digital Studies class this Spring – and I was really grateful for Miriam, Scott and Thomas’ willingness to share their experiences with me.

Anyhow, here’s the video! If you want to get to the good stuff skip to 8:40 where I stop selfishly talking about the classes were teaching at MITH.

PS. this post was brought to you by the letter B since (as you will see) Thomas thinks that blogs are sooooo late 2000s :-) I suspect he is right, but I’m clearly still tightly clutching to my vast media empire.

Eric Hellman: Reader Privacy for Research Journals is Getting Worse

Thu, 2017-03-23 17:22
Ever hear of Grapeshot, Eloqua, Moat, Hubspot, Krux, or Sizmek? Probably not. Maybe you've heard of Doubleclick, AppNexus, Adsense or Addthis? Certainly you've heard of Google, which owns Doubleclick and Adsense. If you read scientific journal articles on publisher websites, these companies that you've never heard of will track and log your reading habits and try to figure out how to get you to click on ads, not just at the publisher websites but also at websites like and the Huffington Post.

Two years ago I surveyed the websites of 20 of the top research journals and found that 16 of the top 20 journals placed trackers from ad networks on their web sites. Only the journals from the American Physical Society (2 of the 20) supported secure (HTTPS) connections, and even now APS does not default to being secure.

I'm working on an article about advertising in online library content, so I decided to revisit the 20 journals to see if there had been any improvement. Over half the traffic on the internet now uses secure connections, so I expected to see some movement. One of the 20 journals, Quarterly Journal of Economics, now defaults to a secure connection, significantly improving privacy for its readers. Let's have a big round of applause for Oxford University Press! Yay.

So that's the good news. The bad news is that reader privacy at most of the journals I looked at got worse. Science, which could be loaded securely 2 years ago, has reverted to insecure connections. The two Annual Reviews journals I looked at, which were among the few that did not expose users to advertising network tracking, now have trackers for AddThis and Doubleclick. The New England Journal of Medicine, which deployed the most intense reader tracking of the 20, is now even more intense, with 19 trackers on a web page that had "only" 14 trackers two years ago. A page from Elsevier's Cell went from 9 to 16 trackers.

Despite the backwardness of most journal websites, there are a few signs of hope. Some of the big journal platforms have begun to implement HTTPS. Springer Link defaults to HTTPS, and Elsevier's Science Direct is delivering some of its content with secure connections. Both of them place trackers for advertising networks, so if you want to read a journal article securely and privately, your best bet is still to use Tor.

David Rosenthal: Threats to stored data

Thu, 2017-03-23 15:32
Recently there's been a lively series of exchanges on the pasig-discuss mail list, sparked by an inquiry from Jeanne Kramer-Smyth of the World Bank as to any additional risks posed by media such as disks that did encryption or compression. It morphed into discussion of the "how many copies" question and related issues. Below the fold, my reflections on the discussion.

The initial question was pretty clearly based on a misunderstanding of the way self-encrypting disk drives (SED) and hardware compression in tape drives work. Quoting the Wikipedia article Hardware-based full disk encryption:
The drive except for bootup authentication operates just like any drive with no degradation in performance. The encrypted data is never visible outside the drive, and the same is true for the compressed data on tape. So as far as systems using them are concerned, whether the drive encrypts or not is irrelevant. Unlike disk, tape capacities are quoted assuming compression is enabled. If your data is already compressed, you likely get no benefit from the drive's compression.

SED have one additional failure mode over regular drives; they support a crypto erase command which renders the data inaccessible. The effect as far as the data is concerned is the same as a major head crash. Archival systems that fail if a head crashes are useless, so they must be designed to survive total loss of the data on a drive. There is thus no reason not to use self-encrypting drives, and many reasons why one might want to.

But note that their use does not mean there is no reason for the system to encrypt the data sent to the drive. Depending on your threat model, encrypting data at rest may be a good idea. Depending on the media to do it for you, and thus not knowing whether or how it is being done, may not be an adequate threat mitigation.

Then the discussion broadened but, as usual, it was confusing because it was about protecting data from loss, but not based on explicit statements about what the threats to the data were, other than bit-rot.

There was some discussion of the "how many copies do we need to be safe?" question. Several people pointed to research that constructed models to answer this question. I responded:
Models claiming to estimate loss probability from replication factor, whether true replication or erasure coding, are wildly optimistic and should be treated with great suspicion. There are three reasons:
  • The models are built on models of underlying failures. The data on which these failure models are typically based are (a) based on manufacturers' reliability claims, and (b) ignore failures upstream of the media. Much research shows that actual failures in the field are (a) vastly more likely than manufacturers' claims, and (b) more likely to be caused by system components other than the media.
  • The models almost always assume that the failures are un-correlated, because modeling correlated failures is much more difficult, and requires much more data than un-correlated failures. In practice it has been known for decades that failures in storage systems are significantly correlated. Correlations among failures greatly raise the probability of data loss.
  • The models ignore almost all the important threats, since they are hard to quantify and highly correlated. Examples include operator error, internal or external attack, and natural disaster.
For replicated systems, three replicas is the absolute minimum IF your threat model excludes all external or internal attacks. Otherwise four (see Byzantine Fault Tolerance).

For (k of n) erasure coded systems the absolute minimum is three sites arranged so that k shards can be obtained from any two sites. This is because shards in a single site are subject to correlated failures (e.g. earthquake).This is a question I've blogged about in 2016 and 2011 and 2010, when I concluded:
  • The number of copies needed cannot be discussed except in the context of a specific threat model.
  • The important threats are not amenable to quantitative modeling.
  • Defense against the important threats requires many more copies than against the simple threats, to allow for the "anonymity of crowds".
In the discussion Matthew Addis of Arkivum made some excellent points, and pointed to two interesting reports:
  • A report from the PrestoPrime project. He wrote:
    There’s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project. There’s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000. The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors. See page 57 onwards.
  • A report from the EU's DAVID project. He wrote:
    This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives. Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems!
Matthew wrote the 2010 PrestoPrime report, building on among others Heydegger's 2008 and 2009 work on the effects of flipping bits in compressed files (Both links are paywalled but the 2008 paper is available via the Wayback Machine). The 2013 DAVID report concluded:
It was acknowledged that some rare cases or corruptions might have been explained by the occurrence of bit rot, but the importance and the risk of this phenomenon was at the present time much lower than any other possible causes of content losses. On the other hand, they were clear that:
Human errors are a major cause of concern. It can be argued that most of the other categories may also be caused by human errors (e.g. poor code, incomplete checking...), but we will concentrate here on direct human errors. In any complex system, operators have to be in charge. They have to perform essential tasks, maintaining the system in operation, checking that resources are sufficient to face unexpected conditions, and recovering the problems that can arise. However vigilant an operator is, he will always make errors, usually without consequence, but sometimes for the worst. The list is virtually endless, but one can cite:
  • Removing more files than wanted
  • Removing files in the wrong folder
  • Pulling out from a RAID a working disk instead of the faulty one
  • Copying and editing a configuration file, not changing all the necessary parameters
  • Editing a configuration file into a bad one, having no backup
  • Corrupting a database
  • Dropping a data tape / a hard disk drive
  • Introducing an adjustment with unexpected consequences
  • Replacing a correct file or setup from a wrong backup.
Such errors have the potential for affecting durably the performances of a system, and are not always reversible. In addition, the risk of error is increased by the stress introduced by urgency, e.g. when trying to make some room on in storage facilities approaching saturation, or introducing further errors when trying to recover using backup copies. We agree, and have been saying so since at least 2005. And the evidence keeps rolling in. For example, on January 31st suffered a major data loss. Simon Sharwood at The Register wrote:
Source-code hub is in meltdown after experiencing data loss as a result of what it has suddenly discovered are ineffectual backups. ... Behind the scenes, a tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated. Commendably, Gitlab made a Google Doc public with a lot of detail about the problem and their efforts to mitigate it:
  1. LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage
  2. Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don’t appear to be working, producing files only a few bytes in size.
    1. SH: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries. This happens because omnibus only uses Pg 9.6 if data/PG_VERSION is set to 9.6, but on workers this file does not exist. As a result it defaults to 9.2, failing silently. No SQL dumps were made as a result. Fog gem may have cleaned out older backups.
  3. Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers.
  4. The synchronisation process removes webhooks once it has synchronised data to staging. Unless we can pull these from a regular backup from the past 24 hours they will be lost
  5. The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented
    1. SH: We learned later the staging DB refresh works by taking a snapshot of the gitlab_replicator directory, prunes the replication configuration, and starts up a separate PostgreSQL server.
  6. Our backups to S3 apparently don’t work either: the bucket is empty
  7. We don’t have solid alerting/paging for when backups fails, we are seeing this in the dev host too now.
So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. => we're now restoring a backup from 6 hours ago that workedThe operator error revealed the kind of confusion and gradual decay of infrastructure processes that is common when procedures are used only to recover from failures, not as a routine. Backups that are not routinely restored are unlikely to work when you need them. The take-away is that any time you reach for the backups, you're likely already in big enough trouble that your backups can't fix it. I was taught this lesson in the 70s. The early Unix dump command failed to check the return value from the write() call. If you forgot to write-enable the tape by inserting the write ring the dump would appear to succeed, the tape would look like it was spinning, but no data would be written to the backup tape.

Fault injection should be, but rarely is, practiced at all levels of the system. The results of not doing so are shown by UW Madison's work injecting faults into file systems and distributed storage. My blog posts on this topic include Injecting Faults in Distributed Storage, More bad news on storage reliability, and Forcing Frequent Failures.

Update: much as I love Kyoto, as a retiree I can't afford to attend iPRES2017. Apparently, there's a panel being proposed on the "bare minimum" for digital preservation. If I were on this panel I'd be saying something like the following.

We know the shape of the graph of loss probability against cost - it starts at one at zero cost and is an S-curve that gets to zero at infinite cost. Unfortunately, because the major threats to stored data are not amenable to quantitative modeling (see above), and technologies differ in their cost-effectiveness, we cannot actually plot the graph. So there are no hard-and fast answers.

The real debate here is how to distinguish between "digital storage" and "digital preservation". We do have a hard-and-fast answer for this. There are three levels of certification; the Data Seal of Approval (DSA), NESTOR's DIN31644, and TRAC/ISO16363. If you can't even pass DSA then what you're doing can't be called digital preservation.

Especially in the current difficult funding situation, it is important NOT to give the impression that we can "preserve" digital information with ever-decreasing resources, because then what we will get is ever-decreasing resources. Because there will always be someone willing to claim that they can do the job cheaper. Their short-cuts won't be exposed until its too late. That's why certification is important.

We need to be able to say "I'm sorry, but preserving this stuff costs this much. Less money, no preservation, just storage.".

Open Knowledge Foundation: The Global Open Data Index – an update and the road ahead

Thu, 2017-03-23 14:00

The Global Open Data Index is a civil society collaborative effort to track the state of open government data around the world. The survey is designed to assess the openness of specific government datasets according to the Open Definition. Through this initiative, we want to provide a civil society audit of how governments actually publish data with input and review from citizens and organisations. This post describe our future timeline for the project. 


Here at Open Knowledge International, we see the Global Open Data Index (aka GODI) as a community effort. Without community contributions and feedback there is no index. This is why it is important for us to keep the community involved in the index as much as we can (see our active forum!). However, in the last couple of months, lots has been going on with GODI. In fact so much was happening that we neglected our duty to report back to our community. So based on your feedback, here is what is going on with GODI 2016:


New Project Management

Katelyn Rogers, who managed the project until January 2017, is now leading the School of Data program. I have stepped in to manage the Index until its launch this year. I am an old veteran to GODI, being its research and community lead for 2014 and 2015, so this is a natural fit for me and the project. This is done with my work as the International Community Coordinator and the Capacity team lead, but fear not, GODI is a priority!


This change in project management allowed us to take some time and modify the way we manage the project internally. We moved all of our current and past tasks: code content and research to the public Github account. You can see our progress on the project here-


Project timeline

Now, after the handover is done, it is easier for us to decide on the road forward for GODI (in coordination with colleagues at the World Wide Web Foundation, which publishes the Open Data Barometer). We are happy to share with you the future timeline and approach of the Index:

  • Finalising review: In the last 6 weeks, we have been reviewing the different index categories of 94 places. Like last year, we took the thematic reviewer approach, in which each reviewer checked all the countries under one category. We finished the review by March 20th, and we are now running quality assurance for the reviewed submissions, mainly looking for false positives of datasets that have been defined as complying with the Open Definition.


  • Building the GODI site: This year we paid a lot of attention to the development of our methodology and changed the survey site to reflect it and allow easy customization (see Brook’s blog). We are now finalising the result site so it will have even better user experience than past years.
  • Launch! The critical piece of information that many of you wanted! We will launch the Index on May 2nd, 2017! And what a launch it is going to be!
    Last year we gave a 3 weeks period for government and civil society to review and suggest corrections for our assessment of the Index on the survey app, before publishing the permanent index results. This was not obvious to many, and we got many requests for corrections or clarifications after publishing the final GODI.
    This year, we will publish the index results, and data publishers and civil society will have the opportunity to contest the results publicly through our forum for 30 days. We will follow the discussions to decide if we should change some results or not. The GODI team believes that if we are aspiring to be a tool for not only measuring but also for learning open data publication, we need to allow civil society and government to engage around the results in the open. We already see the great engagement of some governments in the review process of GODI (See Mexico and Australia), and we would like to take this even one step further, making this a tool that can help and improve open data publication around the world.
  • Report: After fixing the Index result, we will publish a report on our learnings from GODI 2016. This is the first time that we will write a report on the Global Open Data Index findings, and we hope that this will help us not only in creating better GODI in the future but also to promote and publish better datasets.


Have any question? Want to know more about the upcoming GODI? Have ideas for improvements? Start a topic in the forum:


Open Knowledge Foundation: Open data day 2017 in Uganda: Open contracting, a key to inclusive development

Thu, 2017-03-23 13:56

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

On Friday 3rd March 2017, the Anti-Corruption Coalition Uganda (ACCU) commemorated the International Open Data Day 2017 with a meetup of 37 people from Civil Society Organizations (CSOs), development partners, the private sector and the general public. The goal of this meetup was to inform Ugandan citizens, media and government agencies on the importance of open data in improving public service delivery.


The process started with an overview of open data since the concept seemed to be new to most participants. Ms. Joy Namunoga, Advocacy Officer at ACCU, highlighted the benefits of open data, including value for money for citizens and taxpayers, knowing governments transactions, holding leaders accountable, constructive criticism to influence policy, boosting transparency, reducing corruption and increasing social accountability.

With such a background, participants observed the fact that in Uganda, 19% of people have access to the internet. Hence the need to embrace media as a third party to interpret data and take the information closer to citizens. Participants noted that, while Uganda has an enabling policy framework for information sharing; the Access to Information Act and regulations require information to be paid for, namely $6, yet the majority of Ugandans live below $2 a day. The financial requirement denies a percentage of Ugandans their right to know. It was also noted that CSOs and government agencies equally do not avail all the information on their websites, which further underscores this fact.

Issues discussed Open contracting

Mr. Agaba Marlon, Communications Manager ACCU took participants through the process of open contracting as highlighted below:

Figure 1: Open contracting process

He showcased ACCU’s Open Contracting platform commonly known as USER (Uganda System for Electronic open data Records), implemented in partnership with Kampala Capital City Authority (KCCA), a government agency, and funded by the United Nations Development Programme. This platform created a lively conversation amongst the participants, and the following issues were generated to strengthen open contracting in Uganda:

  • Popularizing open data and contracting in Uganda by all stakeholders.
  • Mapping people and agencies in the open contracting space in Uganda to draw lines on how to complement each other.
  • Lobbying and convincing government institutions to embrace the open contracting data standards.
  • Stakeholders like civil society should be open before making the government open up.
  • Simplification of Uganda’s procurement laws for easier understanding by citizens.
  • Faming and shaming of the best and worst contractors as well as advocating for penalties to those who fraud rules.
  • Initiating and strengthening of information portals i.e., both offline and online media.
Bringing new energy and people to the open data movement

Mr. Micheal Katagaya, an open data activist, chaired this session. Some suggestions were made that can bring new energy to the open data movement, such as renegotiate open data membership with the government, co-opting celebrities (especially musicians) to advocate for open data, simplifying data and packaging it in user-friendly formats and linking data to problem-solving principles. Also, thematic days like International women’s day, youth day or AIDS day could be used to spread a message on open data, and local languages could be used to localise the space for Ugandans to embrace open data. Finally, it was seen as important to understand audiences and package messages accordingly, and to identify open data champions and ambassadors.

Sharing open data with citizens who lack internet access

This session was chaired by Ms. Pheona Wamayi an independent media personality. Participants agreed that civil society and government agencies should strengthen community interfaces between government and citizens because these enable citizens to know of government operations. ACCU was encouraged to use her active membership in Uganda to penetrate the information flow and disseminate it to the citizens. Other suggestions included:

  • Weekly radio programs on open data and open contracting should be held. Programs should be well branded to suit the intended audiences.
  • Simplified advocacy materials should be produced for community members’ i.e., leaflets, posters to inform the citizens on open data. Community notice boards could be used to disseminate information on open data.
  • Civil society and government should liaise with telecom companies to provide citizens with the internet.
  • Edutainment through music and forum theatre should be targeted to reach citizens on open data.

Way forward

Ms. Ephrance Nakiyingi, Environmental Governance officer-ACCU took participants through the action planning process. The following points were suggested as key steps to pursue as stakeholders:

  • Consider offline strategies like SMS to share data with citizens
  • Design  massive open data campaigns to bring new energy to the movement
  • Develop a multi-media strategy based on consumer behaviour
  • Creating synergies between different open data initiatives
  • Embrace open data communication
  • Map out other actors in the open data fraternity
  • In-house efforts to share information/stakeholder openness

pinboard: Twitter

Thu, 2017-03-23 12:45
Have not read the full report but based on the abstract seems useful to those involved in the #code4lib incorporati…

Terry Reese: MarcEdit and Alma Integration: Working with holdings data

Thu, 2017-03-23 11:52

Ok Alma folks,

 I’ve been thinking about a way to integrate holdings editing into the Alma integration work with MarcEdit.  Alma handles holdings via MFHDs, but honestly, the process for getting to holdings data seems a little quirky to me.  Let me explain.  When working with bibliographic data, the workflow to extract records for edit and then update, looks like the following:


  1. Records are queried via Z39.50 or SRU
  2. Data can be extracted directly to MarcEdit for editing



  1. Data is saved, and then turned into MARCXML
  2. If the record has an ID, I have to query a specific API to retrieve specific data that will be part of the bib object
  3. Data is assembled in MARCXML, and then updated or created.


Essentially, an update or create takes 2 API calls.

For holdings, it’s a much different animal.


  1. Search via Z39.50/SRU
  2. Query the Bib API to retrieve the holdings link
  3. Query the holdings link api to retrieve a list of holding ids
  4. Query each holdings record API individually to retrieve a holdings object
  5. Convert the holdings object to MARCXML and then into a form editable in the MarcEditor
    1. As part of this process, I have to embed the bib_id and holdin_id into the record (I’m using a 999 field) so that I can do the update


For Update/Create

  1. Convert the data to MARCXML
  2. Extract the ids and reassemble the records
  3. Post via the update or create API


Extracting the data for edit is a real pain.  I’m not sure why so many calls are necessary to pull the data.

 Anyway – Let me give you an idea of the process I’m setting up.

First – you query the data:

Couple things to note – to pull holdings, you have to click on the download all holdings link, or right click on the item you want to download.  Or, select the items you want to download, and then select CTRL+H.

When you select the option, the program will prompt you to ask if you want it to create a new holdings record if one doesn’t exist. 


The program will then either download all the associated holdings records or create a new one.

Couple things I want you to notice about these records.  There is a 999 field added, and you’ll notice that I’ve created this in MarcEdit.  Here’s the problem…I need to retain the BIB number to attach the holdings record to (it’s not in the holdings object), and I need the holdings record number (again, not in the holdings object).  This is a required field in MarcEdit’s process.  I can tell if a holdings item is new or updated by the presence or lack of the $d. 


Anyway – this is the process that I’ve come up with…it seems to work.  I’ve got a lot of debugging code to remove because I was having some trouble with the Alma API responses and needed to see what was happening underneath.  Anyway, if you are an Alma user, I’d be curious if this process looks like it will work.  Anyway, as I say – I have some cleanup left to do before anyone can use this, but I think that I’m getting close.



Open Knowledge Foundation: Code for Ghana celebrates Open Data Day tracking public money flows

Thu, 2017-03-23 11:14

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

This year, Code for Ghana organised their Open Data Day event at Mobile Web Ghana. The theme for the event was “Open Contracting and tracking public money flows”. Open contracting involves analysing government contract data to have a better understanding of how government spend public funds. We had a lot of open contracting resources from This helped the entire team to understand the concept and its importance in increasing transparency and accountability in the governance of a country.

Florence Toffa, project coordinator of Code for Ghana, did an introductory presentation on Open Contracting. To about 98% of the attendees, open contracting was a new concept and this was the first time they tried their hands on datasets related to open contracting. Participants were introduced to the ‘what’, the benefits and ‘why’ open contracting should be embraced by everyone if we want to get rid of corruption in our society. Moreover, about 15 out of the 20 attendees were new to data scraping, data analysis and data visualisation.

Introduction to D3.JS

The participants were taken through a training session in D3.JS by David Lartey (software developer). D3.js is a JavaScript library for manipulating documents based on data. They were taught the basics of the language and how to make some interesting visualisations.

Data Scraping

Shadrack Boadu (software developer and data enthusiast) also taught data scraping. He introduced the participants to two ways of scraping data using Google sheets and tabular. He talked about the importance of cleaning the data and converting it into a useable format to facilitate accurate data analysis and representations.

Before breaking out into various groups, Code for Ghana provided datasets on Government budget (2015 – 2016), Developmental projects procurement and Ghana health service data. The next task was for the participants to combine their skills to come up with relevant insights and visualisations.

The Open Data Day Projects 

The first team (Washington) presented a visualisation (pie chart) on the procurement of the Ghana health Service for the year 2016. Their visualisation gave insights on the volumes of procurement of the Ghana health service. See visualisation:

The second team (Terrific) designed a series of visualisations. These visualisations included the state of developmental projects in Ghana and sources of developmental projects in Ghana. See images below:


Team Meck, the third team, developed a database [web platform] for all the government projects from 2002 to 2016. From the database, one could easily key in a few keywords and bring up a particular result. Unfortunately, the team was not able to complete the web platform on the day.

The fourth team, team Rex after cleaning their data, came up with a visualisation representing the overview of developmental projects. Their project focused on government project success, sources of government funding and project allocations that are done by consultants.

The final team, team Enock developed a web app that visualised government contracts. They focused on analysing procurement contracts from the Ghana health service.

After the presentations, the judges for the event Mr Nehemiah Attigah (Co-founder of Odekro), Mr Wisdom Donkor from National Information Technology Agency (NITA) gave their verdicts. The judges spoke about the importance of open data and the role it plays in the development of transparency and accountability in the Ghanaian society. They also emphasised the need for the participants to always present data in a way that paints an accurate picture and also visualising information that can be easily digested by society. The best three projects were awarded prizes.


Our takeaway from the event is: one day is usually too short to develop a sustainable project. So some of the teams are still working on their projects. For some of the youths, it was an eyeopener. They never knew the importance of data and how it shapes the future of development in the country. To these youth, the event was a success because they gained valuable skills that they would build on.

Alf Eaton, Alf: Symfony Forms

Thu, 2017-03-23 10:28

At the end of several years working with Symfony, I’m taking a moment to appreciate its strongest points. In particular, allowing users to apply mutations to objects via HTML forms.

The output of a Symfony endpoint (controller action) is usually either a data view (displaying data as HTML, rendered with Twig), or a form view (displaying a form as HTML and receiving the submitted form).

Symfony isn’t particularly RESTful, though you can use collection + resource-style URLs if you like:

  • /articles/ - GET the collection of articles
  • /articles/_create - GET/POST a form to create a new article
  • /articles/{id}/ - GET the article
  • /articles/{id}/_edit - GET/POST a form to edit the article
  • /articles/{id}/_delete - GET/POST a form to delete the article

The entity, controller, form and voter for creating and editing an article look like something this:

// ArticleBundle/Entity/Article.php class Article { /** * @var int * * @ORM\Id * @ORM\GeneratedValue * @ORM\Column(type="integer") */ private $id; /** * @var string * * @ORM\Column(type="string") * @Assert\NotBlank */ private $title; /** * @var string * * @ORM\Column(type="text") * @Assert\NotBlank * @Assert\Length(min=100) */ private $description; /** * @return int */ public function getId() { return $this->id; } /** * @return string */ public function getTitle() { return $this->title; } /** * @param string $title */ public function setTitle($title) { $this->title = $title; } /** * @return string */ public function getDescription() { return $this->description; } /** * @param string $description */ public function setDescription($description) { $this->description = $description; } } // ArticleBundle/Controller/ArticleController.php class ArticleController extends Controller { /** * @Route("/articles/_create", name="create_article") * @Method({"GET", "POST"}) * * @param Request $request * * @return Response */ public function createArticleAction(Request $request) { $article = new Article(); $this->denyAccessUnlessGranted(ArticleVoter::CREATE, $article); $article->setOwner($this->getUser()); $form = $this->createForm(ArticleType::class, $article); $form->handleRequest($request); if ($form->isValid()) { $entityManager = $this->getDoctrine()->getManager(); $entityManager->persist($article); $entityManager->flush(); $this->addFlash('success', 'Article created'); return $this->redirectToRoute('articles'); } return $this->render('ArticleBundle/Article/create.html.twig', [ 'form' => $form->createView() ]); } /** * @Route("/articles/{id}/_edit", name="edit_article") * @Method({"GET", "POST"}) * * @param Request $request * @param Article $article * * @return Response */ public function editArticleAction(Request $request, Article article) { $this->denyAccessUnlessGranted(ArticleVoter::EDIT, $article); $form = $this->createForm(ArticleType::class, $article); $form->handleRequest($request); if ($form->isValid()) { $this->getDoctrine()->getManager()->flush(); $this->addFlash('success', 'Article updated'); return $this->redirectToRoute('articles', [ 'id' => $article->getId() ]); } return $this->render('ArticleBundle/Article/edit', [ 'form' => $form->createView() ]); } } // ArticleBundle/Form/ArticleType.php class ArticleType extends AbstractType { /** * {@inheritdoc} */ public function buildForm(FormBuilderInterface $builder, array $options) { $builder->add('title'); $builder->add('description', null, [ 'attr' => ['rows' => 10] ]); $builder->add('save', SubmitType::class, [ 'attr' => ['class' => 'btn btn-primary'] ]); } /** * {@inheritdoc} */ public function configureOptions(OptionsResolver $resolver) { $resolver->setDefaults([ 'data_class' => Article::class, ]); } } // ArticleBundle/Security/ArticleVoter.php class ArticleVoter extends Voter { const CREATE = 'CREATE'; const EDIT = 'EDIT'; public function vote($attribute, $article, TokenInterface $token) { $user = $token->getUser(); if (!$user instanceof User) { return false; } switch ($attribute) { case self::CREATE: if ($this->decisionManager->decide($token, array('ROLE_AUTHOR'))) { return true; } return false; case self::EDIT: if ($user === $article->getOwner()) { return true; } return false; } } } // ArticleBundle/Resources/views/Article/create.html.twig {{ form(form) }} // ArticleBundle/Resources/views/Article/edit.html.twig {{ form(form) }}

The combination of Symfony’s Form, Voter and ParamConverter allows you to define who (Voter) can update which properties (Form) of a resource, and when.

The Doctrine annotations allow you to define validations for each property, which are used in both client-side and server-side form validation.

LibreCat/Catmandu blog: Metadata Analysis at the Command-Line

Thu, 2017-03-23 09:09

I was last week at the ELAG  2016 conference in Copenhagen and attended the excellent workshop by Christina Harlow  of Cornell University on migrating digital collections metadata to RDF and Fedora4. One of the important steps required to migrate and model data to RDF is understanding what your data is about. Probably old systems need to be converted for which little or no documentation is available. Instead of manually processing large XML or MARC dumps, tools like metadata breakers can be used to find out which fields are available in the legacy system and how they are used. Mark Phillips of the University of North Texas wrote recently in Code4Lib a very inspiring article how this could be done in Python. In this blog post I’ll demonstrate how this can be done using a new Catmandu tool: Catmandu::Breaker.

To follow the examples below, you need to have a system with Catmandu installed. The Catmandu::Breaker tools can then be installed with the command:

$ sudo cpan Catmandu::Breaker

A breaker is a command that transforms data into a line format that can be easily processed with Unix command line tools such as grep, sort, uniq, cut and many more. If you need an introduction into Unix tools for data processing please follow the examples Johan Rolschewski of Berlin State Library and I presented as an ELAG bootcamp.

As a simple example lets create a YAML file and demonstrate how this file can be analysed using Catmandu::Breaker:

$ cat test.yaml --- name: John colors: - black - yellow - red institution: name: Acme years: - 1949 - 1950 - 1951 - 1952

This example has a combination of simple name/value pairs a list of colors and a deeply nested field. To transform this data into the breaker format execute the command:

$ catmandu convert YAML to Breaker < test.yaml 1 colors[] black 1 colors[] yellow 1 colors[] red 1 Acme 1 institution.years[] 1949 1 institution.years[] 1950 1 institution.years[] 1951 1 institution.years[] 1952 1 name John

The breaker format is a tab-delimited output with three columns:

  1. An record identifier: read from the _id field in the input data, or a counter when no such field is present.
  2. A field name. Nested fields are seperated by dots (.) and list are indicated by the square brackets ([])
  3. A field value

When you have a very large JSON or YAML field and need to find all the values of a deeply nested field you could do something like:

$ catmandu convert YAML to Breaker < data.yaml | grep "institution.years"

Using Catmandu you can do this analysis on input formats such as JSON, YAML, XML, CSV, XLS (Excell). Just replace the YAML by any of these formats and run the breaker command. Catmandu can also connect to OAI-PMH, Z39.50 or databases such as MongoDB, ElasticSearch, Solr or even relational databases such as MySQL, Postgres and Oracle. For instance to get a breaker format for an OAI-PMH repository issue a command like:

$ catmandu convert OAI --url to Breaker

If your data is in a database you could issue an SQL query like:

$ catmandu convert DBI --dsn 'dbi:Oracle' --query 'SELECT * from TABLE WHERE ...' --user 'user/password' to Breaker

Some formats, such as MARC, doesn’t provide a great breaker format. In Catmandu, MARC files are parsed into a list of list. Running a breaker on a MARC input you get this:

$ catmandu convert MARC to Breaker < t/camel.usmarc | head fol05731351 record[][] LDR fol05731351 record[][] _ fol05731351 record[][] 00755cam 22002414a 4500 fol05731351 record[][] 001 fol05731351 record[][] _ fol05731351 record[][] fol05731351 fol05731351 record[][] 082 fol05731351 record[][] 0 fol05731351 record[][] 0 fol05731351 record[][] a

The MARC fields are part of the data, not part of the field name. This can be fixed by adding a special ‘marc’ handler to the breaker command:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc | head fol05731351 LDR 00755cam 22002414a 4500 fol05731351 001 fol05731351 fol05731351 003 IMchF fol05731351 005 20000613133448.0 fol05731351 008 000107s2000 nyua 001 0 eng fol05731351 010a 00020737 fol05731351 020a 0471383147 (paper/cd-rom : alk. paper) fol05731351 040a DLC fol05731351 040c DLC fol05731351 040d DLC

Now all the MARC subfields are visible in the output.

You can use this format to find, for instance, all unique values in a MARC file. Lets try to find all unique 008 values:

$ catmandu convert MARC to Breaker --handler marc < camel.usmarc | grep "\t008" | cut -f 3 | sort -u 000107s2000 nyua 001 0 eng 000203s2000 mau 001 0 eng 000315s1999 njua 001 0 eng 000318s1999 cau b 001 0 eng 000318s1999 caua 001 0 eng 000518s2000 mau 001 0 eng 000612s2000 mau 000 0 eng 000612s2000 mau 100 0 eng 000614s2000 mau 000 0 eng 000630s2000 cau 001 0 eng 00801nam 22002778a 4500

Catmandu::Breaker doesn’t only break input data in a easy format for command line processing, it can also do a statistical analysis on the breaker output. First process some data into the breaker format and save the result in a file:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc > result.breaker

Now, use this file as input for the ‘catmandu breaker’ command:

$ catmandu breaker result.breaker | name | count | zeros | zeros% | min | max | mean | median | mode | variance | stdev | uniq | entropy | |------|-------|-------|--------|-----|-----|------|--------|--------|----------|-------|------|---------| | 001 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 003 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 005 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 008 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 010a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 020a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 9 | 3.3/3.3 | | 040a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040d | 5 | 5 | 50.0 | 0 | 1 | 0.5 | 0.5 | [0, 1] | 0.25 | 0.5 | 1 | 1.0/3.3 | | 042a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 0822 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 082a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 3 | 0.9/3.3 | | 100a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 100d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 100q | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111c | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 245a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 9 | 3.1/3.3 | | 245b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 245c | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 250a | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 260a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 6 | 2.3/3.3 | | 260b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 2.0/3.3 | | 260c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 2 | 0.9/3.3 | | 263a | 6 | 4 | 40.0 | 0 | 1 | 0.6 | 1 | 1 | 0.24 | 0.49 | 4 | 2.0/3.3 | | 300a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 1.8/3.3 | | 300b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 1 | 0.9/3.3 | | 300c | 4 | 6 | 60.0 | 0 | 1 | 0.4 | 0 | 0 | 0.24 | 0.49 | 4 | 1.8/3.3 | | 300e | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 500a | 2 | 8 | 80.0 | 0 | 1 | 0.2 | 0 | 0 | 0.16 | 0.4 | 2 | 0.9/3.3 | | 504a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 630a | 2 | 9 | 90.0 | 0 | 2 | 0.2 | 0 | 0 | 0.36 | 0.6 | 2 | 0.9/3.5 | | 650a | 15 | 0 | 0.0 | 1 | 3 | 1.5 | 1 | 1 | 0.65 | 0.81 | 6 | 1.7/3.9 | | 650v | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 700a | 5 | 7 | 70.0 | 0 | 2 | 0.5 | 0 | 0 | 0.65 | 0.81 | 5 | 1.9/3.6 | | LDR | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3

As a result you get a table listing the usage of subfields in all the input records. From this output we can learn:

  • The ‘001’ field is available in 10 records (see: count)
  • One record doesn’t contain a ‘020a’ subfield (see: zeros)
  • The ‘650a’ is available in all records at least once at most 3 times (see: min, max)
  • Only 8 out of 10 ‘100a’ subfields have unique values (see: uniq)
  • The last column ‘entropy’ provides a number how interesting the field is for search engines. The higher the entropy, the more uniq content can be found.

I hope this tools are of some use in your projects!

Evergreen ILS: Evergreen 2.12.0 is released

Wed, 2017-03-22 23:24

The Evergreen community is pleased to announce the release of Evergreen 2.12.  The release is available on the Evergreen downloads page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.

Other notable new features and enhancements for 2.12 include:

  • Overdrive and OneClickdigital integration. When configured, patrons will be able to see ebook availability in search results and on the record summary page. They will also see ebook checkouts and holds in My Account.
  • Improvements to metarecords that include:
    • improvements to the bibliographic fingerprint to prevent the system from grouping different parts of a work together and to better distinguish between the title and author in the fingerprint;
    • the ability to limit the “Group Formats & Editions” search by format or other limiters;
    • improvements to the retrieval of e-resources in a “Group Formats & Editions” search;
    • and the ability to jump to other formats and editions of a work directly from the record summary page.
  • The removal of advanced search limiters from the basic search box, with a new widget added to the results page where users can see and remove those limiters.
  • A change to topic, geographic and temporal subject browse indexes that will display the entire heading as a unit rather than displaying individual subject terms separately.
  • Support for right-to-left languages, such as Arabic, in the public catalog. Arabic has also become a new officially-supported language in Evergreen.
  • A new hold targeting service supporting new targeting options and runtime optimizations to speed up targeting.
  • In the web staff client, the ability to apply merge profiles in the record bucket merge and Z39.50 interfaces.
  • The ability to display copy alerts when recording in-house use.
  • The ability to ignore punctuation, such as hyphens and apostrophes, when performing patron searches.
  • Support for recognition of client time zones,  particularly useful for consortia spanning time zones.

Evergreen 2.12 also requires PostgreSQL 9.3, with a recommendation that sites upgrade to PostgreSQL 9.4. It also requires the 2.5 release of OpenSRF. The full feature set for this release is available in the 2.12 Release Notes.

As with all Evergreen releases, many hands contributed to a successful release process. The release is a result of code, documentation, and translation contributions from 46 people representing 23 organizations in the community, along with financial contributions from nine Evergreen sites that commissioned development. Many thanks to everyone who helped make this release happen.

Jonathan Rochkind: Use capistrano to run a remote rake task, with maintenance mode

Wed, 2017-03-22 21:52

So the app I am now working on is still in it’s early stages, not even live to the public yet, but we’ve got an internal server. We periodically have to change a bunch of data in our (non-rdbms) “production” store. (First devops unhappiness, I think there should be no scheduled downtime for planned data transformation. We’re working on it. But for now it happens).

We use capistrano to deploy. Previously/currently, the process for making these scheduled-downtime maintenance mode looked like:

  • on your workstation, do a cap production maintenance:enable to start some downtime
  • ssh into the production machine, cd to the cap-installed app, and run a bundle exec run a rake task. Which could take an hour+.
  • Remember to come back when it’s done and `cap production maintenance:disable`.

A couple more devops unhappiness points here: 1) In my opinion you should ideally never be ssh’ing to production, at least in a non-emergency situation.  2) You have to remember to come back and turn off maintenance mode — and if I start the task at 5pm to avoid disrupting internal stakeholders, I gotta come back after busines hours to do that! I also think every thing you have to do ‘outside business hours’ that’s not an emergency is a not yet done ops environment.

So I decided to try to fix this. Since the existing maintenance mode stuff was already done through capistrano, and I wanted to do it without a manual ssh to the production machine, capistrano seemed a reasonable tool. I found a plugin to execute rake via capistrano, but it didn’t do quite what I wanted, and it’s implementation was so simple that I saw no reason not to copy-and-paste it and just make it do just what I wanted.

I’m not gonna maintain this for the public at this point (make a gem/plugin out of it, nope), but I’ll give it to you in a gist if you want to use it. One of the tricky parts was figuring out how to get “streamed” output from cap, since my rake tasks use ruby-progressbar — it’s got decent non-TTY output already, and I wanted to see it live in my workstation console. I managed to do that! Although I never figured out how to get a cap recipe to require files from another location (I have no idea how I couldn’t make it work), so the custom class is ugly inlined in.

I also ordinarily want maintenance mode to be turned off even if the task fails, but still want a non-zero exit code in those cases (anticipating future further automation — really what I need is to be able to execute this all via cron/at too, so we can schedule downtime for the middle of the night without having to be up then).

Anyway here’s the gist of the cap recipe. This file goes in ./lib/capistrano/tasks in a local app, and now you’ve got these recipes. Any tips on how to organize my cap recipe better quite welcome.

Filed under: General

District Dispatch: After calling Congress, write a letter to the editor

Wed, 2017-03-22 21:20

The single most impactful action you can take to save funding for libraries right now is to contact your member of Congress directly. Once you’ve done that, there is another action you can take to significantly amplify your voice and urge public support for libraries: writing a letter to the editor of your local newspaper.

Each newspaper has its own guidelines for submitting letters to the editor. Source:

If you’ve never done it, don’t let myths get in the way of your advocacy:

Myth 1: My local newspaper is really small, so I don’t want to waste my time. It’s true that the larger the news outlet, the more exposure your letter gets. But it’s also true that U.S. representatives care about the opinions expressed in their own congressional district, where their voters live. For example, if you live in the 15th district of Pennsylvania, your U.S. representative cares more about the Harrisburg Patriot-News and even smaller local newspapers than he does about the Philadelphia Inquirer.

Myth 2: I have to be a state librarian to get my letter printed in the newspaper. Newspaper editorial boards value input from any readers who have specific stories to share about how policies affect real people on a daily basis. Sure, if you’re submitting a letter to the New York Times, having a title increases your chances of getting published. The larger the news outlet, the more competitive it is to get published. But don’t let your title determine the value of your voice. Furthermore, you can encourage your library patrons to write letters to the editor. Imagine the power of a letter written by a veteran in Bakersfield, CA, who received help accessing benefits through the state’s veteransconnect@thelibrary initiative – especially when their U.S. representative is on the Veterans Affairs subcommittee of the House Appropriations Committee.

Myth 3: I don’t have anything special to say in a letter. You don’t need to write a masterpiece, but you need to be authentic. Letters in response to material recently published (within a couple days) stand a better chance of getting printed. How did you feel about a story you read about, for example, the elimination of library programs in the Trump budget? Was there a missing element of the story that needs to be addressed? What new information (statistics) or unique perspective (anecdotes) can you add to what was printed? Is there a library angle that will be particularly convincing to one of your members of Congress (say, their personal interest in small business development)? Most importantly, add a call to action. For example, “We need the full support of Senators {NAME and NAME} and Representative {NAME} to preserve full federal funding for libraries so they can continue to…” Be sure to check our Legislative Action Center for current language you can use.

Ready to write? Here are a few practical tips about how to do it:

Tip 1: Keep it short – in general, maximum 200 words. Every news outlet has its own guidelines for submitting letters to the editor, which are normally published on their website. Some allow longer letters, others shorter. In any case, the more concise and to-the-point, the better.

Tip 2: When you email your letter, paste it into the body of the text and be sure to include your name, title, address and phone number so that you can be contacted if the editor wants to verify that you are the author. Do not send an attachment.

Tip 3: If your letter gets published, send a copy to your representative and senators to reinforce your message (emailing a hyperlink is best). Also, send a copy to the Washington Office (; we can often use the evidence of media attention when we make visits on Capitol Hill.

Finally, get others involved. Recruit patrons, business leaders and other people in your community to write letters to the editor (after they have called their members of Congress, of course!). Editors won’t publish every single letter they get, but the more letters they receive on a specific topic, the more they realize that it is an issue that readers care deeply about – and that can inspire editors to further explore the impact of libraries for themselves.

The post After calling Congress, write a letter to the editor appeared first on District Dispatch.

Terry Reese: Truncating a field by a # of words in MarcEdit

Wed, 2017-03-22 20:34

This question came up on the listserv, and I thought that it might be generically useful that other folks might find it interesting.  Here’s the question:

I’d like to limit the length of the 520 summary fields to a maximum of 100 words and adding the punctuation “…” at the end. Anyone have a good process/regex for doing this? Example: =520  \\$aNew York Times Bestseller Award-winning and New York Times bestselling author Laura Lippman&#x2019;s Tess Monaghan&#x2014;first introduced in the classic Baltimore Blues&#x2014;must protect an up-and-coming Hollywood actress, but when murder strikes on a TV set, the unflappable PI discovers everyone&#x2019;s got a secret. {esc}(S2{esc}(B[A] welcome addition to Tess Monaghan&#x2019;s adventures and an insightful look at the desperation that drives those grasping for a shot at fame and those who will do anything to keep it.{esc}(S3{esc}(B&#x2014;San Francisco Chronicle When private investigator Tess Monaghan literally runs into the crew of the fledgling TV series Mann of Steel while sculling, she expects sharp words and evil looks, not an assignment. But the company has been plagued by a series of disturbing incidents since its arrival on location in Baltimore: bad press, union threats, and small, costly on-set “accidents” that have wreaked havoc with its shooting schedule. As a result, Mann’s creator, Flip Tumulty, the son of a Hollywood legend, is worried for the safety of his young female lead, Selene Waites, and asks Tess to serve as her bodyguard. Tumulty’s concern may be well founded. Recently, a Baltimore man was discovered dead in his home, surrounded by photos of the beautiful&#x2014;if difficult&#x2014;aspiring star. In the past, Tess has had enough trouble guarding her own body. Keeping a spoiled movie princess under wraps may be more than she can handle since Selene is not as naive as everyone seems to think, and instead is quite devious. Once Tess gets a taste of this world of make-believe&#x2014;with their vanities, their self-serving agendas, and their remarkably skewed visions of reality&#x2014;she&#x2019;s just about ready to throw in the towel. But she&#x2019;s pulled back in when a grisly on-set murder occurs, threatening to topple the wall of secrets surrounding Mann of Steel as lives, dreams, and careers are scattered among the ruins. So, there isn’t really a true expression that can break on number of words, in part, because how we define word boundaries will vary between different languages.  Likewise, the MARC formatting can cause a challenge.  So, the best approach is to look for good enough – and in this case, good enough is likely breaking on spaces.  My suggestion is to look for 100 spaces, and then truncate. In MarcEdit, this is easiest to do using the Replace function.  The expression would look like the following: Find: (=520.{4})(\$a)(?<words>([^ ]*\s){100})(.*) Replace: $1$2${words}… Check the use regular expressions option. (image below). So why does this work.  Let’s break it down. Find: (=520.{4}) – this matches the field number, the two spaces related to the mnemonic format, and then the two indicator values. (\$a) – this matches on the subfield a (?<words>([^ ]*\s){100}) – this is where the magic happens.  You’ll notice two things about this.   First, I use a nested expression, and second, I name one.  Why do I do that?  Well, the reason is because the group numbering gets wonky once you start nesting expressions.  In those cases, it’s easier to name them.  So, in this case, I’ve named the group that I want to retrieve, and then have created a subgroup that matches on characters that aren’t a space, and then a space.  I then use the qualifier {100}, which means, must match at least 100 times. (.*) — match the rest of the field. Now when we do the replace, putting the field back together is really easy.  We know we want to reprint the field number, the subfield code, and then the group that captured the 100 units.  Since we named the 100 units, we call that directly by name.  Hence, Replace: $1 — prints out =520  \\ $2 — $a ${words} — prints 100 words … — the literals And that’s it.  Pretty easy if you know what you are looking for. –tr

LITA: Jobs in Information Technology: March 22, 2017

Wed, 2017-03-22 19:35

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Columbus Metropolitan Library, IT Enterprise Applications Manager, Columbus, OH

Auburn University Libraries, Research Data Management Librarian, Auburn, AL

Valencia College, Emerging Technology Librarian, Orlando, FL

Western Carolina University/Hunter Library, Web Development Librarian, Cullowhee, NC

Computercraft Corporation, Online Content Specialist, McLean, VA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Access Conference: Peer reviewers needed!

Wed, 2017-03-22 15:28

Access 2017 is seeking peer reviewers to help select presentations for the conference. Peer reviewers will be responsible for reading Access Conference session proposals and reviewing them for selection.

Peer reviewers will be selected by the Program Committee and will ideally include a variety of individuals across disciplines including Access old-timers and newbies alike. You do not need to be attending the conference to volunteer as a peer reviewer.

The peer review process is double-blind. Those who have submitted Access Conference session proposals are welcome and encouraged to become peer reviewers as well. You won’t receive your own proposal to review. Please note that the peer reviewing activity will take place between April 6 and April 25.

To be considered as a peer reviewer, please attach your abridged CV (max. 5 pages) and provide the following information in the body of an email:

  • Name
  • Position and affiliation
  • A few sentences that explain why you want to be a peer reviewer for Access 2017

Please submit this information to by April 5, 2017.

District Dispatch: House library champions release FY18 “Dear Appropriator” letters

Wed, 2017-03-22 15:19

Your limited-time-only chance to ask for your House Member’s backing for LSTA and IAL begins now.

Where does your Representative stand on supporting FY 2018 library funding? Against the backdrop of the President’s proposal last week to eliminate the Institute for Museum and Library Services and virtually all other library funding sources, their answer this year is more important than ever before.

Every Spring, library champions in Congress ask every Member of the House to sign two, separate “Dear Appropriator” letters directed to the Appropriations Committee: one urging full funding for LSTA (which benefits every kind of library), and the second asking the same for the Innovative Approaches to Literacy program. This year, the LSTA support letter is being led by Rep. Raul Grijalva (D-AZ3). The IAL support letter is being jointly led by Reps. Eddie Bernice Johnson (D-TX30), Don Young (R-AK), and Jim McGovern (D-MA2).

The first “Dear Appropriator” letter asks the Committee to fully fund LSTA in FY 2018 and the second does the same for IAL. When large numbers of Members of Congress sign these letters, it sends a strong signal to the House Appropriations Committee to reject requests to eliminate IMLS, and to continue funding for LSTA and IAL at least at current levels.

Members of the House have only until April 3 to let our champions know that they will sign the separate LSTA and IAL “Dear Appropriator” letters now circulating, so there’s no time to lose. Use ALA’s Legislative Action Center today to ask your Member of Congress to sign both the LSTA and IAL letters. Many Members of Congress will only sign such a letter if their constituents ask them to. So it is up to you to help save LSTA and IAL from elimination or significant cuts that could dramatically affect hundreds of libraries and potentially millions of patrons.

Five minutes of your time could help preserve over $210 million in library funding now at risk.

Soon, we will also need you to ask both of your US Senators to sign similar letters not yet circulating in the Senate, but timing is key. In the meantime, today’s the day to ask your Representative in the House for their signature on both the LSTA and IAL “Dear Appropriator” letters that must be signed no later than April 3.

Whether you call, email, tweet or all of the above (which would be great), the message to the friendly office staff of your Senators and Representative is all laid out at the Legislative Action Center and it’s simple:

“Hello, I’m a constituent. Please ask Representative  ________ to sign both the FY 2018 LSTA and IAL ‘Dear Appropriator’ letters circulating for signature before April 3.”

Please, take five minutes to call, email, or Tweet at your Members of Congress  and watch this space throughout the year for more on how you can help preserve IMLS and federal library funding. We need your help this year like never before.

Supporting Documents:

IAL Approps FY18 Letter

IAL FY18 Dear Colleague

Dear Appropriator letter for LSTA FY2018

Dear Collegue letter FY2018

The post House library champions release FY18 “Dear Appropriator” letters appeared first on District Dispatch.

Open Knowledge Foundation: Open Data Day 2017: Tracking money flows on development projects in the Accra Metropolis

Wed, 2017-03-22 14:00

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

Open data is becoming popular in policy discussions, media discourse and everyday conversations in Ghana, and this year TransGov Ghana had the opportunity, as one of two organisations in the country, to organise Open Data Day 2017. It was under the theme: “Following the money: Tracking money flows on development projects in the Accra Metropolis”. The objective for this year’s event was to clean up and standardise datasets on development projects for deployment to the Ghana Open Data Portal and to give the participants insights into how the government spends public funds on development projects in their local communities.

Who was at the event?

Open Data Day provided an opportunity for various stakeholders within the open data ecosystem to meet for the first time and to network. In attendance were Mohammed Awal, Research Officer from Center for Democratic Development Ghana (CDD-Ghana) , Jerry Akanyi-King, CEO of TransGov Ghana, a startup that enhances civic engagement with government by enabling citizens to monitor and track development projects in their local communities, and Paul Damalie, CEO of Inclusive, a local startup that provides a single identity verification API that connects unbanked Africans to the global economy.

Participants at the Open Data Day event

Also in attendance were Adela Gershon, Project Manager at Oil Journey, a Civil Society Organization (CSO) that uses Open Data to give Ghanaian citizens insights into how oil revenues are spent; and Joel Funu, Chief Product Officer at SynCommerce, an avid open data proponent; and many others including journalists, students from the Computer Science department of the University of Ghana, open data enthusiasts and members of the local developer community.

The state of Open Data in Ghana

The event kicked off at 10:00 am with a discussion on open data in Ghana, its application, the challenges facing the Ghana Open Data Initiative (GODI) and Civil Society Organisations (CSOs) involved in open data work, and the future of the open data ecosystem in Ghana. The discussion also sought to gather consensus on what the key stakeholders in the sector can do to facilitate the passage of Ghana’s Right to Information Bill currently before Parliament. It was an open discussion which involved all participants.  

The discussions were moderated by Paul Damalie, and panellists included Jerry Akanyi-King, CEO of TransGov Ghana, Adela Gershon, Project Manager at Oil Journey and Mohammed Awal, Research Officer at CDD-Ghana.

Guest panellists included (from right) Adela Gershon (OilJourney) and Jerry Akanyi-King (TransGov)

Mohammed Awal spoke about an open data project initiated by CDD-Ghana known as “I’m Aware”, which collects, analyses, archives, and disseminates user-friendly socio-economic data on the state of public goods and public service delivery in 216 districts, located in all the ten regions of Ghana. He agreed with the other panellists on the difficulty of getting data from government and suggested a few strategies that can help other organisations that relied on government for data.

Mr Adela Gershon also spoke about his experiences from working at Oil Journey. He observed that the Ghana Open Data Initiative (GODI) has not been helpful to their cause thus far and he called for closer collaboration between GODI and CSOs. Jerry Akany-King also chimed in with experiences from TransGov. Panellists and participants stressed that CSOs have tended to work in parallel in the past, often working to solve similar problems whilst being totally oblivious of each other and should begin to collaborate more to share knowledge and to enhance open data work in Ghana.

Diving into the Datasets

After the discussions, the attendees formed two teams and were given a dataset of developmental projects. The teams were assigned task leaders and were introduced to the process involved in converting the data from a pdf format to cleaning it up, and to finally visualising it. The two teams were able to come up with visualisations which provided insights into how public funds were spent on development projects.

The first group were able to come up with visualisations on how much was spent on development projects in total from 2012 to 2015. They also visualised how these development projects are distributed across the Accra Metropolis.

Visualisation showing how development projects are distributed across 10 major suburbs in Accra Metropolis

The second group also visualised the number of projects across the city and which contractors got the most contracts within specific time periods. From their analysis, we also found out which development partner organisations provided the most support to the government within the period.

Visualisation displaying the distribution of contracts among top 18 contractors in the Accra Metropolis

Lessons learned 

The developer community in Ghana has not fully embraced open data because there is a yawning knowledge and skills gap. More work has to be done to educate both government and the general public about open data and its social, political and economic benefits. Furthermore, capacity building for CSOs engaged in open data work will go a long way to enhance their work and strengthen the open data ecosystem in Ghana.

What’s Next?

We’ve submitted the cleaned up dataset to the technical team at the Ghana Open Data Initiative and await deployment to the Ghana Open Data portal. We’re also providing support to the student reps from the University of Ghana Computer Science department to form Open Data Clubs at the University to help them build capacity and hopefully grow the ecosystem across other tertiary institutions in Ghana.

A big thanks to Open Knowledge International and the UK Foreign & Commonwealth Office for providing the mini-grant to make this possible.