You are here

Feed aggregator

DuraSpace News: June DSpace User Group Meeting in Geneva

planet code4lib - Mon, 2017-03-27 00:00

From Atmire  

Heverlee, Belgium  DuraSpace and Atmire invite you to take part in a free DSpace User Group Meeting on June 20th, prior to the OAI10 conference in Geneva.

DuraSpace News: The Spanish Institute of Oceanography Repository Launches New Service

planet code4lib - Mon, 2017-03-27 00:00

From Emilio Lorenzo, Arvo Consulting The Spanish Institute of Oceanography is a public body devoted to multidisciplinary research into oceanographic and marine sciences in order to attain scientific knowledge of the oceans and sustainability of fishery resources and the marine environment.

District Dispatch: We’re only as good as our members are engaged

planet code4lib - Fri, 2017-03-24 20:29

This week at the Association of College and Research Libraries (ACRL) 2017 Conference, academic librarians and information professionals convened around the emerging issues challenging higher education, due to federal funding cuts and new regulations.

On Thursday morning, ALA and ACRL jointly hosted a Postcards to Lawmakers town hall, during which member leaders Emily Drabinski, coordinator of library instruction at Long Island University in Brooklyn, and Clay Williams, deputy chief librarian at Hunter College in Manhattan and our very own Lisa Lindle, grassroots communication specialist at the ALA Washington Office, offered insight to those seeking advice and encouragement about how to effectively advocate for libraries in the face of drastic cuts. The panel offered insight on how to sign up for and use ALA’s Legislative Action Center and collected questions from the audience. American Libraries magazine covered their talk here.

On Friday morning’s Academic Libraries and New Federal Regulations session, Corey Williams, a federal lobbyist at the National Education Association (and formally an ALA lobbyist in the Washington Office) again urged members to step up to the plate. Corey made two illustrative points: ALA has 3 lobbyists for our nearly 60,000 members and one person is only one voice. In other words: Lobbyists are only as good as our members are engaged.

Advocacy is akin to a muscle; you can flex it once, by sending a tweet or an email. But we are one mile into a marathon and advocacy is a muscle that needs to be exercised constantly. Both town halls offered some immediate steps you can take in this next leg of the race.

Do Your Reps
• Did you write a postcard? Great. Now tweet a picture of that postcard to your representatives with the line: “No cuts for funding for museums and libraries. #SaveIMLS

Short Sprints
• Sign up for ALA’s Legislative Action Center.
• Then, prepare a talking point about why IMLS is important to your community and share it with a friend or patron so you can customize your emails to Congress.

• Invite your representatives to visit your library (ProTip: Work with your organization’s government relations office to coordinate.)
• Attend a constituent coffee when your reps are home during the weeks of April 10 and April 17 (Note: This period of time that they’re home also happens to be National Library Week. If that time is not possible, other times are good, too, whenever the Member is at home.)
• Think about who can you partner or create a coalition with in your community.
• Pair your data (i.e., how much LSTA funding you receive) with anecdote (i.e. how that money made a transformative difference to your patrons).

In response to other that came up, here are two other helpful references:
• Here’s what the National Archives and Records Administration says about irradiated mail
• Here’s where you can look up your representative’s social media handle

The post We’re only as good as our members are engaged appeared first on District Dispatch.

District Dispatch: Look Back, Move Forward: Library Services and Technology Act

planet code4lib - Fri, 2017-03-24 18:01

Thank you to everyone for sharing your #SaveIMLS stories. Please keep it coming – more than 7,700 tweets (nearly doubling our count since last Thursday). As we prepare for the appropriations process, here’s a look back on how ALA Council resolved to support the Library Services and Technology Act in June 1995.

As we move forward into the “Dear Appropriator Letters” be sure to sign up for our Legislative Action Center today.

The post Look Back, Move Forward: Library Services and Technology Act appeared first on District Dispatch.

FOSS4Lib Recent Releases: Evergreen - 2.12.0

planet code4lib - Fri, 2017-03-24 13:36
Package: EvergreenRelease Date: Wednesday, March 22, 2017

Last updated March 24, 2017. Created by gmcharlt on March 24, 2017.
Log in to edit this page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.
Other notable new features and enhancements for 2.12 include:

Ed Summers: Teaching Networks

planet code4lib - Fri, 2017-03-24 04:00

Yesterday I had the good fortune to speak with Miriam Posner, Scott Weingart and Thomas Padilla about their experiences teaching digital humanities students about network visualization, analysis and representation. This started as an off the cuff tweet about teaching Gephi, which led to an appointment to chat, and then to a really wonderful broader discussion about approaches to teaching networks:

(???) (???) have either of you taught a DH class about how to use Gephi? Or do you know of someone else who has?

— Ed Summers ((???)) March 10, 2017

Scott suggested that other folks who teach this stuff in a digital humanities context might be interested as well so we decided to record it, and share it online (see below).

The conversation includes some discussion of tools (such as Gephi, Cytoscape, NodeXL, Google Fusion Tables, DataBasic, R) but also some really neat exercises for learning about networks with yarn, balls, short stories and more.

A particular fun part of discussion focuses on approaches to teaching graph measurement and analytics as well as humanistic approaches to graph visualization that emphasize discovery and generative creativity.

During the email exchange that led up to our chat Miriam, Scott and Thomas shared some of their materials which you may find useful in your own teaching/learning:

I’m going to be doing some hands-on exercises about social media, networks and big data in Matt Kirschenbaum‘s Digital Studies class this Spring – and I was really grateful for Miriam, Scott and Thomas’ willingness to share their experiences with me.

Anyhow, here’s the video! If you want to get to the good stuff skip to 8:40 where I stop selfishly talking about the classes were teaching at MITH.

PS. this post was brought to you by the letter B since (as you will see) Thomas thinks that blogs are sooooo late 2000s :-) I suspect he is right, but I’m clearly still tightly clutching to my vast media empire.

Eric Hellman: Reader Privacy for Research Journals is Getting Worse

planet code4lib - Thu, 2017-03-23 17:22
Ever hear of Grapeshot, Eloqua, Moat, Hubspot, Krux, or Sizmek? Probably not. Maybe you've heard of Doubleclick, AppNexus, Adsense or Addthis? Certainly you've heard of Google, which owns Doubleclick and Adsense. If you read scientific journal articles on publisher websites, these companies that you've never heard of will track and log your reading habits and try to figure out how to get you to click on ads, not just at the publisher websites but also at websites like and the Huffington Post.

Two years ago I surveyed the websites of 20 of the top research journals and found that 16 of the top 20 journals placed trackers from ad networks on their web sites. Only the journals from the American Physical Society (2 of the 20) supported secure (HTTPS) connections, and even now APS does not default to being secure.

I'm working on an article about advertising in online library content, so I decided to revisit the 20 journals to see if there had been any improvement. Over half the traffic on the internet now uses secure connections, so I expected to see some movement. One of the 20 journals, Quarterly Journal of Economics, now defaults to a secure connection, significantly improving privacy for its readers. Let's have a big round of applause for Oxford University Press! Yay.

So that's the good news. The bad news is that reader privacy at most of the journals I looked at got worse. Science, which could be loaded securely 2 years ago, has reverted to insecure connections. The two Annual Reviews journals I looked at, which were among the few that did not expose users to advertising network tracking, now have trackers for AddThis and Doubleclick. The New England Journal of Medicine, which deployed the most intense reader tracking of the 20, is now even more intense, with 19 trackers on a web page that had "only" 14 trackers two years ago. A page from Elsevier's Cell went from 9 to 16 trackers.

Despite the backwardness of most journal websites, there are a few signs of hope. Some of the big journal platforms have begun to implement HTTPS. Springer Link defaults to HTTPS, and Elsevier's Science Direct is delivering some of its content with secure connections. Both of them place trackers for advertising networks, so if you want to read a journal article securely and privately, your best bet is still to use Tor.

David Rosenthal: Threats to stored data

planet code4lib - Thu, 2017-03-23 15:32
Recently there's been a lively series of exchanges on the pasig-discuss mail list, sparked by an inquiry from Jeanne Kramer-Smyth of the World Bank as to any additional risks posed by media such as disks that did encryption or compression. It morphed into discussion of the "how many copies" question and related issues. Below the fold, my reflections on the discussion.

The initial question was pretty clearly based on a misunderstanding of the way self-encrypting disk drives (SED) and hardware compression in tape drives work. Quoting the Wikipedia article Hardware-based full disk encryption:
The drive except for bootup authentication operates just like any drive with no degradation in performance. The encrypted data is never visible outside the drive, and the same is true for the compressed data on tape. So as far as systems using them are concerned, whether the drive encrypts or not is irrelevant. Unlike disk, tape capacities are quoted assuming compression is enabled. If your data is already compressed, you likely get no benefit from the drive's compression.

SED have one additional failure mode over regular drives; they support a crypto erase command which renders the data inaccessible. The effect as far as the data is concerned is the same as a major head crash. Archival systems that fail if a head crashes are useless, so they must be designed to survive total loss of the data on a drive. There is thus no reason not to use self-encrypting drives, and many reasons why one might want to.

But note that their use does not mean there is no reason for the system to encrypt the data sent to the drive. Depending on your threat model, encrypting data at rest may be a good idea. Depending on the media to do it for you, and thus not knowing whether or how it is being done, may not be an adequate threat mitigation.

Then the discussion broadened but, as usual, it was confusing because it was about protecting data from loss, but not based on explicit statements about what the threats to the data were, other than bit-rot.

There was some discussion of the "how many copies do we need to be safe?" question. Several people pointed to research that constructed models to answer this question. I responded:
Models claiming to estimate loss probability from replication factor, whether true replication or erasure coding, are wildly optimistic and should be treated with great suspicion. There are three reasons:
  • The models are built on models of underlying failures. The data on which these failure models are typically based are (a) based on manufacturers' reliability claims, and (b) ignore failures upstream of the media. Much research shows that actual failures in the field are (a) vastly more likely than manufacturers' claims, and (b) more likely to be caused by system components other than the media.
  • The models almost always assume that the failures are un-correlated, because modeling correlated failures is much more difficult, and requires much more data than un-correlated failures. In practice it has been known for decades that failures in storage systems are significantly correlated. Correlations among failures greatly raise the probability of data loss.
  • The models ignore almost all the important threats, since they are hard to quantify and highly correlated. Examples include operator error, internal or external attack, and natural disaster.
For replicated systems, three replicas is the absolute minimum IF your threat model excludes all external or internal attacks. Otherwise four (see Byzantine Fault Tolerance).

For (k of n) erasure coded systems the absolute minimum is three sites arranged so that k shards can be obtained from any two sites. This is because shards in a single site are subject to correlated failures (e.g. earthquake).This is a question I've blogged about in 2016 and 2011 and 2010, when I concluded:
  • The number of copies needed cannot be discussed except in the context of a specific threat model.
  • The important threats are not amenable to quantitative modeling.
  • Defense against the important threats requires many more copies than against the simple threats, to allow for the "anonymity of crowds".
In the discussion Matthew Addis of Arkivum made some excellent points, and pointed to two interesting reports:
  • A report from the PrestoPrime project. He wrote:
    There’s some examples of the effects that bit-flips and other data corruptions have on compressed AV content in a report from the PrestoPRIME project. There’s some links in there to work by Heydegger and others, e.g. impact of bit errors on JPEG2000. The report mainly covers AV, but there are some references in there about other compressed file formats, e.g. work by CERN on problems opening zips after bit-errors. See page 57 onwards.
  • A report from the EU's DAVID project. He wrote:
    This was followed up by work in the DAVID project that did a more extensive survey of how AV content gets corrupted in practice within big AV archives. Note that bit-errors from storage, a.k.a bit rot was not a significant issue, well not compared with all the other problems!
Matthew wrote the 2010 PrestoPrime report, building on among others Heydegger's 2008 and 2009 work on the effects of flipping bits in compressed files (Both links are paywalled but the 2008 paper is available via the Wayback Machine). The 2013 DAVID report concluded:
It was acknowledged that some rare cases or corruptions might have been explained by the occurrence of bit rot, but the importance and the risk of this phenomenon was at the present time much lower than any other possible causes of content losses. On the other hand, they were clear that:
Human errors are a major cause of concern. It can be argued that most of the other categories may also be caused by human errors (e.g. poor code, incomplete checking...), but we will concentrate here on direct human errors. In any complex system, operators have to be in charge. They have to perform essential tasks, maintaining the system in operation, checking that resources are sufficient to face unexpected conditions, and recovering the problems that can arise. However vigilant an operator is, he will always make errors, usually without consequence, but sometimes for the worst. The list is virtually endless, but one can cite:
  • Removing more files than wanted
  • Removing files in the wrong folder
  • Pulling out from a RAID a working disk instead of the faulty one
  • Copying and editing a configuration file, not changing all the necessary parameters
  • Editing a configuration file into a bad one, having no backup
  • Corrupting a database
  • Dropping a data tape / a hard disk drive
  • Introducing an adjustment with unexpected consequences
  • Replacing a correct file or setup from a wrong backup.
Such errors have the potential for affecting durably the performances of a system, and are not always reversible. In addition, the risk of error is increased by the stress introduced by urgency, e.g. when trying to make some room on in storage facilities approaching saturation, or introducing further errors when trying to recover using backup copies. We agree, and have been saying so since at least 2005. And the evidence keeps rolling in. For example, on January 31st suffered a major data loss. Simon Sharwood at The Register wrote:
Source-code hub is in meltdown after experiencing data loss as a result of what it has suddenly discovered are ineffectual backups. ... Behind the scenes, a tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated. Commendably, Gitlab made a Google Doc public with a lot of detail about the problem and their efforts to mitigate it:
  1. LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage
  2. Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don’t appear to be working, producing files only a few bytes in size.
    1. SH: It looks like pg_dump may be failing because PostgreSQL 9.2 binaries are being run instead of 9.6 binaries. This happens because omnibus only uses Pg 9.6 if data/PG_VERSION is set to 9.6, but on workers this file does not exist. As a result it defaults to 9.2, failing silently. No SQL dumps were made as a result. Fog gem may have cleaned out older backups.
  3. Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers.
  4. The synchronisation process removes webhooks once it has synchronised data to staging. Unless we can pull these from a regular backup from the past 24 hours they will be lost
  5. The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented
    1. SH: We learned later the staging DB refresh works by taking a snapshot of the gitlab_replicator directory, prunes the replication configuration, and starts up a separate PostgreSQL server.
  6. Our backups to S3 apparently don’t work either: the bucket is empty
  7. We don’t have solid alerting/paging for when backups fails, we are seeing this in the dev host too now.
So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. => we're now restoring a backup from 6 hours ago that workedThe operator error revealed the kind of confusion and gradual decay of infrastructure processes that is common when procedures are used only to recover from failures, not as a routine. Backups that are not routinely restored are unlikely to work when you need them. The take-away is that any time you reach for the backups, you're likely already in big enough trouble that your backups can't fix it. I was taught this lesson in the 70s. The early Unix dump command failed to check the return value from the write() call. If you forgot to write-enable the tape by inserting the write ring the dump would appear to succeed, the tape would look like it was spinning, but no data would be written to the backup tape.

Fault injection should be, but rarely is, practiced at all levels of the system. The results of not doing so are shown by UW Madison's work injecting faults into file systems and distributed storage. My blog posts on this topic include Injecting Faults in Distributed Storage, More bad news on storage reliability, and Forcing Frequent Failures.

Update: much as I love Kyoto, as a retiree I can't afford to attend iPRES2017. Apparently, there's a panel being proposed on the "bare minimum" for digital preservation. If I were on this panel I'd be saying something like the following.

We know the shape of the graph of loss probability against cost - it starts at one at zero cost and is an S-curve that gets to zero at infinite cost. Unfortunately, because the major threats to stored data are not amenable to quantitative modeling (see above), and technologies differ in their cost-effectiveness, we cannot actually plot the graph. So there are no hard-and fast answers.

The real debate here is how to distinguish between "digital storage" and "digital preservation". We do have a hard-and-fast answer for this. There are three levels of certification; the Data Seal of Approval (DSA), NESTOR's DIN31644, and TRAC/ISO16363. If you can't even pass DSA then what you're doing can't be called digital preservation.

Especially in the current difficult funding situation, it is important NOT to give the impression that we can "preserve" digital information with ever-decreasing resources, because then what we will get is ever-decreasing resources. Because there will always be someone willing to claim that they can do the job cheaper. Their short-cuts won't be exposed until its too late. That's why certification is important.

We need to be able to say "I'm sorry, but preserving this stuff costs this much. Less money, no preservation, just storage.".

Open Knowledge Foundation: The Global Open Data Index – an update and the road ahead

planet code4lib - Thu, 2017-03-23 14:00

The Global Open Data Index is a civil society collaborative effort to track the state of open government data around the world. The survey is designed to assess the openness of specific government datasets according to the Open Definition. Through this initiative, we want to provide a civil society audit of how governments actually publish data with input and review from citizens and organisations. This post describe our future timeline for the project. 


Here at Open Knowledge International, we see the Global Open Data Index (aka GODI) as a community effort. Without community contributions and feedback there is no index. This is why it is important for us to keep the community involved in the index as much as we can (see our active forum!). However, in the last couple of months, lots has been going on with GODI. In fact so much was happening that we neglected our duty to report back to our community. So based on your feedback, here is what is going on with GODI 2016:


New Project Management

Katelyn Rogers, who managed the project until January 2017, is now leading the School of Data program. I have stepped in to manage the Index until its launch this year. I am an old veteran to GODI, being its research and community lead for 2014 and 2015, so this is a natural fit for me and the project. This is done with my work as the International Community Coordinator and the Capacity team lead, but fear not, GODI is a priority!


This change in project management allowed us to take some time and modify the way we manage the project internally. We moved all of our current and past tasks: code content and research to the public Github account. You can see our progress on the project here-


Project timeline

Now, after the handover is done, it is easier for us to decide on the road forward for GODI (in coordination with colleagues at the World Wide Web Foundation, which publishes the Open Data Barometer). We are happy to share with you the future timeline and approach of the Index:

  • Finalising review: In the last 6 weeks, we have been reviewing the different index categories of 94 places. Like last year, we took the thematic reviewer approach, in which each reviewer checked all the countries under one category. We finished the review by March 20th, and we are now running quality assurance for the reviewed submissions, mainly looking for false positives of datasets that have been defined as complying with the Open Definition.


  • Building the GODI site: This year we paid a lot of attention to the development of our methodology and changed the survey site to reflect it and allow easy customization (see Brook’s blog). We are now finalising the result site so it will have even better user experience than past years.
  • Launch! The critical piece of information that many of you wanted! We will launch the Index on May 2nd, 2017! And what a launch it is going to be!
    Last year we gave a 3 weeks period for government and civil society to review and suggest corrections for our assessment of the Index on the survey app, before publishing the permanent index results. This was not obvious to many, and we got many requests for corrections or clarifications after publishing the final GODI.
    This year, we will publish the index results, and data publishers and civil society will have the opportunity to contest the results publicly through our forum for 30 days. We will follow the discussions to decide if we should change some results or not. The GODI team believes that if we are aspiring to be a tool for not only measuring but also for learning open data publication, we need to allow civil society and government to engage around the results in the open. We already see the great engagement of some governments in the review process of GODI (See Mexico and Australia), and we would like to take this even one step further, making this a tool that can help and improve open data publication around the world.
  • Report: After fixing the Index result, we will publish a report on our learnings from GODI 2016. This is the first time that we will write a report on the Global Open Data Index findings, and we hope that this will help us not only in creating better GODI in the future but also to promote and publish better datasets.


Have any question? Want to know more about the upcoming GODI? Have ideas for improvements? Start a topic in the forum:


Open Knowledge Foundation: Open data day 2017 in Uganda: Open contracting, a key to inclusive development

planet code4lib - Thu, 2017-03-23 13:56

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

On Friday 3rd March 2017, the Anti-Corruption Coalition Uganda (ACCU) commemorated the International Open Data Day 2017 with a meetup of 37 people from Civil Society Organizations (CSOs), development partners, the private sector and the general public. The goal of this meetup was to inform Ugandan citizens, media and government agencies on the importance of open data in improving public service delivery.


The process started with an overview of open data since the concept seemed to be new to most participants. Ms. Joy Namunoga, Advocacy Officer at ACCU, highlighted the benefits of open data, including value for money for citizens and taxpayers, knowing governments transactions, holding leaders accountable, constructive criticism to influence policy, boosting transparency, reducing corruption and increasing social accountability.

With such a background, participants observed the fact that in Uganda, 19% of people have access to the internet. Hence the need to embrace media as a third party to interpret data and take the information closer to citizens. Participants noted that, while Uganda has an enabling policy framework for information sharing; the Access to Information Act and regulations require information to be paid for, namely $6, yet the majority of Ugandans live below $2 a day. The financial requirement denies a percentage of Ugandans their right to know. It was also noted that CSOs and government agencies equally do not avail all the information on their websites, which further underscores this fact.

Issues discussed Open contracting

Mr. Agaba Marlon, Communications Manager ACCU took participants through the process of open contracting as highlighted below:

Figure 1: Open contracting process

He showcased ACCU’s Open Contracting platform commonly known as USER (Uganda System for Electronic open data Records), implemented in partnership with Kampala Capital City Authority (KCCA), a government agency, and funded by the United Nations Development Programme. This platform created a lively conversation amongst the participants, and the following issues were generated to strengthen open contracting in Uganda:

  • Popularizing open data and contracting in Uganda by all stakeholders.
  • Mapping people and agencies in the open contracting space in Uganda to draw lines on how to complement each other.
  • Lobbying and convincing government institutions to embrace the open contracting data standards.
  • Stakeholders like civil society should be open before making the government open up.
  • Simplification of Uganda’s procurement laws for easier understanding by citizens.
  • Faming and shaming of the best and worst contractors as well as advocating for penalties to those who fraud rules.
  • Initiating and strengthening of information portals i.e., both offline and online media.
Bringing new energy and people to the open data movement

Mr. Micheal Katagaya, an open data activist, chaired this session. Some suggestions were made that can bring new energy to the open data movement, such as renegotiate open data membership with the government, co-opting celebrities (especially musicians) to advocate for open data, simplifying data and packaging it in user-friendly formats and linking data to problem-solving principles. Also, thematic days like International women’s day, youth day or AIDS day could be used to spread a message on open data, and local languages could be used to localise the space for Ugandans to embrace open data. Finally, it was seen as important to understand audiences and package messages accordingly, and to identify open data champions and ambassadors.

Sharing open data with citizens who lack internet access

This session was chaired by Ms. Pheona Wamayi an independent media personality. Participants agreed that civil society and government agencies should strengthen community interfaces between government and citizens because these enable citizens to know of government operations. ACCU was encouraged to use her active membership in Uganda to penetrate the information flow and disseminate it to the citizens. Other suggestions included:

  • Weekly radio programs on open data and open contracting should be held. Programs should be well branded to suit the intended audiences.
  • Simplified advocacy materials should be produced for community members’ i.e., leaflets, posters to inform the citizens on open data. Community notice boards could be used to disseminate information on open data.
  • Civil society and government should liaise with telecom companies to provide citizens with the internet.
  • Edutainment through music and forum theatre should be targeted to reach citizens on open data.

Way forward

Ms. Ephrance Nakiyingi, Environmental Governance officer-ACCU took participants through the action planning process. The following points were suggested as key steps to pursue as stakeholders:

  • Consider offline strategies like SMS to share data with citizens
  • Design  massive open data campaigns to bring new energy to the movement
  • Develop a multi-media strategy based on consumer behaviour
  • Creating synergies between different open data initiatives
  • Embrace open data communication
  • Map out other actors in the open data fraternity
  • In-house efforts to share information/stakeholder openness

pinboard: Twitter

planet code4lib - Thu, 2017-03-23 12:45
Have not read the full report but based on the abstract seems useful to those involved in the #code4lib incorporati…

Terry Reese: MarcEdit and Alma Integration: Working with holdings data

planet code4lib - Thu, 2017-03-23 11:52

Ok Alma folks,

 I’ve been thinking about a way to integrate holdings editing into the Alma integration work with MarcEdit.  Alma handles holdings via MFHDs, but honestly, the process for getting to holdings data seems a little quirky to me.  Let me explain.  When working with bibliographic data, the workflow to extract records for edit and then update, looks like the following:


  1. Records are queried via Z39.50 or SRU
  2. Data can be extracted directly to MarcEdit for editing



  1. Data is saved, and then turned into MARCXML
  2. If the record has an ID, I have to query a specific API to retrieve specific data that will be part of the bib object
  3. Data is assembled in MARCXML, and then updated or created.


Essentially, an update or create takes 2 API calls.

For holdings, it’s a much different animal.


  1. Search via Z39.50/SRU
  2. Query the Bib API to retrieve the holdings link
  3. Query the holdings link api to retrieve a list of holding ids
  4. Query each holdings record API individually to retrieve a holdings object
  5. Convert the holdings object to MARCXML and then into a form editable in the MarcEditor
    1. As part of this process, I have to embed the bib_id and holdin_id into the record (I’m using a 999 field) so that I can do the update


For Update/Create

  1. Convert the data to MARCXML
  2. Extract the ids and reassemble the records
  3. Post via the update or create API


Extracting the data for edit is a real pain.  I’m not sure why so many calls are necessary to pull the data.

 Anyway – Let me give you an idea of the process I’m setting up.

First – you query the data:

Couple things to note – to pull holdings, you have to click on the download all holdings link, or right click on the item you want to download.  Or, select the items you want to download, and then select CTRL+H.

When you select the option, the program will prompt you to ask if you want it to create a new holdings record if one doesn’t exist. 


The program will then either download all the associated holdings records or create a new one.

Couple things I want you to notice about these records.  There is a 999 field added, and you’ll notice that I’ve created this in MarcEdit.  Here’s the problem…I need to retain the BIB number to attach the holdings record to (it’s not in the holdings object), and I need the holdings record number (again, not in the holdings object).  This is a required field in MarcEdit’s process.  I can tell if a holdings item is new or updated by the presence or lack of the $d. 


Anyway – this is the process that I’ve come up with…it seems to work.  I’ve got a lot of debugging code to remove because I was having some trouble with the Alma API responses and needed to see what was happening underneath.  Anyway, if you are an Alma user, I’d be curious if this process looks like it will work.  Anyway, as I say – I have some cleanup left to do before anyone can use this, but I think that I’m getting close.



Open Knowledge Foundation: Code for Ghana celebrates Open Data Day tracking public money flows

planet code4lib - Thu, 2017-03-23 11:14

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth Office. This event was supported through the mini-grants scheme under the Open contracting and tracking public money flows theme.

This year, Code for Ghana organised their Open Data Day event at Mobile Web Ghana. The theme for the event was “Open Contracting and tracking public money flows”. Open contracting involves analysing government contract data to have a better understanding of how government spend public funds. We had a lot of open contracting resources from This helped the entire team to understand the concept and its importance in increasing transparency and accountability in the governance of a country.

Florence Toffa, project coordinator of Code for Ghana, did an introductory presentation on Open Contracting. To about 98% of the attendees, open contracting was a new concept and this was the first time they tried their hands on datasets related to open contracting. Participants were introduced to the ‘what’, the benefits and ‘why’ open contracting should be embraced by everyone if we want to get rid of corruption in our society. Moreover, about 15 out of the 20 attendees were new to data scraping, data analysis and data visualisation.

Introduction to D3.JS

The participants were taken through a training session in D3.JS by David Lartey (software developer). D3.js is a JavaScript library for manipulating documents based on data. They were taught the basics of the language and how to make some interesting visualisations.

Data Scraping

Shadrack Boadu (software developer and data enthusiast) also taught data scraping. He introduced the participants to two ways of scraping data using Google sheets and tabular. He talked about the importance of cleaning the data and converting it into a useable format to facilitate accurate data analysis and representations.

Before breaking out into various groups, Code for Ghana provided datasets on Government budget (2015 – 2016), Developmental projects procurement and Ghana health service data. The next task was for the participants to combine their skills to come up with relevant insights and visualisations.

The Open Data Day Projects 

The first team (Washington) presented a visualisation (pie chart) on the procurement of the Ghana health Service for the year 2016. Their visualisation gave insights on the volumes of procurement of the Ghana health service. See visualisation:

The second team (Terrific) designed a series of visualisations. These visualisations included the state of developmental projects in Ghana and sources of developmental projects in Ghana. See images below:


Team Meck, the third team, developed a database [web platform] for all the government projects from 2002 to 2016. From the database, one could easily key in a few keywords and bring up a particular result. Unfortunately, the team was not able to complete the web platform on the day.

The fourth team, team Rex after cleaning their data, came up with a visualisation representing the overview of developmental projects. Their project focused on government project success, sources of government funding and project allocations that are done by consultants.

The final team, team Enock developed a web app that visualised government contracts. They focused on analysing procurement contracts from the Ghana health service.

After the presentations, the judges for the event Mr Nehemiah Attigah (Co-founder of Odekro), Mr Wisdom Donkor from National Information Technology Agency (NITA) gave their verdicts. The judges spoke about the importance of open data and the role it plays in the development of transparency and accountability in the Ghanaian society. They also emphasised the need for the participants to always present data in a way that paints an accurate picture and also visualising information that can be easily digested by society. The best three projects were awarded prizes.


Our takeaway from the event is: one day is usually too short to develop a sustainable project. So some of the teams are still working on their projects. For some of the youths, it was an eyeopener. They never knew the importance of data and how it shapes the future of development in the country. To these youth, the event was a success because they gained valuable skills that they would build on.

Alf Eaton, Alf: Symfony Forms

planet code4lib - Thu, 2017-03-23 10:28

At the end of several years working with Symfony, I’m taking a moment to appreciate its strongest points. In particular, allowing users to apply mutations to objects via HTML forms.

The output of a Symfony endpoint (controller action) is usually either a data view (displaying data as HTML, rendered with Twig), or a form view (displaying a form as HTML and receiving the submitted form).

Symfony isn’t particularly RESTful, though you can use collection + resource-style URLs if you like:

  • /articles/ - GET the collection of articles
  • /articles/_create - GET/POST a form to create a new article
  • /articles/{id}/ - GET the article
  • /articles/{id}/_edit - GET/POST a form to edit the article
  • /articles/{id}/_delete - GET/POST a form to delete the article

The entity, controller, form and voter for creating and editing an article look like something this:

// ArticleBundle/Entity/Article.php class Article { /** * @var int * * @ORM\Id * @ORM\GeneratedValue * @ORM\Column(type="integer") */ private $id; /** * @var string * * @ORM\Column(type="string") * @Assert\NotBlank */ private $title; /** * @var string * * @ORM\Column(type="text") * @Assert\NotBlank * @Assert\Length(min=100) */ private $description; /** * @return int */ public function getId() { return $this->id; } /** * @return string */ public function getTitle() { return $this->title; } /** * @param string $title */ public function setTitle($title) { $this->title = $title; } /** * @return string */ public function getDescription() { return $this->description; } /** * @param string $description */ public function setDescription($description) { $this->description = $description; } } // ArticleBundle/Controller/ArticleController.php class ArticleController extends Controller { /** * @Route("/articles/_create", name="create_article") * @Method({"GET", "POST"}) * * @param Request $request * * @return Response */ public function createArticleAction(Request $request) { $article = new Article(); $this->denyAccessUnlessGranted(ArticleVoter::CREATE, $article); $article->setOwner($this->getUser()); $form = $this->createForm(ArticleType::class, $article); $form->handleRequest($request); if ($form->isValid()) { $entityManager = $this->getDoctrine()->getManager(); $entityManager->persist($article); $entityManager->flush(); $this->addFlash('success', 'Article created'); return $this->redirectToRoute('articles'); } return $this->render('ArticleBundle/Article/create.html.twig', [ 'form' => $form->createView() ]); } /** * @Route("/articles/{id}/_edit", name="edit_article") * @Method({"GET", "POST"}) * * @param Request $request * @param Article $article * * @return Response */ public function editArticleAction(Request $request, Article article) { $this->denyAccessUnlessGranted(ArticleVoter::EDIT, $article); $form = $this->createForm(ArticleType::class, $article); $form->handleRequest($request); if ($form->isValid()) { $this->getDoctrine()->getManager()->flush(); $this->addFlash('success', 'Article updated'); return $this->redirectToRoute('articles', [ 'id' => $article->getId() ]); } return $this->render('ArticleBundle/Article/edit', [ 'form' => $form->createView() ]); } } // ArticleBundle/Form/ArticleType.php class ArticleType extends AbstractType { /** * {@inheritdoc} */ public function buildForm(FormBuilderInterface $builder, array $options) { $builder->add('title'); $builder->add('description', null, [ 'attr' => ['rows' => 10] ]); $builder->add('save', SubmitType::class, [ 'attr' => ['class' => 'btn btn-primary'] ]); } /** * {@inheritdoc} */ public function configureOptions(OptionsResolver $resolver) { $resolver->setDefaults([ 'data_class' => Article::class, ]); } } // ArticleBundle/Security/ArticleVoter.php class ArticleVoter extends Voter { const CREATE = 'CREATE'; const EDIT = 'EDIT'; public function vote($attribute, $article, TokenInterface $token) { $user = $token->getUser(); if (!$user instanceof User) { return false; } switch ($attribute) { case self::CREATE: if ($this->decisionManager->decide($token, array('ROLE_AUTHOR'))) { return true; } return false; case self::EDIT: if ($user === $article->getOwner()) { return true; } return false; } } } // ArticleBundle/Resources/views/Article/create.html.twig {{ form(form) }} // ArticleBundle/Resources/views/Article/edit.html.twig {{ form(form) }}

The combination of Symfony’s Form, Voter and ParamConverter allows you to define who (Voter) can update which properties (Form) of a resource, and when.

The Doctrine annotations allow you to define validations for each property, which are used in both client-side and server-side form validation.

LibreCat/Catmandu blog: Metadata Analysis at the Command-Line

planet code4lib - Thu, 2017-03-23 09:09

I was last week at the ELAG  2016 conference in Copenhagen and attended the excellent workshop by Christina Harlow  of Cornell University on migrating digital collections metadata to RDF and Fedora4. One of the important steps required to migrate and model data to RDF is understanding what your data is about. Probably old systems need to be converted for which little or no documentation is available. Instead of manually processing large XML or MARC dumps, tools like metadata breakers can be used to find out which fields are available in the legacy system and how they are used. Mark Phillips of the University of North Texas wrote recently in Code4Lib a very inspiring article how this could be done in Python. In this blog post I’ll demonstrate how this can be done using a new Catmandu tool: Catmandu::Breaker.

To follow the examples below, you need to have a system with Catmandu installed. The Catmandu::Breaker tools can then be installed with the command:

$ sudo cpan Catmandu::Breaker

A breaker is a command that transforms data into a line format that can be easily processed with Unix command line tools such as grep, sort, uniq, cut and many more. If you need an introduction into Unix tools for data processing please follow the examples Johan Rolschewski of Berlin State Library and I presented as an ELAG bootcamp.

As a simple example lets create a YAML file and demonstrate how this file can be analysed using Catmandu::Breaker:

$ cat test.yaml --- name: John colors: - black - yellow - red institution: name: Acme years: - 1949 - 1950 - 1951 - 1952

This example has a combination of simple name/value pairs a list of colors and a deeply nested field. To transform this data into the breaker format execute the command:

$ catmandu convert YAML to Breaker < test.yaml 1 colors[] black 1 colors[] yellow 1 colors[] red 1 Acme 1 institution.years[] 1949 1 institution.years[] 1950 1 institution.years[] 1951 1 institution.years[] 1952 1 name John

The breaker format is a tab-delimited output with three columns:

  1. An record identifier: read from the _id field in the input data, or a counter when no such field is present.
  2. A field name. Nested fields are seperated by dots (.) and list are indicated by the square brackets ([])
  3. A field value

When you have a very large JSON or YAML field and need to find all the values of a deeply nested field you could do something like:

$ catmandu convert YAML to Breaker < data.yaml | grep "institution.years"

Using Catmandu you can do this analysis on input formats such as JSON, YAML, XML, CSV, XLS (Excell). Just replace the YAML by any of these formats and run the breaker command. Catmandu can also connect to OAI-PMH, Z39.50 or databases such as MongoDB, ElasticSearch, Solr or even relational databases such as MySQL, Postgres and Oracle. For instance to get a breaker format for an OAI-PMH repository issue a command like:

$ catmandu convert OAI --url to Breaker

If your data is in a database you could issue an SQL query like:

$ catmandu convert DBI --dsn 'dbi:Oracle' --query 'SELECT * from TABLE WHERE ...' --user 'user/password' to Breaker

Some formats, such as MARC, doesn’t provide a great breaker format. In Catmandu, MARC files are parsed into a list of list. Running a breaker on a MARC input you get this:

$ catmandu convert MARC to Breaker < t/camel.usmarc | head fol05731351 record[][] LDR fol05731351 record[][] _ fol05731351 record[][] 00755cam 22002414a 4500 fol05731351 record[][] 001 fol05731351 record[][] _ fol05731351 record[][] fol05731351 fol05731351 record[][] 082 fol05731351 record[][] 0 fol05731351 record[][] 0 fol05731351 record[][] a

The MARC fields are part of the data, not part of the field name. This can be fixed by adding a special ‘marc’ handler to the breaker command:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc | head fol05731351 LDR 00755cam 22002414a 4500 fol05731351 001 fol05731351 fol05731351 003 IMchF fol05731351 005 20000613133448.0 fol05731351 008 000107s2000 nyua 001 0 eng fol05731351 010a 00020737 fol05731351 020a 0471383147 (paper/cd-rom : alk. paper) fol05731351 040a DLC fol05731351 040c DLC fol05731351 040d DLC

Now all the MARC subfields are visible in the output.

You can use this format to find, for instance, all unique values in a MARC file. Lets try to find all unique 008 values:

$ catmandu convert MARC to Breaker --handler marc < camel.usmarc | grep "\t008" | cut -f 3 | sort -u 000107s2000 nyua 001 0 eng 000203s2000 mau 001 0 eng 000315s1999 njua 001 0 eng 000318s1999 cau b 001 0 eng 000318s1999 caua 001 0 eng 000518s2000 mau 001 0 eng 000612s2000 mau 000 0 eng 000612s2000 mau 100 0 eng 000614s2000 mau 000 0 eng 000630s2000 cau 001 0 eng 00801nam 22002778a 4500

Catmandu::Breaker doesn’t only break input data in a easy format for command line processing, it can also do a statistical analysis on the breaker output. First process some data into the breaker format and save the result in a file:

$ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc > result.breaker

Now, use this file as input for the ‘catmandu breaker’ command:

$ catmandu breaker result.breaker | name | count | zeros | zeros% | min | max | mean | median | mode | variance | stdev | uniq | entropy | |------|-------|-------|--------|-----|-----|------|--------|--------|----------|-------|------|---------| | 001 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 003 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 005 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 008 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 010a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 020a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 9 | 3.3/3.3 | | 040a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040d | 5 | 5 | 50.0 | 0 | 1 | 0.5 | 0.5 | [0, 1] | 0.25 | 0.5 | 1 | 1.0/3.3 | | 042a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 0822 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 082a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 3 | 0.9/3.3 | | 100a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 100d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 100q | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111c | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 245a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 9 | 3.1/3.3 | | 245b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 245c | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 250a | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 260a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 6 | 2.3/3.3 | | 260b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 2.0/3.3 | | 260c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 2 | 0.9/3.3 | | 263a | 6 | 4 | 40.0 | 0 | 1 | 0.6 | 1 | 1 | 0.24 | 0.49 | 4 | 2.0/3.3 | | 300a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 1.8/3.3 | | 300b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 1 | 0.9/3.3 | | 300c | 4 | 6 | 60.0 | 0 | 1 | 0.4 | 0 | 0 | 0.24 | 0.49 | 4 | 1.8/3.3 | | 300e | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 500a | 2 | 8 | 80.0 | 0 | 1 | 0.2 | 0 | 0 | 0.16 | 0.4 | 2 | 0.9/3.3 | | 504a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 630a | 2 | 9 | 90.0 | 0 | 2 | 0.2 | 0 | 0 | 0.36 | 0.6 | 2 | 0.9/3.5 | | 650a | 15 | 0 | 0.0 | 1 | 3 | 1.5 | 1 | 1 | 0.65 | 0.81 | 6 | 1.7/3.9 | | 650v | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 700a | 5 | 7 | 70.0 | 0 | 2 | 0.5 | 0 | 0 | 0.65 | 0.81 | 5 | 1.9/3.6 | | LDR | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3

As a result you get a table listing the usage of subfields in all the input records. From this output we can learn:

  • The ‘001’ field is available in 10 records (see: count)
  • One record doesn’t contain a ‘020a’ subfield (see: zeros)
  • The ‘650a’ is available in all records at least once at most 3 times (see: min, max)
  • Only 8 out of 10 ‘100a’ subfields have unique values (see: uniq)
  • The last column ‘entropy’ provides a number how interesting the field is for search engines. The higher the entropy, the more uniq content can be found.

I hope this tools are of some use in your projects!

Evergreen ILS: Evergreen 2.12.0 is released

planet code4lib - Wed, 2017-03-22 23:24

The Evergreen community is pleased to announce the release of Evergreen 2.12.  The release is available on the Evergreen downloads page.

With this release, we strongly encourage the community to start using the new web client on a trial basis in production. All current Evergreen functionality is available in the web client with the exception of serials and offline circulation. The web client is scheduled to be available for full production use with the September 3.0 release.

Other notable new features and enhancements for 2.12 include:

  • Overdrive and OneClickdigital integration. When configured, patrons will be able to see ebook availability in search results and on the record summary page. They will also see ebook checkouts and holds in My Account.
  • Improvements to metarecords that include:
    • improvements to the bibliographic fingerprint to prevent the system from grouping different parts of a work together and to better distinguish between the title and author in the fingerprint;
    • the ability to limit the “Group Formats & Editions” search by format or other limiters;
    • improvements to the retrieval of e-resources in a “Group Formats & Editions” search;
    • and the ability to jump to other formats and editions of a work directly from the record summary page.
  • The removal of advanced search limiters from the basic search box, with a new widget added to the results page where users can see and remove those limiters.
  • A change to topic, geographic and temporal subject browse indexes that will display the entire heading as a unit rather than displaying individual subject terms separately.
  • Support for right-to-left languages, such as Arabic, in the public catalog. Arabic has also become a new officially-supported language in Evergreen.
  • A new hold targeting service supporting new targeting options and runtime optimizations to speed up targeting.
  • In the web staff client, the ability to apply merge profiles in the record bucket merge and Z39.50 interfaces.
  • The ability to display copy alerts when recording in-house use.
  • The ability to ignore punctuation, such as hyphens and apostrophes, when performing patron searches.
  • Support for recognition of client time zones,  particularly useful for consortia spanning time zones.

Evergreen 2.12 also requires PostgreSQL 9.3, with a recommendation that sites upgrade to PostgreSQL 9.4. It also requires the 2.5 release of OpenSRF. The full feature set for this release is available in the 2.12 Release Notes.

As with all Evergreen releases, many hands contributed to a successful release process. The release is a result of code, documentation, and translation contributions from 46 people representing 23 organizations in the community, along with financial contributions from nine Evergreen sites that commissioned development. Many thanks to everyone who helped make this release happen.

Jonathan Rochkind: Use capistrano to run a remote rake task, with maintenance mode

planet code4lib - Wed, 2017-03-22 21:52

So the app I am now working on is still in it’s early stages, not even live to the public yet, but we’ve got an internal server. We periodically have to change a bunch of data in our (non-rdbms) “production” store. (First devops unhappiness, I think there should be no scheduled downtime for planned data transformation. We’re working on it. But for now it happens).

We use capistrano to deploy. Previously/currently, the process for making these scheduled-downtime maintenance mode looked like:

  • on your workstation, do a cap production maintenance:enable to start some downtime
  • ssh into the production machine, cd to the cap-installed app, and run a bundle exec run a rake task. Which could take an hour+.
  • Remember to come back when it’s done and `cap production maintenance:disable`.

A couple more devops unhappiness points here: 1) In my opinion you should ideally never be ssh’ing to production, at least in a non-emergency situation.  2) You have to remember to come back and turn off maintenance mode — and if I start the task at 5pm to avoid disrupting internal stakeholders, I gotta come back after busines hours to do that! I also think every thing you have to do ‘outside business hours’ that’s not an emergency is a not yet done ops environment.

So I decided to try to fix this. Since the existing maintenance mode stuff was already done through capistrano, and I wanted to do it without a manual ssh to the production machine, capistrano seemed a reasonable tool. I found a plugin to execute rake via capistrano, but it didn’t do quite what I wanted, and it’s implementation was so simple that I saw no reason not to copy-and-paste it and just make it do just what I wanted.

I’m not gonna maintain this for the public at this point (make a gem/plugin out of it, nope), but I’ll give it to you in a gist if you want to use it. One of the tricky parts was figuring out how to get “streamed” output from cap, since my rake tasks use ruby-progressbar — it’s got decent non-TTY output already, and I wanted to see it live in my workstation console. I managed to do that! Although I never figured out how to get a cap recipe to require files from another location (I have no idea how I couldn’t make it work), so the custom class is ugly inlined in.

I also ordinarily want maintenance mode to be turned off even if the task fails, but still want a non-zero exit code in those cases (anticipating future further automation — really what I need is to be able to execute this all via cron/at too, so we can schedule downtime for the middle of the night without having to be up then).

Anyway here’s the gist of the cap recipe. This file goes in ./lib/capistrano/tasks in a local app, and now you’ve got these recipes. Any tips on how to organize my cap recipe better quite welcome.

Filed under: General

District Dispatch: After calling Congress, write a letter to the editor

planet code4lib - Wed, 2017-03-22 21:20

The single most impactful action you can take to save funding for libraries right now is to contact your member of Congress directly. Once you’ve done that, there is another action you can take to significantly amplify your voice and urge public support for libraries: writing a letter to the editor of your local newspaper.

Each newspaper has its own guidelines for submitting letters to the editor. Source:

If you’ve never done it, don’t let myths get in the way of your advocacy:

Myth 1: My local newspaper is really small, so I don’t want to waste my time. It’s true that the larger the news outlet, the more exposure your letter gets. But it’s also true that U.S. representatives care about the opinions expressed in their own congressional district, where their voters live. For example, if you live in the 15th district of Pennsylvania, your U.S. representative cares more about the Harrisburg Patriot-News and even smaller local newspapers than he does about the Philadelphia Inquirer.

Myth 2: I have to be a state librarian to get my letter printed in the newspaper. Newspaper editorial boards value input from any readers who have specific stories to share about how policies affect real people on a daily basis. Sure, if you’re submitting a letter to the New York Times, having a title increases your chances of getting published. The larger the news outlet, the more competitive it is to get published. But don’t let your title determine the value of your voice. Furthermore, you can encourage your library patrons to write letters to the editor. Imagine the power of a letter written by a veteran in Bakersfield, CA, who received help accessing benefits through the state’s veteransconnect@thelibrary initiative – especially when their U.S. representative is on the Veterans Affairs subcommittee of the House Appropriations Committee.

Myth 3: I don’t have anything special to say in a letter. You don’t need to write a masterpiece, but you need to be authentic. Letters in response to material recently published (within a couple days) stand a better chance of getting printed. How did you feel about a story you read about, for example, the elimination of library programs in the Trump budget? Was there a missing element of the story that needs to be addressed? What new information (statistics) or unique perspective (anecdotes) can you add to what was printed? Is there a library angle that will be particularly convincing to one of your members of Congress (say, their personal interest in small business development)? Most importantly, add a call to action. For example, “We need the full support of Senators {NAME and NAME} and Representative {NAME} to preserve full federal funding for libraries so they can continue to…” Be sure to check our Legislative Action Center for current language you can use.

Ready to write? Here are a few practical tips about how to do it:

Tip 1: Keep it short – in general, maximum 200 words. Every news outlet has its own guidelines for submitting letters to the editor, which are normally published on their website. Some allow longer letters, others shorter. In any case, the more concise and to-the-point, the better.

Tip 2: When you email your letter, paste it into the body of the text and be sure to include your name, title, address and phone number so that you can be contacted if the editor wants to verify that you are the author. Do not send an attachment.

Tip 3: If your letter gets published, send a copy to your representative and senators to reinforce your message (emailing a hyperlink is best). Also, send a copy to the Washington Office (; we can often use the evidence of media attention when we make visits on Capitol Hill.

Finally, get others involved. Recruit patrons, business leaders and other people in your community to write letters to the editor (after they have called their members of Congress, of course!). Editors won’t publish every single letter they get, but the more letters they receive on a specific topic, the more they realize that it is an issue that readers care deeply about – and that can inspire editors to further explore the impact of libraries for themselves.

The post After calling Congress, write a letter to the editor appeared first on District Dispatch.


Subscribe to code4lib aggregator