You are here

Feed aggregator

Open Knowledge Foundation: The future of the Global Open Data Index: assessing the possibilities

planet code4lib - Wed, 2017-11-01 09:30

In the last couple of months we have received questions regarding the status of the new Global Open Data Index (GODI) from a few members of our Network. This blogpost is to update everyone on the status of GODI and what comes next.

But first, some context: GODI is one of the biggest assessments of the state of open government data globally, alongside the Web Foundation’s Open Data Barometer. We notice persistent obstacles for open data year-by-year. High-income countries regularly secure top rankings, yet overall there is little to no development in many countries. As our latest State Of Open Government Data in 2017 report shows, data is often not made available publicly at all. If so, we see many issues around findability, quality, processability, and licensing. Individual countries are notable exceptions to the rule. The Open Data Barometer made similar observations in its latest report, mentioning a slow uptake of policy, as well as persistent data quality issues in countries that provide open data. So there is still a lot of work to be done.

To resolve issues like engagement with our community, we started to explore alternative paths for GODI. This includes a shift in focus from a mere measurement tool to a stronger conversational device between our user groups throughout the process. We understand that we need to speak to new audiences and focus on measurement as a tool in real world applications. We need to focus more on this. We want to understand the use cases of the Open Data Survey (the tool that powers GODI and the Open Data Census) in different contexts and with different goals. We have barely seen a few of the possible uses of the tool in the open data sphere and we want to see even more.

In order to learn more about how GODI is taken up by different user groups, we are also currently exploring GODI’s effects on open data policy and publication. We wish to understand more systematically how individual elements of the GODI interface (such as country ranking, dataset results, discuss forum entries) help mobilising support for open data among different user groups. Our goal is to understand how to improve our survey design and workflow so that they more directly support action around open data policy and publication. In addition we are developing a new vision for the Open Data Index to either measure open data on a regional and city-level or by topical areas. We will elaborate on this vision in a follow-up blogpost soon.

Taking this all into account, we have decided to focus on working on the aforementioned use cases and a regional Index during 2018. In the meantime, we will still work with our community to define a vision that will make GODI a sustainable measurement tool: we understand that tracking the changes in government data publication is crucial for the activists and governments themselves. We know that progress around open data is slower than we would like it to be, but therefore we need to ensure that discussions around open data do not end. Please do not hesitate to submit new discussions around country entries on our forum or reach out to us if you have any ideas on how to take GODI forwards and improve.

If you’re running an Open Data Census, we we’ll continue giving you support in the measurement you’re currently working on, whether it’s local, regional or you have any new idea of a Census you’d like to try. If you want to run your own Census, you can request it here, or send an email to to see how we could collaborate further.

Galen Charlton: Scaling the annual Code4Lib conference

planet code4lib - Tue, 2017-10-31 22:38

One of the beautiful things about Code4Lib qua banner is that it can be easily taken up by anyway without asking permission.

If I wanted to, I could have lunch with a colleague, talk about Evergreen, and call it a Code4Lib meetup, and nobody could gainsay me — particularly if I wrote up a summary of what we talked about.

Three folks in a coffeehouse spending an afternoon hacking together a connection between digital repository Foo and automatic image metadata extractor Bar, then tossing something up on the Code4Lib Wiki? Easy-peasy.

Ten people for dinner and plotting to take over the world replace MARC once and for all? Probably should make a reservation at the restaurant.

Afternoon workshop for 20 in your metro area? Well, most libraries have meeting rooms, integral classrooms, or computer labs— and directors willing to let them be used for the occasional professional development activity.

Day and a half conference for 60 from your state, province, or region? That’s probably a bit more than you can pull off single-handedly, and you may well simply not have the space for it if you work for a small public library. You at least need to think about how folks will get meals and find overnight accommodations.

The big one? The one that nowadays attracts over four hundred people from across the U.S. and Canada, with a good sprinkling of folks from outside North America — and expects that for a good chunk of the time, they’ll all be sitting in the same room? And that also expects that at least half of them will spend a day scattered across ten or twenty rooms for pre-conference workshops? That folks unable to be there in person expect to live-stream? That tries in more and more ways tries to lower barriers to attending it?

Different kettle of fish entirely.

The 2017 conference incurred a tick under $240,000 in expenses. The 2016 conference: a bit over $207,000. This year? At the moment, projected expenses are in the neighborhood of $260,000.

What is this going towards? Convention center or hotel conference space rental and catering (which typically need to be negotiated together, as guaranteeing enough catering revenue and/or hotel nights often translates into “free” room rental). A/V services, including projectors, sound systems, and microphones. Catering and space rental for the reception. For the past few years, the services of a professional event management firm — even with 50+ people volunteering for Code4Lib conference committees, we need the professionals as well. Diversity scholarships, including travel expenses, forgone registration fees, and hotel nights. T-shirts. Gratuities. Live transcription services.

How is this all getting paid for? Last year, 49% of the income came from conference and pre-conference registrations, 31% from sponsorships and exhibitor tables, 5% from donations and sponsorships for scholarships, and 3% from hotel rebates and room credits.

The other 12%? That came from the organizers of the 2016 conference in Philadelphia, who passed along a bit under $33,000 to the 2017 LPC. The 2017 conference in turn was able to pass along a bit over $25,000 to the organizers of the forthcoming 2018 conference.

In other words, the 2017 conference effectively operated at a loss of a bit under $8,000, although fortunately there was enough of a cushion that from UCLA’s perspective, the whole thing was a wash — if you ignore some things. Things like the time that UCLA staff who were members of the 2017 local planning committee spent on the whole effort — and time spent by administrative staff in UCLA’s business office.

What are their names? I have no clue.

But something I can say much more confidently: every member of the 2017 LPC and budget committees lost sleep pondering what might happen if things went wrong. If we didn’t get enough sponsorships. If members of the community would balk at the registration fee — or simply be unable to afford it — and we couldn’t meet our hotel room night commitments.

I can also say, without direct knowledge this time, but equally confidently, that members of the 2016 organizers lost sleep. And 2015. And so on down the line.

While to my knowledge no Code4Lib member has ever been personally liable for the hotel contracts, I leave it to folks to consider the reputational consequence of telling their employer, were a conference to fail, that that institution is on the hook for potentially tens of thousands of dollars.

Of course, somebody could justly respond by citing an ancient joke. You know, the one that begins like this: “Doctor, it hurts when I do this!”.

And that’s a fair point. It is both a strength and weakness of Code4Lib that it imposes no requirement that anybody do anything in particular. We don’t have to have a big annual conference; a lot of good can be done under the Code4Lib banner via electronic communications and in-person meetups small enough that it’s of little consequence if nobody happens to show up.

But I also remember the days when the Code4Lib conference would open registration, then close it a couple hours later because capacity has been reached. Based on the attendance trends, we know that we can reasonably count on at least 400 people being willing to travel to attend the annual conference.  If a future LPC manages to make the cost of attending the conference signficantly lower, I could easily see 500 or 600 people showing up (although I would then wonder if we might hit some limits on how large a single-track conference can be and still remain relevant for all of the attendees),

I think there is value in trying to put on a conference that brings in as many practitioners (and yes, managers) in the GLAM technology space together in person as can come while also supporting online participation — but puts control of the program in the hands of the attendees via a process that both honors democracy and invites diversity of background and viewpoint.

Maybe you agree with that—and maybe you don’t. But even if you don’t agree, please do acknowledge the astonishing generosity of the people and institutions that have put their money and reputation on the line to host the annual conference over the years.

Regardless, if Code4Lib is to continue to hold a large annual conference while not being completely dependent on the good graces of a  small set of libraries that are in a position to assume $250,000+ liabilities, the status quo is not sustainable.

That brings me to the Fiscal Continuity Interest Group, which I have helped lead. If you care about the big annual conference, please read the report (and if you’re pressed for time, start with the summary of options), then vote. You have until 23:59 ET on Friday, November 3 to respond to the survey.

The survey offers the following options:

  • maintain the status quo, meaning that each potential conference host is ultimately responsible for deciding how the liability of holding the conference should be managed
  • set up a non-profit organization
  • pick among four institutions that have generously offered to consider acting as ongoing fiscal sponsors for the annual conference

I believe that moving away from the status quo will help ensure that the big annual Code4Lib conference can keep happening while broadening the number of institutions that would be able to physically host it. Setting up some kind of ongoing fiscal existence for Code4Lib may also solve some problems for the folks who have been running the Code4Lib Journal.

I also believe that continuing with the status quo necessarily means that the Code4Lib community must rethink the annual conference: whether to keep having it at all; to accept the fact that only a few institutions are nowadays capable of hosting it at the scale we’re accustomed to; and to accept that if an institution is nonetheless willing to host it, that we should scale back expectations that the community is entitled to direct the shape of the conference once a host has been selected.

In other words, it boils down to deciding how we wish to govern ourselves. This doesn’t mean that Code4Lib needs to embrace bureaucracy… but we must either accept some ongoing structure or scale back.

Choose wisely.

Lucidworks: 7 Search-Related Halloween Horrors

planet code4lib - Tue, 2017-10-31 17:17

I’m not saying that your search platform is bad if it doesn’t fully address these issues… I’m saying it is horrifying! I’m saying you should dress it up on Halloween and put it in your yard to scare children away (so you can eat all the candy yourself)!

Here are 7 ghoulish limitations that no search app should tolerate:

Security Not Enabled

Years ago, I worked for a company with poorly tuned, insecure intranet search. At that company, I found files showing a plan which involved laying off the entire department I was in – months before it was intended to go into effect. I was the last guy who really knew Oracle and the company really needed me there until it could phase out our department. However, there was nothing in the plan about a long and generous severance and I had rent to pay, so I found a new job months in advance.

Security integration and security trimming allow you to ensure that users do not see documents in search results they don’t have permission to access. This also applies to content repositories like SharePoint, Box, or Google Drive. You configure this at index time to capture permissions from the repository and supply a username parameter at query time in order to filter the search results.

No Idea What Terms to Start With

When I see a blank search box and type the letter “A” and nothing comes up, it is rather jarring. In the early days of search, this was expected behavior. Even Google worked that way. However with auto-complete on a smartphone or auto-suggest on Google and Amazon, everyone expects some kind of direction as they start typing in the search box.

Suggestions and typeahead supply keywords to users as they type their query. This feature is frequently combined with category suggestions (facets) to help tune search to exactly what the user is looking for. There is no reason to fly blind.

No Categorization

In a modern Fortune 1000 company there is no shortage of data. It is insights that matter. Data consolidation and being able to search across data sources is absolutely essential to answering the kinds of hard business questions teams must answer every day. However, a keyword search that isn’t tuned to any particular area or domain will tend to return noise.

Faceting allows users to limit their search to some subdivision of data that is related to them. If you have a field in your data that lends itself to that naturally then this is easy to do. However, if you don’t then tools like classifiers and clustering are essential. These can either assist a human at categorizing data (supervised learning) or just find which data is related to which without a human getting involved (unsupervised learning).

No Measure of Success

There are plenty of companies that deploy search and don’t actually measure or have no way to measure if users are successfully finding… well… anything. Unless users complain, IT just assumes everything is fine. Meanwhile the users that aren’t complaining abandoned the company’s search long ago and either use a series of emailed bookmarks or Google for public sites.

A good search solution includes dashboards with statistics like the top 10 searches with no results. A good search solution shows you statistically significant phrases and helps find spelling errors. A good solution talks to IT and helps IT tune it.

No Data Introspection

Any idiot can put out a search app that shows some of the data. Years ago I put a simple (now defunct) free software tool called htDig in front of a bunch of Word documents. I read them by running the unix “strings” commands against them. It was a simple enough keyword search for a very small corpus so long as you weren’t looking for something too specific and it did nothing for some of the more complicated documents because it couldn’t REALLY see inside them. Oddly this was part of a motivation for creating a project that eventually became Apache POI. POI (via Apache Tika) eventually became the way Fusion read Office documents. htDig sorta worked but it couldn’t “really” look into my Word documents let alone any other file types.

If your search tool can’t “really” see all the way inside your documents it can’t really index them properly. It took years to develop the tools that could really do this. A good search solution is extensible – there is always a new file format or data source – and ideally uses well thought out, proven open source technologies.

Nothing Personal(alized)

Back in the day you had to learn to construct a search for Google. Now Google kinda knows enough about your past behavior, social networks, and geography to predict what you’re actually looking for. This personalization of search is no longer a luxury but essential to satisfy modern users.

Your search solution needs to track signals like clickstreams and which queries the user entered in the past. Your search solution should allow you to inform future queries with actual purchases or how long a user stayed on a page.

Like No Other

The “more like this” feature on Google isn’t all that useful. It usually obscures results Google hid from you because they were mostly repetitive. That’s an area Google got wrong. It is frequently helpful to find something and explore its kin in order to find just the thing you want. In retail this looks like similar items (another shoe in a different style). In enterprise search these are similar documents (maybe a form from a different year or tax type or similar blueprint).

A good search solution lets you find “more like this” based on keywords but also allows you to automatically cluster or classify like items. Users should be able to navigate their way to their perfect result. They shouldn’t have to get there the first time.

The Tricks That Make Fusion a Treat

Fusion is built on Apache Solr, Apache Tika, Apache Spark and a whole lot of other open source tools that have been in the works for decades. A lot of R&D and thought went into them and made them into a complete information management solution for finding your stuff! Lucidworks wove these separate pieces into a total solution and added more functionality like machine learning that mere mortals can implement and an Index Workbench that allows you to tune your data ingestion and see the results in real-time.


The post 7 Search-Related Halloween Horrors appeared first on Lucidworks.

David Rosenthal: Storage Failures In The Field

planet code4lib - Tue, 2017-10-31 15:00
It's past time for another look at the invaluable hard drive data that Backblaze puts out quarterly. As Peter Bright notes at Ars Technica, despite being based on limited data, the current stats reveal two interesting observations:
  • Backblaze is seeing reduced rates of infant mortality for the 10TB and 12TB drive generations:
    The initial data from the 10TB and 12TB disks, however, has not shown that pattern. While the data so far is very limited, with 1,240 disks and 14,220 aggregate drive days accumulated so far, none of these disks (both Seagate models) have failed.
  • Backblaze is seeing no reliability advantage from enterprise as against consumer drives:
    the company has now accumulated 3.7 million drive days for the consumer disks and 1.4 million for the enterprise ones. Over this usage, the annualized failure rates are 1.1 percent for the consumer disks and 1.2 percent for the enterprise ones.
Below the fold, some commentary.

The first thing to note is that devoting engineering effort to reducing infant mortality can have a significant return on investment. A drive that fails early will be returned under warranty, costing the company money. A drive that fails after the warranty expires cannot be returned. Warranty costs must be reserved against in the company's accounts. Any reduction in the rate of early failures goes straight to the company's bottom line.

Thus engineering devoted to reducing infant mortality is much more profitable than engineering devoted to extending the drives' service life. Extending service life beyond the current five years is wasted effort, because unless Kryder's law slows even further, the drives will be replaced to get more capacity in the same slot. Backblaze is replacing drives for this reason:
You’ll also notice that we have used a total of 85,467 hard drives. But at the end of 2016 we had 71,939 hard drives. Are we missing 13,528 hard drives? Not really. While some drives failed, the remaining drives were removed from service due primarily to migrations from smaller to larger drives.The first observation makes it look as though the disk manufacturers have been following this strategy. This also explains the second observation. The goal is zero infant failures for both enterprise and consumer drives. To the extent that this goal is met, failure rates for both types in the first two years would be the same, zero. It might be that after the first two years, when the consumer drives were out of warranty, they would start to fail where the enterprise drives, still in warranty, would not.

But my guess is that both drive types will continue to fail at about the same rate because they share so much underlying technology. Backblaze has a long history of using consumer drives, and their stats show some models are reliable over 4-5 years, others not. A significant part of the enterprise drives' higher price is the cost of the five-year warranty.

Open Knowledge Foundation: OKI wins funds from ODI to create Open Data publication toolkit

planet code4lib - Tue, 2017-10-31 10:47

Open Knowledge International (OKI) has been awarded funds by the Open Data Institute (ODI) as part of a project to enhance and increase adoption of tools and services for open data publishers in the private and public sectors, reducing barriers to publication. OKI’s focus in this programme will be to create better open data publication workflows by building on our earlier work on the Frictionless Data initiative.

We will be implementing significant incremental improvements to a range of code libraries and tools that are loosely aligned around our Frictionless Data project, in which we are working on removing the friction in working with data by developing a set of tools, standards, and best practices for publishing data. The work will be presented as part of a new toolkit which will be specifically targeted at both technical and non-technical users of data, within the public sector, businesses, and the data community.

We will perform additional user research in government and non-governmental contexts, design and enhance user interfaces for non-technical users, implement integrations of tooling with existing workflows as well as working towards new ones. The reports, research and tools produced will become practical assets that can be used and added to by others, to continue to explore how data can and should work in our societies and economies.

Innovate UK, the UK’s innovation agency, is providing £6 million over three years to the ODI, to advance knowledge and expertise in how data can shape the next generation of public and private services, and create economic growth. The work on improving the conditions for data publishing is one of six projects, chosen by the ODI, in this first year of the funding.

Olivier Thereaux, Head of Technology at the ODI said:

‘Our goals in this project are to truly understand what barriers exist to publishing high quality data quickly and at reasonable cost. We’re happy to be working with OKI, and to be building on its Frictionless Data initiative to further the development of simpler, faster, higher quality open data publishing workflows. ‘

On announcing the funding on 17th October, Dr Jeni Tennison, CEO at the ODI said:

‘The work we are announcing today will find the best examples of things working well, so we can share and learn from them. We will take these learnings and help businesses and governments to use them and lead by example.’

A major focus for the Product Team at Open Knowledge International over the last two years has been around data quality and automation of data processing. Data quality is arguably the greatest barrier to useful and usable open data and we’ve been directly addressing this via specifications and tooling in Frictionless Data over the last two years.

Our focus in this project will be to develop ways for non-technical users to employ tools for automation, reducing the potential for manual error, and increasing productivity. We see speed of publication and lowering costs of publication as two areas that are directly enhanced by having better tooling and workflows to address quality and automation and this is something which the development of this toolkit will directly address.

People are fundamental to quality, curated, open data publication workflows. However, by automating more aspects of the “publication pipeline”, we not only reduce the need for manual intervention, we also can increase the speed at which open data can be published.

To keep up to date on our progress, join the Frictionless Data Discuss forum, or ask the team a direct question on the gitter channel.

Harvard Library Innovation Lab: Overheard in LIL (the podcast!)

planet code4lib - Tue, 2017-10-31 00:00

Our first podcast, about what our podcast should be about, is out! Starring Adam Ziegler, Anastasia Aizman, Andy Silva, and Brett Johnson.

DuraSpace News: DSpace-CRIS and DSpace-GLAM: The Community is Expanding

planet code4lib - Tue, 2017-10-31 00:00

From Claudio Cortese, 4Science  More and more institutions are adopting DSpace-CRIS and DSpace-GLAM worldwide.  There is a growing interest around our open source solutions based on DSpace, with more and more institutions adopting them worldwide. Slides from a recent webinar entitled “DSpace-CRIS: How it Works and how to Leverage it” were  presented by: Susanna Mornati, 4Science COO & Andrea Bollini, CTIO, 4Science.

DuraSpace News: Open Repository Revamp: The Premium and Affordable DSpace Service

planet code4lib - Tue, 2017-10-31 00:00

From Bram Luyten, Atmire  As the most widely adopted repository platform, supported by a vast international community, DSpace is an open source marvel. However, using it to build and operate a world class repository service has not been a trivial task for many institutions.

Islandora: 7.x-1.10 Release POSTPONED to Nov 7

planet code4lib - Mon, 2017-10-30 18:56

Lovely Islandorans!

We are postponing the release by a week.

I’m afraid you’re going to have to until November the 7th before getting the 7.x-1.10 release.

Why? We’re not done yet! The testing/documenting sheet is still showing a lot of red.

A note on Travis and PHP 5.3.3: If you’re looking at pull requests and Travis, you know we’ve had a problem with Travis for over a week. A change of theirs broke how the Travis machine loads, and it’s syntax that’s unfriendly to our dear beloved PHP 5.3.3. Jon Green filed a bug and there’s a pull request already. Unfortunately, that PR hasn’t been merged so it’s still failing.

Despite the warnings that “Checks have failed”, we are in the habit of merging our code anyway. But this does not mean that we have given up on compatibility with PHP 5.3.  So please don’t use any of the features introduced in PHP 5.4.

We need you!  If your name’s on the release spreadsheet and you haven’t finished your tasks, please hop on that!

New release date for 7.x-1.10: November 7, 2017.

District Dispatch: Release of Kennedy records is one small step for transparency

planet code4lib - Mon, 2017-10-30 15:17

On Oct. 26, the National Archives and Records Administration (NARA) publicly released thousands of records regarding the assassination of President John F. Kennedy. However, thousands of additional records were withheld from the public pending further review.

Photo by Wknight94

The President John F. Kennedy Assassination Records Collection Act required federal agencies to provide all records related to the assassination to NARA. The law further required all such records to be disclosed to the public within 25 years unless the President certifies it necessary to postpone disclosure. The law was signed on Oct. 26, 1992, which set the 25-year deadline on Oct. 26 of this year.

The legislation was a response to the decades-long delays that often occur before previously-classified information is made available to the public. As then-Sen. John Glenn (D-OH) noted at the time, “Although certain records related to the assassination of President Kennedy have been made available over time to the public, the legislation will create opportunities for the public to review records which might otherwise not be possible for several decades”(142 Cong. Rec. 19499, 1992).

The legislation has resulted in vast amounts of information being made available to the public. According to NARA, the collection “consists of approximately five million pages of records. The vast majority of the collection (88 percent) has been open in full and released to the public since the late 1990s.”ALA recognized the significance of this reform effort by presenting its James Madison Award to the bill’s sponsors and those implementing the law.

As the 25-year deadline approached, only a small percentage of information remained withheld. Perhaps, as the saying goes, nothing focuses the mind like a deadline: On Oct. 4, several members of Congress, led by Rep. Walter Jones (R-NC) and Sen. Chuck Grassley (R-IA), introduced resolutions calling for the remaining records to be disclosed. On Oct. 21, President Donald Trump tweeted that he would not prevent the disclosure of the remaining records “subject to the receipt of further information.”

Ultimately, though, the deadline would bring only partial disclosure. On Oct. 26, President Trump issued a memorandum, stating:

I am ordering today that the veil finally be lifted. At the same time, executive departments and agencies (agencies) have proposed to me that certain information should continue to be redacted because of national security, law enforcement, and foreign affairs concerns. … To further address these concerns, I am also ordering agencies to re-review each and every one of those redactions over the next 180 days. At the end of that period, I will order the public disclosure of any information that the agencies cannot demonstrate meets the statutory standard for continued postponement of disclosure…

It seems appropriate to recognize this partial release as partial progress for transparency. By the end of President Trump’s additional six-month review period, hopefully further records will be released. If librarians and the public continue to demand transparency, eventually, it may rewrite the history books.

The post Release of Kennedy records is one small step for transparency appeared first on District Dispatch.

Casey Bisson: Camera advice: a film camera for a novice

planet code4lib - Mon, 2017-10-30 13:55

A friend of mine sent me a question about a good film camera to get started with:

My partner has been thinking for some time about her first camera and she likes the idea of film photography. Her birthday is coming up and I’m thinking of buying a camera as a surprise gift to bring on an upcoming backpacking trip. It’s just a thought. We don’t buy each other a lot of stuff because we’re big on experiences, and we save our money so we can travel to see each other.

Anyway, as a man of the camera, is there a fun, entry level film camera that you know of? Obviously I have to double check that supplies for it are available internationally, so It couldn’t be too crazy, and it has to be something that would fit a very part time hobby for someone who doesn’t know cameras yet. Any ideas?

tl;dr: get an Olympus Stylus Epic, it takes 35mm film and sports auto exposure and focus with a built-in flash. However, avoid those with a zoom and go for the one with the fixed 35mm lens. Read more below if you’re still curious.

You know I’m going to totally geek out on this question, right?

You didn’t say it, but it seems the camera should be smaller if it’s going to fit into a backpack. That size constraint probably excludes some otherwise solid choices, including Fuji Instax, the newly re-invigorated Polaroid or 120 cameras like the Fuji GW690II (or any of their 120 rangefinders, but I did just get a GW690III, so I like that one) and Holga. And both size and timing probably exclude the rather interesting Lomo’Instant Square.

It’s possible, however, that the smaller Instax Mini or Instax Square might be good choices. They might be small enough to fit your backpack, and instant film is a wonderful and low-risk way to try analog photography, but I have to admit I’ve been frustrated with scanning and sharing it in a way that works for me (Jean Andre Antoine and Cromwell Schubarth are among those who do it well).

Instead, you should probably be looking at 35mm and maybe 110 format cameras. And, honestly, I can’t recommend 110, since both film and processing are harder to find. In that space you might consider a few cameras. If you’re looking for an SLR (and you shouldn’t, honestly), I personally loved the Olympus OM system  but lenses are a bit overpriced compared to Nikon, Pentax, and Canon alternatives, any of which might be a reasonable choice. I recommend against an SLR because, well, if you’re going to invest that much space and complexity, I figure you should either get a 120 camera or go DSLR.

So what I really would recommend are 35mm compacts. I have been doing a lot of shooting with my Olympus Stylus Epic. I think all my photos with 35mm sprocket holes on Instagram are with that camera, the exceptions are my panoramics and half frame shots, and the 120 stuff has no sprocket holes. The Stylus Epic is compact, nicely priced, and has the most perfect exposure and autofocus system you’ll find in such a camera, but there are other choices. Some people prefer the rather more expensive Contax T2 or T3 cameras, others prefer all-manual compacts like the Canonet.

For me, there’s huge joy in getting a roll of film or stack of prints back long after a trip. You can appreciate instant film in the moment, but it also changes the moment. Sometimes it’s OK to change the moment and focus it on a photo, but handing an instant print around can also feel like people staring at their phones instead of engaging in the moment. A compact 35mm camera allows you to capture moments with minimal interruption, and then re-live them all over again when you get the film back. Garry Winogrand is said to have waited a year between shooting and developing, but I’m more likely to wait just a month or two. I carry a Stylus Epic with me at all times specifically so I can do that, and that’s why I’d recommend it to you as well.

Library of Congress: The Signal: October Innovator-in-Residence Update

planet code4lib - Mon, 2017-10-30 13:33

Library of Congress Innovator-in-Residence, Jer Thorp, has started diving into the collections at the Library. We’ve rounded up some of his activities in October and how he is sharing his process in this post.

Jer has created a “text-based exploration of Library of Congress @librarycongress‘ MARC records, specifically of ~9M books & the names of their authors.” He started by asking what would happen if you and he were to wander the Library of Congress stacks and collect every book from a given year. After piling them up, what if you selected 40 titles at random to represent each year, then gathered the first names of the authors? What might you see in those names across time and space?

Now you can explore this thought experiment with Jer as he takes authors’ first names from MARC records and remixes them in glitch.

Library of Names – an experiment extracting author first names from Library of Congress MARC records

You can find code for the front end and processing via this tweet from Jer.

Jer is sharing his work in several ways. First, he’s documenting his research and thoughts via Open Science Framework. You can dig into his wiki, activity, and tags on his Library of Congress Residency 2017/2018.

He has also created a Github repository of his code, data, and miscellanea related to his residency. You can comment and share your ideas with him there and via Twitter.

Earlier this month, we shared our experience touring Library of Congress divisions with Jer. Take a look at the collections we explored via this Twitter Moment. We visited with curators, reference librarians, and archives specialists from Manuscripts, Geography & Maps, Rare Books, Prints & Photographs, American Folklife Center, and Web Archiving.

@LC_Labs Tweet from 06 October during American Folklife Center tour


Want to create something new with Library of Congress collections data? Download a cleaned MARC records data set for yourself from our LC for Robots page from the MARC Open-Access section. And if turning data into a thought experiment is your game, you might consider showcasing your skills in the Congressional Data Challenge (details here).

Open Knowledge Foundation: Sensitizing Nepal’s digital natives: Open Data Awareness Program announced

planet code4lib - Mon, 2017-10-30 09:07

To support Nepal’s growing open data movement and increase its network of data-savvy practitioners, Open Knowledge Nepal has announced the Open Data Awareness Program, which aims to sensitize more than 300+ students and youth from seven districts.

The Open Data Awareness Program aims to raise awareness about the concept and usage of open data to Nepal’s digital natives, who are the current youth population and the potential future decision-makers and leaders of Nepal. As part of the program, hands-on training sessions are being organized at different colleges and youth organizations, where participants will be provided with a compiled Open Data Manual to aid their understanding of open data. After the conclusion of the all the workshops at different colleges, a hackathon event will be organized in Kathmandu, bringing together selected participants from colleges and youth organizations where the training sessions were held to work collaboratively on opening up data currently in a closed format. At the hackathon event, the participants will share their experiences of learning and working with open data and discuss ways to engage more young people in the open data movement in Nepal. The awareness program is also an opportunity for increased civic awareness and engagement and participatory governance.

The program hopes to make the open data momentum in Nepal stronger, especially among the youth population.  Through the sessions, the understanding and the chances of the youth population joining the open data ecosystem will increase, helping groom Nepal’s future leaders and decision-makers to be data-driven. The ongoing work and data-driven initiatives conducted by different organizations will also reach more people through this program. As a result of the program, a significant number of human resources will become aware regarding the current data revolution in Nepal. More importantly, the program will help improve the chances of more open data startups and initiatives emerging in the near future. Likewise, an increased capacity and awareness of open data in Nepal’s present and future generation will also help to increase the potential for open-data-informed development decisions and accountability of responsible bodies.

The objective of the awareness program is to make the youth of Nepal more aware of the benefits of open data, to fill in the gap of data literacy and to better prepare young people for a rapidly changing data scenario. Through the program, the university students and youths will be empowered to:

  • Use open data for research and new projects
  • Conduct data analysis and reporting
  • Use new data tools and programming languages
  • Build innovative solutions to tackle development challenges

Throughout workshops, participants will discuss:

  • What is open data and why it is of importance
  • The background and history of open data
  • Open data sources and stories
  • Technical processes to extract, analyze, clean and visualize the available data in Nepal

The awareness program will be based on the Open Data Curriculum and the Open Data Manual, both prepared by Open Knowledge Nepal as a reference and recommended a guide for university students, civil society, and the private sector.

This project is supported by the Data for Development Programme, implemented by The Asia Foundation in partnership with Development Initiatives, with funding from the UK Department for International Development to improve the sharing and use of data as evidence for development in Nepal.

For more project updates regarding workshops, hackathon and resources, please visit the project webpage:

Eric Hellman: Turning the page on ereader pagination

planet code4lib - Mon, 2017-10-30 00:50
Why bother paginating an ebook? Modern websites encourage you to "keep on swiping" but if you talk to people who read ebooks, they rather like pages. I'll classify their reasons into "backward looking" and "practical".

Backward looking reasons that readers like pagination
  • pages evoke the experience of print books
  • a tap to turn a page is easier than swiping
Practical reasons that readers like pagination
  • pages divide reading into easier to deal with chunks
  • turning the page gives you a feeling of achievement
  • the thickness of the turned pages help the reader measure progress
Reasons that pagination sucks
  • sentences are chopped in half
  • paragraphs are chopped in half
  • figures and such are sundered from their context
  • footnotes are ... OMG footnotes!
How would you design a long-form reading experience for computer screens if you weren't tied to pagination? Despite the entrenchment of Amazon and iPhones, people haven't stopped taking fresh looks at the reading experience.

Taeyoon Choi and his collaborators at the School for Poetic Computation recently unveiled their "artistic intervention" into the experience of reading. (Choi and a partner founded the Manhattan-based school in 2013 to help artists learn and apply technology.) You can try it out at

On viewing the first chapter, you immediately see two visual cues that some artistry is afoot. On the right side, you see something that looks like a stack of pages. On the left is some conventional-looking text, and to its right is a some shrunken text. Click on the shrunken text to expand references for the now shrunken main text. This conception of long form text as existing in two streams seems much more elegant than the usual pop-up presentation of references and footnotes in ebook readers. Illustrations appear in both streams, and when you swipe one stream up or down, the other stream moves with it.

The experience of the poetic computation reader on a smartphone adapts to the smaller screen. One or other of the two streams is always off-screen, and little arrows, rather than shrunken images indicate the other's existence.

 * * *
On larger screens, something very odd happens when you swipe down a bit. You get to the end of the "page". And then it starts moving the WRONG way, sideways instead of up and down. Keep swiping, and you've advanced the page! The first time this happened, I found it really annoying. But then, it started to make sense. "Pages" in the Poetic Computation Reader are intentional, not random breaks imposed by the size of the readers screen and the selected typeface. The reader gets a sense of achievement, along with an indication of progress.

In retrospect, this is a completely obvious thing to do. In fact, authors have been inserting intentional breaks into books since forever. Typesetters call these breaks "asterisms" after the asterisks that are used to denote them. They look rather stupid in conventional ebooks. Turning asterisms into text-breaking animations is a really good idea. Go forth and implement them, ye ebook-folx!

On a smart phone, Poetic Computation Reader ignores the "page breaks" and omits the page edges. Perhaps a zoom animation and a thickened border would work.

Also, check out the super-slider on the right edge. Try to resist sliding it up and down a couple of times. You can't!

 * * *
Another interesting take on the reading experience is provided by Slate, the documentation software written by Robert Lord. On a desktop browser, Slate also presents text in parallel streams. The center stream can be thought of as the main text. On the left is the hierarchical outline (i.e. a table of contents), on the right is example code. I like the way you can scroll either the outline or the text stream and the other stream follows. The outline expands and contracts accordion-style as you scroll, resulting in effortless navigation. But Slate uses a responsive design framework, so on a smartphone, the side streams reconfigure into inline figures or slide-aways.

"Clojure by Example", generated by Slate.
There are no "pages" in Slate. Instead, the animated outline is always aware of where you are and indicates your progress. The outline is a small improvement on the static outline produced by documentation generators like Sphinx, but the difference in navigability and usability is huge.

As standardization and corporate hegemony seem to be ossifying digital reading experiences elsewhere,  independent experiments and projects like these give me hope that a next generation of ebooks will put some new wind in the sails of our digital reading journey.

  1. The collaborators on the Poetic Computation Reader include Molly Kleiman, Shannon Mattern, Taeyoon Choi and HAWRAF. Also, these footnotes are awkward.

Peter Sefton: How to save a PowerPoint or Impress presentation as a PDF with slide notes from the command line using the descendants of StarOffice; OpenOffice, LibreOffice, soffice

planet code4lib - Sun, 2017-10-29 23:00

tl;dr: The magic incantation you need is:

soffice --headless --convert-to pdf:"impress_pdf_Export:ExportNotesPages=True" 'My presentation.pptx'

Where ‘My presentation.pptx’ is the path to, you know, your presentation. It can be in Powerpoint or OpenDocument presentation format (.odp).

It’s possible to turn presentations into PDF and many other formats using the command line, but most of the flags you need to change the behaviour of the various converters are completely undocumented at both OpenOffice and the forked project LibreOffice. I had a very hard time figuring out how to save a presentation as PDF with slide notes, hence this post to help others.

NOTE ALSO: the result of saving a presentation as PDF with slide notes in StarOffice derivatives is a bit odd - you get twice as many pages of PDF as you have slides. For presentations with p pages the first p slides are the slides without notes and the second p have both the slide and the notes (unless the notes are too long in which case they are cut off). This doesn’t bother me, I don’t actually want the PDF as such, I am working on a script to turn the PDF into a series of images, extract the slide and notes text and wrap them in a markdown document so I can publish presentations here. I’ll put that up on github soon. If anyone knows other flags with better behaviour, then please let me know. A complete list of conversion flags would be great, like for example how to trigger all the HTML export options. Ta.

The key, hard to find part of this is the flag: “impress_pdf_Export:ExportNotesPages=True”.

NOTE: on some platforms the binary might be libreoffice or some other variant rather than soffice.

To get set up OS X so you can just type soffice
  • Install LibreOffice
  • Type this in a terminal: sudo ln -s /Applications/ /usr/local/bin/soffice

There’s a handy tool that wraps the soffice binary called unoconv (installable with brew install unoconv) that provides an abstracted interface with a few extra features over typing soffice but at the moment it doesn’t work on OS X with recent versions of LibreOffice.

Hugh Rundle: How I ended up learning to code

planet code4lib - Sun, 2017-10-29 19:18

This is 'Part 2' of my October GLAM Blog Club posts.

I've been asked by a few librarians about my software programming, how I learned, and what I worked on first. I'm an extremely mediocre dilettante coder. I'm pretty comfortable with that. I'd definitely like to learn more, and I'm open to a future job/career in software development, but at this point I'm pretty happy with what I'm doing at work and where I've gotten to with coding. Nevertheless, I'm sometimes asked how I got to where I am when it comes to coding, and given that it was a much longer and more winding road than it needed to be, I thought it might be useful to explain, so you can avoid my mistakes.

Eco Warrior

My father was a secondary school teacher, and our neighbour was an electronics enthusiast. As part of some professional development program, Dad procured a BBC Micro PC in the late 1980s. I loved it. I taught myself BASIC and started writing text-based adventure games. In Grade 6 my primary school friend Christopher and I made a game called Eco Warrior and sold it to our high school for $2.00. I can't remember the specifics of the game, or why anyone agreed to pay us anything for it, but that was the last program I wrote for nearly 25 years.

Studying for my Information Management degree we were taught a little HTML, but it was when I became a Systems Librarian in 2011 that I started to get interested again in code and how sofware was written rather than just what it did. I sweated over XSLT files and played with the CSS in our library catalogue. I attended training on Crystal Reports and SQL queries but found the way it was explained simply left me more confused. Over the next couple of years I kept thinking I really should learn to code, but I wasn't sure how, or where to start, and it all seemed very complicated. I did the first couple of activities in a bunch of different free online coding tutorials: Python, Ruby, JavaScript, even Go. It was incredibly frustrating and confusing. Which languages should I learn? In what order? To what extent? It all seemed impossible, yet so necessary. In my more melancholy moments I thought I really needed to create a backup career plan as I saw UK libraries decimated by a simple series of government decisions.

Ask Why

The breakthrough came when I frustratedly 'asked Twitter' which language would be best to start with if I wanted to code for libraries. Several people gave the same answer, which initially seemed frustratingly unhelpful, but turned out to be exactly right: don't choose a language, choose a project. Then learn whatever language that project is written in. It was frustratingly unhelpful because I wasn't really sure I wanted to work an any existing library software project: most of the things I knew about seemed to be very academic-focussed or just mindbendingly confusing. Telling me to find a project just moved the problem from 'which language' to 'which project' - but this advice turned out to be really useful in an unexpected way. What had been holding me back ultimately was that I didn't have a reason to learn to code. Having a goal of 'learn to code' is a bit like having a goal to 'read more books' - it's not helpful if you don't know why you want to do it. The key to learning to code is to have a particular project you want to work on. It could be an existing project that you want to contribute to, like Koha ILS, Blacklight, or Hoodie. But it could just as easily by a really small personal project - Misty De Méo told us at newCardigan's August 2016 Cardi Party that her first projects were to automate some particularly boring archiving tasks. In my case, it was an absurdly over-ambitous thing. I had an idea for a 'zero knowledge' library management system that facilitates library circulation and further reading suggestions whilst protecting reader privacy with client-side encryption. Making a prototype was my stretch goal. It was probably too big a project, but now I now had a goal in mind, which was the key.

The second golden rule that most people gave me when asked, was that whatever I learned first, it should not be JavaScript. I took on board the first rule - pick a project - but I reject the second. Partially this was because I was lazy and figured I was probably going to make web apps, and that would mean using JavaScript, so I may as well learn it. And I quickly discovered nodejs, which seemed to promise that I could learn one language and use it for both server and client code. As I've learned more, I've come to grudgingly think perhaps the advice was right, but I also surprise myself sometimes with how comfortable I've become with JavaScript (acknowledging, of course, that I'm still not particularly good at it.) The 'pick a project' advice also conflicted. As I looked around for possibilities, I found Mylar, designed to run on Meteor - a node/JavaScript framework. It seemed to be the perfect technology for my encrypted circulation system. I was sold, and began learning JavaScript, nodejs, and Meteor almost simultaneously. It was everyone's worst nightmare, and yet somehow, it's mostly worked. Now that I've learned more, I somewhat understand why I was advised not to learn JavaScript first. It does have some weird patterns and has probably created some bad habits. But on the other hand, JavaScript is experiencing something of a revival with the emergence of nodejs and the release of ES6, ES7 and ES8, which have introduced some really useful improvements to the language. If I was starting again, I'd probably begin with Python rather than JavaScript, but the original advice stands: choose a project first.

Do the Reading

Something that had tripped me up early is the tendency for online coding tutorials to use mathematical examples and assume a level of familiarity with maths concepts, and to focus on telling you to do something rather than explain how it works. It wasn't that I couldn't work it out, but these examples had no context and aren't particularly interesting for people who aren't maths geeks. I did pretty well in school because generally speaking our schooling system is designed for people who are good at book learning, rather than learning by doing. When trying to learn via online coding tutorials, I found the tables had turned, and I really struggled because I wanted to know why typing that particular command had that particular result. They also reinforce the erroneous idea that one has to understand high level mathematics to be an effective software developer. My friend and fellow librarian Katie Haden helped me out enormously by recommending Jon Duckett's web programming books (you can buy them as a pair) HTML & CSS and JavaScript & jQuery. These books use simple examples and a wonderful design that makes them easy to understand - they both also have really good indexes, which is very helpful - thanks indexers, your work is often under-appreciated! Duckett's books don't go into great detail, but they are fantastic for getting to the point where you're ready for the next resource: whether it's the hugely helpful Mozilla Developer Network documents, Ethan Brown's Learning Javascript - which was particularly helpful for understanding ES6 better - or even Douglas Crockfield's JavaScript: the good parts - though that one is really for once you've had a bit more experience. I've also found both the nodejs docs and the Meteor Guide really useful for nodejs and Meteor respectively. No matter what language or project you're using, Stack Overflow will soon become your friend. Thousands of developers have provided great advice there that can save your hours. Thousands have also provided terrible advice there, but usually it's clear which is which.


Once I'd chosen a goal (building what became Tinfoil) I had a language - JavaScript - and a framework - Meteor. This actually complicated things, because I worked out how to do things in Meteor by reading the Guide and documentation, but I didn't really know enough JavaScript to understand what I was doing or why things worked. I got things done but later I had to re-learn some concepts - even things as fundamental as callbacks. If I had my time again I'd definitely try to learn as much about 'vanilla' JavaScript before using any frameworks like Meteor. Learning JavaScript primarily as a server-side language by using nodejs, on the other hand, actually really helped. It meant I could start by making small stand-alone scripts, and gradually build more functionality in as I learned more. Being able to build something from scratch in a weekend gave me a sense of achievement, and being able to join these mini projects together like 'modules' gave me a sense of progress. I set myself a challenge to build twelve twitter bots in twelve months and whilst I didn't quite make it to twelve, working on discrete projects like this really helped me to keep learning. And Twitter bots can be really fun, which is absolutely crucuial if you're going to learn (and why I still haven't learned to do SQL queries: I can't find a way to make it not seem tedious, though I know it's something I really should know how to do). Building functionality in separate modules also really helps when troubleshooting problems. The Aus GLAM Blogs app is basically a combination of web app for discovery and search, RSS reader to ingest new posts, and Twitter bot to tweet when new posts are found. These three main functions are mostly independent of each other, and I've simply stitched them together. When I added Pocket integration, this architcture made it quiet easy to add the function without having to change much existing code and risk breaking something.

So, that's a kind of rambling explanation of how I got to here with coding. I'm still learning, and there are some things I definitely need to know more about - automated testing, for example. I think all information workers should learn a bit of coding - not because we should all become sofware developers, but because understanding the basics of why software works in particular ways is empowering. If you're wanting to learn a bit of code, I thoroughly recommend Jon Duckett's books, and really any of the O'Reilly introductory books.

John Miedema: Play with Phaedrus, Version 2 of my Chatbot. Try it.

planet code4lib - Sat, 2017-10-28 15:11


My friend, “Kerrumba,” said that my chatbot needed a name. He suggested “Phaedrus.” Kerrumba groks this. Maybe it also needs a new picture?

I recently posted the “Hello World” version of my chatbot. It did not do much. The new version is much more functional. Keep in mind that a chatbot is not a full-blown artificial intelligence. It is designed to facilitate a chat-like dialog about a specific range of subjects. Phaedrus can do the following:

  • Present some basic options in button format. E.g, an About button can be clicked to learn more about Phaedrus.
  • Handle chat text for the same, e.g, enter “Tell me about yourself”
  • Show a gallery of some After Reading images
  • Show samples of my After Reading essays
  • Handle subscription requests
  • Send an email directly to the real me
  • Fun. Basic language capabilities. It can recognize and reply to compliments and bad language.
  • It is available on my website — see lower right icon; and on my Facebook page — click the Get Started button. Or click here.

The next step will be to incorporate more sophisticated language handling. Please try it out and let me know how you like it. Thanks.

District Dispatch: Apply for the ALA Policy Corps by November 3

planet code4lib - Fri, 2017-10-27 21:53

This is a guest post by Christine Lind Hage, director of Rochester Hills (MI) Public Library and a member of the ALA Policy Corps Working Group. She is a past president of the Public Library Association and of United for Libraries. She served on ALA Council for 10 years and is currently a member of the Office for Information Technology Policy (OITP) Advisory Committee.

As technology and innovation develops and as the political landscape shifts, we need experts who can clearly articulate the contributions, value and concerns of America’s libraries to policymakers at every level. ALA needs professional librarians who not only understand the impact of federal legislation on libraries and the millions of people they serve, but also who have the depth of knowledge it takes to help shape policies on the front end.

Maine State Librarian Linda Lord at a congressional hearing

To expand ALA’s ability to advocate on key policy issues, ALA President Jim Neal and ALA’s Office for Information Technology Policy are working with library leaders from AASL, ACRL, PLA and UFL to create the ALA Policy Corps. Announced earlier this month, the ALA Policy Corps will consist of 10 to 12 advocates who will receive training and opportunities to participate in targeted policy advocacy work primarily at the national level.

Jim Neal himself has modeled our vision for the ALA Policy Corps. He has represented the American library community numerous times in testimony on copyright matters before congressional committees, in the U.S. Copyright Office Section 108 Study Group (2005-2008) and as an advisor to the U.S. delegation at World Intellectual Property Organization (WIPO) meetings on copyright.

ALA member and retired Maine State Librarian Linda Lord is another example of a librarian who has used her expertise and in-depth understanding to help shape federal programs. She has been an effective voice for libraries regarding the E-rate program. Linda’s real-life experience in rural and small libraries was instrumental in getting the E-rate process streamlined and critical additional funding secured. Linda’s advocacy included testifying before the Senate Committee on Commerce, Science and Transportation on the importance of E-rate for libraries that serve rural America. Upon her retirement, Senator Angus King (ME) said,”Her leadership at the national level on the E-rate program and other issues has been a huge benefit to Maine.”

Just this month four librarians testified at a hearing of the House Administration Committee about the Federal Depository Library Program (FDLP). They highlighted the enduring value of the FDLP in ensuring that Americans can access the documents of their government, not just today but in the future. Their real-life experience lent to their own credibility and that of librarianship as a profession in general.

We do not expect all Policy Corps applicants to be seasoned experts at giving congressional testimony. If you have past advocacy experience at any level, a commitment to keeping current on policy-related library issues, and the desire to develop your policy advocacy skills and apply them over a period of years – you should apply to be part of the ALA Policy Corps.

Whether it is the importance of copyright, E-rate, FDLP or federal library funding (to name only a few issues!), it is essential that library professionals share our experiences and provide our expertise to elected leaders and other decision-makers. You can learn more about the ALA Policy Corps and apply at our website, The deadline for submitting applications is Friday, November 3.


The post Apply for the ALA Policy Corps by November 3 appeared first on District Dispatch.

District Dispatch: Music copyright: what do students know? what do we do about it?

planet code4lib - Fri, 2017-10-27 20:00

Join us for CopyTalk on November 2nd with Kathleen DeLaurenti, Open Access Editor, Music Library Association Head Librarian, Arthur Friedheim Library at the Peabody Institute of the Johns Hopkins University. DeLaurenti will give an overview of a research project to identify how college-aged students perceive music copyright and the process of figuring out what to do about it.

Photo credit: trophygeek

DeLaurenti will share the findings of a phenomenological research study examining undergraduate students aimed at uncovering students’ understanding of music copyright. The study provided students an opportunity to share how they understand the phenomenon of music copyright in their lives and how it influences their interactions in the digital and analog world.

While assumptions of ongoing rampant piracy remain the cultural norm, student participants in the study expressed complicated feelings about copyright and a lot of confusion about how to act ethically and legally in an environment of frictionless access to music. The talk will also discuss the follow-up work of a team of librarians and students to co-create open educational resources to introduce high school and college-aged students to the basic concepts of copyright. This discussion should be very revealing!

Tune in on Thursday, November 2 at 2 p.m. Eastern/11 a.m. Pacific time. Go to and sign in as a guest. You’re in.

CopyTalks are FREE and brought to you by OITP’s copyright education subcommittee. Archived webinars can be found here.

The post Music copyright: what do students know? what do we do about it? appeared first on District Dispatch.


Subscribe to code4lib aggregator