You are here

Feed aggregator

District Dispatch: Libraries across the U.S. are Ready to Code

planet code4lib - Fri, 2017-06-23 19:09

This post was originally published on Google’s blog The Keyword.

“It always amazes me how interested both parents and kids are in coding, and how excited they become when they learn they can create media on their own–all by using code.” – Emily Zorea, Youth Services Librarian, Brewer Public Library

Emily Zorea is not a computer scientist. She’s a Youth Services Librarian at the Brewer Public Library in Richland Center, Wisconsin, but when she noticed that local students were showing an interest in computer science (CS), she started a coding program at the library. Though she didn’t have a CS background, she understood that coding, collaboration and creativity were critical skills for students to approach complex problems and improve the world around them. Because of Emily’s work, the Brewer Public Library is now Ready to Code. At the American Library Association, we want to give librarians like Emily the opportunity to teach these skills, which is why we are thrilled to partner with Google on the next phase of the Libraries Ready to Code initiative — a $500,000 sponsorship from Google to develop a coding toolkit and make critical skills more accessible for students across 120,000 libraries in the U.S.

Libraries will receive funding, consulting expertise, and operational support from Google to pilot a CS education toolkit that equips any librarian with the ability to implement a CS education program for kids. The resources aren’t meant to transform librarians into expert programmers but will support them with the knowledge and skills to do what they do best: empower youth to learn, create, problem solve, and develop the confidence and future skills to succeed in their future careers.

For libraries, by libraries
Librarians and staff know what works best for their communities, so we will rely on them to help us develop the toolkit. This summer a cohort of libraries will receive coding resources, like CS First, a free video-based coding club that doesn’t require CS knowledge, to help them facilitate CS programs. Then we’ll gather feedback from the cohort so that we can build a toolkit that is useful and informative for other libraries who want to be Ready to Code. The cohort will also establish a community of schools and libraries who value coding, and will use their knowledge and expertise to help that community.

Critical thinking skills for the future
Though every student who studies code won’t become an engineer, critical thinking skills are essential in all career paths. That is why Libraries Ready to Code also emphasizes computational thinking, a basic set of problem-solving skills, in addition to code, that is at the heart of connecting the libraries’ mission of fostering critical thinking with computer science.

“Ready to Code means having the resources available so that if someone is interested in coding or wants to explore it further they are able to. Knowing where to point youth can allow them to begin enjoying and exploring coding on their own.”- Jason Gonzales, technology specialist, Muskogee Public Library

Many of our library educators, like Jason Gonzales, a technology specialist at the Muskogee Public Library, already have exemplary programs that combine computer science and computational thinking. His community is located about 50 miles outside of Tulsa, Oklahoma, so the need for new programming was crucial, given that most youth are not able to travel to the city to pursue their interests. When students expressed an overwhelming interest in video game design, he knew what the focus of a new summer coding camp would be. Long-term, he hopes students will learn more digital literacy skills so they are comfortable interacting with technology and applying it to other challenges now and in the future.

From left to right: Jessie ‘Chuy’ Chavez of Google, Inc. with Marijke Visser and Alan Inouye of ALA’s OITP at the Google Chicago office.

When the American Library Association and Google announced the Libraries Ready to Code initiative last year, it began as an effort to learn about CS activities, like the ones that Emily and Jason led. We then expanded to work with university faculty at Library and Information Science (LIS) schools to integrate CS content their tech and media courses. Our next challenge is scaling these successes to all our libraries, which is where our partnership with Google, and the development of a toolkit, becomes even more important. Keep an eye out in July for a call for libraries to participate in developing the toolkit. We hope it will empower any library, regardless of geography, expertise, or affluence to provide access to CS education and ultimately, skills that will make students successful in the future.

The post Libraries across the U.S. are Ready to Code appeared first on District Dispatch.

LITA: Apply to be the next ITAL Editor

planet code4lib - Fri, 2017-06-23 17:01

Applications and nominations are invited for the position of editor of Information Technology And Libraries (ITAL), the flagship publication of the Library Information Technology Association (LITA).

LITA seeks an innovative, experienced editor to lead its top-tier, open access journal with an eye to the future of library technology and scholarly publishing. The editor is appointed for a three-year term, which may be renewed for an additional three years. Duties include:

  • Chairing the ITAL Editorial Board
  • Managing the review and publication process:
    • Soliciting submissions and serving as the primary point of contact for authors
    • Assigning manuscripts for review, managing review process, accepting papers for publication
    • Compiling accepted and invited articles into quarterly issues
  • Liaising with service providers including the journal publishing platform and indexing services
  • Marketing and promoting the journal
  • Participating as a member of and reporting to the LITA Publications Committee

Some funding for editorial assistance plus a $1,500/year stipend are provided.

Please express your interest or nominate another person for the position using this online form:

Applications and nominations that are received by July 21 will receive first consideration. Applicants and nominees will be contacted by the search committee and an appointment will be made by the LITA Board of Directors upon the recommendation of the search committee and the LITA Publications Committee. Applicants must be a member of ALA and LITA at the time of appointment.

Contact with any questions.

Information Technology and Libraries (ISSN 2163-5226) publishes material related to all aspects of information technology in all types of libraries. Topic areas include, but are not limited to, library automation, digital libraries, metadata, identity management, distributed systems and networks, computer security, intellectual property rights, technical standards, geographic information systems, desktop applications, information discovery tools, web-scale library services, cloud computing, digital preservation, data curation, virtualization, search-engine optimization, emerging technologies, social networking, open data, the semantic web, mobile services and applications, usability, universal access to technology, library consortia, vendor relations, and digital humanities.

HangingTogether: Seeking a Few Brass Tacks: Measuring the Value of Resource Sharing

planet code4lib - Fri, 2017-06-23 04:19

At the two most recent American Library Association conferences, I’ve met with a small ad hoc group of librarians to discuss how we might measure and demonstrate the value that sharing our collections delivers to various stake holders: researchers, library administrators, parent organizations, service/content providers.

First we described our current collection sharing environment and how it is changing (Orlando, June 2016).

Then we walked through various ways in which, using data, we might effectively measure and demonstrate the value of interlending – and how some in our community are already doing it (Atlanta, January 2017).

Our next logical step will be to settle on some concrete actions we can take – each by ourselves, or working among the group, or collaborating with others outside the group – to begin to measure and demonstrate that value in ways that tell a meaningful and compelling story.

As the group prepares to meet for a third time – at ALA Annual in Chicago this weekend – I thought it might be useful to share our sense of what some of these actions might eventually look like, and what group members have been saying about the possibilities during our conversations.

“Value of Resource Sharing” discussion topics: Round III

We demonstrate value best by documenting the value we deliver to our patrons.

o “One could fruitfully explore how what patrons value (speed, convenience, efficiency, ease) determines whether resource sharing is ultimately perceived as valuable.”
o “Rather than focusing on systems and exploring the life cycle of the request, we should look at that of the learner.”
o “We need to support our value not just with numbers, which are important, but with human examples of how we make a difference with researchers.”
o “We are now sharing this [citation study] work with our faculty and learning a lot, such as their choice not to use the best, but most accessible material.”
o “Did they value what we provided, and, if so, why?”
o “We know that resource sharing supports research, course completion, and publishing, but it is usually a one-way street: we provide information on demand but don’t see the final result, the contribution of that material to the final product.”
o “We need to collect and tell the stories of how the material we obtain for our users transforms their studies or allows them to succeed as researchers.”
o “I think we need to explore how we can make the process smoother for both the patrons and library staff. We talk about the cost of resource sharing a lot but we haven’t really talked about how it could be easier or how policies get in the way or how our processes are so costly because they make so much busy work.”

Common methods of measuring and demonstrating value include determining how much it costs a library to provide a service, or how much a library service would cost if the patron had to purchase it from a commercial provider.

o Q: “How much did you spend on textbooks?”  A:”None! ILL!”
o “Why not measure that expense [of providing access to academic databases to students]?”
o “Build an equation to calculate the costs of various forms of access: shelve/retrieve on campus, shelve/retrieve remotely, etc.”
o “Paul Courant did a study of what it cost to keep a book on the shelf on campus as opposed to in offsite storage….Are the numbers in the Courant study still right?”

Collections have long been a way for libraries to demonstrate value – by counting them and publicizing their size. Numbers of volumes to which you have access via consortia is becoming a more useful metric. Collections can have different values for an organization, depending upon context: where they are housed, how quickly they can be provided to users, and who wants access to them.

o “How can access to legacy print in high density storage be monetized? Perhaps a change in mindset is in order – to lower costs for institutions committed to perpetual preservation and access, and raise costs for institutions that do not.”
o “What would be the cost to retain a last copy in a secure climate controlled environment? Would we then be counting on ARLs to do the work of preserving our cultural heritage? We already know there are unique material not held by ARLs, so how do the pieces fit together? How do we incorporate public libraries which also have many unique materials in their collections? How do we equitably share the resources and costs?”
o “We rely on redundancy…65% of…requests are for things…already owned.”

We can demonstrate value by providing new services to patrons that make their experience more like AmaZoogle.

o “How do we create delivery predictability models like everyone in e-commerce already offers? Are we just afraid to predict because we don’t want to be wrong? Or do we really not know enough to offer delivery information to users?”
o “I’m interested in focusing on the learning moments available throughout the resource sharing workflows and integrating stronger information literacy into the users’ experience…’We’ve begun processing your request for a dissertation…Did you know your library provides access to these peer reviewed journal articles that you might find helpful?’ or ‘You can expect this article to hit your inbox within 24 hours – are you ready to evaluate and cite it? You might find these research guides helpful…'”

What ideas do you have for measuring the value of sharing collections?  We’d love to hear from you about this.  Please leave us a comment below.

I’ll report out about takeaways from the group’s third meeting soon after ALA.

Ed Summers: Implications/Questions

planet code4lib - Fri, 2017-06-23 04:00

… we are concerned with the argument, implicit if not explicit in many discussions about the pitfalls of interdisciplinary investigation, that one primary measure of the strength of social or cultural investigation is the breadth of implications for design that result (Dourish, 2006). While we have both been involved in ethnographic work carried out for this explicit purpose, and continue to do so, we nonetheless feel that this is far from the only, or even the most significant, way for technological and social research practice to be combined. Just as from our perspective technological artifacts are not purely considered as “things you might want to use,” from their investigation we can learn more than simply “what kinds of things people want to use.” Instead, perhaps, we look to some of the questions that have preoccupied us throughout the book: Who do people want to be? What do they think they are doing? How do they think of themselves and others? Why do they do what they do? What does technology do for them? Why, when, and how are those things important? And what roles do and might technologies play in the social and cultural worlds in which they are embedded?

These investigations do not primarily supply ubicomp practitioners with system requirements, design guidelines, or road maps for future development. What they might provide instead are insights into the design process itself; a broader view of what digital technologies might do; an appreciation for the relevance of social, cultural, economic, historical, and political contexts as well as institutions for the fabric of everyday technological reality; a new set of conceptual resources to bring to bear within the design process; and a new set of questions to ask when thinking about technology and practice.

Dourish & Bell (2011), p. 191-192

I’m very grateful to Jess Ogden for pointing me at this book by Dourish and Bell when I was recently bemoaning the fact that I struggled to find any concrete implications for design in Summers & Punzalan (2017).


Dourish, P. (2006). Implications for design. In Proceedings of the sigchi conference on human factors in computing systems (pp. 541–550). ACM. Retrieved from

Dourish, P., & Bell, G. (2011). Divining a digital future: Mess and mythology in ubiquitous computing. MIT PressPress.

Summers, E., & Punzalan, R. (2017). Bots, seeds and people: Web archives as infrastructure. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing (pp. 821–834). New York, NY, USA: ACM.

Dan Cohen: Irrationality and Human-Computer Interaction

planet code4lib - Thu, 2017-06-22 20:07

When the New York Times let it be known that their election-night meter—that dial displaying the real-time odds of a Democratic or Republican win—would return for Georgia’s 6th congressional district runoff after its notorious November 2016 debut, you could almost hear a million stiff drinks being poured. Enabled by the live streaming of precinct-by-precinct election data, the dial twitches left and right, pauses, and then spasms into another movement. It’s a jittery addition to our news landscape and the source of countless nightmares, at least for Democrats.

We want to look away, and yet we stare at the meter for hours, hoping, praying. So much so that, perhaps late at night, we might even believe that our intensity and our increasingly firm grip on our iPhones might affect the outcome, ever so slightly.

Which is silly, right?

*          *          *

Thirty years ago I opened a bluish-gray metal door and entered a strange laboratory that no longer exists. Inside was a tattered fabric couch, which faced what can only be described as the biggest pachinko machine you’ve ever seen, as large as a giant flat-screen TV. Behind a transparent Plexiglas front was an array of wooden pegs. At the top were hundreds of black rubber balls, held back by a central gate. At the bottom were vertical slots.

A young guy—like me, a college student—sat on the couch in a sweatshirt and jeans. He was staring intently at the machine. So intently that I just froze, not wanting to get in the way of his staring contest with the giant pinball machine.

He leaned in. Then the balls started releasing from the top at a measured pace and they chaotically bounced around and down the wall, hitting peg after peg until they dropped into one of the columns at the bottom. A few minutes later, those hundreds of rubber balls had formed a perfectly symmetrical bell curve in the columns.

The guy punched the couch and looked dispirited.

I unfroze and asked him the only phrase I could summon: “Uh, what’s going on?”

“I was trying to get the balls to shift to the left.”

“With what?”

With my mind.”

*          *          *

This was my first encounter with the Princeton Engineering Anomalies Research program, or PEAR. PEAR’s stated mission was to pursue an “experimental agenda of studying the interaction of human consciousness with sensitive physical devices, systems, and processes,” but that prosaic academic verbiage cloaked a far cooler description: PEAR was on the hunt for the Force.

This was clearly bananas, and also totally enthralling for a nerdy kid who grew up on Star Wars. I needed to know more. Fortunately that opportunity presented itself through a new course at the university: “Human-Computer Interaction.” I’m not sure I fully understood what it was about before I signed up for it.

The course was team-taught by prominent faculty in computer science, psychology, and engineering. One of the professors was George Miller, a founder of cognitive psychology, who was the first to note that the human mind was only capable of storing seven-digit numbers (plus or minus two digits). And it included engineering professor Robert Jahn, who had founded PEAR and had rather different notions of our mental capacity.

*          *          *

One of the perks of being a student in Human-Computer Interaction was that you were not only welcome to stop by the PEAR lab, but you could also engage in the experiments yourself. You would just sign up for a slot and head to the basement of the engineering quad, where you would eventually find the bluish-gray door.

By the late 1980s, PEAR had naturally started to focus on whether our minds could alter the behavior of a specific, increasingly ubiquitous machine in our lives: the computer. Jahn and PEAR’s co-founder, Brenda Dunne, set up several rooms with computers and shoebox-sized machines with computer chips in them that generated random numbers on old-school red LED screens. Out of the box snaked a cord with a button at the end.

You would book your room, take a seat, turn on the random-number generator, and flip on the PC sitting next to it. Once the PC booted up, you would type in a code—as part of the study, no proper names were used—to log each experiment. Then the shoebox would start showing numbers ranging from 0.00 to 2.00 so quickly that the red LED became a blur. You would click on the button to stop the digits, and then that number was recorded by the computer.

The goal was to try to stop the rapidly rotating numbers on a number over 1.00, to push the average up as far as possible. Over dozens of turns the computer’s monitor showed how far that average diverged from 1.00.

That’s a clinical description of the experiment. In practice, it was a half-hour of tense facial expressions and sweating, a strange feeling of brow-beating a shoebox with an LED, and some cursing when you got several sub-1.00 numbers in a row. It was human-computer interaction at its most emotional.

Jahn and Dunne kept the master log of the codes and the graphs. There were rumors that some of the codes—some of the people those codes represented—had discernable, repeatable effects on the random numbers. Over many experiments, they were able to make the average rise, ever so slightly but enough to be statistically significant.

In other words, there were Jedis in our midst.

Unfortunately, over several experiments—and a sore thumb from clicking on the button with increasing pressure and frustration—I had no luck affecting the random numbers. I stared at the graph without blinking, hoping to shift the trend line upwards with each additional stop. But I ended up right in the middle, as if I had flipped a coin a thousand times and gotten 500 heads and 500 tails. Average.

*          *          *

Jahn and Dunne unsurprisingly faced sustained criticism and even some heckling, on campus and beyond. When PEAR closed in 2007, all the post-mortems dutifully mentioned the editor of a journal who said he could accept a paper from the lab “if you can telepathically communicate it to me.” It’s a good line, and it’s tempting to make even more fun of PEAR these many years later.

The same year that PEAR closed its doors, the iPhone was released, and with it a new way of holding and touching and communicating with a computer. We now stare intently at these devices for hours a day, and much of that interaction is—let’s admit it—not entirely rational.

We see those three gray dots in a speech bubble and deeply yearn for a good response. We open the stocks app and, in the few seconds it takes to update, pray for green rather than red numbers. We go to the New York Times on election eve and see that meter showing live results, and more than anything we want to shift it to the left with our minds.

When asked by what mechanism the mind might be able to affect a computer, Jahn and Dunne hypothesized that perhaps there was something like an invisible Venn diagram, whereby the ghost in the machine and the ghost in ourselves overlapped ever so slightly. A permeability between silicon and carbon. An occult interface through which we could ever so slightly change the processes of the machine itself and what it displays to us seconds later.

A silly hypothesis, perhaps. But we often act like it is all too real.

Harvard Library Innovation Lab: at IIPC

planet code4lib - Thu, 2017-06-22 19:54

At IIPC last week, Jack Cushman (LIL developer) and Ilya Kreymer (former LIL summer fellow) shared their work on security considerations for web archives, including, a sandbox for developers interested in exploring web archive security.

Slides: repo:

David Rosenthal of Stanford also has a great write-up on the presentation:

LITA: Megan Ozeran wins 2017 LITA / Ex Libris Student Writing Award

planet code4lib - Thu, 2017-06-22 17:30

Megan Ozeran has been selected as the winner of the 2017 Student Writing Award sponsored by Ex Libris Group and the Library and Information Technology Association (LITA) for her paper titled “Managing Metadata for Philatelic Materials.” Ozeran is a MLIS candidate at the San Jose State University School of Information.

“Megan Ozeran’s paper was selected as the winner because it takes a scholarly look at an information technology topic that is new and unresolved. Ms. Ozeran’s discussion offers a thorough examination of the current state of cataloging stamps and issues related to their discoverability,” said Rebecca Rose, the Chair of this year’s selection committee.

The LITA/Ex Libris Student Writing Award recognizes outstanding writing on a topic in the area of libraries and information technology by a student or students enrolled in an ALA-accredited library and information studies graduate program. The winning manuscript will be published in Information Technology and Libraries (ITAL), LITA’s open access, peer reviewed journal, and the winner will receive $1,000 and a certificate of merit.

The Award will be presented during the LITA Awards Ceremony & President’s Program at the ALA Annual Conference in Chicago (IL), on Sunday, June 25, 2017.

The members of the 2017 LITA/Ex Libris Student Writing Award Committee are: Rebecca Rose (Chair), Ping Fu, and Mary Vasudeva.

Thank you to Ex Libris for sponsoring this award.

David Rosenthal: WAC2017: Security Issues for Web Archives

planet code4lib - Thu, 2017-06-22 15:00
Jack Cushman and Ilya Kreymer's Web Archiving Conference talk Thinking like a hacker: Security Considerations for High-Fidelity Web Archives is very important. They discuss 7 different security threats specific to Web archives:
  1. Archiving local server files
  2. Hacking the headless browser
  3. Stealing user secrets during capture
  4. Cross site scripting to steal archive logins
  5. Live web leakage on playback
  6. Show different page contents when archived
  7. Banner spoofing
Below the fold, a brief summary of each to encourage you to do two things:
  1. First, view the slides.
  2. Second, visit, which is a sandbox with a local version of Webrecorder that has not been patched to fix known exploits, and a number of challenges for you learn how they might apply to web archives in general.
Archiving local server filesA page being archived might have links that, when interpreted in the context of the crawler, point to local resources that should not end up in public Web archives. Examples include:
  • http://localhost:8080/
  • file:///etc/passwd
It is necessary to implement restrictions in the crawler to prevent it collecting from local addresses or from protocols other than http(s). It is also a good idea to run the crawler in an isolated container or VM to maintain control over the set of resources local to the crawler.
Hacking the headless browserNowadays collecting many Web sites requires executing the content in a headless browser such as PhantomJS. They all have vulnerabilities, only some of which are known at any given time. The same is true of the virtualization infrastructure. Isolating the crawler in a VM or a container does add another layer of complexity for the attacker, who now needs exploits not just for the headless browser but also for the virtualization infrastructure. But it requires that both need to be kept up-to-date. This isn't a panacea, just risk reduction.
Stealing user secrets during captureUser-driven Web recorders place user data at risk, because they typically hand URLs to be captured to the recording process as suffixes to a URL for the Web recorder, thus vitiating the normal cross-domain protections. Everything, login pages, third-party ads, etc. is regarded as part of the Web recorder domain.

Mitigating this risk is complex, potentially including rewriting cookies, intercepting Javascript's access to cookies, and manipulating sessions.
Cross site scripting to steal archive loginsSimilarly, the URLs used to replay content must be carefully chosen to avoid the risk of cross-site scripting attacks on the archive. When replaying preserved content, the archive must serve all preserved content from a different top-level domain from that used by users to log in to the archive and for the archive to serve the parts of a replay page (e.g. the Wayback machine's timeline) that are not preserved content. The preserved content should be isolated in an iframe. For example:
  • Archive domain:
  • Content domain:
Live web leakage on playbackEspecially with Javascript in archived pages, it is hard to make sure that all resources in a replayed page come from the archive, not from the live Web. If live Web Javascript is executed, all sorts of bad things can happen. Malicious Javascript could exfiltrate information from the archive, track users, or modify the content displayed.

Injecting the Content-Security-Policy (CSP) header into replayed content can mitigate these risks by preventing compliant browsers from loading resources except from the specified domain(s), which would be the archive's replay domain(s).
Show different page contents when archivedI wrote previously about the fact that these days the content of almost all web pages depends not just on the browser, but also the user, the time, the state of the advertising network and other things. Thus it is possible for an attacker to create pages that detect when they are being archived, so that the archive's content will be unrepresentative and possibly hostile. Alternately, the page can detect that it is being replayed, and display different content or attack the replayer.

This is another reason why both the crawler and the replayer should be run in isolated containers or VMs. The bigger question of how crawlers can be configured to obtain representative content from personalized, geolocated, advert-supported web-sites is unresolved, but out of scope for Cushman and Kreymer's talk.
Banner spoofingWhen replayed, malicious pages can overwrite the archives banner, misleading the reader about the provenance of the page.

LibreCat/Catmandu blog: Introducing FileStores

planet code4lib - Thu, 2017-06-22 12:35

Catmandu is always our tool of choice when working with structured data. Using the Elasticsearch or MongoDB Catmandu::Store-s it is quite trivial to store and retrieve metadata records. Storing and retrieving a YAML, JSON (and by extension XML, MARC, CSV,…) files can be as easy as the commands below:

$ catmandu import YAML to database < input.yml
$ catmandu import JSON to database < input.json
$ catmandu import MARC to database <
$ catmandu export database to YAML > output.yml

A catmandu.yml  configuration file is required with the connection parameters to the database:

$ cat catmandu.yml --- store: database: package: ElasticSearch options: client: '1_0::Direct' index_name: catmandu ...

Given these tools to import and export and even transform structured data, can this be extended to unstructured data? In institutional repositories like LibreCat we would like to manage metadata records and binary content (for example PDF files related to the metadata).  Catmandu 1.06 introduces the Catmandu::FileStore as an extension to the already existing Catmandu::Store to manage binary content.

A Catmandu::FileStore is a Catmandu::Store where each Catmandu::Bag acts as a “container” or a “folder” that can contain zero or more records describing File content. The files records themselves contain pointers to a backend storage implementation capable of serialising and streaming binary files. Out of the box, one Catmandu::FileStore implementation is available Catmandu::Store::File::Simple, or short File::Simple, which stores files in a directory.

Some examples. To add a file to a FileStore, the stream command needs to be executed:

$ catmandu stream /tmp/myfile.pdf to File::Simple --root /data --bag 1234 --id myfile.pdf

In the command above: /tmp/myfile.pdf is the file up be uploaded to the File::Store. File::Simple is the name of the File::Store implementation which requires one mandatory parameter, --root /data which is the root directory where all files are stored.  The--bag 1234 is the “container” or “folder” which contains the uploaded files (with a numeric identifier 1234). And the --id myfile.pdf is the identifier for the new created file record.

To download the file from the File::Store, the stream command needs to be executed in opposite order:

$ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf to /tmp/file.pdf


$ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf > /tmp/file.pdf

On the file system the files are stored in some deep nested structure to be able to spread out the File::Store over many disks:

/data `--/000 `--/001 `--/234 `--/myfile.pdf

A listing of all “containers” can be retreived by requesting an export of the default (index) bag of the File::Store:

$ catmandu export File::Simple --root /data to YAML
_id: 1234

A listing of all files in the container “1234” can be done by adding the bag name to the export command:

$ catmandu export File::Simple --root /data --bag 1234 to YAML
_id: myfile.pdf
_stream: !!perl/code '{ "DUMMY" }'
content_type: application/pdf
created: 1498125394
md5: ''
modified: 1498125394
size: 883202

Each File::Store implementation supports at least the fields presented above:

  • _id: the name of the file
  • _stream: a callback function to retrieve the content of the file (requires an IO::Handle as input)
  • content_type: the MIME-Type of the file
  • created: a timestamp when the file was created
  • modified: a timestamp when the file was last modified
  • size: the byte length of the file
  • md5: optional a MD5 checksum

We envision in Catmandu that many implementations of FileStores can be created to be able to store files in GitHub, BagIts, Fedora Commons and more backends.

Using the Catmandu::Plugin::SideCar  Catmandu::FileStore-s and Catmandu::Store-s can be combined as one endpoint. Using Catmandu::Store::Multi and Catmandu::Store::File::Multi many different implementations of Stores and FileStores can be combined.

This is a short introduction, but I hope you will experiment a bit with the new functionality and provide feedback to our project.

Open Knowledge Foundation: OKI Agile: Scrum and sprints in open data

planet code4lib - Thu, 2017-06-22 12:00

This is the third in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change ( In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first posts dealt with user stories and methodologies: this time we go into using scrum and sprints to manage delivery of projects.

Throughout my time as a project manager of open data projects in The Public Knowledge Workshop in Israel and in Open Knowledge International, I have used various tools and methods to manage delivery of software and content development. I have used Trello, Asana and even a Google spreadsheet, but at the end of the day I am always going back to Github to run all of the project tasks, assisted by Waffle.

Many people that I spoke to are afraid of using GitHub for project management. To be fair, I am still afraid of Git, but GitHub is a different concept: It is not a code language, it is a repo site, and it has got really good functions and a very friendly user interface to use for it. So do not fear the Octocat!

Why Github?
  • As an open source community facing products, our code is always managed on Github. Adding another platform to deal with non-code tasks just adding more complications and syncing.
  • It is open to the community to contribute and see the progress and does not need permissions management (like Trello).
  • Unlike what people think – it is really easy to learn how to use Github web version, and it’s labels and milestones feature are helpful for delivery.
Why Waffle?
  • It syncs with Github and allows to show the tasks as Kanban.  
  • It allows to write estimates that hours of work for each task.

So far, working on Github for the project showed the following:

  1. Better cooperation between different streams of work
    Having one platform helps the team to understand what each function in the project is doing.  I believe that the coder should understand the content strategy and the community lead should understand the technical constraints while working on a project  It gives back better feedback and ideas for improving the product.
  2. Better documentation
    Having all in one place allows to create better documentation for the future.
So what did we do for GODI (the Global Open Data index) 2016?
  • Firstly, I have gathered all the tasks from the Trello and moved it to the Github.
  • I created tags that allow to differentiate between different types of tasks – content, design, code and community.
  • I added milestones and sorted out all tasks to fit their respective milestones of the project. I also created a “backlog” for all tasks that are not prioritise for the project but need to be done one day in the future. Each milestone got a deadline that responds to the project general deadlines.
  • I made sure that all the team members are part of the repository.
  • I organised Waffle to create columns – we use the default Waffle ones: Backlog, Ready, In Progress and Done.

Using one system and changing the work culture means that I needed to be strict on how the team communicates. It is sometimes unpleasant and needed me to be the “bad cop” but it is a crucial part of the process of enforcing a new way of working.  It means repetitive reminders to document issues on the issue tracker, ignoring issues that are not on GitHub and commenting on the Github when issues are not well documented.

Now, after all is in one system, we can move to the daily management of tasks.

  • Before the sprint call
    • Make sure all issues are clear –  Before each sprint, the scrum master (in this case, also the project manager), make sure that all issues are clear and not vague. The SM will also add tasks that they think are needed to this sprint.
    • Organise issues – In this stage, prior to the sprint call, use the Waffle to move tasks to represent where you as a project manager think they are currently.
  • During the sprint call:
    • Explain to the team the main details about the sprint:  
      • Length of the milestone or how many weeks this milestone will take
      • Length of the sprint
      • Team members – who are they? Are they working part time or not?
      • Objectives for the sprint these derive from the milestone
      • Potential risks and mitigation
      • Go through the issues: yes, you did it before, but going through the issues with the team helps you as PM or SM to understand where the team is, what blocks them and creates a true representation of the tasks for the delivery team.
      • Give time estimates – Waffle allows to give rough time estimates between 1-100 hours. Use it to forecast work for the project.
      • Create new tasks – speaking together gets the creative juices going. This will lead to creation of new issues. This is a good thing. Make sure they are labeled correctly.
      • Make sure that everyone understand their tasks: In the last 10 minutes of the sprint, repeat the division of work and who is doing what.
    • After the sprint call and during the sprint:
      • Make sure to have regular stand ups –  I have 30 minute stand ups, to allow the team to have more time to share issues. However, make sure not to have more than 30 minutes. If an issue demands more time to discuss, this means it needs its own dedicated call to untangle it, so set a call with the relevant team members for that issue.
      • Create issues as they arise – Don’t wait for the stand up or sprint kick-off call to create issues. Encourage the team and the community to create issues as well.
      • Always have a look at the issue tracker – Making sure all issues are there is a key action in agile work. I start everyday with checking the issues to make sure that I don’t miss critical work.
      • Hyper communicate – Since we are a remote team, it is best to repeat a message than not say it at all. I use Slack to make sure that the team knows that a new issue arise or if there is an outside blocker. I will repeat it on the team stand ups to make sure all team members are up-to-date.

    How do you manage you sprints and projects? Leave us a comment below!


    Open Knowledge Foundation: New open energy data portal set to spark innovation in energy efficiency solutions

    planet code4lib - Thu, 2017-06-22 11:11

    Viderum spun off as a company from Open Knowledge International in 2016 with the aim to provide services and products to further expand the reach of open data around the world. Last week they made a great step in this direction by powering the launch of the Energy Data Service portal, which will make Denmark’s energy data available to everyone. This press release has been reposted from Viderum‘s website at

    Image credit: Jürgen Sandesneben, Flickr CC BY

    A revolutionary new online portal, which gives open access to Denmark’s energy data, is set to spark innovation in smart, data-led solutions for energy efficiency. The Energy Data Service, launched on 17 June 2017 by the CEO of Denmark’s state-owned gas and electricity provider Energinet, and the Minister for Energy, Utilities and Climate, will share near real-time aggregated energy consumption data for all Danish municipalities, as well data on CO2emissions, energy production and the electricity market.

    Developers, entrepreneurs and companies will be able to access and use the data to create apps and other smart data services that empower consumers to use energy more efficiently and flexibly, saving them money and cutting their carbon footprint.

    Viderum is the technology partner behind the Energy Data Service. It developed the portal using CKAN, the leading data management platform for open data, originally developed by non-profit organisation Open Knowledge International.

    Sebastian Moleski, CEO of Viderum said: “Viderum is excited to be working with Energinet at the forefront of the open data revolution to make Denmark’s energy data available to everyone via the Energy Data Service portal. The portal makes a huge amount of complex data easily accessible, and we look forward to developing its capabilities further in the future, eventually providing real-time energy and CO2 emissions data.”

    Energinet hopes that the Energy Data Service will be a catalyst for the digitalisation of the energy sector and for green innovation and economic growth, both in Denmark and beyond.

    “As we transition to a low carbon future, we need to empower consumers to be smarter with how they use energy. The Energy Data Service will enable the development of innovative data based solutions to make this possible. For example, an electric car that knows when there is spare capacity on the electricity grid, making it a good time to charge itself.Or an app that helps local authorities understand energy consumption patterns in social housing, so they can make improvements that will save money and cut carbon”, said Peder Ø. Andreasen, CEO of Energinet.

    The current version of the Energy Data Service includes the following features:

    • API (Application Programme Interface) access to all raw data, which makes it easy to use in data applications and services
    • Downloadable data sets in regular formats (CSV and Excel)
    • Helpful user guides
    • Contextual information and descriptions of data sets
    • Online discussion forum for questions and knowledge sharing

    Galen Charlton: What makes an anti-librarian?

    planet code4lib - Thu, 2017-06-22 11:07

    Assuming the order gets made and shipped in time (update 2017-06-22: it did), I’ll be arriving in Chicago for ALA Annual carrying a few tens of badge ribbons like this one:

    Am I hoping that the librarians made of anti-matter will wear these ribbons to identify themselves, thereby avoiding unpleasant explosions and gamma ray bursts? Not really. Besides, there’s an obvious problem with this strategy, were anti-matter librarians a real constituency at conferences.

    No, in a roundabout way, I’m mocking this behavior by Jeffrey Beall:

    Seriously, dude?

    I suggest reading Rachel Walden’s tweets for more background, but suffice it to say that even if you were to discount Walden’s experience as a medical library director (which I do not), Beall’s response to her is extreme. (And for even more background, John Dupuis has an excellent compilation of links on recent discussions about Open Access and “predatory” journals.)

    But I’d like to unpack Beall’s choice of the expression “anti-librarian”? What exactly makes for an anti-librarian?

    We already have plenty of names for folks who oppose libraries and librarians. Book-burners. Censors. Austeritarians. The closed-minded. The tax-cutters-above-all-else. The drowners of governments in bathtubs. The fearful. We could have a whole taxonomy, in fact, were the catalogers to find a few spare moments.

    “Anti-librarian” as an epithet doesn’t fit most of these folks. Instead, as applied to a librarian, it has some nasty connotations: a traitor. Somebody who wears the mantle of the profession but opposes its very existence. Alternatively: a faker. A purveyor of fake news. One who is unfit to participate in the professional discourse.

    There may be some librarians who deserve to have that title — but it would take a lot more than being mistaken, or even woefully misguided to earn that.

    So let me also protest Beall’s response to Walden explicitly:

    It is not OK.

    It is not cool.

    It is not acceptable.

    Evergreen ILS: Evergreen 2.11.6 and 2.12.3 released

    planet code4lib - Thu, 2017-06-22 00:11

    The Evergreen community is pleased to announce two maintenance releases of Evergreen: 2.11.6 and 2.12.3.

    Evergreen 2.12.3 includes the following bugfixes:

    • Web staff client fixes
      • The receipt on payment checkbox now prints a receipt at time of payment.
      • The Items Out count in the patron screen now includes long overdue items.
      • A fix was added to prevent values from a previously-edited patron from appearing in the edit form of a subsequent patron.
      • User notification preferences now save correctly in the patron registration and edit forms.
      • The UPDATE_MARC permission is no longer requested when performing a search from the staff catalog.
      • Non-cataloged circulations now display in the Items Out screen without requiring a refresh.
      • Required statistical categories are now required to be entered in the copy editor. (A similar bug for the patron editor was fixed in the 2.12.1 release).
      • Voiding bills now requires confirmation.
      • Staff can no longer use the copy editor to put items into or out of the following statuses: checked out, lost, in transit, on holds shelf, long overdue, and canceled transit.
      • The contrast is improved for alert text showing the amount a patron owes in bills.
      • Circ modifiers now sort alphabetically in the copy editor.
    • Other bugfixes
      • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
      • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
        Month transform with the On or After (>=) operator.
      • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
      • A check that prevents a SMS message from attempting to sending when the SMS carrier is null.
      • For systems that provide search format as a filter on the advanced search page, a fix so that the format selected in the search bar when launching a new search from the results page overrides any previously-set formats.
      • The addition of an optional new Apache/mod_perl configuration variable for defining the port Apache listens on for HTTP traffic. This resolves an issue where added content lookups attempting HTTP requests on the local Apache instance on port 80 failed because Apache was using non-standard ports.
      • A fix to the public catalog’s My List page responsive design so that it now displays properly on mobile devices and allows users to place holds from My List.
      • A fix to a bug where the second (and subsequent) pages of search results in the public catalog (when group formats and editions is in effect) does not correctly generate links to hits that are not part of of a multi-bib metarecords.

    Evergreen 2.11.6 includes the following fixes:

    • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
    • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
      Month transform with the On or After (>=) operator.
    • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
    • A check that prevents a SMS message from sending if the SMS carrier is null.

    Please visit the downloads page to view the release notes and retrieve the server software and staff clients.

    DuraSpace News: THE Research Networking Event–Register for the 2017 VIVO Conference by June 30 and SAVE $100

    planet code4lib - Thu, 2017-06-22 00:00

    From the organizers of the 2017 VIVO Conference

    The 2017 VIVO Conference is all about research networking! If this topic and creating an integrated record of the scholarly work of your organization is of interest then the 2017 VIVO Conference is the place to be Aug 2-4 in New York City. Institutions with production VIVOs as well as those who are considering implementing VIVO will be in attendance, present their work, and/or offer workshops.

    DuraSpace News: DuraSpace Launches New Web Site

    planet code4lib - Thu, 2017-06-22 00:00

    DuraSpace has a lot to celebrate in 2017. Our community-supported open source technologies continue to contribute to advancing the access and preservation goals of our member organizations and beyond. The DuraSpace hosted services team is onboarding new customers, while at the same time contributing to efforts to offer new technologies that provide full, hosted access to, control of, and protection for your content.

    District Dispatch: House expected to approve CTE reauthorization

    planet code4lib - Wed, 2017-06-21 20:34

    Perkins CTE Program helps library patrons thrive in 21st Century Economy

    Libraries play numerous roles in communities across the country, working generally to meet the needs of their patrons at every life stage. Whether providing high-speed broadband access to rural and urban communities alike, running youth reading sessions and book clubs, teaching computer literacy to patrons seeking to learn new skills or aiding small businesses, libraries serve as learning centers helping patrons along their career paths.

    Libraries also play a valuable and specific role in supporting and working to improve secondary and postsecondary Career and Technical Education (CTE) programs funded by the Carl D. Perkins Career and Technical Education Act (“Perkins Act”), the federal bill which governs the more than $1 billion in federal funding for career and technical education activities across the country. Such programs help equip youth and adults with the academic, technical and employability skills and knowledge needed to secure employment in today’s high-growth industries. In so doing, libraries help close the “skills gap” and expand economic opportunity to more communities across the nation. Some libraries work directly with their state labor and employment offices to implement CTE programs which receive Federal funding.

    Libraries and certified librarians also provide valuable CTE resources, equipment, technology, instructional aids, and publications designed to strengthen and foster academic and technical skills achievement. In many communities, libraries play a significant role in career and technical development. Often the library is the only place where patrons can access the high-speed broadband vital to those working to apply for jobs, research careers, and towards enhanced certification and training.

    As early as this week, the House of Representatives is expected to pass legislation reauthorizing the Perkins Act, which was originally adopted in 1984. ALA recently submitted a letter to the House Committee on Education and Workforce supporting this bi-partisan legislation: the Career and Technical Education for the 21st Century Act (H.R. 2353), which was approved by the Committee on June 6.

    The House timed the vote on the reauthorization to occur during the National Week of Making spearheaded by the Congressional Maker Caucus. The week highlights the growing maker movement across the country.

    We’ve been here before, however, as the House passed similar legislation in 2016 only to see reauthorization of the Perkins program stall in the Senate, where a companion bill has yet to be introduced. Unfortunately, the President’s budget seeks to cut $168.1 million from the Perkins CTE State Grant program, which had previously received $1.118 billion in funding for FY15, FY16 and FY17. ALA will continue work to support robust funding for CTE programs and, if the House acts favorably, to urge the Senate to follow its lead and promptly reauthorize the Perkins Act.

    The post House expected to approve CTE reauthorization appeared first on District Dispatch.

    LITA: Jobs in Information Technology: June 21, 2017

    planet code4lib - Wed, 2017-06-21 19:39

    New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

    New This Week

    Purchase College – State University of New York, Client Systems Administrator, Purchase, NY

    EBSCO Information Services, Product Owner, FOLIO, Ipswich, MA

    Pacific States University, University Librarian, Los Angeles, CA

    Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

    Galen Charlton: Mashcat at ALA Annual 2017 + shared notes

    planet code4lib - Wed, 2017-06-21 19:34

    I’m leaving for Chicago tomorrow to attend ALA Annual 2017 (and to eat some real pizza), and while going over the schedule I found some programs that may be of interest to Mashcat folk:

    As a little experiment, I’ve started a Google Doc for shared notes about events and other goings-on at the conference. There will of course be a lot of coverage on social media about the conference, but the shared notes doc might be a way for Mashcatters to identify common themes.

    LITA: Ellyssa Kroski Receives 2017 LITA/Library Hi Tech Award

    planet code4lib - Wed, 2017-06-21 18:01

    Ellyssa Kroski has been named the winner of the 2017 LITA/Library Hi Tech Award for Outstanding Communication in Library and Information Technology.

    The Library and Information Technology Association (LITA) and Emerald Group Publishing sponsor the Award, which recognizes outstanding individuals or institutions for their long-term contributions in the area of Library and Information Science technology and its application. The winner receives a citation and a $1,000 stipend.

    The Award Committee selected Kroski because of her passion for teaching and technology, as well as her impact in public libraries, academic libraries, and with library school students. As a technology expert, an editor, contributor, and compiler of books, she has helped make technology and makerspaces accessible for many different types of institutions.

    Kroski is the Director of Information Technology at the New York Law Institute as well as an award-winning editor and author of 36 books including Law Librarianship in the Digital Age for which she won the AALL’s 2014 Joseph L. Andrews Legal Literature Award. Her ten-book technology series, The Tech Set, won the ALA’s Best Book in Library Literature Award in 2011. She is a librarian, an adjunct faculty member at Drexel and San Jose State University, and an international conference speaker.

    When notified she had won the Award, Kroski said, “I am incredibly honored and very pleased to receive this award from LITA. I have been fortunate enough to collaborate with and be inspired by many amazingly talented librarians throughout my professional development activities and to all of those colleagues I extend my thanks and share this honor as well.”

    Members of the 2017 LITA/Library Hi-Tech Award Committee are: Vanessa L. Ames (Chair) and Robert Wilson.

    Thank you to Emerald Publishing and Library Hi Tech for sponsoring this award.

    District Dispatch: ALA celebrates World Wi-Fi Day

    planet code4lib - Wed, 2017-06-21 16:10

    Among all their other functions in our communities, libraries are critical spaces for people to access the internet, and they are increasingly doing so wirelessly via Wi-Fi.

    Virtually all public libraries in the U.S. provide Wi-Fi to patrons. By doing so, libraries serve as community technology hubs that enable digital opportunity and full participation in the nation’s economy. Wi-Fi is a critical part of how libraries are transforming our programs and services in the digital age.

    June 20th was World Wi-Fi Day, a global initiative helping to bridge the digital divide as well as recognizing and celebrating the role of Wi-Fi in cities and communities around the world. In Washington, D.C., the WifiForward Coalition—of which ALA is a founding member—held a kick off celebration at the Consumer Technology Association’s Innovation House off of Capitol Hill. Congressman Darrell Issa (R-CA) and Federal Communications Commissioner Michael O’Rielly were on hand to expound on the wonders of Wi-Fi and to voice their support for policies that would help its growth and success.

    ALA added the following statement to materials for World Wi-Fi Day:

    “With Wi-Fi, our nation’s 120,000 libraries are able to dramatically increase our capacity to connect people of all incomes and backgrounds to the Internet beyond our public desktop computers. Wi-Fi allows us to serve more people anywhere in the library, as well as enabling mobile technology training labs, roving reference, access to diverse digital collections and pop-up library programs and services. Library wi-fi is essential to support The E’s of Libraries®—Education, Employment, Entrepreneurship, Empowerment and Engagement—on campuses and in communities nationwide. The American Library Association is proud to be a supporter of World Wi-Fi Day.”

    The post ALA celebrates World Wi-Fi Day appeared first on District Dispatch.


    Subscribe to code4lib aggregator