You are here

Feed aggregator

Open Knowledge Foundation: OKI Agile: Scrum and sprints in open data

planet code4lib - Thu, 2017-06-22 12:00

This is the third in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change ( In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first posts dealt with user stories and methodologies: this time we go into using scrum and sprints to manage delivery of projects.

Throughout my time as a project manager of open data projects in The Public Knowledge Workshop in Israel and in Open Knowledge International, I have used various tools and methods to manage delivery of software and content development. I have used Trello, Asana and even a Google spreadsheet, but at the end of the day I am always going back to Github to run all of the project tasks, assisted by Waffle.

Many people that I spoke to are afraid of using GitHub for project management. To be fair, I am still afraid of Git, but GitHub is a different concept: It is not a code language, it is a repo site, and it has got really good functions and a very friendly user interface to use for it. So do not fear the Octocat!

Why Github?
  • As an open source community facing products, our code is always managed on Github. Adding another platform to deal with non-code tasks just adding more complications and syncing.
  • It is open to the community to contribute and see the progress and does not need permissions management (like Trello).
  • Unlike what people think – it is really easy to learn how to use Github web version, and it’s labels and milestones feature are helpful for delivery.
Why Waffle?
  • It syncs with Github and allows to show the tasks as Kanban.  
  • It allows to write estimates that hours of work for each task.

So far, working on Github for the project showed the following:

  1. Better cooperation between different streams of work
    Having one platform helps the team to understand what each function in the project is doing.  I believe that the coder should understand the content strategy and the community lead should understand the technical constraints while working on a project  It gives back better feedback and ideas for improving the product.
  2. Better documentation
    Having all in one place allows to create better documentation for the future.
So what did we do for GODI (the Global Open Data index) 2016?
  • Firstly, I have gathered all the tasks from the Trello and moved it to the Github.
  • I created tags that allow to differentiate between different types of tasks – content, design, code and community.
  • I added milestones and sorted out all tasks to fit their respective milestones of the project. I also created a “backlog” for all tasks that are not prioritise for the project but need to be done one day in the future. Each milestone got a deadline that responds to the project general deadlines.
  • I made sure that all the team members are part of the repository.
  • I organised Waffle to create columns – we use the default Waffle ones: Backlog, Ready, In Progress and Done.

Using one system and changing the work culture means that I needed to be strict on how the team communicates. It is sometimes unpleasant and needed me to be the “bad cop” but it is a crucial part of the process of enforcing a new way of working.  It means repetitive reminders to document issues on the issue tracker, ignoring issues that are not on GitHub and commenting on the Github when issues are not well documented.

Now, after all is in one system, we can move to the daily management of tasks.

  • Before the sprint call
    • Make sure all issues are clear –  Before each sprint, the scrum master (in this case, also the project manager), make sure that all issues are clear and not vague. The SM will also add tasks that they think are needed to this sprint.
    • Organise issues – In this stage, prior to the sprint call, use the Waffle to move tasks to represent where you as a project manager think they are currently.
  • During the sprint call:
    • Explain to the team the main details about the sprint:  
      • Length of the milestone or how many weeks this milestone will take
      • Length of the sprint
      • Team members – who are they? Are they working part time or not?
      • Objectives for the sprint these derive from the milestone
      • Potential risks and mitigation
      • Go through the issues: yes, you did it before, but going through the issues with the team helps you as PM or SM to understand where the team is, what blocks them and creates a true representation of the tasks for the delivery team.
      • Give time estimates – Waffle allows to give rough time estimates between 1-100 hours. Use it to forecast work for the project.
      • Create new tasks – speaking together gets the creative juices going. This will lead to creation of new issues. This is a good thing. Make sure they are labeled correctly.
      • Make sure that everyone understand their tasks: In the last 10 minutes of the sprint, repeat the division of work and who is doing what.
    • After the sprint call and during the sprint:
      • Make sure to have regular stand ups –  I have 30 minute stand ups, to allow the team to have more time to share issues. However, make sure not to have more than 30 minutes. If an issue demands more time to discuss, this means it needs its own dedicated call to untangle it, so set a call with the relevant team members for that issue.
      • Create issues as they arise – Don’t wait for the stand up or sprint kick-off call to create issues. Encourage the team and the community to create issues as well.
      • Always have a look at the issue tracker – Making sure all issues are there is a key action in agile work. I start everyday with checking the issues to make sure that I don’t miss critical work.
      • Hyper communicate – Since we are a remote team, it is best to repeat a message than not say it at all. I use Slack to make sure that the team knows that a new issue arise or if there is an outside blocker. I will repeat it on the team stand ups to make sure all team members are up-to-date.

    How do you manage you sprints and projects? Leave us a comment below!


    Open Knowledge Foundation: New open energy data portal set to spark innovation in energy efficiency solutions

    planet code4lib - Thu, 2017-06-22 11:11

    Viderum spun off as a company from Open Knowledge International in 2016 with the aim to provide services and products to further expand the reach of open data around the world. Last week they made a great step in this direction by powering the launch of the Energy Data Service portal, which will make Denmark’s energy data available to everyone. This press release has been reposted from Viderum‘s website at

    Image credit: Jürgen Sandesneben, Flickr CC BY

    A revolutionary new online portal, which gives open access to Denmark’s energy data, is set to spark innovation in smart, data-led solutions for energy efficiency. The Energy Data Service, launched on 17 June 2017 by the CEO of Denmark’s state-owned gas and electricity provider Energinet, and the Minister for Energy, Utilities and Climate, will share near real-time aggregated energy consumption data for all Danish municipalities, as well data on CO2emissions, energy production and the electricity market.

    Developers, entrepreneurs and companies will be able to access and use the data to create apps and other smart data services that empower consumers to use energy more efficiently and flexibly, saving them money and cutting their carbon footprint.

    Viderum is the technology partner behind the Energy Data Service. It developed the portal using CKAN, the leading data management platform for open data, originally developed by non-profit organisation Open Knowledge International.

    Sebastian Moleski, CEO of Viderum said: “Viderum is excited to be working with Energinet at the forefront of the open data revolution to make Denmark’s energy data available to everyone via the Energy Data Service portal. The portal makes a huge amount of complex data easily accessible, and we look forward to developing its capabilities further in the future, eventually providing real-time energy and CO2 emissions data.”

    Energinet hopes that the Energy Data Service will be a catalyst for the digitalisation of the energy sector and for green innovation and economic growth, both in Denmark and beyond.

    “As we transition to a low carbon future, we need to empower consumers to be smarter with how they use energy. The Energy Data Service will enable the development of innovative data based solutions to make this possible. For example, an electric car that knows when there is spare capacity on the electricity grid, making it a good time to charge itself.Or an app that helps local authorities understand energy consumption patterns in social housing, so they can make improvements that will save money and cut carbon”, said Peder Ø. Andreasen, CEO of Energinet.

    The current version of the Energy Data Service includes the following features:

    • API (Application Programme Interface) access to all raw data, which makes it easy to use in data applications and services
    • Downloadable data sets in regular formats (CSV and Excel)
    • Helpful user guides
    • Contextual information and descriptions of data sets
    • Online discussion forum for questions and knowledge sharing

    Galen Charlton: What makes an anti-librarian?

    planet code4lib - Thu, 2017-06-22 11:07

    Assuming the order gets made and shipped in time (update 2017-06-22: it did), I’ll be arriving in Chicago for ALA Annual carrying a few tens of badge ribbons like this one:

    Am I hoping that the librarians made of anti-matter will wear these ribbons to identify themselves, thereby avoiding unpleasant explosions and gamma ray bursts? Not really. Besides, there’s an obvious problem with this strategy, were anti-matter librarians a real constituency at conferences.

    No, in a roundabout way, I’m mocking this behavior by Jeffrey Beall:

    Seriously, dude?

    I suggest reading Rachel Walden’s tweets for more background, but suffice it to say that even if you were to discount Walden’s experience as a medical library director (which I do not), Beall’s response to her is extreme. (And for even more background, John Dupuis has an excellent compilation of links on recent discussions about Open Access and “predatory” journals.)

    But I’d like to unpack Beall’s choice of the expression “anti-librarian”? What exactly makes for an anti-librarian?

    We already have plenty of names for folks who oppose libraries and librarians. Book-burners. Censors. Austeritarians. The closed-minded. The tax-cutters-above-all-else. The drowners of governments in bathtubs. The fearful. We could have a whole taxonomy, in fact, were the catalogers to find a few spare moments.

    “Anti-librarian” as an epithet doesn’t fit most of these folks. Instead, as applied to a librarian, it has some nasty connotations: a traitor. Somebody who wears the mantle of the profession but opposes its very existence. Alternatively: a faker. A purveyor of fake news. One who is unfit to participate in the professional discourse.

    There may be some librarians who deserve to have that title — but it would take a lot more than being mistaken, or even woefully misguided to earn that.

    So let me also protest Beall’s response to Walden explicitly:

    It is not OK.

    It is not cool.

    It is not acceptable.

    Evergreen ILS: Evergreen 2.11.6 and 2.12.3 released

    planet code4lib - Thu, 2017-06-22 00:11

    The Evergreen community is pleased to announce two maintenance releases of Evergreen: 2.11.6 and 2.12.3.

    Evergreen 2.12.3 includes the following bugfixes:

    • Web staff client fixes
      • The receipt on payment checkbox now prints a receipt at time of payment.
      • The Items Out count in the patron screen now includes long overdue items.
      • A fix was added to prevent values from a previously-edited patron from appearing in the edit form of a subsequent patron.
      • User notification preferences now save correctly in the patron registration and edit forms.
      • The UPDATE_MARC permission is no longer requested when performing a search from the staff catalog.
      • Non-cataloged circulations now display in the Items Out screen without requiring a refresh.
      • Required statistical categories are now required to be entered in the copy editor. (A similar bug for the patron editor was fixed in the 2.12.1 release).
      • Voiding bills now requires confirmation.
      • Staff can no longer use the copy editor to put items into or out of the following statuses: checked out, lost, in transit, on holds shelf, long overdue, and canceled transit.
      • The contrast is improved for alert text showing the amount a patron owes in bills.
      • Circ modifiers now sort alphabetically in the copy editor.
    • Other bugfixes
      • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
      • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
        Month transform with the On or After (>=) operator.
      • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
      • A check that prevents a SMS message from attempting to sending when the SMS carrier is null.
      • For systems that provide search format as a filter on the advanced search page, a fix so that the format selected in the search bar when launching a new search from the results page overrides any previously-set formats.
      • The addition of an optional new Apache/mod_perl configuration variable for defining the port Apache listens on for HTTP traffic. This resolves an issue where added content lookups attempting HTTP requests on the local Apache instance on port 80 failed because Apache was using non-standard ports.
      • A fix to the public catalog’s My List page responsive design so that it now displays properly on mobile devices and allows users to place holds from My List.
      • A fix to a bug where the second (and subsequent) pages of search results in the public catalog (when group formats and editions is in effect) does not correctly generate links to hits that are not part of of a multi-bib metarecords.

    Evergreen 2.11.6 includes the following fixes:

    • Code to prevent a hold already on the Holds Shelf from being transferred to another title.
    • A fix to a bug that prevented users from scheduling reports with a relative month if the report template used a date that applied the Year
      Month transform with the On or After (>=) operator.
    • A fix to a bug where the max fines threshold was reached prematurely due to the presence of account adjustments.
    • A check that prevents a SMS message from sending if the SMS carrier is null.

    Please visit the downloads page to view the release notes and retrieve the server software and staff clients.

    DuraSpace News: THE Research Networking Event–Register for the 2017 VIVO Conference by June 30 and SAVE $100

    planet code4lib - Thu, 2017-06-22 00:00

    From the organizers of the 2017 VIVO Conference

    The 2017 VIVO Conference is all about research networking! If this topic and creating an integrated record of the scholarly work of your organization is of interest then the 2017 VIVO Conference is the place to be Aug 2-4 in New York City. Institutions with production VIVOs as well as those who are considering implementing VIVO will be in attendance, present their work, and/or offer workshops.

    DuraSpace News: DuraSpace Launches New Web Site

    planet code4lib - Thu, 2017-06-22 00:00

    DuraSpace has a lot to celebrate in 2017. Our community-supported open source technologies continue to contribute to advancing the access and preservation goals of our member organizations and beyond. The DuraSpace hosted services team is onboarding new customers, while at the same time contributing to efforts to offer new technologies that provide full, hosted access to, control of, and protection for your content.

    District Dispatch: House expected to approve CTE reauthorization

    planet code4lib - Wed, 2017-06-21 20:34

    Perkins CTE Program helps library patrons thrive in 21st Century Economy

    Libraries play numerous roles in communities across the country, working generally to meet the needs of their patrons at every life stage. Whether providing high-speed broadband access to rural and urban communities alike, running youth reading sessions and book clubs, teaching computer literacy to patrons seeking to learn new skills or aiding small businesses, libraries serve as learning centers helping patrons along their career paths.

    Libraries also play a valuable and specific role in supporting and working to improve secondary and postsecondary Career and Technical Education (CTE) programs funded by the Carl D. Perkins Career and Technical Education Act (“Perkins Act”), the federal bill which governs the more than $1 billion in federal funding for career and technical education activities across the country. Such programs help equip youth and adults with the academic, technical and employability skills and knowledge needed to secure employment in today’s high-growth industries. In so doing, libraries help close the “skills gap” and expand economic opportunity to more communities across the nation. Some libraries work directly with their state labor and employment offices to implement CTE programs which receive Federal funding.

    Libraries and certified librarians also provide valuable CTE resources, equipment, technology, instructional aids, and publications designed to strengthen and foster academic and technical skills achievement. In many communities, libraries play a significant role in career and technical development. Often the library is the only place where patrons can access the high-speed broadband vital to those working to apply for jobs, research careers, and towards enhanced certification and training.

    As early as this week, the House of Representatives is expected to pass legislation reauthorizing the Perkins Act, which was originally adopted in 1984. ALA recently submitted a letter to the House Committee on Education and Workforce supporting this bi-partisan legislation: the Career and Technical Education for the 21st Century Act (H.R. 2353), which was approved by the Committee on June 6.

    The House timed the vote on the reauthorization to occur during the National Week of Making spearheaded by the Congressional Maker Caucus. The week highlights the growing maker movement across the country.

    We’ve been here before, however, as the House passed similar legislation in 2016 only to see reauthorization of the Perkins program stall in the Senate, where a companion bill has yet to be introduced. Unfortunately, the President’s budget seeks to cut $168.1 million from the Perkins CTE State Grant program, which had previously received $1.118 billion in funding for FY15, FY16 and FY17. ALA will continue work to support robust funding for CTE programs and, if the House acts favorably, to urge the Senate to follow its lead and promptly reauthorize the Perkins Act.

    The post House expected to approve CTE reauthorization appeared first on District Dispatch.

    LITA: Jobs in Information Technology: June 21, 2017

    planet code4lib - Wed, 2017-06-21 19:39

    New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

    New This Week

    Purchase College – State University of New York, Client Systems Administrator, Purchase, NY

    EBSCO Information Services, Product Owner, FOLIO, Ipswich, MA

    Pacific States University, University Librarian, Los Angeles, CA

    Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

    Galen Charlton: Mashcat at ALA Annual 2017 + shared notes

    planet code4lib - Wed, 2017-06-21 19:34

    I’m leaving for Chicago tomorrow to attend ALA Annual 2017 (and to eat some real pizza), and while going over the schedule I found some programs that may be of interest to Mashcat folk:

    As a little experiment, I’ve started a Google Doc for shared notes about events and other goings-on at the conference. There will of course be a lot of coverage on social media about the conference, but the shared notes doc might be a way for Mashcatters to identify common themes.

    LITA: Ellyssa Kroski Receives 2017 LITA/Library Hi Tech Award

    planet code4lib - Wed, 2017-06-21 18:01

    Ellyssa Kroski has been named the winner of the 2017 LITA/Library Hi Tech Award for Outstanding Communication in Library and Information Technology.

    The Library and Information Technology Association (LITA) and Emerald Group Publishing sponsor the Award, which recognizes outstanding individuals or institutions for their long-term contributions in the area of Library and Information Science technology and its application. The winner receives a citation and a $1,000 stipend.

    The Award Committee selected Kroski because of her passion for teaching and technology, as well as her impact in public libraries, academic libraries, and with library school students. As a technology expert, an editor, contributor, and compiler of books, she has helped make technology and makerspaces accessible for many different types of institutions.

    Kroski is the Director of Information Technology at the New York Law Institute as well as an award-winning editor and author of 36 books including Law Librarianship in the Digital Age for which she won the AALL’s 2014 Joseph L. Andrews Legal Literature Award. Her ten-book technology series, The Tech Set, won the ALA’s Best Book in Library Literature Award in 2011. She is a librarian, an adjunct faculty member at Drexel and San Jose State University, and an international conference speaker.

    When notified she had won the Award, Kroski said, “I am incredibly honored and very pleased to receive this award from LITA. I have been fortunate enough to collaborate with and be inspired by many amazingly talented librarians throughout my professional development activities and to all of those colleagues I extend my thanks and share this honor as well.”

    Members of the 2017 LITA/Library Hi-Tech Award Committee are: Vanessa L. Ames (Chair) and Robert Wilson.

    Thank you to Emerald Publishing and Library Hi Tech for sponsoring this award.

    District Dispatch: ALA celebrates World Wi-Fi Day

    planet code4lib - Wed, 2017-06-21 16:10

    Among all their other functions in our communities, libraries are critical spaces for people to access the internet, and they are increasingly doing so wirelessly via Wi-Fi.

    Virtually all public libraries in the U.S. provide Wi-Fi to patrons. By doing so, libraries serve as community technology hubs that enable digital opportunity and full participation in the nation’s economy. Wi-Fi is a critical part of how libraries are transforming our programs and services in the digital age.

    June 20th was World Wi-Fi Day, a global initiative helping to bridge the digital divide as well as recognizing and celebrating the role of Wi-Fi in cities and communities around the world. In Washington, D.C., the WifiForward Coalition—of which ALA is a founding member—held a kick off celebration at the Consumer Technology Association’s Innovation House off of Capitol Hill. Congressman Darrell Issa (R-CA) and Federal Communications Commissioner Michael O’Rielly were on hand to expound on the wonders of Wi-Fi and to voice their support for policies that would help its growth and success.

    ALA added the following statement to materials for World Wi-Fi Day:

    “With Wi-Fi, our nation’s 120,000 libraries are able to dramatically increase our capacity to connect people of all incomes and backgrounds to the Internet beyond our public desktop computers. Wi-Fi allows us to serve more people anywhere in the library, as well as enabling mobile technology training labs, roving reference, access to diverse digital collections and pop-up library programs and services. Library wi-fi is essential to support The E’s of Libraries®—Education, Employment, Entrepreneurship, Empowerment and Engagement—on campuses and in communities nationwide. The American Library Association is proud to be a supporter of World Wi-Fi Day.”

    The post ALA celebrates World Wi-Fi Day appeared first on District Dispatch.

    Access Conference: Opening Keynote – Dr. Kimberly Christen

    planet code4lib - Wed, 2017-06-21 15:41

    We are excited to announce that Dr. Kimberly Christen will be the opening keynote speaker for Access 2017. Join us for her aptly titled talk “The Trouble with Access

    Dr. Kimberly Christen is the Director of the Digital Technology and Culture Program and the co-Director of the Center for Digital Scholarship and Curation at Washington State University.

    She is the founder of Mukurtu CMS an open source community archive platform designed to meet the needs of Indigenous communities, the co-Director of the Sustainable Heritage Network, a global community providing educational resources for stewarding digital heritage and co-Director for the Local Contexts initiative, an educational platform to support the management of intellectual property specifically using Traditional Knowledge Labels.

    More of her work can be found at her website: and you can follow her on Twitter @mukurtu.

    Open Knowledge Foundation: Always Already Computational Reflections

    planet code4lib - Wed, 2017-06-21 14:07

    Always Already Computational is a project bringing together a variety of different perspectives to develop “a strategic approach to developing, describing, providing access to, and encouraging reuse of collections that support computationally-driven research and teaching” in subject areas relating to library and museum collections.  This post is adapted from my Position Statement for the initial workshop.  You can find out more about the project at

    Earlier this year, I spent two and a half days in beautiful University of California Santa Barbara at a workshop speaking with librarians, developers, and museum and library collection managers about data.  Attendees at this workshop represented a variety of respected cultural institutions including the New York Public Library, the British Library, the Internet Archive, and others.

    Our task was to build a collective sense of what it means to treat library and museum “collections”—the (increasingly) digitized catalogs of their holdings—as data for analysis, art, research, and other forms of re-use.  We gathered use cases and user stories in order to start the conversation on how to best publish collections for these purposes.  Look for further outputs on the project website: .  For the moment, here are my thoughts on the experience and how it relates to work at Open Knowledge International, specifically, Frictionless Data.

    Always Already Computational

    Open Access to (meta)Data

    The event organizers—Thomas Padilla (University of California Santa Barbara), Laurie Allen (University of Pennsylvania), Stewart Varner (University of Pennsylvania), Sarah Potvin (Texas A&M University), Elizabeth Russey Roke (Emory University), Hannah Frost (Stanford University)—took an expansive view of who should attend.  I was honored and excited to join, but decidedly new to Digital Humanities (DH) and related fields.  The event served as an excellent introduction, and I now understand DH to be a set of approaches toward interrogating recorded history and culture with the power of our current tools for data analysis, visualization, and machine learning.  As part of the Frictionless Data project at Open Knowledge International, we are building apps, libraries, and specifications that support the basic transport and description of datasets to aid in this kind of data-driven discovery.  We are trialling this approach across a variety of fields, and are interested to determine the extent to which it can improve research using library and museum collection data.

    What is library and museum collection data?  Libraries and museums hold physical objects which are often (although not always) shared for public view on the stacks or during exhibits.  Access to information (metadata) about these objects—and the sort of cultural and historical research dependent on such access—has naturally been somewhat technologically, geographically, and temporally restricted.  Digitizing the detailed catalogues of the objects libraries and museums hold surely lowered the overhead of day-to-day administration of these objects, but also provided a secondary public benefit: sharing this same metadata on the web with a permissive license allows a greater variety of users in the public—researchers, students of history, and others—to freely interrogate our cultural heritage in a manner they choose.  

    There are many different ways to share data on the web, of course, but they are not all equal.  A low impact, open, standards-based set of approaches to sharing collections data that incorporates a diversity of potential use cases is necessary.  To answer this need, many museums are currently publishing their collection data online, with permissive licensing, through GitHub: The Tate Galleries in the UK, Cooper Hewitt, Smithsonian Design Museum and The Metropolitan Museum of Art in New York have all released their collection data in CSV (and JSON) format on this popular platform normally used for sharing code.  See A Nerd’s Guide To The 2,229 Paintings At MoMA and An Excavation Of One Of The World’s Greatest Art Collections both published by FiveThirtyEight for examples of the kind of exploratory research enabled by sharing museum collection data in bulk, in a straightforward, user-friendly way.  What exactly did they do, and what else may be needed?

    Packages of Museum Data

    Our current funding from the Sloan Foundation enables us to focus on this researcher use case for consuming data.  Across fields, the research process is often messy, and researchers, even if they are asking the right questions, possess a varying level of skill in working with datasets to answer them.  As I wrote in my position statement:

    Such data, released on the Internet under open licenses, can provide an opportunity for researchers to create a new lens onto our cultural and artistic history by sparking imaginative re-use and analysis.  For organizations like museums and libraries that serve the public interest, it is important that data are provided in ways that enable the maximum number of users to easily process it.  Unfortunately, there are not always clear standards for publishing such data, and the diversity of publishing options can cause unnecessary overhead when researchers are not trained in data access/cleaning techniques.

    My experience at this event, and some research beforehand, suggested that there is a spectrum of data release approaches ranging from a basic data “dump” as conducted by the museums referenced above to more advanced, though higher investment, approaches such as publishing data as an online service with a public “API” (Application Programming Interface).  A public API can provide a consistent interface to collection metadata, as well as an ability to request only the needed records, but comes at the cost of having the nature of the analysis somewhat preordained by its design.  In contrast, in the data dump approach, an entire dataset, or a coherent chunk of it, can be easier for some users to access and load directly into a tool like R (see this UK Government Digital Service post on the challenges of each approach) without needing advanced programming.  As a format for this bulk download, CSV is the best choice as the MoMa reflected when releasing their collection data online:

    CSV is not just the easiest way to start but probably the most accessible format for a broad audience of researchers, artists, and designers.  

    This, of course, comes at the cost of not having a less consistent interface for the data, especially in the case of the notoriously underspecified CSV format.  The README file will typically go into some narrative detail about how to best use the dataset, some expected “gotchas” (e.g. “this UTF-8 file may not work well with Excel on a Mac”).  It might also list the columns in a tabular data file stored in the dataset, expected types and formats for values in each column (e.g. the date_acquired column should, hopefully, contain dates in a one or another international format).  This information is critical for actually using the data, and the automated export process that generates the public collection dataset from the museum’s internal database may try to ensure that the data matches expectations, but bugs exist, and small errors may go unnoticed in the process.

    The Data Package descriptor (described in detail on our specifications site), used in conjunction with Data Package-aware tooling, is meant to somewhat restore the consistent interface provided by an API by embedding this “schema” information with the data.  This allows the user or the publisher to check that the data conforms to expectations without requiring modification of the data itself: a “packaged” CSV can still be loaded into Excel as-is (though without the benefit of type checking enabled by the Data Package descriptor).  The Carnegie Museum of Art, in its release of its collection data, follows the examples set by the Tate, the Met, the Moma, and Cooper-Hewitt as described above, but opted to also include a Data Package descriptor file to help facilitate online validation of the dataset through tools such as Good Tables.  As tools come online for editing, validating, and transforming Data Packages, users of this dataset should be able to benefit from those, too:

    We are a partner in the Always Already Computational: Collections as Data project, and as part of this work, we are working with Carnegie Museum of Art to provide a more detailed look at the process that went into the creation of the CMOA dataset, as well as sketching a potential ways in which the Data Package might help enable re-use of this data.  In the meantime, check out our other case studies on the use of Data Package in fields as diverse as ecology, cell migration, and energy data:

    Also, pay your local museum or library a visit.

    Library Tech Talk (U of Michigan): Software contributions reduce our debt

    planet code4lib - Wed, 2017-06-21 00:00

    Contributing to software projects can be harder and more time consuming than coding customized solutions. But over the long term, writing generalized solutions that can be used and contributed to by developers from around the world reduces our dependence on ourselves and our organizational resources, thus drastically reducing our technical debt.

    DuraSpace News: WATCH New Hyku, Hyrax, and the Hydra-in-a-Box Project Demo

    planet code4lib - Wed, 2017-06-21 00:00

    From Michael J. Giarlo, Technical Manager, Hydra-in-a-Box Project, Software Architect, on behalf of the Hyku tech team

    Stanford, CA  Here's the latest demo of advances made on Hyku, Hyrax, and the Hydra-in-a-Box project.

    DuraSpace News: Visit

    planet code4lib - Wed, 2017-06-21 00:00

    DuraSpace is pleased to announce that the new HykuDirect web site is up and running, and ready to field inquiries about the exciting new hosted service currently in development:

    • The site features Hyku background information, a complete list of key features, a timeline that lays out the steps towards availability of a full-production service, and a contact form.

    LITA: Timothy Cole Wins 2017 LITA/OCLC Kilgour Research Award

    planet code4lib - Tue, 2017-06-20 22:20

    Timothy Cole, Head of the Mathematics Library and Professor of Library and Information Science at the University of Illinois Urbana-Champaign, has been selected as the recipient of the 2017 Frederick G. Kilgour Award for Research in Library and Information Technology, sponsored by OCLC and the Library and Information Technology Association (LITA). Professor Cole also holds appointments in the Center for Informatics Research in Science and Scholarship (CIRSS) and the University Library.

    The Kilgour Award is given for research relevant to the development of information technologies, especially work which shows promise of having a positive and substantive impact on any aspect(s) of the publication, storage, retrieval and dissemination of information, or the processes by which information and data is manipulated and managed. The winner receives $2,000, a citation, and travel expenses to attend the LITA Awards Ceremony & President’s Program at the 2017 ALA Annual Conference in Chicago (IL).

    Over the past 20 years, Professor Cole’s research in digital libraries, metadata design and sharing, and interoperable linked data frameworks have significantly enhanced discovery and access of scholarly content which embodies the spirit of this prestigious Award. His extensive publication record includes research papers, books, and conference publications and has earned more than $11 million in research grants during his career.

    The Award Committee also noted Professor Cole’s significant contributions to major professional organizations including the World Wide Web Consortium (W3C), Digital Library Federation, and Open Archives Initiative, all of which help set the standards in metadata and linked data practices that influence everyday processes in libraries. We believe his continuing work on Linked Open Data will further improve how information is discovered and accessed. With all of Professor Cole’s research and service contributions, the Committee unanimously found him to be the ideal candidate to receive the 2017 Frederick G. Kilgour Award.

    When notified he had been selected, Professor Cole said, “I am honored and very pleased to accept this Award. Fred Kilgour’s recognition more than 50 years ago of the ways that computers and computer networks could improve both library services and workflow efficiencies was remarkably prescient, and his longevity and consistent success in this dynamic field was truly amazing. Many talented librarians have built on his legacy, and over the course of my career, I have found the opportunity to meet, learn from, and work with many of these individuals, including several prior Kilgour awardees, truly rewarding. I have been especially fortunate in my opportunities and colleagues at Illinois — notably (to name but three) Bill Mischo, Myung-Ja Han, and Muriel Foulonneau — as well as in my collaborations with other colleagues across the globe. It is these collaborations that account in large measure for the modest successes I have enjoyed. I am humbled by and most appreciative of the Award Committee for giving me this opportunity to join the ranks of Kilgour awardees.”

    Members of the 2017 Kilgour Award Committee are: Tabatha Farney (Chair), Ellen Bahr, Matthew Carruthers, Zebulin Evelhoch, Bohyun Kim, Colby Riggs, and Roy Tennant (OCLC Liaison).

    Thank you to OCLC for sponsoring this award.

    Library of Congress: The Signal: Hack-to-Learn at the Library of Congress

    planet code4lib - Tue, 2017-06-20 20:52

    When hosting workshops, such as Software Carpentry, or events, such as Collections As Data, our National Digital Initiatives team made a discovery—there is an appetite among librarians for hands-on computational experience. That’s why we created an inclusive hackathon, or a “hack-to-learn,” taking advantage of the skills librarians already have and paring them with programmers to mine digital collections.

    Hack-to-Learn took place on May 16-17 in partnership with George Mason and George Washington University Libraries. Over the two days, 61 attendees used low or no-cost computational tools to explore four library collection as data sets. You can see the full schedule here.

    Day two of the workshop took place at George Washington University Libraries. Here, George Oberle III, History Librarian at George Mason University, gives a Carto tutorial. Photo by Justin Littman, event organizer.

    The Data Sets

    The meat of this event was our ability to provide library collections as data to explore, and with concerted effort we were able to make a diverse set available and accessible.

    In the spring, the Library of Congress released 25 million of its MARC records for free bulk download. Some have already been working with the data – Ben Schmidt was able to join us on day one to present his visual hacking history of MARC cataloging and Matt Miller made a list of 9 million unique titles. We thought these cataloging records would also be a great collection for hack-to-learn attendees because the format is well-structured and familiar for librarians.

    The Eleanor Roosevelt Papers Project at George Washington University shared its “My Day” collection – Roosevelt’s daily syndicated newspaper column and the closest thing we have to her diary. George Washington University Libraries contributed their Tumblr End of Term Archive- text and metadata from  72 federal Tumblr blogs harvested as part of the End of Term Archive project.

    Topic modelling in MALLET with the Eleanor Roosevelt “My Day” collection. MALLET generates a list of topics from a corpus and keywords composing those topics. An attendee suggested it would be a useful method for generating research topics for students (and we agree!).

    As excitement for hack-to-learn grew, the Smithsonian joined the fun by providing their Phyllis Diller Gag file. Donated to the Smithsonian American History Museum, the gag file is a physical card catalog containing 52,000 typewritten joke cards the comedian organized by subject. The Smithsonian Transcription Center put these joke cards online, and they were transcribed by the public in just a few weeks. Our event was the first time these transcriptions were used.

    Gephi network analysis visualization of the Phyllis Diller Gag file. The circles (or nodes) represent joke authors and their relationship to each other based on their joke subjects.

    To encourage immediate access to the data and tools, we spent a significant amount of time readying these four data sets so ready-to-load versions were available. For the MARC records to be amenable for the mapping tool Carto, for example, Wendy Mann, Head of George Mason University Data Services, had to reduce the size of the set, then convert the 1,000 row files to csv using MarcEdit, map the MARC fields as column headings, create load files for MARC fields in each file, and then mass edit column names in OpenRefine so that each field name began with a character as opposed to a number (a Carto requirement).

    We also wanted to be transparent about this work so attendees could re-create these workflows after hack-to-learn. We bundled the data sets in their multiple versions of readiness, README files, a list of resources, a list of brainstorming ideas of what possible questions to ask of the data, and install directions for the different tools all in a folder that was available for attendees a week before the event. We invited attendees to join a Slack channel to ask questions or report errors before and during the event, and opened day one with a series of lightning talks about the data sets from content and technical experts.

    What Was Learned

    Participants were largely librarians, faculty or students from our three partner organizations. 12 seats were opened to the public and quickly filled by librarians, faculty or students from universities or cultural heritage institutions. Based on our registration survey, the majority of participants trended towards little or no experience. Almost half reported experience with OpenRefine, while 44.8% reported having never used any of the tools before. 49.3% wanted to learn about “all” methodologies (data cleaning, text mining, network analysis, etc.), and 46.3% reported interest in specifically text mining.

    31.3% of hack-to-learn registrants were curious about computational research and wanted and introduction, and 28.4% were familiar with some tools but not all. 14.9% thought it sounded fun!

    Twenty-one attendees responded to our post-event survey. Participants confirmed that collections as data work felt less “intimidating” and the tools more “approachable.” Respondents reported a recognition of untapped potential in their data sets and requested more events of this kind.

    “I was able to get results using all the tools, so in a sense everything worked well. Pretty sure my ‘success’ was related to the scale of task I set for myself; I viewed the work time as time for exploring the tools, rather than finishing something.”

    Many appreciated the event’s diversity- the diversity of data sets and tools, the mixture of subject matter and technical experts, and the mix between instructional and problem-solving time.

    “The tools and datasets were all well-selected and gave a good overview of how they can be used. It was the right mix of easy to difficult. Easy enough to give us confidence and challenging enough to push our skills.”

    The Phyllis Diller team works with OpenRefine at Hack-to-Learn, May 17, 2017. Photo by Shawn Miller.

    When asked what could be improved, many felt that identifying what task to do or question to ask of the data set was difficult, and attendees often underestimated the data preparation step. We received suggestions such as adding guided exercises with the tools before independent work and more time for digging deeper into a particular methodology or research question.

    “It was at first overwhelming but ultimately hugely beneficial to have multiple tools and multiple data sets to choose from. All this complexity allowed me to think more broadly about how I might use the tools, and having data sets with different characteristics allowed for more experimentation.”

    Most importantly, attendees identified what still needed to be learned. Insights from the event related to the limitations of the tools. For example, attendees recognized GUI interfaces were accessible and useful for surface-level investigation of a data set, but command-line knowledge was needed for deeper investigation or in some cases, working with a larger data set. Several participants in the post-event survey showed interest in learning Python as a result.

    Recognizing what they didn’t know was not discouraging. In fact, one point we heard from multiple attendees was the desire for more hack-to-learn events.

    “If someone were to host occasional half-day or drop-in hack-a-thons with these or other data sets, I would like to try again. I especially appreciate that you were welcoming of people like me without a lot of programming experience … Your explicit invitation to people with *all* levels of experience was the difference between me actually doing this and not doing it.”

    We’d like to send a big thank you again to our partners at George Washington and George Mason University Libraries, and to the Smithsonian American History Museum and Smithsonian Transcription Center for you time and resources to make Hack-to-Learn a success! We encourage anyone reading this to consider doing one at your library, and if you do, let us know so we can share it on The Signal!



    LITA: Learn about Contextual Inquiry, after ALA Annual

    planet code4lib - Tue, 2017-06-20 18:42

    Sign up today for 

    Contextual Inquiry: Using Ethnographic Research to Impact your Library UX

    This new LITA web course begins, July 6, 2017, shortly after ALA Annual. Use the excitement generated by the conference to further explore new avenues to increase your user engagement. The contextual inquiry research methodology helps to better understand the intents and motivations behind user behavior. The approach involves in-depth, participant-led sessions where users take on the role of educator, teaching the researcher by walking them through tasks in the physical environment in which they typically perform them.

    Instructors: Rachel Vacek, Head of Design & Discovery, University of Michigan Library; and Deirdre Costello, Director, UX Research, EBSCO Information Services
    July 6 – August 10, 2017
    Register here, courses are listed by date and you need to log in

    In this session, learn what’s needed to conduct a Contextual Inquiry and how to analyze the ethnographic data once collected. We’ll talk about getting stakeholders on board, the IRB, Institutional Review Board, process and scalability for different sized library teams. We’ll cover how to synthesize and visualize your findings as sequence models and affinity diagrams that directly inform the development of personas and common task flows. Finally, learn how this process can help guide your design and content strategy efforts while constructing a rich picture of the user experience.

    View details and Register here.

    This is a blended format web course

    The course will be delivered as separate live webinar lectures, one per week. You do not have to attend the live lectures in order to participate. The webinars will be recorded for later viewing.

    Check the LITA Online Learning web page for additional upcoming LITA continuing education offerings.

    Questions or Comments?

    For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

    Karen Coyle: Pray for Peace

    planet code4lib - Tue, 2017-06-20 17:56
    This is a piece I wrote on March 22, 2003, two days after the beginning of the second Gulf war. I just found it in an old folder, and sadly have to say that things have gotten worse than I feared. I also note an unfortunate use of terms like "peasant" and "primitive" but I leave those as a recognition of my state of mind/information. Pray for peace.

    Saturday, March 22, 2003
    Gulf War II
    The propaganda machine is in high gear, at war against the truth. The bombardments are constant and calculated. This has been planned carefully over time.
    The propaganda box sits in every home showing footage that it claims is of a distant war. We citizens, of course, have no way to independently verify that, but then most citizens are quite happy to accept it at face value.
    We see peaceful streets by day in a lovely, prosperous and modern city. The night shots show explosions happening at a safe distance. What is the magical spot from which all of this is being observed?
    Later we see pictures of damaged buildings, but they are all empty, as are the streets. There are no people involved, and no blood. It is the USA vs. architecture, as if the city of Bagdad itself is our enemy.
    The numbers of casualties, all of them ours, all of them military, are so small that each one has an individual name. We see photos of them in dress uniform. The families state that they are proud. For each one of these there is the story from home: the heavily made-up wife who just gave birth to twins and is trying to smile for the camera, the child who has graduated from school, the community that has rallied to help re-paint a home or repair a fence.
    More people are dying on the highways across the USA each day than in this war, according to our news. Of course, even more are dying around the world of AIDS or lung cancer, and we aren't seeing their pictures or helping their families. At least not according to the television news.
    The programming is designed like a curriculum with problems and solutions. As we begin bombing the networks show a segment in which experts explain the difference between the previous Gulf War's bombs and those used today. Although we were assured during the previous war that our bombs were all accurately hitting their targets,  word got out afterward that in fact the accuracy had been dismally low. Today's experts explain that the bombs being used today are far superior to those used previously, and that when we are told this time that they are hitting their targets it is true, because today's bombs really are accurate.
    As we enter and capture the first impoverished, primitive village, a famous reporter is shown interviewing Iraqi women living in the USA who enthusiastically assure us that the Iraqi people will welcome the American liberators with open arms. The newspapers report Iraqis running into the streets shouting "Peace to all." No one suggests that the phrase might be a plea for mercy by an unarmed peasant facing a soldier wearing enough weaponry to raze the entire village in an eye blink.
    Reporters riding with US troops are able to phone home over satellite connections and show us grainy pictures of heavily laden convoys in the Iraqi desert. Like the proverbial beasts of burden, the trucks are barely visible under their packages of goods, food and shelter. What they are bringing to the trade table is different from the silks and spices that once traveled these roads, but they are carrying luxury goods beyond the ken of many of Iraq's people: high tech sensor devices, protective clothing against all kinds of dangers, vital medical supplies and, perhaps even more important, enough food and water to feed an army. In a country that feeds itself only because of international aid -- aid that has been withdrawn as the US troops arrive -- the trucks are like self-contained units of American wealth motoring past.
    I feel sullied watching any of this, or reading newspapers. It's an insult to be treated like a mindless human unit being prepared for the post-war political fall-out. I can't even think about the fact that many people in this country are believing every word of it. I can't let myself think that the propaganda war machine will win.
    Pray for peace.


    Subscribe to code4lib aggregator