You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 16 hours 59 min ago

Open Knowledge Foundation: FutureTDM: The Future of Text and Data Mining

Fri, 2017-03-03 14:46

Blog written by Freyja van den Boom (FutureTDM researcher) and Lieke Ploeger.

Since September 2015 Open Knowledge International has been working on finding new ways to improve the uptake of text and data mining in the EU, as part of the FutureTDM project. Text and data mining (TDM) is the process of extracting relevant information from large amounts of machine-readable data (such as scientific papers) and recombining this to unlock new knowledge and power innovation (see ‘Techniques, Tools & Technologies for TDM in Europe’). Project partners include libraries, publishers and universities, but also the non-profit organisation ContentMine that advocates for the right to mine content. Open Knowledge International leads the work on communication, mobilisation and networking and undertakes the research into best practices and methodologies.

A practical example explaining the use of TDM

Because the use of TDM is significantly lower in Europe than in some countries in the Americas and Asia, FutureTDM actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help pinpoint why uptake is lower, raise awareness of TDM and develop solutions. This is especially important at this current time, because an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

Throughout 2016 we organised Knowledge Cafe’s across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field and held stakeholder consultations with the various communities.  In September 2016 we held the first of two workshops to discuss the project’s findings in Brussels where many MEPs and policymakers were present. In early 2017 a roundtable was organised at the Computer Privacy and Data Protection (CPDP) conference in Brussels, where the impact of data protection regulations for the uptake of advanced data analysis technologies like TDM was discussed.

MEP Julia Reda discussing the upcoming copyright reform at the FutureTDM workshop

Below are some of the insights we have gained through are research so far, which include the main barriers for different TDM stakeholder communities. In the upcoming months we will be publishing more of the results and proposed solutions on how to overcome them.

Education and skill
There is a need for more education on the benefits and practical use of TDM for researchers: working together with industry, publishing community and academia to develop effective courses aimed at different levels depending on the discipline and type of research that is likely to use TDM. We are currently working on TDM education and looking to get feedback on what the learning outcomes should be. If you are interested to get involved contact us !

Legal and policy
There is no legal clarity about the legal status of TDM practices and use of results that are gained through using TDM. Barriers include the uncertainty about the scope of copyright, database protection and privacy and data protection regulations. See for example our guest blog here.

The current copyright reform discussions focuses partly on a TDM exception which could help provide more clarity. Under discussion is for example what data and what usefalls under copyright, for example whether there should be a distinction between commercial and non-commercial use. FutureTDM partners are monitoring these developments.

We have recently published the FutureTDM policy framework introducing high level principles that should be the foundation of every stakeholder action that aims to promote TDM. These high level principles are:

  • Awareness and Clarity: actions should improve certainty on the use of TDM technologies. Information and clear actions are crucial for a flourishing TDM environment in Europe.
  • TDM without Boundaries: insofar as appropriate, boundaries should be cleared to prevent and take away fragmentation in the TDM landscape.
  • Equitable Access: access to TDM tools and technologies, as well as sources (such as datasets), are indispensable for a successful uptake of TDM, but usually comes at a price. While a broadest possible access to tools and data should be the aspiration, providers of these also have a legitimate interest in restricting access, for example for the protection of their investments or any privacy related interest.

Technical and infrastructure
The main concern is access and quality of available data. There is a confidence in the technological developments of more reliable and easy to use tools and services, although the documentation and findability of relevant tools and services is reported as a barrier at the moment.

Developing standards for data quality is seen as a useful but most likely impossible solution given the diversity in projects and requirements, which would make standards too complex for compliance.

Economy and Incentives
Barriers that are mentioned are the lack of a single European market, the problems of having multiple languages and a lack of enforcement for US companies.

Further research
The interviews and the case studies have provided evidence of and insight into the barriers that exist in Europe. To what extent these barriers can be solved given the different interests of the stakeholders involved remains a topic for further research within the FutureTDM project.

We will continue to work on recommendations, guidelines and best practices for improving the uptake of TDM in Europe, focused on addressing the barriers presented by the main stakeholders. All findings, which include policy recommendations, guidelines, case studies, best practices, practical tutorials and help and how to guides to increase TDM uptake are shared through the platform at The FutureTDM awareness sheets for example cover a range of factors that have an impact on TDM uptake and were created from our expert reports, expert interviews and discussions through our Knowledge Café events. The reports that have been completed so far are available from the Knowledge Library.

In the final six months of the FutureTDM project, there are many opportunities to find out more about the results and give your feedback on the situation around TDM in Europe. On 29 March, the second FutureTDM workshop at the European Parliament in Brussels  will take place, where your input on TDM experiences on the ground is very welcome. With EU copyright reform now in progress, we bring together policy makers and stakeholder groups so that we can share FutureTDM’s  findings and our first expert driven policy recommendations that can help increase EU TDM. To find out more and sign up, please check the event page. We will showcase the final project results during the final FutureTDM symposium, organised in conjunction with the International Data Science Conference (12-13 June 2017, Salzburg, Austria.

Our animation explaining TDM and the importance of stakeholder engagement

District Dispatch: OITP takes over @YALSA for Teen Tech Week #TTW17

Thu, 2017-03-02 22:17

ALA’s Young Adult Library Services Association (YALSA) kicks off its annual Teen Tech Week (#TTW17) this Monday. The Office for Information Technology Policy will be participating throughout the week and round it off with a Twitter Takeover on Friday, March 10.

This year’s theme, Be the Source of Change, is perfect for the Libraries Ready to Code project. #TTW17 challenges teens and the libraries that serve them to think about how digital resources add to their creativity, encourage computational thinking and help teens make a positive change in their communities. Libraries #ReadytoCode builds on that by asking library staff to imagine how they can design and facilitate coding programs for youth that connect coding to their interests and passions; build community partnerships; positively influence youth and community perceptions of who can code; and inspire more youth to code.

Sound like a lot? Not really. #ReadytoCode concepts start where libraries are already strong (as in facilitating programs, understanding community demographics, connecting youth to relevant resources), but during #TTW17 we will have a series of guest bloggers who will talk about projects that will inspire you to get started or to try something new if you are already coding. Two things you must try!

  • Check out The Tessera. Tweet the characters through an “Enchanted Crate.” Ada Lovelace (a.k.a. @horselytowers) and others are waiting to communicate with you. Be sure to watch out for S (a.k.a. @otpreny), the enemy of the Tessera society. The design team will be blogging about the game, so check back on Wednesday and follow #thetessera for more.
  • Check out the 2017 Congressional App Challenge and mix coding and civics into one activity. You will hear more about how to get involved next week.

Finally, join the #ReadytoCode team on Friday for a Twitter chat at 3:30 p.m. EST. Tell us what you’ve been doing during #TTW17 and what you do throughout the year to get your teens creative through coding.

Want to learn more about Libraries #ReadytoCode? Listen to this Public Library Association podcast and visit ala/org/librariesreadytocode.

The post OITP takes over @YALSA for Teen Tech Week #TTW17 appeared first on District Dispatch.

District Dispatch: Next CopyTalk on copyright first responders

Thu, 2017-03-02 21:48

Photo by Toshihiro Oimatsu.

Please join us for our next CopyTalk on April 6 with Kyle K. Courtney, Copyright Advisor at Harvard University, who will discuss his innovative copyright service model.

Kyle Courtney of Harvard University’s Office for Scholarly Communication describes how library patrons, faculty, students and staff need more guidance than ever on copyright matters on issues such as fair use, open access, MOOCs, repositories and digitization. These questions are arriving at the library with greater frequency and Kyle believes a modern, 21st century library should be equipped to answer such questions.

The Copyright First Responders (CFR) program has developed the first decentralized model of copyright expertise in an academic setting, relying on a hub-and-spoke model to answer questions from the communities associated with certain libraries. The librarians — each with their own focus, specialty, degrees and training — are in the best position to be trained to answer copyright questions from their respective communities. Therefore, copyright training should be layered on top of that subject expertise and result in a systemic shift in copyright knowledge thought the academic setting – the library becomes the focus of copyright inquiry and policy. The presentation will reveal the examine the types of copyright questions received, note the thematic uniformity of large copyright questions, present success metrics on questions answered, lessons learned, and share best practices in creating a CFR program.

Day: Join us Thursday, April 6, for an hour-long free webinar
Time: 2 p.m. Eastern / 11 a.m. Pacific
Link: Go to and sign in as a guest. You’re in!

This program is brought to you by OITP’s copyright education subcommittee. An archive of previous webinars is available.

The post Next CopyTalk on copyright first responders appeared first on District Dispatch.

OCLC Dev Network: DEVCONNECT Speakers Announced

Thu, 2017-03-02 21:00

Speakers announced for first annual OCLC DevCONnect conference to be held at OCLC headquarters in Dublin, Ohio on May 8 and 9, 2017

District Dispatch: And the James Madison Award goes to… Senator Jon Tester

Thu, 2017-03-02 20:07

Senator Jon Tester, MT, is this year’s  James Madison Awardee. The award honors individuals or groups who have championed, protected and promoted public access to government information and the public’s right to know.

The American Library Association Washington Office is pleased to announce that the winner of the 2017 James Madison Award is Senator Jon Tester of Montana. The award will be presented by ALA Past President Sari Feldman in a ceremony at the Newseum in Washington, D.C., on Wednesday, March 15. Even if you can’t join us in Washington, you will be able to watch the program wherever you have an internet connection! This year the event will be streamed live from the Knight TV Studio in the Newseum at

The James Madison Award is presented annually near the anniversary of President Madison’s birth. The award honors individuals or groups who have championed, protected and promoted public access to government information and the public’s right to know.

This year’s award will be presented during Sunshine Week, the annual nationwide celebration of open access to public information and what it means for communities across the country, as part of an event sponsored by our partners: the Project On Government Oversight (POGO), the Newseum Institute, the American Society of News Editors and Following the award ceremony, a panel discussion on “Accessing Information in the Trump Administration” will offer practical advice for journalists, advocates and others who rely on access to government data for informing the public and holding the government accountable.

You are invited to celebrate the free flow of information with us. Space is limited, so be sure to RSVP if you plan to join us in person.

WHAT: 2017 James Madison Award presentation

WHEN: Wednesday, March 15, 2017, 1:30 p.m. EST

WHERE: Knight TV Studio, Newseum, 555 Pennsylvania Ave NW, Washington, DC 20001


The program will be live streamed and available at 1:30 p.m. EST, March 15, at

The post And the James Madison Award goes to… Senator Jon Tester appeared first on District Dispatch.

Islandora: Report from a release dimension: 7.x-1.9 Code freeze is a reality

Thu, 2017-03-02 18:53

Good Morning inhabitants of Islandora Land,

After a long 1st of March that involved a bit of 2nd of March of re: auditing audits, pull request reviews, last minute fixes, cleaning, pulling, IRC, coffee (hot and cold) and merging, 7.x-1.9 Code Freeze is a reality! 

All thanks to a wonderful group of committed committers, maintainers and volunteers!

Some conclusions (skip if more interested in the next steps):
README and LICENSE Audits happened. Green marks on the spreadsheet required some interpretation but we did it. Many Licences were normalized and even submodules and sub-sub modules that were lacking since forever some pieces got our attention.

Many enhancements and a few new things made it into this 7.x-1.9 release branch. A group of involved committers worked hard yesterday to test pulls, answer all review questions and address issues. Travis-CI played its part in making this more difficult and even requiring some small bending of rules. Which leads probably to some future discussion that maybe it could be a good idea to not make pull requests that are not part of the release or urgent the day of code freeze itself to avoid a long wait on a single queue. Just some morning ideas.

All modules (except tuque which uses a different versioning convention) got a 7.x-1.9RC1 provisional version in their .info files, to make sure nobody installs them "thinking" they are final. This is different to other releases where versioning was added later in the game. But this is a different release! Things like auditing being done before code freeze allowed us some new approaches.

Next step:
Everything reported in JIRA as a Bug (and proven so) can still be Fixed and can eventually make it into our release. Wrong documentation can also be fixed, means files, even when already audited, will be part of the next process and any missing info can be added to them. Again,  to make any of this happen we will require two pull requests:  one to 7.x and one to 7.x-1.9 branches of the affected modules. No code tasks, enhancements or improvements are permitted in 7.x-1.9 branches from now on. That is the reason we name that a freezed branch.

I will create an updated and downloadable VM in the next 10 days (deadline March 13) to allow testers to do what they are famous for: test.

Again, I know for most of you there is nothing new here, but just wanted to let you know this happened (again). And again, for those new or just curious, don't hesitate to reach out or get motivated to get involved.

Component Manager, Auditors, Committers:
Please give your petted modules a look to see if i did my job correclty. You should see a 7.x-1.9 branch in your module and .info files should have 7.x-1.9RC1 version inside. If not, I'm a human being. Ping me and I will fix that.


Thanks to all of you to make this important part of our Islandora 7.x-1.9 release happen! I have special mention to Melissa Anez which not only served as communications facilitator-translator but also as active committer doing a huge amount of pulling and merging + helping me stay sane and extending her workday to be around in case of something exploding. Also to all those who fixed their pulls to address issues yesterday, those who reviewed those pull requests and those who merged las minute pulls. You are all great! Proud to be part of this community.

FOSS4Lib Recent Releases: veraPDF - 1.2

Thu, 2017-03-02 16:19

Last updated March 2, 2017. Created by Peter Murray on March 2, 2017.
Log in to edit this page.

Package: veraPDFRelease Date: Thursday, March 2, 2017

District Dispatch: Look Back, Move Forward: Five women who stood up for the public’s right to know.

Thu, 2017-03-02 16:13

ALA President Patricia Schuman speaking at the Rally for America’s Libraries at the 1991 ALA Annual Conference in Atlanta, Georgia. Photo from the American Library Association Archives.

Sunshine Week—a national initiative to promote a dialogue about the importance of open government and freedom of information—is on the horizon (pun most certainly intended). Held on March 12 through March 18, Sunshine Week occurs annually, coinciding with James Madison’s birthday and National Freedom of Information Day on March 16.

During Sunshine Week, hundreds of libraries, media organizations, civic groups, nonprofits, schools, federal agencies and other participants engage public discussion on the importance of open government. The purpose? To highlight the fact that government functions best when it operates transparently and that the public has the right to access information from the government.

In honor of this upcoming library holiday—and the start of women’s history month—TBT to these five women who advocated for the public’s right to know.

1. Rep. Zoe Lofgren (D-CA)
Over the course of her long career, Lofgren has consistently sponsored legislation that strengthens the public’s right to access information and her opposition to legislation that impedes First Amendment rights. Among her many activities, Rep. Lofgren is well regarded for her work on patent reform, copyright issues, digital rights and net neutrality. She successfully fought to initiate the E-rate program that provides affordable internet access for schools, libraries and rural health centers, and she is the author of legislation that would allow the unlocking of cellular phones and other digital devices to give owners more control over their devices.

2010-2011 ALA President Roberta Stevens presents Meredith Fuchs, former General Counsel National Security Archives the Madison Award.

2. Meredith Fuchs
As Vice President and General Counsel of the National Security Archive at George Washington University, Fuchs lead the National Security Archive though the litigation of Armstrong v. Executive Office of the President in 1989—the first White House e-mail lawsuit. The case led to numerous important precedents, including: e-mail can be defined as a record; certain contextual data associated with e-mail records must be preserved; and the public has a right to enforce the Federal Records Act. In her words: “Records of decision making at the White House, in both the federal record and presidential record components, have long been viewed as some of the most valuable records to historians and researchers.”

Records of decision making at the White House, in both the federal record and presidential record components, have long been viewed as some of the most valuable records to historians and researchers.

3. Hazel Reid O’Leary
Nominated by President Bill Clinton in 1993, O’Leary became the seventh United States Secretary of Energy and the first African American woman to serve in that office. As Secretary, O’Leary changed the department’s Office of Classification to the Office of Declassification, created an Openness Advisory Panel and encouraged the Clinton administration to end nuclear testing in the United States (nbd).

Eileen D. Cooke, second from the right, helping to hoist a “Welcome ALA” banner at the Library of Congress.

4. Eileen D. Cooke
Cooke began her career with ALA in 1964 as assistant director of the Washington Office and assumed the director position in 1972. For thirty years, she led the organization and, during her tenure, played a major role in the development, renewal and funding of key library legislation, including the Library Services and Construction Act, the Higher Education Act, the Elementary and Secondary Education Act, the Medical Library Assistance Act, the Copyright Revision Act and the establishment of the National Commission on Libraries and Information Science. Cooke was instrumental in planning ALA’s National Library Legislative Day, helping to ensure that library voices were heard loud and clear by our elected officials.

5. Patricia Glass Schuman
As the 1991-92 president-elect of ALA, Schuman launched a nationwide media campaign to focus public attention on threats to the public’s right to know—including library funding cuts, censorship and restricted access to government information—and the need to support libraries and librarians. More than 500,000 Americans called a special toll-free number or signed petitions to Congress supporting full funding for libraries. During her presidency, Schuman implemented a program of media training for ALA chapters and division leaders and founded ALA’s first Speaker’s Network.

Join us on March 15 for the 2017 James Madison Award Ceremony!
Named for President James Madison, this ALA award is presented annually near the anniversary his birth to honor individuals or groups who have championed, protected and promoted public access to government information and the public’s right to know at the national level. The 2017 James Madison Award will be presented by ALA Past President Sari Feldman at the Knight Studio at the Newseum in Washington, D.C. You can find more details about the event here.

The post Look Back, Move Forward: Five women who stood up for the public’s right to know. appeared first on District Dispatch.

David Rosenthal: Injecting Faults in Distributed Storage

Thu, 2017-03-02 16:00
I'll record my reactions to some of the papers at the 2017 FAST conference in a subsequent post. But one of them has significant implications for digital preservation systems using distributed storage, and deserves a post to itself. Follow me below the fold as I try to draw out these implications.

Four of the 27 papers at this year's FAST conference, and both the Best Paper awards, were from UW Madison. Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions by Ashiwarya Ganesan et al from UW Madison wasn't a Best Paper but I would have voted for it. And kudos to the primary author for her very clear presentation. The abstract reads:
We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability.Previous work studied the response of local file systems to media faults. At the 2005 SOSP a team from UW Madison presented IRON File Systems. They injected device faults below state-of-the-art file systems and studied the response.
commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures.The experiments described in the 2017 paper were based on injecting realistic media-caused faults into the local file systems underlying the distributed storage systems in question. They inserted a FUSE-based stacked file system between the local file system and the distributed storage system being studied to inject these faults. This code is being released for others to use.

The faults they injected each affected only a single data block in the local file system of only one server in the distributed system. They did not inject faults into local file system metadata. The injected faults were corrupted data, read errors and write errors. It is notable that a single fault of this kind at a single node of the distributed storage system, which did not itself affect the replicas at other nodes, could still cause significant problems, including corrupting those replicas.

The distributed storage systems studied were Redis, ZooKeeper, Cassandra, Kafka, RethinkDB, MongoDB, LogCabin, and CockroachDB. None looked good; all failed in some ways and none effectively exploited the redundancy they maintained to recover from the failures. The paper drew five general conclusions:
  1. Systems employ diverse data integrity strategies, ranging from trusting the underlying file systems (RethinkDB and Redis) to using checksums (ZooKeeper, MongoDB, CockroachDB).
  2. Local Behavior: Faults are often undetected; even if detected, crashing is the most common local reaction
  3. Redundancy is underutilized: A single fault can have disastrous cluster-wide effects.
  4. Crash and corruption handling are entangled.
  5. Nuances in commonly used distributed protocols can spread corruption or data loss.
This work has two kinds of implication for digital preservation systems:
  • These systems frequently use stored checksums to detect corruption, which they repair from a replica. They may therefore exhibit some of the bad behaviors of the similar processes in the distributed storage systems the paper studied.
  • Systems using a distributed storage layer for long-term storage frequently place some level of trust in it, for example they may trust it to store the checksums. They do so because they believe the replication of the distributed storage system renders it fault-tolerant. This paper destroys that belief.
There was related work in the Work In Progress session. On Fault Resilience of File System Checkers by Om Rameshwar Gatla and Mai Zheng of New Mexico State was inspired by a real data loss incident at a high-performance computing center. A power failure triggered their Lustre file system checker, which would take several days. During the check, a second power failure interrupted it. The checker uses the underlying local file system checkers, in this case e2fsck. The interrupted e2fsck runs had already made changes to their file systems which could not be repaired by a subsequent e2fsck.

The paper used similar fault injection techniques to replicate and study the problem. The take-away is that file system checkers need to be transactional, a conclusion which seems obvious once it has been stated. This is especially the case since the probability of a power outage is highest immediately after power restoration. Until the checkers are transactional, their vulnerability to crashes and power failures, and the inability of the distributed storage systems above them to correctly respond poses a significant risk.

Open Knowledge Foundation: Museum collections and the stories we tell about data

Thu, 2017-03-02 14:30

This week Frictionless Data will be at Always Already Computational: Library Collections as Data. Sierra Williams looks at the containers for museum and library collections data and why storytelling is important for understanding these collections in the digital age.

One of the great things about working at Open Knowledge International is being surrounded by fascinating projects and a community of committed individuals working at the intersection of innovative tech, social good, and underlying principles of openness in society. Working in communications, my interest and focus is on the stories we are telling about these projects. Data as subject material – whether big, small, open, or closed – does not typically come across as human-focused and if it elicits any emotive response it is usually fear of those unknown algorithms ruling our lives in unseen ways. How do we talk about data that extends beyond the fear?

One of the most powerful stories I’ve come across at the organisation is from our Frictionless Data team. They build standards, tools and best practices designed to make sharing data as easy, or frictionless, as possible. When the team discuss this work, they talk about how their Data Packages act as containers for making it easier to transport data, as shipping containerisation does for the movement of goods. Data Packages identify the metadata necessary for data to be transferred from one researcher to another, making the whole process of getting access to quality data more efficient.

Dan Fowler will be discussing Data Packages for library and museum collections data this week at the Always Already Computational: Library Collections as Data conference, designed to foster a strategic approach to developing, describing, providing access to, and encouraging reuse of museum and library collections that support computationally-driven research and teaching in areas including but not limited to Digital Humanities, Public History, Digital History, data-driven Journalism, Digital Social Science, and Digital Art History.

Librarians get the importance of structured metadata. Digital humanities scholars get that how we share our data will have lasting implications for the future. We are thrilled to see the Carnegie Museum of Art in Pittsburgh, Pennsylvania is providing its museum collections data as a downloadable Data Package and we look forward to learning more about how this data will be used.

Over the course of this year, with the continued support of a grant from the Sloan Foundation, the Frictionless Data team will be working with researchers to pilot these Data Packages and the associated tooling further. Through this process, Frictionless Data specifically hope to provide useful tools for researchers in digital humanities.

There is so much we in the open data community can learn from the history and development of museums and libraries. Some of the best storytelling I’ve read on the key role of data management in society is by Abby Smith Rumsey in her recent book When We Are No More: How Digital Memory is Shaping Our Future. She charts the growth of the Library of Alexandria and the influence it had advancing scholarship, culture and civilisation. In fact, the word museum, I learned, comes from the Temple of the Muses, or Mouseion, which housed the great Library. Rumsey also writes on the technical responsibilities and innovation of the Alexandria librarians:

“Good work requires good order. Before scholars could begin their work, the librarians had to have finished theirs. Their task was twofold: to create and maintain an intellectual organization so it would be possible to find things, and to provide physical stewardship, to keep the scrolls secure and usable over time…

Greater efficiencies in the production of scrolls over time meant the proliferation of physical objects that were valuable, fragile and once rolled up, identical in appearance…But a better solution was a technically advanced format — the codex. This is the book form we know today…”

As we look to tell stories about data, we should continue to engage with and be inspired by the work of museums and libraries, especially in how they have articulated the meaning of information processing over the years. Information has been managed for thousands of years – yielding insight from data has been and continues to be a very tangible, human issue.

Our mission at Open Knowledge International is to empower civil society organisations to use open data to improve people’s lives. Our approach to technology is not an end in itself, but rather a gateway to make a difference in the world and collaborate. Drawing inspiration from the librarians of Alexandria, we know that good work requires good order. From Data Packages to Good Tables, Frictionless Data are building tools to help build order in an otherwise chaotic information landscape.

Make sure to follow the work of the Frictionless Data team and our other projects at Open Knowledge International for more insights on how to make sense of data.

For updates from the Always Already Computational conference, follow the hashtag #aacdata on Twitter.

LITA: Who are LITA members? LITA Personas

Thu, 2017-03-02 14:00
LITA Persona Task Force

The Persona Task Force was appointed by LITA (Library and Information Technology Association) Board in January 2016 to better capture the varied interests, needs, and expectations of the current, future, and past LITA members. These personas were developed as a tool that will help LITA create and refine tangible benefits and actionable strategies that directly address those interests, needs, and expectations.


Based on data analysis from existing data and user research (see below in the Data Collection section), the Task Force came up with 9 personas.

1. New Member: Jason Creek

New Member Persona


2. LITA Involvement Member: Rachel Stein

LITA Involvement Persona


3. Admin-Level Member: Jennifer Torre

Admin-Level Persona


4. Techie Librarian: Amy Chuck

Techie Librarian Persona


5. Librarian: Michael Zhang

Librarian Persona


6. Non member – Previous Member: Andrew Young

Previous Member Persona


7. Non Member – LITA Events Participant: Steve Gonzalez

Non Member Event Participant Persona


8. Non Member – Online Follower: Rebecca Miller

Non Member Online Follower Persona


9. Non Member – No Connection: Carol English

Non Member No Connection with LITA Persona


Next Step: How the Personas Will Be Used

The LITA membership personas were developed to improve the overall user experience of LITA members. In February 2017, the LITA Board asked LITA committees and interest groups to select and rank three LITA member personas that are most relevant to them and to submit a concrete action suggestion(s), which each committee or interest group would like to pursue. This will allow the LITA Board to collect a good number of concrete use case scenarios for these personas and review those use cases in relation to the LITA 2016-2018 strategic plan. The LITA Board plans to select, prioritize, and implement high-impact use cases through committees and interest groups and continue to rely on the LITA personas in order to improve the overall LITA experience for its members.

Timeline and Task Force Membership

The Task Force had the first virtual meeting in February and ran through Midwinter 2017. There were 11 members, including liaison to the following LITA committees:

  • Membership
  • Assessment and Research
  • Communications and Marketing
  • Board

Task Force members:

  • Hong Ma (Co-chair)
  • Yoo Young Lee (Co-chair)
  • TJ Lamanna
  • Frank Skornia
  • Callan Bignoli
  • Lynne Edgar
  • Amanda L. Goodman
  • Bohyun Kim (Liaison to the LITA Board)
  • Isabel Gonzalez-Smith
  • Nadaleen Tempelman-Kluit
  • Mark A. Beatty, Staff Support
Data Collection 1. Gathering Existing Data

The Task Force gathered existing data as much as possible. The data sets included sanitized LITA 2016 membership data – which contained only job title, city, state, organization type, work area, position, and membership type – sanitized LITA Forum attendee data from 2011 to 2015 – which contained only job title, city, state, organization type, and membership type – annual reports, tech-related job postings, surveys, evaluations, and work reports like 2011 LITA Emerging Leaders Project “Branding LITA: A Market Identity for the 21st Century” and the President’s Message: UX Thinking and the LITA Member Experience written by Rachel Vacek.

2. User Research

The Task Force conducted 11 in-person interviews with current LITA members at the 2016 ALA Annual Conference in Orlando, FL from June 23 to June 28 for preliminary research. With help from the LITA staff, the Task Force reached out to attendees for voluntary participation in the interview. Each interview lasted between 30 to 70 minutes.

The Task Force also developed an online survey and distributed it to the following listservs from September 1 to October 3 to aim to collect opinions from current and past LITA members as well as non-members:

  • LITA-L
  • LITA Blog
  • LITA Social Media (Twitter and Facebook)
  • Code4Lib-L
  • Alma-L
  • Primo-L
  • Voyager-L
  • NMRT-L
  • LibUX Slack Channel
  • Local Boston/New England Library Tech Groups
  • New England Chapter of ASIS&T
  • Greater Boston LibTech’rs Meetup Group

The total respondents were 477, but only 379 responses were valid to be analyzed further. Out of 379 respondents, 155 were current LITA members while 224 were either previous members or non-members. In addition, 73 volunteers were willing to participate in the follow up study.

The Task Force identified some missing areas like personal characteristics. The customized follow up short survey was sent out to 73 volunteers from November 28 to December 12 and 16 of them replied back.


ZBW German National Library of Economics: New version of multi-lingual JEL classification published in LOD

Thu, 2017-03-02 13:51

The Journal of Economic Literature Classification Scheme (JEL) was created and is maintained by the American Economic Association. The AEA provides this widely used resource freely for scholarly purposes. Thanks to André Davids (KU Leuven), who has translated the originally English-only labels of the classification to French, Spanish and German, we provide a multi-lingual version of JEL. It's lastest version (as of 2017-01) is published in the formats RDFa and RDF download files. These formats and translations are provided "as is" and are not authorized by AEA. In order to make changes in JEL tracable more easily, we have created lists of inserted and removed JEL classes in the context of the skos-history project.

JEL Classification for Linked Open Dataskos-history Linked data  

Open Knowledge Foundation: Sneak peek of Open Data Day: where will you be?

Thu, 2017-03-02 10:51

This Saturday 4 March marks the 7th International Open Data Day. Groups from around the world are organising events to celebrate, promote and spread the use of open data. It is a great opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society. This year the focus is on four key areas: Open research data, Tracking public money flows, Open data for environment and Open data for human rights.

Open Data Day is a bottom-up initiative coordinated through a collaborative mailing list. With over 300 events currently registered on the map (a record high), it is clear that the global movement around open data is growing. Events range from meetups, talks and seminars to more hands-on demonstrations, hackathons and training sessions: anyone interested in learning more about open data is invited to participate.  All outputs are open for everyone to use and re-use.

A total of 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19,  Hewlett Foundation and the UK Foreign & Commonwealth Office. The focus was on funding events in the four dedicated categories, as well as supporting places that will be hosting an Open Data Day event for the first time. An overview of these events is available from this blog.

It is close to impossible to summarise all the activity that will be going on this Saturday: in the list below we highlight some events happening on the different continents.


  • Egypt: the Big Data, Big Questions: From Data to Knowledge in the MENA Region conference will offer a space for multidisciplinary, critical engagement, with questions formulated around knowledge and big data in the Middle East and North Africa region.
  • Ghana: The Open Data Day event in Accra focuses on “Following the money: Open Data for tracking money flows on development projects in the Accra Metropolis”. Participants will dive in with data experts to gain insights into how the government spends public funds on development projects in communities in the Accra Metropolitan Area.
  • Ivory Coast: Ovillage is hosting a conference about open data and a Hackathon for the creation of an open data website by and for citizens.
  • Namibia: the Open & Big data Innovation Hackathon invites software developers, statisticians, tech experts and citizens to work on co-creating innovative mobile and web applications that use open and big data to enhance public service delivery, solve challenges facing communities and keep citizens informed.
  • Tunesia: the workshop “Open data for Smart cities” takes place during the  yearly Forum ENET’Com Feee’17


  • China: Open Data Shanghai is a data-focused hackathon where participants will be given datasets and 24 hours to “hack it”.  For people who are new to Data Science, there will also be educational talks by experienced data analysts. The hackathon will provide participants with a diverse range of datasets, containing a lot of data relevant to China.
  • India: Datameet Hyderabad is organizing a community get together to work on generating new open data and using any existing data to make some useful tools.
  • Indonesia: the Goethe-Institut, Jakarta Smart City and Web Foundation’s Open Data Lab Jakarta are organising Open is the New Smart: Making cities work for citizens, a day focused on how open data could be a key driver of the participation of citizens to jointly address urban problems more quickly and effectively.
  • Iran: Transparency4Iran will host the first session in Iran on “Open Data and its’ good applications for government”.
  • Malaysia: During an Open Data Exhibition in Subang Jaya, participants are going to search for data, and use the resulting data.
  • Thailand: At the Open Data and Big Data Forum,  hosted by the Electronic Government Agency and Social Technology Institute of Thailand, between 150-200 participants are expected.


  • Australia: ODI Queensland is inviting people to an afternoon of updates & feedback on their training program and priority Open Data projects. You can also play their International Open Data Day Datopolis Game.
  • French Polynesia: the Open Data Day event in Tahiti is organised by the Institut de la Statistique de la Polynésie Française and focuses on government data specifically.
  • New Zealand: Open Data NZ hosts a mini-hack event at the National Library in Wellington, with a focus on environmental data.


  • Estonia: Open Data Day Tartu centres around open data in science and how to use data to establish facts in the age of post-truth problematics. There will be workshops on many different tracks, from linked data methods to using data in humanities.
  • Germany: Many events are taking place across Germany: a good overview is available from this Open Knowledge Germany blog. During the Heilbronn Open Data Day for example, you can work with various datasets and interfaces, such as weather data, and the organisers will provide hardware for the assembly of fine dust sensors.
  • Italy: At Open Data Day Palermo, there is a full program of talks on open data applications and portals, followed by a workshop on web scraping in the afternoon.
  • Spain: In Granada, the Oficina de Software Libre will organise different workshops, activities, and an open data hackathon using the data of the city and province of Granada and the open data portal of the University of Granada. Participants can work with data linking university, gender and geographical origin, air pollution data, electoral data and contracts and cultural heritage data.
  • Sweden: At Brainhack Global in Stockholm, researchers from across the globe and a myriad of disciplines come together to work on innovative projects related to neuroscience and open collaboration.

North America

  • Canada: in Halifax, Nova Scotia you can participate in an Open Data Contest focused on promoting tourism, assisting new immigrants and effective management of Nova Scotia’s natural resources and protection of the environment.
  • Mexico: The Failure Institute is organising the first Datatón del Fracaso (Datathon of Failure), where participants will work in teams to analyse and visualise various datasets of the institute, such as that of business closures, to better understand entrepreneurial failure.
  • Mexico: Open Data Day Xalapa + Mapatón Ciudadano invites students, entrepreneurs, professionals, graphic designers, technology-based companies and anyone with an interest in open data to a full program of a Codeando Mexico workshop, talks about Social Entrepreneurship and Mobility and the presentation of the results of the first Citizen Mapatón in Xalapa.
  • USA: the California Data Collaborative hosts the Open water rate spectacular event to find out more about local water prices, and whether these vary by ethnicity, income, or location. Participants will help document water prices from agencies across the state to better understand water use in California and enable regional water agencies to prepare for an uncertain future.
  • USA: Code for America will hold its sixth annual “CodeAcross”: a weekend of civic hacking events hosted by over 100 cities in the Code for America network where Brigades (local communities) are located. Code for America is a national network of community organizers & technologists who are putting technology to work for the benefit of their local communities. They recently put out their Brigade Rally Cry reflecting their mission. Events are happening all over the USA, from Code for Hawaii to Code for Burlington, Vermont, but everyone interested can also hack from home or join or start another brigade (more information on this page).

South America

  • Argentina: the city of Rosario hosts their first Open Data Day this year around open government data. Different organisations have been invited to work for the advancement of data and open governments in the 21st century, working on themes such as social inclusion, open justice, sustainable mobility, responsible consumption and access to public information.
  • Brazil: The Hackers Club of Teresina in conjunction with APISOL and the Open Knowledge Brazil will hold a hackathon about transparency.
  • Colombia: GANA, the open government program of the Department of Nariño, hosts the NariñoVis exhibition of data visualizations that have been made using the data from the Open Data Portal of the Government of Nariño.
  • Ecuador: Datalat and MediaLabUIO are organizing Open Data Day Quito, an event that includes workshops, talks and hands-on activities on opening and using open data, as well as a workshop to use local data related to the environment.
  • Uruguay: the DATA Cafes community in Montevideo is organizing a meeting to renew and improve the project, which allows you to visualise the streets of Montevideo that bear the name of a woman and to know more about those women and their stories.
  • A summary on more events happening in South America is available from this blog.

Montevideo has more than 5000 streets: only 100 are named after a woman

More information on Open Data Day itself is available from You can also follow news on the day itself through Twitter: #opendataday and #ODD17. If you are unable join an event, but would still like to participate virtually, check Why you should take 10 minutes to look at the Open Data Roadmap this Open Data Day.

District Dispatch: ALA in NYC (Part 1): Working together, sharing solutions

Thu, 2017-03-02 01:31

Last week, OITP Director Alan Inouye and I escaped the DC swamp momentarily for a trip to NYC. We accompanied the Digital Content Working Group (DCWG) co-chairs Carolyn Anthony and Erika Linke, ALA President Julie Todaro and President-elect Jim Neal as part of ALA’s DCWG leadership contingent that meets periodically with publishers in New York City. ALA started these meetings several years ago, initially to convince trade publishers to sell e-books to public libraries. Since that time, public libraries can buy e-books, but there are still ongoing conversations with publishers about business models, contract terms and prices as well common interests and potential collaborations.

Left to right: Julie Todaro, Carrie Russell, Erika Linke, Jim Neal, Carolyn Anthony Photo credit: Alan Inouye

During this trip, we met with representatives from Penguin Random House and Hachette. We discussed e-book prices and audio books as well as the health of the publishing market. In general, library e-book sales have remained stable, print sales have gone up and audiobooks have skyrocketed. We also discussed hybrid business models and the reasons why publishers were not ready to offer more than one model. New aggregator platforms are coming to the library market, and some platforms do not have the capability (yet) to serve up more than one model. Of note, Hachette has recently acquired Perseus Book Group, a large mid-size trade publisher, and is working through that transition.

The ALA delegation also met with Brooklyn Public Library (BPL) leadership at their Central Library. Hosted by Director and Chief Librarian Lisa Rosenblum, our meeting also included BPL President & CEO Linda Johnson as well as two members of their Board of Trustees and senior staffers from around the Library. Our tour of the facilities and the subject-specific centers was the highlight of the trip for me. (My next blog post will describe a lot of exciting things that Brooklyn Public is doing to engage and help the community!)

During our meeting with the senior staffers, we learned more about the Brooklyn community and how the library and its 60 branches are serving them. Linda Johnson said “Our mission is literacy, in the broadest sense of the word.” One focal point, however, was the future and the improvement of the infrastructure at both the Central Library and the branches. The hallmark of renovation plans must be flexibility, given how libraries are evolving, and the path of change is far from certain. ALA provided updates on several initiatives, including the new ALA Book Club Central and our work on national public policy and lobbying. We found the meeting to be quite useful in getting a better appreciation for the priorities, needs and perspectives of a large, urban library system—essential as Washington enters into discussions about a possible infrastructure bill (given libraries’ vital role in our nation’s digital infrastructure by providing internet access) and repealing/reforming the Affordable Care Act. With major changes in the program, people will come to the library for help.

Photo credit: KelliParker369

In addition, we met with the Metropolitan New York Library Council (METRO) staff, who have moved into their new building under renovation. (Grab a hard hat and keep your coat on – no heat after 4 pm.) METRO is a membership based non-profit that serves all types of libraries in the five boroughs of NYC in addition to Westchester County. The issue of space came up again as Director Nate Hill reiterated the need for flexible space. Particularly in New York City, space is at a premium, and you have to make the most out of what you have. METRO will offer new production services and programs to its members.

Another important meeting was with staff and members of the board of the Book Industry Study Group (BISG) and its new director, Brian O’Leary, a well-known publishing consultant. BISG’s mission is “to create a more informed, empowered and efficient book industry.” They are the problem solvers: through standards development or other collaborations with the book industry and libraries, they identify glitches in the supply chain from the publishers, intermediaries and libraries and then collaborate with these communities to fix problems. Fixing problems depends on “shared solutions,” so collaboration among the players is vital. With an ambitious schedule before them, ALA can expect to see more programs, research and helpful products such as BISG’s recently published BISG Quick Start Guide to Accessible Publishing. There is also a greater opportunity for ALA to partner with BISG.

So many projects and new developments, and so little time and resources! The New York City trip was a bit of a whirlwind, especially when I was trying to meet the airport shuttle at my hotel in an insanely short period of time. But I made it—back to the swamp.

The post ALA in NYC (Part 1): Working together, sharing solutions appeared first on District Dispatch.

DuraSpace News: On the Road with DSpace 7 Collaborators

Thu, 2017-03-02 00:00

From the DSpace 7 UI outreach group

Austin, TX  DSpace community members will be attending conferences and meetings this spring where DSpace 7 UI development will be discussed. If you will be attending as well please reach out to them, "ask about DSpace 7," and find out how to get involved with community efforts around the development of DSpace 7. 

LITA: Jobs in Information Technology: March 1, 2017

Wed, 2017-03-01 18:54

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Vassar College, Digital Technologies Librarian, Poughkeepsie, NY

EBSCO Information Services, Technical Product Manager EBSCO Books, Ipswich, MA

Saint Meinrad Archabbey Library, Associate Director and Director of Digital Resources, Saint Meinrad, IN

MARCIVE, Inc., Director of Metadata Analysis and Management, San Antonio, TX

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

HangingTogether: Five Years of Revealing How MARC is Used

Wed, 2017-03-01 17:05

This October it will be 15 years since I wrote the oft-cited “MARC Must Die” column for Library Journal. At the time I took a lot of heat for that sentiment (and speaking engagements to defend it) but over time the profession has pretty much come around to realizing that it is time to do something different.

When I joined OCLC in 2007, and discovered the kind of computing power available to me as well as the single largest aggregation of library metadata in the world, I was like a kid in a candy store. What, pray tell, is actually recorded in MARC, I wondered? And now I could find out.

So in 2013 I began the “MARC Usage in WorldCat” project to reveal exactly that. I would, for particular subfields, report exactly what is contained therein, with no filtering and no normalization. This is because I felt that if we were going to move to some new way of recording and sharing our data we needed to know what we have to work with.

And often what we have to work with is not…well, standard. The same concept is often registered using many different strings which means somewhere down the line translation and normalization must happen. And of course typographical errors complicate that as well as any other automated procedure we may seek to undertake. But that’s the state of play, and only by knowing what we have to work with, in reality, will we be properly armed to make the right decisions.

It has been interesting to see changes over time as well. Some fields are new, and their use has climbed over the years. Others have been deprecated and have declined in use. Once I even detected a subfield that had gone into decline after being made obsolete and then oddly shot back up again. My colleagues traced the problem to one particular record contributor and asked them to desist in using it.

As we move to a linked data world we will need to do a lot of computational maneuvering to create links where none previously existed. This work is complicated by a diverse, non-standard, and error-riddled past. That’s why we need to know exactly what we are dealing with, and this project aims to make it possible to move into the future from a position of knowledge and understanding. As always, if you would like to see a report on a particular subfield just let me know.

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

Mail | Web | Twitter | Facebook | LinkedIn | Flickr | YouTube | More Posts (98)

Jonathan Rochkind: Return to libraryland

Wed, 2017-03-01 16:49

I’m excited to announce this week is my first week working for the Othmer Library division at the Chemical Heritage Foundation. CHF’s name isn’t necessarily optimal at explaining what the organization does: It’s actually an independent history of science institute (not in fact focusing exclusively on chemistry), with a museum, significant archival collection, and ongoing research activities. As they say on their home page, “The Chemical Heritage Foundation is dedicated to explaining a simple truth: science has a past and our future depends on it.” That’s a nice way to put it.

I’ll be working, at least initially, mostly on the Hydra/Sufia stack. CHF has been a significant contributor to this open source staff already (with existing developer Anna Headley, who’s still here at CHF, fortunately for us all!), and I am happy to be at a place that prioritizes open source contributions.  CHF has some really interesting collections (medieval alchemical manuscripts? Oral histories from scientists? that and more), which aren’t available on the web yet — but we’re working on it.

CHF is located in Philadelphia, but I’ll still be in Baltimore, mostly working remotely, with occasional but regular visits. (Conveniently, Philadelphia is only about 100 miles from Baltimore).

And I’m very happy to be back in the library community. It’s a little bit less confusing now if I tell people I’m “a librarian”. Just a little.  I definitely missed being in the library world, and the camaraderie and collaboration of the library open source tech community in my year+ I was mostly absent from it — it really is something special.

I have nothing but good things to say about Friends of the Web, where I spent the past 15 months or so. I’ll miss working with my colleagues there and many aspects of the work environment. They’re really a top-notch design and Rails/React/iOS dev firm, and if you’re looking for high-quality design or app implementation work, if you need something done in either web design or app development (or both!) that you don’t have in-house resources to do, I don’t hesitate to recommend them.

Filed under: General

Evergreen ILS: Conference 2017 – Filling Up Fast

Wed, 2017-03-01 16:40

If you haven’t registered for the Evergreen International Conference 2017 and reserved a hotel room, do so soon. The rooms are quickly getting taken at the lovely Marriott Cincinatti Rivercenter and will run out soon!


DPLA: Application for 2017-2018 Education Advisory Committee Now Open

Wed, 2017-03-01 15:30

The Digital Public Library of America is looking for excellent instructors in higher education to join its Education Advisory Committee for 2017-2018. We recently announced a new grant from the Teagle Foundation that funds the creation of a professional development curriculum and new set development for our Primary Source Sets project.  

We are currently recruiting a small group of enthusiastic educators in higher education to collaborate with our current team of K-12 and higher ed instructors on this project. Specifically, we are interested in applicants working in humanities fields with US culture including English, Composition, History, Area Studies, Music, Art History, and education fields including Secondary Education, School Librarianship, and Curriculum and Instruction. Applicants can be employed in various roles in higher education including administration or as professors, lecturers, instructors, or adjuncts. Members of this group will:

  • develop and deliver curriculum for professional development using DPLA and its resources for inquiry-based instruction
  • collaborate with K-12 and higher ed members of the Education Advisory Committee to develop best practices for implementing the Primary Source Sets
  • write new primary source sets following DPLA’s format and participate in peer review and revision (3 in 15 months)
  • provide input on online professional development modules

If selected, participants are committing to:

  • attend a 2-day in-person meeting on June 29-30 2017 (arriving the night of June 28 if traveling from far away) in Boston, Massachusetts
  • attend six virtual meetings approximately quarterly (September 2017-November 2018)
  • participate in no more than one professional development or conference presentation as scheduled

Participants will receive a $1,200 stipend (upon completion of the project in Dec 2018) for participation as well as full reimbursement for travel costs. The deadline for applications for the 2017-2018 Education Advisory Committee is March 31, 2017.

Application Form

To learn more about DPLA’s work with education, please read our Education page. For questions, please contact

About the Digital Public Library of America

The Digital Public Library of America is a free, online library that provides access to millions of primary and secondary sources (books, photographs, audiovisual materials, maps, etc.) from libraries, archives, and museums across the United States. Since launching in April 2013, it has aggregated over 15 million items from over 2,100 institutions. The DPLA is a registered 501(c)(3) non-profit.