You are here

Feed aggregator

D-Lib: Workshop Report: CAQDAS Projects and Digital Repositories' Best Practices

planet code4lib - Wed, 2017-03-15 12:13
Workshop Report by Sebastian Karcher and Christiane Page, Syracuse University

D-Lib: ReplicationWiki: Improving Transparency in Social Sciences Research

planet code4lib - Wed, 2017-03-15 12:13
Article by Jan H. Hoeffler, University of Gottingen, Germany

D-Lib: Research Data Challenges

planet code4lib - Wed, 2017-03-15 12:13
Editorial by Laurence Lannom, CNRI

Open Knowledge Foundation: OK Belgium welcomes Dries van Ransbeeck as new project coordinator… and other updates from quarter 4

planet code4lib - Wed, 2017-03-15 11:00

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Belgium team.

A lot of things has happened over the past few months at Open Knowledge Belgium. First, we welcomed Dries Van Ransbeeck as the new project coordinator. His previous experience ranges from data modelling, civic engagement to crowdsourcing. He also has a keen interest in open innovation and the power of many intrinsically motivated individuals contributing to projects with social and societal impact, serving the interests of the many rather than the happy few.

Dries’ mission is to bring Open Knowledge and Open Data to a level playing field, where people with all sort of backgrounds, technical and non-technical can use, reuse and create knowledge in a sustainable way. Read more on Dries here. We also moved to a new office space in Brussels and welcomed three new interns: Chris, Umut and myself [Sarah] who will be helping the team for the next few months. Below are the latest updates of our activities:

Open Belgium Conference

Our annual conference was held on March 6 in Brussels, with the theme: ‘Open Cities and Smart Data’. There were talks and discussions about how Open Data can contribute to smart urban development, the rise of smart applications and the shift from raising the quantity of data to raising the quality of data. About 300 industries, researchers, government and citizen stakeholders were expected for the conference to discuss various open efforts in Belgium. More will be shared later.

Open Education

An interesting Kickoff meeting about Open Education was held in February to discuss the possibilities of opening up educational data in Belgium. This bottom-up action is needed in order to make things possible and keep the discussion alive. Business owners, data providers, data users and problem owners sat together and discussed the possibilities concerning open educational resources (OER) and open educational practices.

While students and staff want information which is up-to-date and easy to find, most colleges are unwilling to open up their data which is very problematic because opening up educational data to build applications would make things a lot easier for students as well as the colleges themselves. Another issue that was identified is the fact of every educational institution working with a different database.

It was, therefore, interesting to discuss an open data standard for every college or university. We can solve these problems by giving educational institutions more concrete information about what data they have to open up and what the consequences are. Therefore, this working group could contribute to the discussions on the possibilities of Open Education and create extra pressure on colleges and universities to open up their data and provide them with more information.

OpenStreetMap Belgium

The biggest achievement for OSM Belgium in 2016 was co-organizing SOTM (State of the Map) in Brussels, which is the yearly international conference on OpenStreetMap. Our community for OpenStreetMap is growing and thanks to the help of many enthusiastic volunteers, SOTM was a great success. 

For 2017, OSM plans to formalise their organisation by setting up a membership and some very basic governance rules. By doing this, they want to provide some structure and support for future projects. OSM will continue to stay an open community as it always has been. The main goal of OSM is to communicate in a better way about their ongoing projects so as to attract sponsorships for the new year. They’re also collaborating more closely with other organisations which share the same goals.

Open Badges Belgium

We have also recently started a new working group who wants to help spread the use of Open Badges in Belgium. An Open Badge is a digital platform where you can showcase the talents you have acquired and share them with the labour market. They are visual tokens of achievement or other trust relationship given by training centres, companies and selection agencies and are also shareable across the web.

Open Badges are more detailed than a CV as they can be used in all kinds of combinations, creating a constantly evolving picture of a person’s lifelong learning. To learn more about how Open Badges work, watch our introductory video here.


Oasis is the acronym for ‘Open Applications for Semantically Interoperable Services’. This is a cooperation between the city of Ghent and the region of Madrid to increase the accessibility of public services and public transport. Both cities work together and publish linked open data to prove that new technologies can lead to economies of scale, such as the creation of cross-country applications.

To read up on Open Knowledge team and our activities please visit or follow us on Twitter, Facebook and Linkedin.


Ed Summers: IS

planet code4lib - Wed, 2017-03-15 04:00

This week’s readings were focused on Interaction Sociolinguistics (IS) which is a field of discourse analysis that is situated at the intersection of anthropology, sociology and linguistics. At a high level Gordon (2011) defines IS this way:

IS offers theories and methods that enable researchers to explore not only how language works but also to gain insights into the social processes through which individuals build and maintain relationships, exercise power, project and negotiate identities, and create communities.

IS typically uses recordings obtained through ethnography to identify signaling mechanisms or contextual cues. Breakdowns in communication happen when participants don’t share contextualization conventions, which can contribute to social problems such as stereotyping and differential access. Looking at the role of discourse in creating and reinforcing social problems was a specific theme in Gumperz work. It seems much in keeping with the goals of CDA, to interrogate power relationships, but perhaps without such an explicit theoretical framework, like what Marx or Foucault provide.

If this sounds similar to previous weeks’ focus on Ethnography of Communication and Critical Discourse Analysis that’s because, well, they are pretty similar. But there are some notable differences. The first notable one is something Gordon (2011), highlights: things start with John Gumperz. Gumperz was trained as a linguist, but his research and work brought him into close contact with some of the key figures in sociology and anthropology at the time.

Gumperz work grew out of developing “replicable methods of qualitative analysis that account for our ability to interpret what participants intend to convey in everyday communicative practices”. He drew on and synthesized several areas of theoretical and methodological work:

  • structural linguistics: ideas of communicative competence, when (and when not) to speak, what to talk about, who to talk to, what manner to talk in, subconscious/automatic speech, and regional linguistic diversity.
  • ethnography of communication: the use of participant observation and interviewing and data collection as “thick description” (from Geertz)
  • ethnomethodology: from nature of social interaction, and the background knowledge needed to participate. Garfinkeling experiments where the researcher breaks social norms in order to discover unknown social norms.
  • conversation analysis: interaction order, frames, face saving, how conversation represents and creates social organization, and a focus on “micro features of interaction” while also allowing for cultural context. (from Goffman

Gumperz developed the idea of contextualization cues and how indexical signs offer a way of discovering how discourse is framed. Bateson calls these metamessages, or messages about how to interpret messages.

He also established the concept of conversational inference, which is how people assess what other people say in order to create meaning. It is an idea that bears a lot of resemblance to Grice’s idea of from CA of the cooperative principle, and how implicatures are sent by following or breaking maxims.

The idea of indirectness, linguistic politeness and face saving from Robin Lakoff also factor into IS. The choices speakers make (rate, pitch, amplitude) that affect an utterance’s interpretation. Tannen’s IS work also demonstrated how cultural background, race, ethnicity, gender and sexual orientation were factors in conversational style. IS admits generative grammar (Chomsky) as a theory, but does not limit study of language to just grammar, and allows for issues of context and culture to have a role. I basically think of IS saying “yes, and” to CA: it recognizes all the patterns and structures that CA identifies, but doesn’t limit work to just the text, and provides a framework where context is relevant and important to understanding.

The other article we read by Gordon took a deep dive into an empirical study that uses IS methods (Gordon & Luke, 2012). The authors examine email correspondence between school counselors in training with their supervising professor. They make the point that not much work has been done on supervision in the medium of email. Specifically they examine how identity development can be seen as ongoing face negotiation. Their research draws on work around face negotiation and politeness theory from Goffman (1967), as well as Arundale (2006)’s idea of face as a con-constructed understanding of self in relation to others.

Specifically their work is centered on the notion that face, politeness and identity are interconnected:

  • face is the social value a person can claim by the way people receive that person’s verbal and nonverbal acts.
  • they are connected through Lave and Wenger (1991)’s idea of community of practice.
  • “The notion of face is crucial to understanding how novices develop professional identities within a community of practice.” (p. 114)
  • politeness strategies are employed in online communications (email)

They collected the email of 8 (6F, 2M) participants, who sent at least one email per week over 14 weeks to their supervisor. These participants were learning how to be school counselors. The authors used data-driven discourse analysis with IS and computer-mediated discourse analysis (Herring, 2004). Among their findings they discovered taht:

  • constructed dialogue or reported speech and metadiscourse are used to raise face which they argue is part of identity construction
  • first person plural pronouns (we, us and our) are used to create shared alignment as well as give advice while saving face
  • use of discourse markers Schiffrin (1988) to structure ideas, meanings and interaction. For example “that being said” which is used by by the supervisors to offer criticism while also saving face
  • repetition is used to construct conversational worlds and establish communities of practice. It is possible that its used more in email because previous word usage can easily be recalled and copied, as compared with spoken words.

Tannen (2007) seems to be cited a fair bit in this paper as sources for these patterns. Perhaps it could be a good source of types of patterns to look for in my own data? I particular like the angle on community of practice which is something I’m looking to explore in my own research into web archiving.

Ironically it is another book by Tannen (2005) that is included as the next set of readings–specifically two chapters that analyze a set of conversations that happen over a Thanksgiving dinner. The rich description of this conversation (in which the author is a participant) offers many methodological jumping off points for IS work. Tannen does a pretty masterful job of weaving the description of the conversation, often with detailed transcription, with the reflections from her and the other participants, and observations from discourse analysis. It is clear that she recorded the conversations and then reviewed them with participants afterwards. Here’s a list of some of the methodological tools she used when analyzing the converstation, there were a lot!

  • conversations as a continuous stream that are punctuated into separate events that respond to each other (Bateson, 1972)

  • machine gun question: rapid fire questions which are meant them to show enthusiasm and interest, but can lead to defensiveness.

  • latching: when responses follow directly on from each other (Scheinkein, 1978)

  • dueting: jointly holding one side of the conversation (Falk, 1979)

  • buffering talk: for example “and all that” which can be used to save face when positioning.

  • back channel responses which serve to meta-conversational purposes when the mode of communication is mostly one way [Yngve:1970]

  • deviations from answer/question as adjacency pair (Sacks, Schegloff, & Jefferson, 1974)

  • sharing in conversational form and pacing is participation, even when specific knowledge is not being exhibited

  • formulaic or repeated phrases

  • shared revelations: to create personal connections

  • reduced syntactic form: “You survive?” instead of “Do you survive?”

  • intonational contours (interesting mode of illustration)

  • metaphorical code switching (Blom & Gumperz, 1972)


Generally speaking I enjoyed the readings this week, in particular the piece by Tannen which does a really nice job of exhibiting the various discourse features and explaining their significance in understanding the Thanksgiving dinner conversation. The ultimate realization of cultural differences that explain why some of the conversations played out the way they did, and why they were remembered by participants in particular ways seemed to be what made this an IS study. The fact that this contextual information (nationality, ethnicity) have a place when understanding the language seems like an important distinction for IS work. It also speaks to making the analysis relevant–one isn’t merely identifying patterns in the discourse but also casting those patterns in a light where greater insight is achieved. This seems like an important dimension for research to have. Even the Gordon & Luke (2012) piece seemed to draw some pretty sound inferences and conclusions about the research. This speaks to the pragmatist in me.

I also liked the mode of data collection and analysis since it seemed to strike a good balance between the detail and rigor of Conversational Analysis and the openness to context and thick description offered by Ethnography of Communication. I will admit to feeling a bit overwhelmed with the number of discourse features that were covered, and worry that I wouldn’t be able to draw on them as successfully as the authors. But I guess this must come with practice.

With regards to my own research the discussion of IS got me wondering if it might be fruitful to examine my interviews for segments where participants talked about how they understood web archiving processes like crawling or external systems like CMS. Specifically I’d like to see where their own experience and knowledge was shaped or formed by a community if practice. Thinking of GWU’s understanding of the data center - or Stanford’s idea about how a particular CMS worked, or NCSU’s understanding of DNS. I’m still really interested in this idea of a community of practice and it seems like using discourse as a window into this realm might be something other folks have done before. What are the methods that could best yield insights into this in my data?


Arundale, R. B. (2006). Face as relational and interactional: A communication framework for research on face, facework, and politeness. Journal of Politeness Research. Language, Behaviour, Culture, 2(2), 193–216.

Bateson, G. (1972). Steps to an ecology of mind: Collected essays in anthropology, psychiatry, evolution, and epistemology. University of Chicago Press.

Blom, J.-P., & Gumperz, J. J. (1972). Directions in sociolinguistics. In. Holt, Rinehart,; Winston.

Falk, J. L. (1979). The duet as a conversational process (PhD thesis). Princeton University.

Goffman, E. (1967). Interaction ritual: Essays in face to face behavior. Doubleday.

Gordon, C. (2011). The sage handbook of sociolinguistics. In R. Wodak, B. Johnstone, & P. E. Kerswill (Eds.),. Sage Publications.

Gordon, C., & Luke, M. (2012). Discursive negotiation of face via email: Professional identity development in school counseling supervision. Linguistics and Education, 23(1), 112–122.

Herring, S. C. (2004). Online communication: Through the lens of discourse. Internet Research Annual, 1, 65–76.

Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 696–735.

Scheinkein, J. (1978). Studies in the organization of conversational interaction. Academic Press.

Schiffrin, D. (1988). Discourse markers. Cambridge University Press.

Tannen, D. (2005). Conversational style: Analyzing talk among friends (2nd ed.). Oxford University Press.

Tannen, D. (2007). Talking voices: Repetition, dialogue, and imagery in conversational discourse (Vol. 26). Cambridge University Press.

District Dispatch: NLLD 2017 keynote announced

planet code4lib - Tue, 2017-03-14 18:10

We are happy to announce the keynote speaker for National Library Legislative Day 2017! Hina Shamsi, director of the ACLU National Security Project, will be joining us in Washington, D.C. on May 1, 2017. The National Security Project is dedicated to ensuring that U.S. national security policies and practices are consistent with the Constitution, civil liberties, and human rights. According to the ACLU website, Shamsi has:

litigated cases upholding the freedoms of speech and association, and challenging targeted killing, torture, unlawful detention, and post-9/11 discrimination against racial and religious minorities.

Her work includes a focus on the intersection of national security and counter-terrorism policies with international human rights and humanitarian law. She previously worked as a staff attorney in the ACLU National Security Project and was the acting director of Human Rights First’s Law & Security Program. She also served as senior advisor to the U.N. Special Rapporteur on Extrajudicial Executions. You can find her on Twitter @HinaShamsi and on the ACLU blog. In addition to a review of current legislation and issue briefs, provided by ALA Washington staff, we will also be joined by the team from The Campaign Workshop for an hour of advocacy training. Christian Zabriskie, Executive Director of Urban Librarians Unite, and Jennifer Manley, Managing Director of Prime Strategies NYC, will also be leading at 30 minutes breakout session called “The Political Dance.” Other afternoon sessions will be announced as the schedule is finalized.

If you are interested in library advocacy and are unfamiliar with National Library Legislative Day, you can find out more by visiting our website, or reading previous articles about the event. Online registration will remain open till April and registrations are accepted at the door.

Photo Credit: Adam Mason

For folks looking for funding, check out the WHCLIST award, which is still accepting submissions until April 2nd. You can also check out, which offers librarians the chance to fund-raise for Professional Development events, collection development projects, and other library needs.

Unable to join us in D.C. in May? Register to participate virtually – in May, we’ll send you a list of talking points, background information, and other resources, so that you can call, email, or Tweet at your Members of Congress about legislative issues that are important to you and your patrons. We’ll also send you a link to our webcast, so you can watch our keynote speaker and the issue briefs live!

The post NLLD 2017 keynote announced appeared first on District Dispatch.

Jonathan Rochkind: “Polish”; And, What makes well-designed software?

planet code4lib - Tue, 2017-03-14 16:34

Go check out Schneem’s post on “polish”. (No, not the country).

Polish is what distinguishes good software from great software. When you use an app or code that clearly cares about the edge cases and how all the pieces work together, it feels right. Unfortunately, this is the part of the software that most often gets overlooked, in favor of more features or more time on another project…

…When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent.…

…In many ways I want my software to be boring. I want it to harbor few surprises. I want to feel like I understand and connect with it at a deep level and that I’m not constantly being caught off guard by frustrating, time stealing, papercuts.

I definitely have experienced the difference between working with and on a project that has this kind of ‘polish’ and, truly, experiencing a deep-level connection of the code that lets me crazy effective with it — and working on or with projects that don’t have this.  And on projects that started out with it, but lost it! (An irony is that it takes a lot of time, effort, skill, and experience to design an architecture that seems like the only way it would make sense to do it, obvious, and as schneems says, “boring”!)

I was going to say “We all have experienced the difference…”, but I don’t know if that’s true. Have you?

What do you think one can do to work towards a project with this kind of “polish”, and keep it there?  I’m not entirely sure, although I have some ideas, and so does schneems. Tolerating edge-case bugs is a contraindication — and even though I don’t really believe in ‘broken windows theory’ when it comes to neighborhoods, I think it does have an application here. Once the maintainers start tolerating edge case bugs and sloppiness, it sends a message to other contributors, a message of lack of care and pride. You don’t put in the time to make a change right unless the project seems to expect, deserve, and accommodate it.

If you don’t even have well-defined enough behavior/architecture to have any idea what behavior is desirable or undesirable, what’s a bug– you’ve clearly gone down a wrong path incompatible with this kind of ‘polish’, and I’m not sure if it can be recovered from. A Fred Brooks “Mythical Man Month” quote I think is crucial to this idea of ‘polish’: “Conceptual integrity is central to product quality.”  (He goes on to say that having an “architect” is the best way to get conceptual integrity; I’m not certain, I’d like to believe this isn’t true because so many formal ‘architect’ roles are so ineffective, but I think my experience may indeed be that a single or tight team of architects, formal or informal, does correlate…).

There’s another Fred Brooks quote that now I can’t find and I really wish I could cause I’ve wanted to return to it and meditate on it for a while, but it’s about how the power of a system is measured by what you can do with it divided by the number of distinct architectural concepts. A powerful system is one that can do a lot with few architectural concepts.  (If anyone can find this original quote, I’ll buy you a beer or a case of it).

I also know you can’t do this until you understand the ‘business domain’ you are working in — programmers as interchangeable cross-industry widgets is a lie. (‘business domain’ doesn’t necessarily mean ‘business’ in the sense of making money, it means understanding the use-cases and needs of your developer users, as they try to meet the use cases and needs of their end-users, which you need to understand too).

While I firmly believe in general in the caution against throwing out a system and starting over, a lot of this caution is about losing the domain knowledge encoded in the system (really, go read Joel’s post). But if the system was originally architected by people (perhaps past you!) who (in retrospect) didn’t have very good domain knowledge (or the domain has changed drastically?), and you now have a team (and an “architect”?) that does, and your existing software is consensually recognized as having the opposite of the quality of ‘polish’, and is becoming increasingly expensive to work with (“technical debt”) with no clear way out — that sounds like a time to seriously consider it. (Although you will have to be willing to accept it’ll take a while to get feature parity, if those were even the right features).  (Interestingly, Fred Books was I think the originator of the ‘build one to throw away’ idea that Joel is arguing against. I think both have their place, and the place of domain knowledge is a crucial concept in both).

All of these are more vague hand wavy ideas than easy to follow directions, I don’t have any easy to follow directions, or know if any exist.

But I know that the first step is being able to recognize “polish”, a well-designed parsimoniously architected system that feels right to work with and lets you effortlessly accomplish things with it.  Which means having experience with such systems. If you’ve only worked with ball-of-twine difficult to work with systems, you don’t even know what you’re missing or what is possible or what it looks like. You’ve got to find a way to get exposure to good design to become a good designer, and this is something we don’t know how to do as well with computer architecture as with classic design (design school consists of exposure to design, right?)

And the next step is desiring and committing to building such a system.

Which also can mean pushing back on or educating managers and decision-makers.  The technical challenge is already high, but the social/organizational challenge can be even higher.

Because it is harder to build such a system than to not, designing and implementing good software is not easy, it takes care, skill, and experience.  Not every project deserves or can have the same level of ‘polish’. But if you’re building a project meant to meet complex needs, and to be used by a distributed (geographically and/or organizationally) community, and to hold up for years, this is what it takes. (Whether that’s a polished end-user UX, or developer-user UX, which means API, or both, depending on the nature of this distributed community).

Filed under: General

Islandora: Report from a release perspective: Islandora 7.x-1.9RC1 VM is available

planet code4lib - Tue, 2017-03-14 15:57

Good day dearest Islandora Folks,

Our Islandora 7.x-1.9 Release Candidate One Virtual Machine and updated islandora_vagrant github branch are available for immediate download and testing.   Summoning all Testers and Documenters, Maintainers, Committers and their friends: give this humble virtualised (or american virtualized) box of dreams a test.   Before: Give this a look again:,-Document,-or-Test-an-Islandora-Release   What to expect: just a working, clean, vanilla Islandora 7.x-1.9 VM machine. What is new?: not much really   How to use:    Passwords, which URL, other Questions, are answered here.   VirtualBox   Download the OVA file (3.4 Gbytes)   ( md5 hash of that ova file is: 6kpTJwCyWNXLG17ExmuUrw== )   and import as usual into your VirtualBox app.   Vagrant + VirtualBox as provider  Open your favourite Terminal and execute (won't go into too much details here)    git clone -b 7.x-1.9
 cd islandora_vagrant
 vagrant up

  (coffee, black no sugar, and look for some warnings... will open some jira tickets today =)
  (when finished)

 vagrant ssh   Enjoy 7.x-1.9 on fire ;) remember:  if its not on fire, it's not a RC1   Please don't hesitate to reach out, contact me or Melissa Anez if you have questions (email, IRC or Skype). This week is our Release call (Thursday 16, 3PM (ADT) 2PM NYC time, please join us if you signed for some release role or have ideas/questions/concerns about what is happening here.   We are here to help and to facilitate this one being another successful and trusty Islandora release   Thanks again for making Islandora happen. 

Library of Congress: The Signal: A Library of Congress Lab: More Use and More Users of Digital Collections

planet code4lib - Tue, 2017-03-14 14:15

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers of Rosa Parks and Sigmund Freud and archived web sites of U. S. election candidates — can be accessed anytime and anywhere by researchers, Congress and the general public.

The National Digital Initiatives division of the Library of Congress seeks to facilitate even more use of the Library’s digital collections. Emerging disciplines — like data science, data journalism and digital humanities that take advantage of new computing tools and infrastructure — provide a model for creating new levels of access to library collections. Visualizing historical events and relationships on maps, with network diagrams and analysis of thousands of texts for the occurrence of words and phrases are a few examples of what’s possible. NDI is actively exploring how to support these and other kinds of interactions with the Library’s vast digital holdings.

A visualization of links between web sites extracted from an October 2015 Library of Congress crawl of news site feeds. This diagram was created as part of the demonstration pilot for the Library of Congress Lab report.

Michelle Gallinger and Daniel Chudnov were asked by NDI to study how libraries and other research centers have developed services that use computational analysis, design and engagement to enable new kinds of discovery and outreach. Their report, Library of Congress Lab (PDF), was just released. For the report, they interviewed researchers and managers of digital scholarship labs and worked with Library staff on a pilot project that demonstrated how the collections could be used in data analysis. This work resulted in concrete recommendations to the Library on how to approach setting up a Lab at the Library of Congress. These recommendations could also be helpful to other organizations who may be thinking of establishing their own centers for digital scholarship and engagement.

Michelle, Dan, thanks for the report, and thank you for talking with me more about it. How do you think digital labs are addressing a need or a gap in how digital collections are served by libraries and archives?

Michelle Gallinger

Michelle: The value proposition for digital collections has always been their usefulness to researchers, scholars, scientists, artists, as well as others. However, use was limited in the past because substantial computational analysis was something that an individual needed a great deal of specialized knowledge to pursue. That’s changing now. Tools have become more ubiquitous and labs have been established to support users in their analysis of digital collections. Where labs are supporting the work of users to delve deeply into the digital collections, we’re seeing computational analysis being used as another tool in areas of scholarship that haven’t benefited from it in the past. We are seeing that the support labs provide helps address the pent-up demand in a wide variety of fields to use digital content in meaningful ways.  And as this computational work is published, it’s creating new demand for additional support.

Dan: We were particularly impressed by the breadth of answers to this question shared by the colleagues we interviewed who lead and support digital scholarship services in Europe, Canada and the U.S. They have each molded their skills and services to fit these new and unique combinations of service demands coming from their own communities.  In university settings, labs fill a growing role supporting teaching and learning with workshops and consultations for younger students, graduate students, and early-career researchers alike.  In labs connected with large collections, they are enabling advanced researchers to perform large-scale computational techniques and finding ways — based on the services they are providing to scholars — to rethink and revise institutional workflows to enable more innovative uses of collections.  Each of these success stories represents a need- or a services-gap filled and presents an opportunity to consider doing more at our respective institutions.

Why do you think this is a good time for the Library of Congress to consider establishing a Lab?

Michelle: It’s a great time to be engaged in addressing the needs of scholars to work with digital collections. As I mentioned before, there really is a demand from users for support in performing digital scholarship. The Library of Congress receives regular requests for this support and it’s my opinion the number of those requests will continue to grow. Concepts of “big data” and data analytics have permeated society. Everyone knows about it, everyone wants to be working with digital scholarship techniques and tools. A Lab is an opportunity for the Library of Congress to start addressing these requests for support with routine workflows, regular access permissions, consistent legal counsel and predictable guidelines. This support not only helps further the transformative influence of digital scholarship, it also makes the Library of Congress more efficient and able to respond and serve the needs of its 21st century scholars.

Dan Chudnov

Dan: As Michelle highlights, better tools and increased demand to work with much greater volumes of materials have changed the equation.  The pilot project we performed, working with Library of Congress Web Archive collections not directly available to the public, demonstrated this well.  We used a third-party cloud services platform to securely transfer and process several terabytes of data from the Library to the cloud.  Using tools included in the cloud services platform for cluster computing, we defined access controls for this data where it was stored, then automated file format transformations, extracted focused derivative data, and ran parallel algorithms on a cluster with two dozen virtual machines performing network analysis on a quarter of a billion web links.  Once the extracted data was ready, it took less than five minutes to run a half-dozen of these queries over the entire dataset, and after just a few minutes more to verify the results, we shut the cluster down, having spent no more than a few dollars to rent that computing power for under an hour.  Back in the early 2000s, I worked in a medical informatics research center and helped to support cluster computing there with expensive, custom-designed racks full of fickle servers that gobbled up power and taxed our building cooling systems beyond reason.  Today, any ambitious high school student or not-yet-funded junior researcher can perform that same scale of computation and more, much more easily, all for the price of a cup of coffee.  To do this, they need the kinds of support Michelle describes: tool training, a solid legal framework with reasonable guidelines and routine workflows for enabling access, all of which the Library of Congress is ideally suited to develop and deliver right now.

How could a Lab help to serve audiences beyond the typical scholarly or academic user?

Michelle: I loved [the new Librarian of Congress] Dr. Hayden’s quote in the recent New Yorker article when she asked herself: “How can I make this library that relevant, and that immediate?” I think a Lab supporting digital scholarship will help her achieve that vision of increasing the relevance and immediacy of the Library of Congress. The Lab offers a new way for users to access and get support in analyzing the Library’s digital collections. But it is also an opportunity for the Library to reach out to underrepresented groups and engage with those groups in new ways — coding, analytics, scholarly networks, and more. Unique perspectives help the Lab in its efforts to transform how the Library’s digital collections are used. The Lab becomes a controlled access point for users that might not be able to get to the Library in person.

One of the reasons Dan and I think that the Lab should have an open-ended name (rather than something more specific like “Digital Scholars Lab” or “Digital Research Lab”) is that we both feel strongly that the Lab should be as inclusive as possible. A specific name encourages a small group of people who identify with that name to come. Researchers look at a research lab. Scholars look to a scholarly lab. But a really transformative Lab environment gives anyone the tools to use digital collections for their work — whether that’s scholarship, research, data analytics, art, history, social science, creative expression, or anything else they can imagine. We think that there is significant value to making the Lab a space where anyone can imagine working — even if they aren’t a typical Library of Congress researcher. Everyone should be able to see themselves at the lab, engaging with the Library of Congress digital collections in a myriad of ways.

Dan: I agree on all counts.  That focus from Dr. Hayden resonates with something we heard from a scholar at the Collections as Data event last fall, that the sheer size of Library of Congress collections can sometimes overwhelm. Anyone approaching LC collections for the first time should be able to find and work with material at a scale that meets their needs and abilities. It is most important to provide access to collections and services at a ‘human scale’, whether that means one item at a time, or millions of items at a time, or some scale in between which best fits the needs of the individual coming to the Library.  For example, UCLA’s Miriam Posner engages humanities students with collections at the scale of a few thousand items, which challenges them to use automated tools and techniques but is still small enough that they can “get to know” the materials over the course of a project.  Another critical aspect of this focus is representation.  To make the Library relevant and immediate, anyone visiting its collections should be able to see themselves and to recognize stories of people like them reflected and featured among digital collections, at every scale.  The breadth and variety of collections at the Library of Congress reflects our wonderfully diverse culture, and that means all of us and all of our histories.

What other opportunities do you see in establishing a Lab at the Library of Congress?  

Michelle: The Library of Congress is a powerful convener. It has always been able to get people to come together around a table and talk through controversial or challenging topics — from copyright restrictions to stewardship responsibilities and many others. The Lab community is still emerging. There are some extraordinarily strong players that have a lot to share and there are a lot of opportunities for labs that haven’t yet been developed. The Library of Congress could provide valuable leadership by convening the full spectrum of this community to make sure that emerging successes are circulated and pitfalls are documented. It could really help move things to another level.

Dan: I agree, the possibilities of building communities around opening up access to digital collections, connecting students with collections and subject expertise across institutions, and convening practitioners to share what works by building networks of potential collaborators across disciplines and distances are compelling.  We heard from many people that public goodwill toward the Library of Congress is strong, which affords that ability to draw people with mutual interests together.  When the Library puts an event together, people will travel great distances and tune in from all over the net, as the recent #asdata event demonstrated. Similarly, when Library staff show up and participate in community initiatives and events, people take notice and take their contributions to heart.  A Lab at the Library of Congress could be a great new conduit for this kind of leadership, amplifying the great service innovations of many great peer institutions while assembling a mix of services that fit the unique possibilities and constraints at LC.

Thank you both again for the time and effort you put into the report (PDF). NDI is excited to work toward establishing a Library of Congress Lab in the coming year, we’ll keep you all posted on our progress.

Open Knowledge Foundation: Three ways ROUTETOPA promotes Transparency

planet code4lib - Tue, 2017-03-14 11:00

Data sharing has come a long way over the years. With open source tools, improvements and new features are always quickly on the rise. Serah Rono looks at how ROUTETOPA, a Horizon2020 project advocate for transparency.

From as far back as the age of enlightenment, the human race has worked hard to keep authorities accountable. Long term advocates of open data agree that governments are custodians, rather than owners, of data in their keep and should, therefore, avail the information they are charged with safekeeping for public scrutiny and use. Privacy and national security concerns are some of the most common barriers to absolute openness in governments and institutions in general around the world.

As more governments and organisations embrace the idea of open data, some end up, inadvertently, holding back on releasing data they believe is not ready for the public eye, a phenomenon known as ‘data-hugging’. In other instances, governments and organisations end up misleading the general public about the actual quantity and quality of information they have made public. This is usually a play at politics – a phenomenon referred to as ‘open-washing’ and is very frustrating to the open data community. It does not always stop here – some organisations are known to notoriously exaggerate the impact of their open data work  – a phenomenon Andy Nickinson refers to as ‘open-wishing’.

The  Horizon2020 project, Raising Open and User-Friendly Transparency Enabling Technologies for Public Administrations (ROUTETOPA), works to bridge the gap between open data users and open data publishers. You can read the project overview in this post and find more information on the project here.

In an age of open-washing and data-hugging, how does ROUTETOPA advocate for transparency

  1. ROUTETOPA leads by example!

The source code for ROUTETOPA tools is open source and lives in this repository. ROUTETOPA also used CKAN, a renowned data portal platform, as the basis for its Transparency Enabling Toolkit (TET). TET provides public administrators in ROUTETOPA’s pilot cities with a platform to publish and open up their data to the public. You can read more about it here

       2. Data publishers as pilot leads

ROUTETOPA pilots are led by public administrators. This ensures that public administrators are publishing new data regularly and that they are also at hand to answer community questions, respond to community concerns and spearhead community discussions around open data in the five pilot cities.

3.Use of online and offline communication channels

Not only does ROUTETOPA have an active social media presence on Facebook, Twitter and Youtube, it also has its own social media platform, the Social Platform for Open Data (SPOD) that provides a much needed avenue for open data discourse between data publishers and users.  The pilots in Prato, Groningen, Dublin, Issy and Den Haag also hold regular workshops, focus groups and tool test parties. Offline engagement is more relatable, and creates rapport between public administrations and citizens and is also a great avenue for making data requests.

The ROUTETOPA consortium also runs an active blog that features project updates and lessons learnt along the way. Workshops and focus groups are a key part of the success of this project, as user feedback informs the development process of ROUTETOPA tools.

ROUTETOPA partners also attend and spread the work in open data conferences and seminars, to keep the open data community across Europe in the know, and as an avenue to invite the community to test the tools, give feedback, and if it suites, adapt the tools for use in their organizations, institutions and public administrations.

Need clarification, or want to plug in and be a part of ROUTETOPA’s progress? Write to Stay open!

DuraSpace News: PARTICIPATE in Endangered Data Week: April 17-21

planet code4lib - Tue, 2017-03-14 00:00

From Bethany Nowviskie, Director of the Digital Library Federation (DLF) at CLIR and Research Associate Professor of Digital Humanities, University of Virginia 

Jonathan Rochkind: Command-line utility to visit github page of a named gem

planet code4lib - Mon, 2017-03-13 20:45

I’m working in a Rails stack that involves a lot of convolutedly inter-related dependencies, which I’m not yet all that familiar with.

I often want to go visit the github page of one of the dependencies, to check out the README, issues/PRs, generally root around in the source, etc.  Sometimes I want to clone and look at the source locally, but often in addition to that I’m wanting to look at the github repo.

So I wrote a little utility to let me do gem-visit name_of_gem and it’ll open up the github page in a browser, using the MacOS open utility to open your default browser window.

The gem doesn’t need to be installed, it uses the API (hooray!) to look up the gem by name, and look at it’s registered “Homepage” and “Source Code” links. If either one is a link, it’ll prefer that. Otherwise, it’ll just send you to the Homepage, or failing that, to the page.

It’s working out well for me.

I wrote it in ruby (naturally), but with no dependencies, so you can just drop it in your $PATH, and it just works. I put ~/bin on my $PATH, so I put it there.

I’ll give to give you a gist of it (yes, i’m not maintaining it at present), but thinking about the best way to distribute this if I wanted to maintain it…. Distributing as a ruby gem doesn’t actually seem great, with the way people use ruby version switchers, you’d have to make sure to get it installed for every version of ruby you might be using to get the command line to work in them all.

Distributing as a homebrew package seems to actually make a lot of sense, for a very simple command-line utility like this. But homebrew doesn’t really like niche/self-submitted stuff, and this doesn’t really meet the requirements for what they’ll tolerate. But homebrew theoretically supports a third-party ‘tap’, and I think a github repo itself can be such a ‘tap’, and hypothetically I could set things up so you could cask tap my_github_url, and install (and later upgrade or uninstall!) with brew… but I haven’t been able to figure out how to do that from the docs/examples, just verified that it oughta be possible!  Anyone want to give me any hints or model examples?  Of a homebrew formula that does nothing but install a simple one-doc bash script, and is a third-party tap hosted  on github?

Filed under: General

HangingTogether: Linked data panel at ALIA Information Online 2017 Conference

planet code4lib - Mon, 2017-03-13 17:18

I had the honor of being invited to participate in a linked data panel at the ALIA (Australian Library and Information Association) Information Online conference held in Sydney last month. We panelists were pleasantly surprised by the turnout—about 200 people attended our session, more than anticipated (standing room only). The conference organizers were amazingly prompt at uploading all presentations for the entire conference on the ALIA website.

Here’s my summary of the session’s highlights, reinforced by what was tweeted (#online17).

Monika Szunejko (National Library of Australia): “The links effect: The laws of attraction, linked data, and the national union catalogues of France and Britain” (Rather than speaker notes on the PowerPoint, see the accompanying paper.) Monika highlighted key findings from her case studies of the British Library’s and Bibliothèque nationale de France’s releases of their respective national bibliographies as linked data to inform the National Library of Australia’s own plans.

  • The BL wanted to move away from library-specific formats and adopt standards that would reach audiences outside the library world; data must be open and re-used. It also wanted to enrich professional practice, leveraging staff skills in scripting languages and data conversion. Its success was affirmed by unlocking the value of its national bibliography to broader audiences, influencing Google rankings for its unique material and developing in-house expertise in linked data principles and data modeling. It also built an infrastructure that could be shared with other libraries.
  • The BnF’s main impetus was to connect different databases, breaking down their silos—“a wedding planner between data sets”—enhancing discovery across collections and making them visible on the web to reach non-library audiences. It already had a deep engagement with identifiers as one of the founders of both VIAF and ISNI, and saw linked data as a means to expand their professional practices.

Both organizations see disseminating information as part of their mission, purpose and value and linked data helps them achieve that goal, while reaching new audiences and maximizing their data assets.  Monika concluded that it was important for each library to understand its own context—your own reasons for moving to linked data—and your own capacity and capabilities. She encouraged others to join communities of practice to inform their own implementations.

Jenny Langenstrass & Katrina McAlpine (U. Sydney): “Good data bad data: getting ready for linked data” (the PowerPoint includes speaker notes). This presentation focused on the steps libraries could take now to prepare their legacy MARC data for future linked data implementations, such as data remediation and using identifiers. The University of Sydney has focused on authority work, and particularly disambiguating personal names (over 40 entries for “Robert Scott”). Research information management systems already use identifiers like DOIs and ORCIDs and those for grants so that researchers can be linked to their papers, institutions, grants, datasets and collaborators. Katrina pointed to the Research Data Switchboard ( as a successful example of linking researchers, organizations, datasets and papers across the data collections in Research Data Australia, Dryad, CERN and other international repositories.

Mine on “Linked data – bringing the world closer together” (PowerPoint includes speaker notes). My presentation focused on the opportunities of embedding library data into the “web of data”, not just the “web of documents” we’ve become familiar with. I focused on where a large portion of our communities live and learn—Google and Wikipedia—as platforms where we could leverage the work libraries have done to expose relationships among works, translations and their creators. I also showed some linked data examples from the 2015 International Linked Data Survey for Implementers to illustrate how library data could bridge domains and languages.

There were many good and inspiring sessions during the ALIA Information Online 2017 conference and I encourage you to look at the full program.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

Mail | Web | Twitter | More Posts (73)

Open Knowledge Foundation: Open Data Day 2017 in Kisumu, Kenya – Can we tell stories about our local environment with no open data?

planet code4lib - Mon, 2017-03-13 15:51

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth OfficeThis event was supported through the mini-grants scheme under the Environment theme.

This post was originally published on the LakeHub website:

On Saturdays the 4th of March we gathered at the hub for Open Data Day, 2017. It’s the third time we are hosting this important workshop, the previous times being in 2014 and 2015. Open Data day events are a great opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society.

The initial plan for Open Data Day Kisumu was to host a datathon on the climatic implications on jobs and the economy around the city. Ailing levels of Lake Victoria’s water levels, visualise how Lake Victoria has been receding, fish species of Lake Victoria. We wanted to know whether fishes are dying or getting extinct for instance. Visualising the destruction of the Mau Forest would have been another great eye-opener – Global Forest Watch provides several data sets on forest coverage, use, loss, and indigenous peoples. However, due to time-constraints much of this data was not accessible. Bringing us to the realisation of how much Open Data is much needed – but very much so, leadership on initiatives to popularise open data required.

This year Open Data Day Kisumu was a bit different from the past. Unlike previous events where data experts presented informative visuals to the participants and little contribution from the participants, this time our participants, consisting primarily of teenage developers, benefited from a walk through of how Open Data could be useful in their work.

These young group of people, who were mainly girls from a local school, built mobile apps to solve some challenges in the city. One such solution was: a group of girls in the primary school mapped spots known as ‘black spots’ which can cause road carnage. At the workshop, the girls got to learn about how to access publicly available data from the government website on the number of traffic accidents that occur each month. The data, however, did not turn out to be very useful in the form in which it was presented, and therefore strategies on how to clean the data were learned.

Primary school girls present an app they are building. Young learners should be made aware of the benefits of Open Data.

Some other interesting solutions the girls are working on and could greatly benefit from open data include a game that educates the public on safe sex and the dangers of HIV/AIDs and non-adherence to medication. Another solution, dubbed “Kiraia”, is a children’s app that encourages them to take part in governance. The group of girls working on this solution want to make it possible for children to participate in day-to-day activities regarding civic action and government. The team was mentored on which data they could find useful for their work, and how possibly they could leverage existing data to create Open Data for the masses.

Data Scientist Dominic Mutai from PATH, Kisumu leading the session

Even though we had to divert from our original plan, we are still confident that moving forward one of our primary goals as a community interested in Open learning would be to use data to highlight effects of climate change, deforestation on Lake Victoria, the second largest freshwater body in the world. Water levels in the lake have been dropping drastically over the years. As a community, we want to build visualisations that tell a story about the lake.


Subscribe to code4lib aggregator