You are here

Feed aggregator

FOSS4Lib Upcoming Events: VuFind Summit 2017

planet code4lib - Thu, 2017-07-13 14:50
Date: Monday, October 9, 2017 - 08:00 to Tuesday, October 10, 2017 - 17:00Supports: VuFind

Last updated July 13, 2017. Created by Peter Murray on July 13, 2017.
Log in to edit this page.

VuFind Summit 2017 conference page

Open Knowledge Foundation: FutureTDM symposium: sharing project findings, policy guidelines and practitioner recommendations

planet code4lib - Thu, 2017-07-13 14:16

The FutureTDM project, in which Open Knowledge International participates, actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help improve the uptake of text and data mining (TDM) in Europe (read more). Last month, we held our FutureTDM Symposium at the International Data Science Conference 2017 in Salzburg, Austria. With the project drawing to a close, we shared the project findings and our first expert driven policy recommendations and practitioner guidelines. This blog report has been adapted from the original version on the FutureTDM blog.

The FutureTDM track at the International Data Science Conference 2017 started with a speech by Bernhard Jäger form SYNYO who did a brief introduction to the project and explained the purpose of the Symposium – bringing together policy makers and stakeholder groups to share with them FutureTDM’s findings on how to increase TDM uptake.

Introduction to the FutureTDM project from FutureTDM

This was followed by a keynote speech on the Economic Potential of Data Analytics by Jan Strycharz from Fundacja Projekt Polska, a FutureTDM project partner. It was estimated that automated (big) data and analytics – if developed properly – will bring over 200 B Euro to the European GDP by 2020. This means that algorithms (not to say robots) will be, then, responsible for 1.9% of the European GDP. You can read more on the TDM impact on economy in our report Trend analysis, future applications and economics of TDM.

Dealing with the legal bumps

The plenary session with keynote speeches was followed by the panel: Data Analytics and the Legal Landscape: Intellectual Property and Data Protection. As an introduction to this legal session Freyja van den Boom from Open Knowledge International presented our findings on the legal barriers to TDM uptake that mainly refer to type of content and applicable regime (IP or Data Protection). Having gathered evidence from the TDM community, FutureTDM has identified three types of barriers: uncertainty, fragmentation and restrictiveness and developed guidelines recommendation how to overcome them. We have summarised this in our awareness sheet Legal Barriers and Recommendations.

This was followed by the statements from the panelists: Prodromos Tsiavos (Onassis Cultural Centre/ IP Advisor) stressed the fact that with the recent changes in the European framework, the law faces significant issues and balancing the industrial interest is becoming necessary. He added that in order to initiate the uptake of the industry, a different approach is certainly needed because the industry will continue with license arrangements.

Duncan Campbell (John Wiley & Sons, Inc.) concentrated on Copyright and IP issues. How do we deal with all the knowledge created? How does the copyright rule has influence? He spoke about EU Commission Proposal and UK TDM exception – how to make an exception work?

Marie Timmermann (Science Europe) also focused on the TDM exception and its positive and negative sides. From the positive perspective, she views the fact that TDM exception moved from being optional to mandatory and it is not overridable. From the negative side she stated that the exception is very limited in scope. Startups or SMEs do not fall under this exception. Thus, Europe risks to lose promising researchers to other parts of the world.

This statement was also supported by Romy Sigl (AustrianStartups). She confirmed that anybody can created a startup today, but if startups are not supported by legislation, they move outwards to another country where more potential is foreseen.

The right to read is the right to mine

The next panel was devoted to an overview of FutureTDM case studies: Startups to Multinationals. Freyja van den Boom (OKI) gave on overview of the highlights of the stakeholder consultations, which cover different areas and stakeholder groups within TDM domain. Peter Murray-Rust (ContentMine) presented a researcher’s view and he stressed that the right to read is to right to mine, but we have no legal certainty what a researcher is allowed to do and what not.

Petr Knoth from CORE added that he believed that we needed the data infrastructure to support TDM. Data scientist are very busy with cleaning the data and they have little time to do the real mining. He added that the infrastructure should not be operated by the publishers but they should provide support.

Donat Agosti from PLAZI focused on how you can make the data accessible so that everybody can use it. He mentioned the case of PLAZI repository – TreatmentBank. It is open and extracts each article and creates citable data. Once you have the data you can disseminate it.

Kim Nilsson from PIVIGO spoke about the support for academics – they have already worked with 70 companies and provided support in TDM for 400 PhD academics. She mentioned how important data analytics and the possibility to see all the connections and correlations are for example for the medical sector. She stressed that data analytics is also extremely important for startups – gaining the access is critical for them.

Data science is the new IT

The next panel was devoted to Universities, TDM and the need for strategic thinking on educating researchers. FutureTDM project officer Kiera McNeice (British Library) gave an overview on the skills and education barriers to TDM. She stressed that there are many people saying that they need to have quite a lot of knowledge to use TDM and that there are skills gap between academia and industry. Also, the barriers to enter are still high because use of the TDM tools often require programming knowledge.

We have put together a series of guidelines to help stakeholders overcome the barriers we have identified. Our policy guidelines include encouraging universities to support TDM through both their research and education arm for example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it. You can read more in our Baseline report of policies and barriers of TDM in Europe, or walk through them via our Knowledge Base.

Kim Nilsson from PIVIGO stressed that the main challenge are software skills. The fact is that if you can do TDM you have fantastic options: startups, healthcare, charity. Their task is to offer proper career advice, help people understand what kind of skills are appreciated and assist them to build on them.

Claire Sewell (Cambridge University Library) elaborated on the skills from the perspective of an academic librarian. What important is the basic understanding on copyright law, keeping up with technical skills and data skills. “We want to make sure that if a researcher comes into the library we are able to help him.”- she concluded.

Jonas Holm from Stockholm University Library highlighted the fact that very little strategical thinking is going on in TDM area. “We have struggled to find much strategical thinking on TDM area. Who is strategically looking for improving the uptake at the universities? We couldn’t find much around Europe” – he said.

Stefan Kasberger (ContentMine) stressed that the social part of the education is also important – meaning inclusion and diversity.

Infrastructure for Technology Implementation

The last session was dedicated to technologies and infrastructures supporting Text and Data Analytics: challenges and solutions. FutureTDM Project Officer Maria Eskevich (Radboud University) delivered a presentation on the TDM landscape with respect to infrastructure for technical implementation.

Stelios Piperidis from OpenMinTed stressed the need for an infrastructure. “Following more on what we have discussed, it looks that TDM infrastructure has to respond to 3 key questions: How can I get hold on the data that I need? How can I find the tool to mine the data? How can I deploy the work carried out?”

Mihai Lupu form Data market Austria brought up the issue of data formats: For example, there is a lot of data in csv files that people don’t know how to deal with.

Maria Gavrilidou (clarin:el) highlighted the fact that not only the formats are problem but also identifying the source of data and putting in place lawful procedures with respect to this data. Meta data is also problematic because it very often does not exist.

Nelson Silva (know-centre) focused on using proper tools for mining the data. Very often there is no particular tool that meets your needs and you have to either develop one or search for open source tools. Another challenge is the quality of the data. How much can you rely on the data and how to visualise it? And finally, how to be sure that the people will have the right message.


The closing session was conducted by Kiera McNeice (British Library), who presented A Roadmap to promoting greater uptake of Data Analytics in Europe.  Finally, we also had a Demo Session with flash presentations by:

  • Stefan Kasberger (ContentMine),
  • Donat Agosti (PLAZI), Petr Knoth (CORE),
  • John Thompson-Ralf Klinkenberg (Rapidminer),
  • Maria Gavrilidou (clarin:el),
  • Alessio Palmero Aprosio (ALCIDE)

You can find all FutureTDM reports in our Knowledge Library, or visit our Knowledge Base: a structured collection of resources on Text and Data Mining (TDM) that has been gathered throughout the FutureTDM project.


OCLC Dev Network: DEVCONNECT 2017: Dashboards and Artificial Intelligence in Libraries

planet code4lib - Thu, 2017-07-13 13:00

Wayne State University Libraries has been working on several projects aimed at better understanding library material usage using OCLC products and APIs. 

Terry Reese: MarcEdit 7: MARC Tools Wireframe

planet code4lib - Thu, 2017-07-13 06:03

The changes aren’t big – they are really designed to make the form a little more compact and add common topics to the screen.  The big changes are related to integrations.  In MarcEdit 6.x, when you run across an error, you have to open the validator, pick the correct validation option, etc.  This won’t be the case any longer.  When the tool determines that the problem may be related to the record structure – it will just offer you option to check for errors in your file…no opening the validator, not picking options.  This should make it easier get immediate feedback regarding any structural processing errors that the tool may run up against.

MARC Tools Window Wireframe #1:

The second write frame collapses the list into an autocomplete/autosuggest options, moves data around and demonstrates some of the potential integration options.  I like this one as well – though I’m not sure if having the items in a dropdownlist with autocomplete would be more difficult to use than the current dropdown list.  I also use this as an opportunity to get ride of the Input File and Output file labels.  I’m not sure these are always necessary, and I honestly hate seeing them.  But I know that iconography maybe isn’t the best way to express meeting.  I think attaching tooltips to each button and textbox might allow me to finally let these labels go.

MARC Tools Wireframe #2:


Based on feedback, it sounds like the labels are still desired.  So here is wireframe #3 with a slight modification to allow for labels in the window.

MARC Tools Wireframe #3:


District Dispatch: Breaking #SaveIMLS news from ALA Pres. Jim Neal

planet code4lib - Wed, 2017-07-12 23:25

Today, ALA President Jim Neal sent this #SaveIMLS campaign update to the ALA membership regarding tomorrow’s vote on part of the House Appropriations Committee bill:


I’m pleased, but with important cautions, to tell you that all of our collective work to Fight for Libraries! is poised to pay off dramatically. Key parts of the House Appropriations Subcommittee bill that is scheduled to be voted on tomorrow afternoon at 4:30 EDT were released late this afternoon. The bill does NOT cut last year’s funding to the Institute of Museum and Library Services. Once final, that would mean no cuts to LSTA in this critical first vote stage! (We’ll know about Innovative Approaches to Literacy tomorrow.)

Now the cautions… While unlikely, an amendment could be offered to the bill that changes the IMLS appropriation. In addition, after tomorrow’s vote, there will be at least two further procedural opportunities for the bill to be amended. After the House acts, of course, the Senate will take its turn, though probably not for some months.

As the phrase goes, therefore, this is definitely NOT over until it’s over. We will report immediately on the results of tomorrow’s Subcommittee vote. I hope very much that the next thing for us to do will be to thank our House supporters.

Until early evening tomorrow in Washington, please join me in crossing your fingers.

Jim Neal
ALA President

Further updates will be sent out as we get them. In the meantime, if you are a constituent of a Representative working on the Labor HHS subcommittee, give them a call!

The post Breaking #SaveIMLS news from ALA Pres. Jim Neal appeared first on District Dispatch.

David Rosenthal: Is Decentralized Storage Sustainable?

planet code4lib - Wed, 2017-07-12 19:49
There are many reasons to dislike centralized storage services. They include business risk, as we see in le petit musée des projets Google abandonnés, monoculture vulnerability and rent extraction. There is thus naturally a lot of enthusiasm for decentralized storage systems, such as MaidSafe, DAT and IPFS. In 2013 I wrote about one of their advantages in Moving vs. Copying. Among the enthusiasts is Lambert Heller. Since I posted Blockchain as the Infrastructure for Science, Heller and I have been talking past each other. Heller is talking technology; I have some problems with the technology but they aren't that important. My main problem is an economic one that applies to decentralized storage irrespective of the details of the technology.

Below the fold is an attempt to clarify my argument. It is a re-statement of part of the argument in my 2014 post Economies of Scale in Peer-to-Peer Networks, specifically in the context of decentralized storage networks.

To make my argument I use a model of decentralized storage that abstracts away the details of the technology. The goal is a network with a large number of peers each providing storage services. This network is:
  • decentralized in the sense that no single entity, or small group of entities, controls the network (the peers are independently owned and operated), and
  • sustainable, in that the peers do not lose financially by providing storage services to the network.
I argue that this network is economically unstable and will, over time, become centralized. This argument is based on work from the 80s by the economist W. Brian Arthur1.

Let us start by supposing that such a decentralized storage network has, by magic, been created:
  • It consists of a large number of peers, initially all providing the same amount of storage resource to the network.
  • Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards.
  • Each peer incurs costs to supply this resource, in the form of hardware, bandwidth, power, cooling, space and staff time.
  • The network has no central organization which could contract with the peers to supply their resource. Instead, it rewards the peers in proportion to the resource they supply by a token, such as a crypto-currency, that the peers can convert into cash to cover their costs.
  • The users of the network rent space in the network by buying tokens for cash on an exchange, setting a market price at which peers can sell their tokens for cash. This market price sets the $/TB/month rent that users must pay, and that peers receive as income. It also ensure that users do not know which peers store their data.
Although the income each peer receives per unit of storage is the same, as set by the market, their costs differ. One might be in Silicon Valley, where space, power and staff time are expensive. Another might be in China, where all these inputs are cheap. So providing resources to the network is more profitable in China than in Silicon Valley.

Suppose the demand for storage is increasing. That demand will preferentially be supplied from China, where the capital invested in adding capacity can earn a greater reward. Thus peers in China will add capacity faster than those in Silicon Valley and will enjoy not merely a lower cost base because of location, but also a lower cost base from economies of scale. This will increase the cost differential driving the peers to China, and create a feedback process.

Competition among the peers and decreasing hardware costs will drive down the  $/TB/month rent to levels that are uneconomic for Silicon Valley peers, concentrating the storage resource in China (as we see with Bitcoin miners).

Lets assume that all the peers in China share the same low cost base. But some will have responded to the increase in demand before others. They will have better economies of scale than the laggards, so they will in turn grow at the laggards' expense. Growth may be by increasing the capacity of existing peers, or adding peers controlled by the entity with the economies of scale.

The result of this process is a network in which the aggregate storage resource is overwhelmingly controlled by a small number of entities, controlling large numbers of large peers in China. These are the ones which started with a cost base advantage and moved quickly to respond to demand. The network is no longer decentralized, and will suffer from the problems of centralized storage outlined above.

This should not be a surprise. We see the same winner-take-all behavior in most technology markets. We see this behavior in the Bitcoin network.

I believe it is up to the enthusiasts to explain why this model does not apply to their favorite decentralized storage technology, and thus why it won't become centralized. Or, alternatively, why they aren't worried that their decentralized storage network isn't actually decentralized after all.


  1. Arthur, W. Brian. Competing technologies and lock-in by historical small events: the dynamics of allocation under increasing returns. Center for Economic Policy Research, Stanford University, 1985. in Arthur, W. Brian. Increasing Returns and Path Dependence in the Economy, Michigan University Press, 1994.

Jodi Schneider: QOTD: Working out scientific insights on paper, Lavoisier case study

planet code4lib - Wed, 2017-07-12 19:04

…language does do much of our thinking for us, even in the sciences, and rather than being an unfortunate contamination, its influence has been productive historically, helping individual thinkers generate concepts and theories that can then be put to the test. The case made here for the constitutive power of figures [of speech] per se supports the general point made by F.L. Holmes in a lecture addressed to the History of Science Society in 1987. A distinguished historian of medicine and chemistry, Holmes based his study of Antoine Lavoisier on the French chemist’s laboratory notebooks. He later examined drafts of Lavoisier’s published papers and discovered that Lavoisier wrote many versions of his papers and in the course of careful revisions gradually worked out the positions he eventually made public (Holmes, 221). Holmes, whose goal as a historian is to reconstruct the careful pathways and fine structure of scientific insights, concluded from his study of Lavoisier’s drafts

We cannot always tell whether a thought that led him to modify a passage, recast an argument, or develop an alternative interpretation occurred while he was still engaged in writing what he subsequently altered, or immediately afterward, or after some interval during which he occupied himself with something else; but the timing is, I believe, less significant than the fact that the new developments were consequences of the effort to express ideas and marshall supporting information on paper (225).

– page xi of Rhetorical Figures in Science by Jeanne Fahnestock, Oxford University Press, 1999.

She is quoting Frederich L. Holmes. 1987. Scientific writing and scientific discovery. Isis 78:220-235. DOI:10.1086/354391

As Moore summarizes,

Lavoisier wrote at least six drafts of the paper over a period of at least six months. However, his theory of respiration did not appear until the fifth draft. Clearly, Lavoisier’s writing helped him refine and understand his ideas.

Moore, Randy. Language—A Force that Shapes Science. Journal of College Science Teaching 28.6 (1999): 366.
(which I quoted in
a review I wrote recently)

Fahnestock adds:
“…Holmes’s general point [is that] there are subtle interactions ‘between writing, thought, and operations in creative scientific activity’ (226).”

District Dispatch: Speak up on Day of Action to Save Net Neutrality

planet code4lib - Wed, 2017-07-12 15:06

Today, ALA joins close to 200 organizations participating in a Day of Action to Save Net Neutrality. Websites, internet users and online communities are coming together to sound the alarm about the Federal Communications Commission (FCC)’s attack on net neutrality. You can add your voice to a growing and powerful chorus.

ALA and America’s libraries believe protecting and preserving the open internet is essential for ensuring the free flow of information to all, which underpins free speech, research and learning, economic empowerment and digital innovation. We have historically and currently support strong, enforceable net neutrality rules from the FCC as a matter of our values and ethics, public mission and professional practice as broadband consumers and advocates.

I often get the question—from librarians, net neutrality advocates and adversaries) —about the specific ALA- and library-stake in this issue. I’ve thought about it a lot over the last 8+ years working on this issue and here are a few of the things I see through a library lens.

As a matter of principle, library professionals commit to professional values of intellectual freedom, equitable access to information and diversity. Consider:

  • Intellectual freedom and free expression are as fundamental to the Internet as the First Amendment is to American democracy. These also are the core values of America’s public, K-12 school, higher education and all libraries. Commercial ISPs should not be enabled to serve as gatekeepers for the information people may freely access online.
  • Equitable access to information online depends on the open internet. Prioritized access to some content over others is antithetical to librarian and democratic values. It also runs counter to the innovative and “permissionless” nature of the internet that enables creators to reach global audiences by the quality of their offerings rather than the size of their wallet.
  • Embedded in both of the above is a commitment to the need to foster and share a diversity of voices, ideas and experiences.

As a matter of practice, libraries collect, create, provide access to and disseminate essential information to the public over the internet. Consider:

  • Will libraries—largely funded through public dollars—be able to compete for priority access to share diverse digital collections that range from community newspaper and photo archives to downloadable local music to veterans’ oral history projects to documentary video?
  • Will libraries be able to pay increased fees to vendors that may pass along the costs of paying for prioritized access to their streaming and downloadable media resources?
  • Will libraries be required to consider the affiliated (and therefore likely prioritized) content available through a commercial ISP when selecting their broadband provider(s)? And will they be forced to pay multiple ISPs to provide service to enable public access to affiliated content?
  • Will libraries’ ability to provide no-fee public internet access to support bandwidth-intensive services ranging from high-definition video conferencing and distance learning to big data sharing to telemedicine be compromised if these services are throttled in favor of commercial content preferred by commercial ISPs?
  • How will libraries educate their public internet users to these choices and limitations related to prioritized content—both in terms of patrons’ access and their ability to contribute their own cultural and commercial products to other internet users? How transparent will ISP practices be to libraries and the campuses, communities and individuals we serve?

America’s libraries and librarians are rightly recognized as essential democratic institutions and leading advocates for people’s rights to read and express themselves freely. The internet is today’s most essential platform for this speech and expression.

An open internet in which commercial ISPs are prohibited from blocking, throttling, degrading, discriminating or prioritizing among online content and services is essential to free expression and equitable access and contribution to online information.

To preserve the open internet, we must have legal protections and the ability of the expert government agency to enforce these protections. The 2015 Open Internet Order enables these protections, creates a framework for addressing future conduct concerns, and allows for the flexibility to forbear where needed. The FCC must retain and use as needed its lawful authority and not entrust the future of the internet to entities whose financial interests may vary significantly from the public interests.

Today is a big day, but one among many ahead to preserve the gains ALA and other network neutrality advocates made in 2014 and 2015. Please join us now in sending the strongest message possible to the FCC about the value of an open internet for libraries and the people we serve. And please stay tuned and share your questions and ideas with us in the coming weeks and months.

The post Speak up on Day of Action to Save Net Neutrality appeared first on District Dispatch.

In the Library, With the Lead Pipe: Following the Yik Yak: Using Social Media Observations to Understand Student Needs on College Campuses

planet code4lib - Wed, 2017-07-12 12:00

In Brief:

What can social media tell us about our patrons? We look at Yik Yak through the lens of library ethnography, suggesting that anonymous social media can reveal not only complaints and commentary about library services and spaces, but also uncover students’ emotional and social experience with research and study space. Looking at social media posts on Yik Yak, we uncover common threads of social dynamics, expectations of quiet space, and frustration with studying, all of which increase our understanding of student experiences in US and international libraries.

by Celia Emmelhainz and Miriam Rigby

Parseltongue in the Library, screenshot by Celia Emmelhainz, CC-BY


As librarians, we always want to better understand and connect with our patrons—and to use our understanding to set up more useful spaces and services. In the past 10 years, librarians have moved from relying on user surveys to working with anthropologists and ethnographically-trained librarians. These collaborations let us learn not just what our users say to our face, but what they actually do—and how they live their lives in relation to the library as a space and social place.

Ethnographic methods provide librarians with a series of approaches that let us see our libraries in their social context.1 These could include things such as inhabiting study spaces and watching the social habits of patrons in our library, observing spatial usage and leading mapping exercises with students, embedding in research methods or capstone seminar courses to observe how library research is taught, and even inviting students to document their own experiences through photos, videos, and research diaries.2 Such ethnographic methods differ from the “interviews and surveys” baseline of qualitative user experience research in libraries, as ethnographic methods can allow us to embed over time in our users’ social and virtual spaces. Inhabiting the same world that our students inhabit can lead us to much deeper understanding than just recording their reported experiences through surveys, focus groups, or interviews alone.  

In this article, we take another ethnographic angle, that of digital ethnography. By collecting and reflecting on changing social media posts, we as researchers use the internet to observe–in real time–how patrons discuss their engagement with the library. Anonymous, location-based services like Yik Yak let us practice trace ethnography, observing the times, places, and digital traces by which students express their relationship to the library.3

One puzzle for us was that whether we run qualitative or quantitative studies, our libraries still find it hard to gain feedback from those who do not currently use our services.4 How can we know what people are thinking if they don’t share their thoughts? How can we learn what students need if they don’t darken our revolving doors, use our website, or volunteer for a research study with a side incentive of pizza?

And even when patrons complete an interview or survey, do they realize they’re editing their thoughts—perhaps based on a desire to be polite and helpful to us as librarians?

Using Social Media Apps to Reach Out and Listen to Students

With this in mind, we chose to gain a new angle on our libraries by analyzing anonymous and ephemeral social media posts on Yik Yak. Public, school, and academic librarians already use social media such as Twitter and Facebook to gain feedback from tech-savvy users.5 Yet trend-responsive patrons are always moving from one network to another. A prior generation left Tribe, Friendster, and MySpace for Facebook, while today’s students may leave shell profiles on Facebook, while using other apps in order to avoid parents, teachers, and corporate advertisers.6 One student at UT Austin describes Facebook as “dead to us” and an “awkward family dinner party,”7 noting that his friends now enjoy Medium, GroupMe, WhatsApp, Tumblr, and Snapchat instead.

As younger patrons leave networks like Facebook, they move to ephemeral social media such as Snapchat, where messages disappear after viewing;8 Instagram, where they curate daily life for a wide audience;9 Whisper, where they anonymously share their “secrets” with those nearby; and Yik Yak, a source of anonymous chatter on many college campuses.  

While Instagram remains strong, though, most apps are ephemeral. They move in and out of popularity, with posts fading out over minutes, hours, or days. In fact, Yik Yak itself announced in April 2017 that it is ceasing operations.10 We argue, though, that capturing these passing streams of student experiences in the library continues to have value. Like culture as a social construct, social discourse on social media apps is both passing and long-lasting in impact. So why pay attention to passing trends, and how can we capture lasting value from these ephemeral posts?

Listening to Campus Yaks

At the time we reviewed each of these apps in 2015, Yik Yak was reaching the height of its popularity. Discussions of the pros and cons of social media were appearing regularly in the Chronicle of Higher Education and Inside Higher Ed. Among those apps existing at the time, we found Yik Yak’s comments on campus life to be the most revealing in understanding our patrons.

Started by two graduates of Furman University, Yik Yak was a mobile app that launched in 2013 and gained rapid popularity on and near college campuses, as well as some high schools.11 It let users within a few miles12 share anonymous, wry comments on daily life, and allowed other users to upvote or downvote existing posts, as well as add their own responses and commentary. As we found with Yik Yak, anyone with a smartphone or tablet can view campus chatter, from hookup requests and dorm party announcements, to bathroom humor, complaints about homework, and anguish over life and relationships.13

Bonded With the People Near Me, screenshot by Celia Emmelhainz, CC-BY

Lessons from Yaks in the Wild

The authors had a unique opportunity for research when we found ourselves snowbound together during ALA Midwinter 2015. As blizzard-like winds blew across Chicago, we ensconced ourselves indoors and surveyed over 10,000 “Yaks” posted at 60 public, private, and religious universities across North America. Using Yik Yak’s “peek” feature to see campuses besides our own, we collected anonymous, public posts by students in the vicinity of major campuses.

Methodological discussions of the ethics of performing ethnography online differ in whether to inform subjects, often depending on the nature of the research, topics of study, and expectations of privacy involved. In this case we determined that the brief nature of comments on the library, full anonymity of users, ephemeral nature of posts, and public availability to any smartphone user worldwide all reduced student expectations of privacy as well as the risk of harm. Further, we found no way to directly contact users, and are now quoting posts not tied to any specific user, and which no longer exist except in our screenshots.14

This sample was collected on Sunday, 1 February, 2015 between 5pm-2am EST, and included posts about academic libraries from Hawai’i to London, although primarily in the United States. Because the app lacked a search feature, we scrolled through both the 100 most “recent” Yaks, which last until voted down, and the 100 most popular (liked) Yaks, which last several days until they time out of the system.15 While users can save images to their phones, most posts disappear within two to three days.

The timing for our survey was excellent: on some campuses, it was the Sunday before midterms; on others students were returning from winter break or anticipating a snow day, and many were balancing studying with parties for the Super Bowl. These varied conditions gave us a wide cross-section of student experiences as they prepared for a coming week on campus.

We searched by name for campuses that would give us a range of insights, and content varied widely. On some campuses we found just one or two Yaks referencing the library among 200 recent and popular posts. At others, the library was not mentioned (and so they are not included in our sample).

In other words, libraries comprise a small slice of youthful online commentary—less popular than hookups and pizza, but important enough to occasionally bubble up in student chatter.

In reviewing sixty library-related Yaks, we found forty that fell into four thematic clusters, including the library as study space, the library as site of suffering, and the library as social space, as well as students commenting in a way that suggested (to us as librarians!) a need for research support and information literacy training.

1. Libraries as a quiet study space

Many students framed their local library as a quiet space in which to study. For example:

  • Pomona: Shoutout to the people rushing to the library after Super Bowl ends #workhardplayhard
  • Endicott: Sunday is supposed to be a day to relax and yet it’s my busiest and most stressful day
  • Oregon: The library should have an adderall vending machine
  • Michigan: Are any libs open tomorrow? Where are we supposed to study if the libraries are closed [for a snow day]?
  • Oberlin: Is Mudd open? I actually have stuff to prepare for the first day of class and reading Marxian philosophy is really hard with chatter and rap in the background
  • New Mexico: Where can I get free pizza today? Since I’m stationed in the library today
  • Bates: The library is nice and quiet. Thanks Super Bowl!

Yet this desire for a focused study space led others to express frustration with patrons who were talking… or breathing loudly.

  • Boston: please don’t breathe so loudly in the library
  • British Columbia: Why are u here, girls that come to koerner to socialize and talk super loud?
  • Bates: That glare you give when you’re in the library and you hear people outside your window having fun
  • Brigham Young: Since when is it okay to hum in the library?
  • Carleton: The person behind me on 3rd Libe is panting profusely, wtf?
  • UCL, London: When the guy next to you is whispering his essay aloud in the library, and it sounds like parseltongue

In other words, in contrast to libraries’ attempts to open up interactive study spaces, many students were frustrated by the noise levels and distracting patrons around them. Students’ highly emotional responses to these issues online suggests the need for more quiet student study spaces–something that libraries such as Colby College have implemented as the result of library observations, interviews, and ethnographic projects.

2. Libraries as a site of suffering

While students appreciated the library as a quiet study space, they also expressed ambivalent feelings about studying in the context of a university library. Of course, we attempt to make the library a welcoming space, yet students still feel a range of internal affective experiences as they study. Because this momentary experience is not be captured in library surveys, we find social media all the more useful to help us grasp the quiet struggle going on in front of us, in carrels and at tables where students wrestle with the tension between getting work done and wanting to relax:

  • Colby: Nothing worse than seeing other people from your class leaving Miller while you’re stuck there with so much left to do
  • Pittsburgh: Opens up CourseWeb and sees 25 notifications, closes laptop and cries
  • Oxford, UK: Hate it when a procession of tourists come through the library. Makes me feel like an exhibit in the world’s most boring zoo
  • Charleston College: I’ll be suffering in the library studying for my cellular biology exam tomorrow if you need me
  • Berkeley: Frustrated guy next to me in the library just threw down his pen and sighed so hard he farted. Me too, bruh
  • Carleton: Things the Libe doesn’t have: heat, wifi. things the Libe does have: books on the origins of hats
  • Ohio State: The walk to the library is the boulevard of broken dreams
  • Cambridge, UK: Oh God, the library is getting spooky. No one else is here (I think) and my lack of sleep isn’t helping…
  • Georgetown: Is there anyone still in library?

Adderall Vending Machine, screenshot by Celia Emmelhainz, CC-BY. For example of an actual library vending machine with medications and other supplies, see

3. Libraries as social space

We also found that many students use Yik Yak to comment on studying in relation to other students. The social aspect of studying in the library is highlighted by the insider language used (“the Libe”) as well as the shared humor and social experience of students sleeping, sighing, or simply being in proximity to each other:

  • Charleston: Whenever I’ve been sitting in the same spot in the library for a long time, I always feel like I’ve bonded with the people near me
  • U Chicago: Nothing reminds me of the interconnectedness of humanity like realizing that everyone in the Harper chairs naps in the exact same position
  • British Columbia: When you open your laptop in a library and Beyoncé is still playing (British Columbia)
  • Sarah Lawrence: SLC life: smoothies in the library

Social media posts can also capture how the library is perceived both socially and spatially, at a more granular level than observed with other assessment measures:

  • U Washington: Library logic: *Sees a table with only one person* *Goes in search of another table*
  • Miami U: What floor you study on in King says a lot about you. What’s your favorite?
  • Texas A&M: Why do foreign people ALWAYS talk in the library? GTFO

As in the quote from Texas A&M above, we should here note that the anonymity of many social media apps also brings up critical concerns of racism and exclusion, highlighting how our libraries are still not a safe space for everyone. Although we did not encounter other racist or sexist comments in this brief survey, we have seen them over time on our home campuses. And while most student users quickly down-vote such posts out of existence, their presence still creates a hostile space of social exclusion, and mirrors the microaggressions minority students already experience offline on college campuses.16 This darker side of student sociality on the web should prompt us to check in with vulnerable populations and make sure we host a safer and more inclusive space in our campus libraries.

4. The ongoing need for study aids and information literacy

Finally, students used Yik Yak to comment on their information habits or seek assistance from others. We wondered, here, if they were either too intimidated to approach a teacher, or didn’t realize that a librarian could be of use in helping them access related resources. Whether asking about assignments, offering flashcards, or commenting on how they seek information, these posts hint at ways librarians could improve the cultural relevance of our marketing and services for students:

  • Alabama: What’s the format of Econ 110 tests?
  • Oxford: The great feeling when you open a journal article and half the pages are references
  • UNC: Writing a scientific journal article is about as useful as the magic scarf I got three years ago
  • Grinnell: Anyone in intro sociology that wants pretty comprehensive flashcards? It feels like a shame to waste them.
  • Ohio Dominican: Can’t find study aids anywhere [crying emoticon]

And a few more students raise issues of copyright and intellectual property, grousing about others copying their Yik Yak posts, or asking if they can get away with illegal downloads on their campus:

  • Ohio State: Reposts [of popular Yik Yaks] are academic misconduct
  • Montana: Can you get in trouble for using Youtube to MP3 on campus?
  • New Mexico: I read yik yak in the mornings like it’s the morning paper
  • Calgary: An update on my game: 3 weeks and still have not opened a book
Bringing it Home

As librarians, then, we found social media posts eye-opening for the light they shine on our campus and student experiences. In library ethnography, even a small study can be insightful,17 and these posts gave us one unobtrusive view into students’ digital lives.

The Floor You Study On, screenshot by author Celia Emmelhainz, CC-BY

In talking with others, though, we also found that comments on social media can be used to directly improve our services. At the University of Oregon Libraries, Lesli Larson (Director of Communications and Marketing) used Yik Yak to actually change library outreach, reporting:

I do monitor Yik Yak to find out what undergrads are saying in raw/unvarnished terms. The key takeaway from Yik Yak for me is how student chatter reinforces wayfinding findings about the library. Students prize quiet study spaces, compete for study spots during finals, and proudly identify themselves as frequent or first time library users.”

This student chatter helped Lesli keep displays relevant, and sparked her “program to create a display system for our reservable study rooms.” As she wrote to us, “I definitely invoked Yak feedback about the scramble for study spaces” as she planned library improvements. (Her strategy here was to frame improvements as pilot projects or ongoing iterations of existing projects, thus avoiding the need for extensive bureaucratic review before implementation.)

While specific social media services rise and fall, Lesli’s experience suggests that attending to these apps helps us see another angle of campus life. However, we have two caveats: first, that reading a hundred posts often resulted in only one about the library (and in Celia’s case at Colby College, checking in daily for a semester netted only 100 comments on the library). This suggests social media may be more useful for getting a sense of trends and conversations on campus as a whole, than for learning only about the library.

In addition, we recommend that librarians practice self-care when viewing negative comments, either about campus in general or about the library and vulnerable student populations in particular.

Because Yik Yak is an “offstage” space,18 we warn you to take it with a grain of salt. Students post cynical comments to win the upvotes of their peers, and not all students are equally represented. We believe it can be a useful inverse of positive encouragements on our library surveys, but it certainly isn’t the whole picture. When students write with an edge, it often does not reflect the deep respect they have for their peers, their campus, and the libraries and librarians who help them in their daily life.

Further, we suggest librarians avoid directly advertising on anonymous social media. Faculty do intervene, as when Colgate University faculty popped up on Yik Yak to encourage students during finals,19 but one-time posts by popular faculty are the exception to the rule. Even Lesli, who used Yik Yak for “stealth communication” and promoting free coffee at finals, chose to communicate as if student-to-student, rather than as an institutional advertiser.

Screenshot by Celia Emmelhainz, CC-BY

Instead, we recommend you let social media apps spark insight into how students relate to each other, and how they communicate wryly about the library among their peers. In ephemeral online spaces like Yik Yak, we learn best by quietly observing as ethnographers and listening in to learn how patrons relate to their library, their campus, and their world.

Acknowledgments: The authors would like to thank Erin Pappas (external peer reviewer) and Annie Pho (internal peer reviewer) for reviewing drafts of this article. We really appreciate their constructive comments. We would also like to thank our Publishing Editor, Amy Koester, for her guidance in this process. Finally, we would like to thank all the members of ANSS (the Anthropology and Sociology Section of ACRL) for their collegiality and for the opportunity to develop as anthropology librarians among a great cohort of peers.


Below is a list of colleges we reviewed, in alphabetical order. Although we did not sample systematically from a list of known colleges, we sought out diversity between urban and rural environments, selective liberal arts colleges, major research universities, and smaller campus environments. We sampled for geographic diversity across both the United States, as well as, in a more limited fashion, Canada and the UK. (We checked several university locations in France but did not find Yik Yak in operation there). This list is not comprehensive, as when we did not find any library-related posts at a college, we did not keep a record of the search.

Bates College

Boston College

BYU Hawai’i

Cambridge University (UK)

Carleton College

Carroll College

Charleston College

Colby College

Duke University

Endicott College

Georgetown University

Grinnell College

Harvard University

Lewis and Clark College

Macalester College

Miami University (Ohio)

New York Univeristy

Oberlin College

Ohio Dominican University

Ohio State University

Oxford University (UK)

Pomona College

Sarah Lawrence College

Stanford University

Texas A&M University

University College London (UK)

University of Alabama

University of British Columbia (Canada)

University of Calgary (Canada)

University of California Berkeley

University of California Irvine

University of Chicago

University of Hawai’i at Hilo

University of Michigan

University of Montana

University of New Mexico

University of North Carolina–Chapel Hill

University of Oregon

University of Pittsburgh

University of Washington

  1. Celia Emmelhainz, Interviews, focus groups, and social media: lessons from collaborative library ethnographies in America and Kazakhstan, conference paper for IFLA 2016,
  2. Nancy Fried Foster and Susan Gibbons, eds., Studying Students: The Undergraduate Research Project at the University of Rochester (Chicago: Association of College and Research Libraries, 2007), and Nancy Fried Foster, ed. Studying Students: a Second Look (Chicago: Association of College and Research Libraries, 2013).
  3. R. Stuart Geiger and David Ribes, “Trace ethnography: following coordination through documentary practices,” 44th Hawaii International Conference on System Sciences (HICSS)/IEEE (2011): 1-10: (accessed March 20, 2014).
  4. Leah Emary, “The why, what and how of using ethnography for designing user experience in libraries (and a few pitfalls to avoid),” in User Experience in Libraries: Applying Ethnography and Human-Centred Design, ed. Andy Priestner and Matt Borg (Routledge, 2016, p. 68-93).
  5. Andy Burkhardt, “Social media: a guide for college and university libraries,” C&RL 71, no. 1 (2010): 10-24, Elizabeth Price and Rebecca Richardson, “Eavesdropping on the user experience,” C&RL News, 78, no. 1 (2017): 16-10, (accessed February 9, 2017).
  6. Jillian D’Onofro, “What teens say about facebook, instagram, and snapchat,” Business Insider (2013), (accessed February 7, 2015).
  7. Andrew Watts, “A Teenager’s View on Social Media,” Medium (2015), (accessed February 2, 2015).
  8. Will Oremus, “Is Snapchat Really Confusing, or Am I Just Old?” Slate (2015), (accessed February 7, 2015).
  9. Danielle Salomon, “Moving on from Facebook: Using Instagram to connect with undergraduates and engage in teaching and learning,” C&RL 78, no. 8 (2013): 408-412,
  10. Emma Hinchliffe, “It’s the End of the Line for Yik Yak,” Mashable, 29 April 2017,, accessed 30 May, 2017.
  11. Amy Carlson, “Ignite Session: Outreach, Yik Yak, and Reality TV,” The Scoop, American Libraries, (2015), (accessed 20 February, 2016).
  12. Evelyn M. Rusli, “Yik Yak, Big in Schools, is a Hit With Investors Too,” Wall Street Journal (2014), (accessed February 8, 2015).
  13. However, as participation is based within a one-mile radius, some distance-education students may be limited to observing without the ability to directly participate, hinting at the unequal access to campus resources and dialogue that they face.
  14. For a further critique of the ethics of online ethnography, see Lisa Sugiura, Rosemary Wiles, and Catherine Pope, “Ethical Challenges in Online Research: Public/Private Perception,” Research Ethics (2016), 1-16.
  15. See our addendum for a list of colleges and more on our sampling methodology.
  16. Scott Jaschik, “Rally at Colby Against Racist Yik Yak Comments,” Inside HIgher Ed, April 17, 2015., accessed 30 May 2015.
  17. Sims Kline, “The librarian as ethnographer: an interview with David Green,” C&RL 74, no. 9 (2013): 488-491,
  18. Erving Goffman, The Presentation of Self in Everyday Life (New York: Doubleday Anchor Books, 1959).
  19. Kaitlin Mulhere, “Yik Yak Take Back: professors turn Yik Yak into happy space,” Inside Higher Ed (2014), (accessed 15 February 2015).

Ed Summers: Assemblages of Appraisal

planet code4lib - Wed, 2017-07-12 04:00

I had the opportunity to put together a poster for AERI this year. The poster presents a paper that I recently gave at CSCW (Summers & Punzalan, 2017). Creating it was a surprisingly useful process of distilling the paper to its essentials while re-presenting it visually. It occurred to me that the poster session audience and the typical web audience have something in common: limited attention. So I reworked the poster content here as a blog post to try to make my research findings a bit more accessible.

Even after over 20 years of active web archiving we know surprisingly little about how archivists appraise and select web content for preservation. Since we can’t keep it all, how we decide what to keep from the web is certain to shape the historical record (Cook, 2011). In this context, we ask the following research questions:

  1. How are archivists deciding what to collect from the web?

  2. How do technologies for web archiving figure in their appraisal decisions?

  3. Are there opportunities to design more useful systems for the appraisal of content for web archives?


To answer these questions I conducted a series ethnographic interviews with 29 individuals involved in the selection of web content. Participants include web archivists as well as researchers, managers, local government employees, volunteers, social activists, and entrepreneurs. The field notes from these interviews were analyzed using inductive thematic analysis.

Analysis began with reading all the field notes together, followed by line by line coding. While coding was done without reference to an explicit theoretical framework, it was guided by an interest in understanding archival appraisal as a sociotechnical and algorithmic system (Botticelli, 2000 ; Kitchin, 2016).


Coding and analysis surfaced six interconnected and interdependent themes that fell into two categories, the social and the technical, which are illustrated here in green and yellow respectively.

Appraisal in the context of web archiving is a complex interplay between the following:

Crawl Modalities: The selection strategies designed into tools and chosen by archivists in their work: domains, websites, documents, topics, and events.

Information Structures: Specific formations of web content that archivists interacted with during appraisal: hierarchies, networks, streams, and lists.

Tools: Configurations of tools that were used: archiving services, storage, spreadsheets, email, social media, content management systems.

People: Field archivists, managers, technicians, journalists, volunteers, software developers, groups (activists, professional), and institutions.

Time: How long to collect, how often to collect, how quickly web content needed to be gathered, perceptions of change in content.

Money: Grants from foundations and agencies to support collection activities, staffing, subscription fees, relationship between money and storage.


The findings highlighted sites of breakdown that are illustrated by the red lines in the thematic diagram. These breakdowns are examples of infrastructural inversion (Bowker & Star, 2000), or sites where the infrastructure of web archiving became legible.

Breakdowns between People and Tools were seen in the use of external applications such as email, spreadsheets and forms to provide missing communication features for documenting provenance and appraisal decisions.

Breakdowns between Crawl Modalities, Information Structures and Tools were also evident when archivists improvised communication tools to coordinate selection decisions when geopolitical boundaries complicated collection policies.

Breakdowns in Money, Crawl Modalities and Information Structures occurred when archivists could not determine how much it would cost to archive a website, and attempted to estimate the size of websites.

Appraisal decisions depend on visualizations of the material archive.

While our chosen research methodology and findings do not suggest specific implications for design (Dourish & Bell, 2011) they do highlight rich sites for for repair work as well as improvisational and participatory design (Jackson, 2014).


Thank you to Ricky Punzalan for much guidance during the planning and execution of the study. Leah Findlater and Jessica Vitak also helped in the selection of research methods. Nicholas Taylor, Jess Ogden and Samantha Abrams provided lots of useful feedback on early drafts, as well as pointers into the literature that were extremely helpful.

I also want to thank the Maryland Institute for Technology in the Humanities and the Documenting the Now project (funded by the Mellon Foundation) who provided generous support for this research. My most heartfelt thanks are reserved for the members of the web archiving community who shared their time, expertise and wisdom with me.

Noun Project images by Nirbhay, il Capitano, Creative Stall, Setyo Ari Wibowo, Agni, and Shuaib Usman Yusuf.


Botticelli, P. (2000). Records appraisal in network organizations. Archivaria, 1(49), 161–191.

Bowker, G. C., & Star, S. L. (2000). Sorting things out: Classification and its consequences. MIT Press.

Cook, T. (2011). We are what we keep; we keep what we are: Archival appraisal past, present and future. Journal of the Society of Archivists, 32(2), 173–189.

Dourish, P., & Bell, G. (2011). Divining a digital future: Mess and mythology in ubiquitous computing. MIT Press.

Jackson, S. J. (2014). Media technologies: Essays on communication, materiality and society. In P. Boczkowski & K. Foot (Eds.),. MIT Press. Retrieved from

Kitchin, R. (2016). Thinking critically about and researching algorithms. Information, Communication & Society, 20(1), 1–16.

Summers, E., & Punzalan, R. (2017). Bots, seeds and people: Web archives as infrastructure. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing (pp. 821–834). New York, NY, USA: ACM.

District Dispatch: Ready to Code Faculty get ready to go at ALA Annual!

planet code4lib - Tue, 2017-07-11 20:31

During ALA’s recent Annual Conference in Chicago, Libraries Ready to Code (RtC) Faculty Fellows and Phase II project team met in person for the first time and for a full day and a half workshop during ALA’s Annual Conference in June. The purpose: to get deep into defining computational thinking in a way that resonates with the library community, parsing out RtC concepts and deconstructing faculty syllabi with these things in mind.

Ready to Code Phase II Faculty Fellows and project team met at Google Chicago during ALA’s Annual Conference in June. Photo credit: Emily Wagner

ALA’s Office for Information Technology Policy (OITP) launched Phase II of the RtC project along with our partner, Google, Inc., in January 2017. Phase II focuses on Library and Information Science graduate programs and consists of a faculty cohort of six RtC Fellows that will redesign one of their current technology/media courses based on RtC concepts (i.e., increasing access and exposure to CS, changing perceptions of who does CS, and connecting CS to youth interests or CS+X). Faculty will pilot the redesigned courses at their institutions this fall.

RtC Faculty Fellows are: Dr. Colette Drouillard, Valdosta State University (GA); Dr. Melissa Johnston, University of West Georgia; Dr. Rachel Magee, University of Illinois at Urbana-Champaign; Dr. Jennifer Moore, Texas Woman’s University; Dr. Joe Sanchez, City University of New York; and Dr. Natalie Taylor, University of South Florida. Phase II RtC project team members are: Marijke Visser (OITP), Linda Braun (Librarians & Educators Online), Mega Subramaniam (University of Maryland) and Caitlin Martin (Stanford University).

Ready to Code Phase II Faculty Fellows brainstormed content for LIS course syllabi with sticky notes and marker scribbles. Photo credit: Caitlin Martin

RtC Faculty Fellow Rachel Magee describes the workshop this way:

Attending the Ready to Code Workshop at this year’s ALA Annual Meeting was a unique opportunity to collaborate with other faculty who teach “pre-service librarians,” or students currently completing library school. Our group included professors from all over the country, and we were able to work together to develop our understandings of computational thinking and brainstorm ways to incorporate it into classes for students specifically focused on youth services.

Our classes range from in-person courses to online classes that meet both asynchronously and synchronously. We’re all focused on youth services broadly, but each class has its own flavor. My course is built on service learning and requires students to volunteer in an organization that serves or supports youth. Bringing Ready to Code concepts into this course will include in-depth discussions of ways these organizations engage with computational thinking, and give students the opportunity to plan these kinds of programs themselves.

At the end of the workshop, Fellows not only left with a framework and timeline for redesigning their syllabi, they left with a strong level of commitment to the importance of their work as RtC Fellows. Throughout the summer, the Fellows will continue to connect with their cohort colleagues as they fine tune the approaches they will take in embedding RtC into their courses.

While the end of the fall semester seems like a ways off, Phase II will culminate in graduate level course models that equip MLIS students to deliver coding programs through public and school libraries that foster computational thinking skills among the nation’s youth. Well-trained MLIS graduates will enable libraries around the country to broaden and diversify access to computer science education. Faculty Fellows will share revised syllabi and course models with colleagues across the LIS community and serve as ambassadors to encourage other faculty to embed RtC concepts in curriculum.

Additional information is available on the Libraries Ready to Code website.


The post Ready to Code Faculty get ready to go at ALA Annual! appeared first on District Dispatch.

District Dispatch: Protecting public access to earth science information

planet code4lib - Tue, 2017-07-11 15:32

This week, ALA sent a letter calling for continued funding for the U.S. Geological Survey (USGS) Library and public access to its collections and services. 

USGS is a scientific agency within the Interior Department. The USGS Library bills itself as “largest earth science library in the world.”

President Trump’s budget proposes to cut $137.8 million from USGS, a 13% reduction from the current year. Out of that total, $3 million would come from the USGS Library – reportedly, a 52% cut. Such a drastic and disproportionate cut would close at least three, and potentially all four, of the library’s locations, and would eliminate public access to its collections.

Implementing these proposed library cuts would be penny wise and pound foolish. Investments in the USGS Library provide at least a 7:1 return, according to a group of trade and science organizations. Industries ranging from energy to mining to insurance widely use the maps, publications, and other resources that the USGS Library provides.

The House Interior Appropriations subcommittee is scheduled to begin consideration of its funding bill for USGS this week, with the Senate expected to follow at a later date. Amidst the larger “Fight for Libraries!” campaign to preserve federal funding for libraries, we hope that Congress will also remember the value of the USGS Library to America’s economic and scientific enterprise.

The post Protecting public access to earth science information appeared first on District Dispatch.

Open Knowledge Foundation: Are Indian laws really ‘open’?

planet code4lib - Tue, 2017-07-11 14:07

Opening up India’s laws – the journey of, a non-profit legal tech organization in India

India seems to fare reasonably well on ‘open data’ and ‘open government’ indicators.  However its abysmal record with respect to enforcement of rights and its civil and criminal justice systems points towards an inability to use this information meaningfully and for the purpose of actual legal empowerment. Nyaaya, a new legal tech organization in India, is attempting to make an intervention by applying standard legislative markup tools for the first time in the country and by developing plain language explainers and guides.

Surveys conducted in other jurisdictions point towards a direct relationship between people’s awareness of the law and legal services and problem resolution strategies. For example, consider the case of a senior citizen who has been abandoned by her children and needs financial help in supporting herself. There is more than one law she can look at to address her problem in India – provisions in the criminal procedure code, religious laws of certain communities or a special law on the maintenance of senior citizens. Even to take the first step of approaching a lawyer, she needs to have already known that she has a right to maintenance under various laws or be able to easily find this information. Assuming she’s taken this first step, the senior citizen is very likely to be misinformed or not be presented with her full range of options given the poor quality of legal representation one is very likely to find in most Indian courts. Access to accurate and easy-to-comprehend legal information is a critical enabler in the realization of various constitutional and legal rights.

The Global Open Data Index 2016, published by Open Knowledge International, describes India’s national laws as 45% ‘open’. Yet, as we discuss in this post, the way in which these laws have been made available do not allow for citizens to extract even simple legal information easily.

The central government in India in its effort to publish laws digitally put together a portal known as IndiaCode in the mid 1990s. The design and search functions of this portal were rudimentary – the laws are published without more recent amendments and in completely unreadable formats. For example, the image below indicates how the primary and most important criminal law of the country (i.e, the Indian Penal Code, 1860) continues to be published on the portal.

India has a quasi-federal structure – laws are issued by the central government and its various states. At the state level, there are only a few state governments or courts which have actually made complete sets of their laws available online. Even then, most states continue to publish laws in formats which are not machine readable. For example, the eastern state of Odisha publishes its laws in this format:

A third very important body of statutory law in the country, is subordinate law in the form of rules, regulations, circulars, notifications and guidelines. These are again published in an ad-hoc manner by both central and state governments. While the Gazette published by the central government has published more recent rules and regulations, these again take the form of PDFs and very often scanned PDFs.

These different resources suffer from numerous problems including the fact that they are not searchable, updated to reflect amendments, hyperlinked or published in user-friendly formats. Consider one such problem – the lack of updated or consolidated laws make the laws unreliable for use by laypersons who are generally unaware of the fact that they must look at a law with all amendments to understand the position of law as it stands today. The laws passed by the Central Government can be considered as some of the most important laws regulating a person’s life in India – even these were unavailable in a consolidated version until 2016. A decision by the Chief Information Commission (authority set up under the freedom of information law in India) in 2015 recognized the responsibility of the government in making consolidated versions of the law available. Following this decision, the Central Government has, as of date, made most laws between 1947 and 2016 available on IndiaCode.

However, given that the government has approached this exercise without giving much thought to best practices and without actually being interested in ‘opening’ laws, in the same way that it has approached opening other forms of data, this manually, at a slow pace with the output taking the form of PDFs. Recognizing that this approach is not sustainable in the long run, the Delhi High Court has very recently directed the government to consider developing technical standards in the drafting and publication of laws.

The unavailability of laws passed by different types of governments, the inaccessible format of the available documents and the text of the law itself has essentially allowed the understanding of laws and legal information to remain the privilege of a few (lawyers and judges). Locking up of laws in this manner contributes significantly to the alienation of the justice system and public services from the public.

To understand how we could use laws and legal information in bridging this gap, a small team of lawyers and technologists got together with the support of a visionary philanthropist in 2016 to form It was started with the simple goal of being a single resource for national and state laws along with plain language explainers. Nyaaya is a digital resource that is trying to bridge the gap between the availability of laws and legal literacy by making laws machine readable and by providing simple legal information to answer simple legal queries as well as complex legal processes.

It aims to be a single resource where one can connect the dots between different types of laws so that it can be used fruitfully in understanding their rights under the law. Our constant endeavour is to use design principles to connect the dots between unstructured and disaggregate legal information in different forms such as Acts, Rules, Regulations and Notifications. While the linking of legal information is currently being done by an internal team, we are also building systems which will allow for external volunteers and contributors to make the connections and create content.

A key component in this integrated platform will be presentation and tagging of annotations to individual legal provisions. These annotations can help not only serve the purpose of an explainer or commentary for the section, but also help in understanding the interplay between the primary law with subordinate legislation and state amendments to federal laws (a unique form of statutory instrument in India).

To make legal information consumable and empowering, the information must also be machine readable. Making laws machine readable can monumentally improve the users’ search for accurate legal information to their queries. Going back to the example, technology that is able to identify maintenance provisions in the gamut of Indian laws will deliver faster and more accurate results for the senior citizen who is using the internet as her first source of information. Nyaaya uses Indigo a tool developed by a civic-tech organisation in South Africa, OpenUp in order to capture and manage laws in Akoma-Ntoso. Akoma-Ntoso is a popular XML schema for describing legislative documents. The laws marked up in Akoma-Ntoso which are published on Nyaaya are available on Github here.

Example from the Guide to Marriage, Divorce and Maintenance

The biggest challenge when it comes to making laws machine readable is parsing and extraction of laws. The long legacy of private publishing has continued to influence the presentation of laws being published online. Government bodies publish laws in the manner in which they are extracted – as scanned PDFs (which are highly unstructured data). Further, the traditional structure of marginal notes and footnotes (indicating amendments) make it difficult for machines to identify the components of a law correctly and also makes publishing of laws at different points of time (prior to and after an amendment) next to impossible.  

India, being a country with 29 states and 7 union territories, there is an entire body of law which haven’t even been published digitally, let alone being published in machine-readable and open formats. As a country, open data enthusiasts in India have several things going their way – a government which is (at least superficially) enthusiastic about embracing technology, a robust freedom of information law and a growing open data movement. However, opening up of laws is only a very recent and new focus for a limited section of civil society and government.

While the initial inspiration behind this initiative was to make laws accessible to laypersons, our experience this last year has helped us realize how critical this is to access to justice and public services. The journey of having to re-imagine laws and legal information this last year has also helped us realize that the need of the hour is developing an India-specific legislative mark-up language. We’re smitten by the transformative potential that a combination of laws, design and technology offers and hope to use the next year to make giant leaps in the manner simple and useful legal information is delivered to the general public (including through regional languages!)


Zotero: New Features for Chrome and Safari Connectors

planet code4lib - Mon, 2017-07-10 23:51

We are excited to announce major improvements to the Zotero Connectors for Chrome and Safari.


The Zotero Connector for Chrome now includes functionality that was previously available only in Zotero for Firefox.

Automatic Institutional Proxy Detection

Many institutions provide a way to access electronic resources while you are off-campus by signing in to a web-based proxy system. The Zotero Connector for Chrome makes this more convenient by automatically detecting your institutional proxy. Once you’ve accessed a site through the proxy, the connector will automatically redirect future requests to that site through the proxy (e.g., if you open a link to, you’ll be automatically redirected to

Proxy detection does not require manual configuration. You can disable or customize it from the connector preferences.

Style Installation and Automatic File Importing

Zotero for Firefox has long included support for installing citation styles and importing reference manager files such as BibTeX or RIS when opened in Firefox. This functionality is now supported by the Zotero Connector for Chrome.

Clicking on a link to a CSL file will display a prompt to install the style into Zotero. Clicking on a BibTeX or RIS file that is served correctly by the website will prompt to import its data into Zotero. If you choose ‘Cancel’, you can download the file normally.

Note that this feature requires Zotero 5.0.


If you have the latest version of Chrome installed, you should be automatically updated to version 5.0.5 or later of the Zotero Connector. If you’re not currently using Zotero with Chrome, you can install the extension from the downloads page. Make sure you’ve installed Zotero 5.0 as well for the best experience.


We’ve brought several features to Safari that were previously available only in Chrome and Firefox.

Streamlined Saving

Previously, on pages where Zotero could not detect any high-quality data, the Zotero save button was inactive, and you could save a basic webpage item and snapshot by right-clicking on the page and selecting “Save Page to Zotero”. In the latest version of the Zotero Connector for Safari, it’s possible to save any page to Zotero with the click of a button.

When high-quality data is available, the button will show the same icon as before: book, newspaper, etc. On all other pages, you’ll see an empty page icon, and clicking on it will create a basic webpage item and snapshot in Zotero. Hovering over the icon will tell you which translator, if any, Zotero would use to save the page.

One-click PDF saving

You can also now save PDFs to Zotero with a single click.

Previously, saving a PDF to Zotero from Safari required dragging from the address bar into Zotero or saving the PDF to disk and adding it to Zotero manually. Now, when you’re viewing a PDF in Safari, simply click the toolbar button to save the PDF as a top-level attachment in Zotero, after which you can use “Retrieve Metadata from PDF” or “Create Parent Item” to create a full bibliographic item. Note that, when possible, it’s still preferable to save from a webpage with full metadata.

Secondary Translators

On some webpages, Zotero can save data using multiple translators, but up until now it hasn’t been possible to choose a secondary translator from Safari. You can now long-press the Zotero button to see additional options for saving from the current page, including saving as a regular webpage instead of using a translator.

Secondary translators may provide different data for the page itself or data for other sources referenced in the page (for example, cited references on Wikipedia, as in the example above).


You can get the new version of the Safari extension from the downloads page or by updating to 5.0.0 or later from the Extensions pane of the Safari preferences. Make sure you’ve installed Zotero 5.0 as well for the best experience.

Zotero: A Unified Zotero Experience

planet code4lib - Mon, 2017-07-10 23:50

Since the introduction of Zotero Standalone in 2011, Zotero users have had two versions to choose from: the original Firefox extension, Zotero for Firefox, which provides deep integration into the Firefox user interface, and Zotero Standalone, which runs as a separate program and can be used with any browser.

Starting with the release of Zotero 5.0, Zotero for Firefox and Zotero Standalone have been replaced by a single standalone application for users of all browsers. A new Zotero Connector for Firefox, similar to the extensions available for Chrome and Safari, allows saving to the Zotero application or in a single click.

If you’re using Zotero for Firefox, you’ll be automatically upgraded to the new Zotero Connector for Firefox in the next few weeks, and you’ll need to install the standalone Zotero application — either 4.0 or 5.0 — to access your data going forward. The Zotero application offers the same interface and runs off the same database as Zotero for Firefox, so you’ll be able to pick up right where you left off.

If you’re already using Zotero Standalone, you can continue using Zotero as you were before — with some new features available in the Zotero Connector for your browser.

We know that many people rely on Zotero for Firefox, so we wanted to take a moment to explain why this is happening and what it means for the future of Zotero.

The primary reason for this change is technical, and, unfortunately, out of our control: Mozilla is discontinuing the powerful extension framework on which Zotero for Firefox is based in favor of WebExtensions, a new framework based on the Chrome extension model. WebExtensions offer many advantages, including improved browser performance, improved security, a fine-grained permissions model, and the ability to create a single extension that runs in both Chrome and Firefox (as we’ve done with the Zotero Connector). The trade-off is that it’s no longer possible to create an extension like Zotero for Firefox that affects the browser in more profound ways (adding an entirely new pane or tab, creating native-looking windows, adding options to the open/save dialog, etc.) or that uses low-level features of Firefox to manage a database, access the filesystem, or run local programs.

But while we’re not able to continue offering the full version of Zotero for Firefox, we think that this change will ultimately benefit the Zotero ecosystem going forward. The Zotero interface will no longer need to fit into a small browser pane, allowing for a much richer user experience. Offering a single version will mean that documentation and instruction can be greatly simplified. And most importantly, Zotero developers will be able to spend less time maintaining separate versions and responding to Firefox changes and more time improving Zotero for everyone.

In the lead-up to Zotero 5.0, we’ve worked hard to add features that were previously available only in Zotero for Firefox, such as institutional proxy support, to the existing Chrome and Safari connectors and the new Zotero Connector for Firefox. Those changes are documented in a separate post. As Mozilla and Chrome add additional capabilities to their extension frameworks, we’ll continue to add new features to the connectors.

When we launched Zotero a little over 10 years ago, the Firefox extension framework allowed us to create a new type of research tool that lived where people worked, in the browser itself. Today, we think we can offer the best of both worlds — unparalleled extensions for every browser, all connecting to a powerful standalone app. While it’s tough to say goodbye to the original version of Zotero, we couldn’t be more excited about Zotero’s future, beginning with Zotero 5.0. Here’s to the next 10 years.

Zotero: Zotero 5.0

planet code4lib - Mon, 2017-07-10 23:50

We’re delighted to announce the release of Zotero 5.0, the next major version of Zotero and the biggest upgrade in Zotero’s history. Zotero 5.0 brings many new features, as well as a huge number of changes under the hood to improve Zotero’s responsiveness and stability and lay the groundwork for other big developments coming soon. We’ll be highlighting some of the new features in upcoming posts, but for now see the changelog for more details on all that’s new.

Download Zotero 5.0 now to get started with the new version.

If you’re already using Zotero Standalone 4.0, you’ll be offered the choice to upgrade within the next few weeks, or you can update now via Help -> Check for Updates. (Windows users may wish to reinstall from the download page instead to rename the program from “Zotero Standalone” to “Zotero”.) Your database will be automatically upgraded to work with the new version.

If you’re using Zotero 4.0 for Firefox, be aware that Zotero 5.0 now runs only as a standalone application, and a new Zotero Connector for Firefox replaces the full Firefox extension. We’ve written a separate post explaining this change. Existing Zotero for Firefox users will be automatically upgraded to the Zotero Connector for Firefox within the next few weeks and will need to install Zotero 5.0 to continue accessing their Zotero data locally. If you install Zotero 5.0 now, be sure to install the Zotero Connector for Firefox from the download page as well.

Thanks to everyone from the Zotero community who has helped test Zotero 5.0 over the last year and get it ready for today. We’re excited to finally share it with the world!

District Dispatch: First key vote on FY18 library funding set for July 13

planet code4lib - Mon, 2017-07-10 23:06

Earlier today, the House Appropriations Subcommittee that oversees library funding announced that it will meet and vote this Thursday afternoon, July 13 at 4:30 p.m., to consider a key vote on a large spending bill that will save, trim or totally wipe out funding for the Institute of Museum and Library Services (IMLS) and funding for the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL). That means that if you happen to live in one of the 13 congressional districts (in 12 states) represented by a member of this powerful and pivotal Subcommittee, you also have incredible power for the next 48 hours to help determine what happens to IMLS, LSTA and IAL funding for FY 2018.

This Thursday will be the first key vote in the House of Representatives related to federal library funding in FY2018. Photo credit: CT Health Notes blog

Please, use that power now. To find out if you’re a key constituent, enter your zip code at ALA’s Legislative Action Center. If you are, the Center is set to help you quickly send an email to or Tweet at your Representative. (If you’re not in one of these 13 congressional districts, please don’t be discouraged; we will definitely need you later!)

The message is simple: “Save IMLS. Give LSTA $186.6 million and IAL $27 million in FY18. Thank you!”

That’s it. There will be more votes in other committees later, but this first one could set the stage for all others. We have until Thursday afternoon to convince 13 key members of Congress to do the right thing. Now’s really the time to fight for libraries!

The post First key vote on FY18 library funding set for July 13 appeared first on District Dispatch.

Islandora: Islandora CLAW: Check out the MVP Sandbox!

planet code4lib - Mon, 2017-07-10 16:38

At Open Repositories 2017 in Brisbane, Australia, the Islandora project unveiled a brand new sandbox site where you can take Islandora CLAW for a test drive, with a few test collections and theming set up by our awesome Islandora UI Interest Group. Check it out at and login with:

  • User: sandbox
  • Pass: Cl@wS@ndb0x

This sandbox is, like Islandora CLAW itself, still something of a work in progress, so you can expect to see new functions and configurations appear over the next few weeks, and a lot of changes to come as we work with the community to determine development priorities post-MVP (for more background on that process, check out the CLAW FAQ). But for now, please come in and kick the tires and get a feel for Drupal 8 with Fedora 4. Please let us know about bugs, improvements, and new features you'd like to see by filing an issue in GitHub (not sure how? check out for instructions).

Some differences you'll notice from the old Islandora:

  • "Add Content" - just like you're making a Drupal page. There is no special menu to add new Islandora objects in CLAW. Add them from any part of the site and indicate the parent, instead of having to navigate to the collection you want the object created in.
  • Your objects are nodes. Don't like the standard collection view? Go ahead and use Drupal Views to build whatever you like.
  • No XML Form Builder. Want to edit your metadata forms? You can use the regular Drupal form builder for each content type to add and edit fields, manage displays, and even map fields to RDF.

A few known issues that aren't quite as we'd like them yet:

  • Derivative creation is still being developed, so you have to add your own thumbnails and medium-sized files.
  • You can only use jpegs for the MEDIUM_SIZE datastream. This is a quick fix, but it didn't quite make it into the sandbox, so expect to see this corrected in the near future.

We will periodically wipe and restore the sandbox, so don't put anything there that you want to keep. If you would like your own enduring copy of Islandora CLAW to play with, you can build your own virtual machine with the Islandora CLAW Vagrant.

You can see slides from the sessions where we launched the next phase of CLAW here:

Eric Hellman: Creative Works *Ascend* into the Public Domain

planet code4lib - Mon, 2017-07-10 16:26

It's a Wonderful Life, the movie, became a public domain work in 1975 when its copyright registration was not renewed. It had been a disappointment at the box office, but became a perennial favorite in the 80s as television stations began to play it (and play it again, and again) at Christmas time, partly because it was inexpensive content. Alas, copyright for the story it was based on, The Greatest Gift by Philip Van Doren Stern, HAD been renewed, and the movie was thus a derivative work on which royalties could be collected. In 1993, the owners of the story began to cash in on the film's popularity by enforcing their copyright on the story.

I learned about the resurrection of Wonderful Life from a talk by Krista Cox, Director of Public Policy Initiatives for ARL (Association of Research Libraries) during June's ALA Annual Conference. But I was struck by the way she described the movie's entry into the public domain. She said that it "fell into the public domain". I'd heard that phrase used before, and maybe used it myself. But why "fall"? Is the public domain somehow lower than the purgatory of being forgotten but locked into the service of a copyright owner? I don't think so. I think that when a work enters the public domain, it's fitting to say that it "ascends" into the public domain.

If you're still fighting this image in your head, consider this example: what happens when a copyright owner releases a poem from the chains of intellectual property? Does the poem drop to the floor, like a jug of milk? Or does it float into the sky, seen by everyone far and wide, and so hard to recapture?

It is a sad quirk of the current copyright regime that the life cycle of a creative work is yoked to the death of its creator. That seems wrong to me. Wouldn't it be better use the creator's birth date? We could then celebrate an author's birthday by giving their books the wings of an angel. Wouldn't that be a wonderful life?

Harvard Library Innovation Lab: LIL Talks: Comedy

planet code4lib - Mon, 2017-07-10 16:12

This is a guest post by our LIL interns — written by Zach Tan with help from Anna Bialas and Doyung Lee

This week, LIL’s resident comic (and staff member) Brett Johnson taught a room full of LIL staff, interns, and fellows the finer intricacies of stand up comedy, which included the construction of a set, joke writing, and the challenges and high points of the craft.

As one example, Brett showed and broke down multiple jokes into the core structure of setup and punch line (or, platform and dismount) for analysis. Additionally, we were also given an insight into the industry where we often take for granted the sheer amount of work, honing, and refining that goes into a set.

We also explored what it meant to be a comic, and how the immediacy of audience reaction and enjoyment means that stand up comedy is one of the only art forms with an extremely evident (and sometimes, brutal) line between success and failure.

Though the talk was littered with choice jokes and funny bits, we definitely came away with a refreshing look into some aspects of stand-up comedy that rarely goes noticed.


Subscribe to code4lib aggregator