You are here

Feed aggregator

Terry Reese: MarcEdit 7 visual styles: High Contrast

planet code4lib - Fri, 2017-07-14 15:50

An interesting request made while reviewing the Wireframes was if MarcEdit 7 could support a kind of high contrast, or “Dark” theme mode.  An Example of this would be Office:

Some people find this interface easier on the eyes, especially if you are working on a screen all day. 

Since MarcEdit utilizes its own GUI engine to handle font sizing, scaling, and styling – this seems like a pretty easy request.  So, I did some experimentation.  Here’s MarcEdit 7 using the conventional UI:

And here it is under the “high contrast” theme:

Since theming falls into general accessibility options, I’ve put this in the language section of the options:

However, I should point out that in MarcEdit 7, I will be changing this layout to include a dedicated setting area for Accessibility options, and this will likely move into that area.

I’m not sure this is an option that I’d personally use as the “Dark” theme or High Contrast isn’t my cup of tea, but with the new GUI engine added to MarcEdit 7 with the removal of XP support – supporting this option really took about 5 minutes to turn on.

Questions, comments?


D-Lib: RARD: The Related-Article Recommendation Dataset

planet code4lib - Fri, 2017-07-14 11:42
Article by Joeran Beel, Trinity College Dublin, Department of Computer Science, ADAPT Centre, Ireland; Zeljko Carevic and Johann Schaible, GESIS - Leibniz Institute for the Social Sciences, Germany; Gabor Neusch, Corvinus University of Budapest, Department of Information Systems, Hungary

D-Lib: Massive Newspaper Migration - Moving 22 Million Records from CONTENTdm to Solphal<

planet code4lib - Fri, 2017-07-14 11:42
Article by Alan Witkowski, Anna Neatrour, Jeremy Myntti and Brian McBride, J. Willard Marriott Library, University of Utah

D-Lib: The Best Tool for the Job: Revisiting an Open Source Library Project

planet code4lib - Fri, 2017-07-14 11:42
Article by David J. Williams and Kushal Hada, Queens College Libraries, CUNY, Queens, New York

D-Lib: Ensuring and Improving Information Quality for Earth Science Data and Products

planet code4lib - Fri, 2017-07-14 11:42
Article by Hampapuram Ramapriyan, Science Systems and Applications, Inc. and NASA Goddard Space Flight Center; Ge Peng, Cooperative Institute for Climate and Satellites-North Carolina, North Carolina State University and NOAA's National Centers for Environmental Information; David Moroni, Jet Propulsion Laboratory, California Institute of Technology; Chung-Lin Shie, NASA Goddard Space Flight Center and University of Maryland, Baltimore County

D-Lib: Trends in Digital Preservation Capacity and Practice: Results from the 2nd Bi-annual National Digital Stewardship Alliance Storage Survey

planet code4lib - Fri, 2017-07-14 11:42
Article by Michelle Gallinger, Gallinger Consulting; Jefferson Bailey, Internet Archive; Karen Cariani, WGBH Media Library and Archives; Trevor Owens, Institute of Museum and Library Services; Micah Altman, MIT Libraries

D-Lib: The End of an Era

planet code4lib - Fri, 2017-07-14 11:42
Editorial by Laurence Lannom, CNRI

D-Lib: Explorations of a Very-large-screen Digital Library Interface

planet code4lib - Fri, 2017-07-14 11:42
Article by Alex Dolski, Independent Consultant; Cory Lampert and Kee Choi, University of Nevada, Las Vegas Libraries

District Dispatch: Library funding bill passes Labor HHS

planet code4lib - Thu, 2017-07-13 23:08

In response to today’s House subcommittee vote, ALA President Jim Neal sent ALA members the following update:

(AP Photo/Carolyn Kaster)


I am pleased to report that, this evening, the House Appropriations subcommittee that deals with library funding (Labor, Health & Human Services, Education and Related Agencies) voted to recommend level funding in FY2018 for the Institute of Museum and Library Services (IMLS, $231 million), likely including $183 million for the Library Services and Technology Act, as well as $27 million for the Innovative Approaches to Literacy program.

Four months ago, President Trump announced that he wanted to eliminate IMLS and federal funding for libraries. Since then, all of us have been communicating with our members of Congress about the value of libraries. This evening’s Subcommittee vote, one important step in the lengthy congressional appropriations process, shows that our elected officials are listening to us and recognize libraries’ importance in the communities they represent. We are grateful to the leaders of the Subcommittee, Chairman Tom Cole (R-OK-4) and Ranking Member Rosa DeLauro (D-CT-3), and all Subcommittee members, for their support.

We have not saved FY18 federal library funding yet. Hurdles can arise at each stage of the appropriations process, which will continue into the fall. But the fact that federal library funding was not cut at this particular stage shows what can be accomplished when ALA members work together. We expect the full House Appropriations Committee to vote on the subcommittee bills as early as next Wednesday, July 19. I will send an update as soon as we have the results of the full committee’s actions.

In the meantime, I encourage you to stay informed and stay involved. Libraries and the millions of people we serve are in a better position today because of your advocacy.

Thank you,

Jim Neal

The post Library funding bill passes Labor HHS appeared first on District Dispatch.

Jonathan Rochkind: on hooking into sufia/hyrax after file has been uploaded

planet code4lib - Thu, 2017-07-13 15:53


Our app (not yet publicly accessible) is still running on sufia 7.3. (A digital repository framework based on Rails, also known in other versions or other drawings of lines as hydra, samvera, and hyrax).

I had a need to hook into the point after a file has been added to fedora, to do some post-processing at that point.

(Specifically, we are trying to run a riiif instance on another server, without a shared file system (shared FS are expensive and/or tricky on AWS). So, the riiif server needs to copy the original image asset down from fedora. Since our original images are uncompressed TIFFs that average around 100MB, this is somewhat slow, and we want to have the riiif server “pre-load” at least the originals, if not the derivatives it will create. So after a new image is uploaded, we want to ‘ping’ the riiif server with an info request, causing it to download the original, so it’s there waiting for conversion requests, and at least it won’t have to do that. But it can’t pull down the file until it’s in fedora, so we need to wait until after fedora has it to ping. phew.)

Here are the cooperating objects in Sufia 7.3 that lead to actual ingest in Fedora. As far as I can tell. Much thanks to @jcoyne for giving me some pointers as to where to look to start figuring this out.

Keep in mind that I believe “actor” is just hydra/samvera’s name for a service object involved in handling ‘update a persisted thing’. Don’t get it confused with the concurrency notion of an ‘actor’, it’s just an ordinary fairly simple ruby object (although it can and often does queue up an ActiveJob for further processing).

The sufia default actor stack at ActorFactory includes the Sufia::CreateWithFilesActor.


  • AttachFilesToWork job does some stuff, but then calls out to a CurationConcerns::Actors::FileSetActor#create_content. (we are using curation_concerns 1.7.7 with sufia 7.3) — At least if it was a direct file upload (I think is what this means). If the file was a `CarrierWave::Storage::Fog::File` (not totally sure in what circumstances it would be), it instead kicks off an ImportUrlJob.  But we’ll ignore that for now, I think the FileSetActor is the one my codepath is following. 





  • We are using hydra-works 0.16.0. AddFileToFileSet I believe actually finishes things off synchronously without calling out to anything else related to ‘get this thing into fedora’. Although I don’t really totally understand what the code does, honestly.
    • It does call out to Hydra::PCDM::AddTypeToFile, which is confusingly defined in a file called add_type.rb, not add_type_to_file.rb. (I’m curious how that doesn’t break things terribly, but didn’t look into it).


So in summary, we have six fairly cooperating objects involved in following the code path of “how does a file actually get added to fedora”.  They go across 3-4 different gems (sufia, curation_concerns, hydra-works, and maybe hydra-pcdm, although that one might not be relevant here). Some of the classes involved inherit from, mix-in, or have references to classes from other gems. The path involves at least two (sometimes more in some paths?) bg jobs — a bg job that queues up another bg job (and maybe more).

That’s just trying to follow the path involved in “get this uploaded file into fedora”, some  of those cooperating objects also call out to other cooperating objects (and maybe queue bg jobs?) to do other things, involving a half dozenish additional cooperating objects and maybe one or two more gem dependencies, but I didn’t trace those, this was enough!

I’m not certain how much this changed in hyrax (1.0 or 2.0), at the very least there’d be one fewer gem dependency involved (since Sufia and CurationConcerns were combined into Hyrax). But I kind of ran out of steam for compare and contrast here, although it would be good to prepare for the future with whatever I do.

Oh yeah, what was I trying to do again?

Hook into the point “after the thing has been successfully ingested in fedora” and put some custom code there.

So… I guess…  that would be hooking into the ::IngestFileJob (located in CurationConcerns), and doing something after it’s completed. It might be nice to use the ActiveJob#after_perform hook to this.  I actually hadn’t known about that callback, haven’t used it before — we’d need to get at least the file_set arg passed into it, which the docs say maybe you can get from the passed-in job.arguments?  That’s a weird way to do things in ruby (why aren’t ActiveJob’s instances with their state as ordinary state? I dunno), but okay! Or, of course we could just monkey-patch override-and-call-super on perform to get a hook.

Or we could maybe hook into Hydra::Works::AddFileToFileSet instead, I think that does the actual work. There’s no callbacks there, so that’d just be monkey-patch-and-call-super on #call, I guess.

This definitely seems a little bit risky, for a couple different reasons.

  • There’s at least one place where a potentially different path is followed, if you’re uploading a file that ends up as a CarrierWave::Storage::Fog::File instead of a CarrierWave::SanitizedFile.  Maybe there are more I missed? So configuration or behavior changes in the app might cause my hook to be ignored, at least in some cases.


  • Forward-compatibility seems unreliable. Will this complicated graph of cooperating instances get refactored?  Has it already in future versions of Hyrax? If it gets refactored, will it mean the object I hook into no longer exists (not even with a different namespace/name), or exists but isn’t called in the same way?  In some of those failure modes, it might be an entirely silent failure where no error is ever raised, my code I’m trying to insert just never gets called. Which is sad. (Sure, one could try to write a spec for this to warn you…  think about how you’d do that. I still am.)  Between IngestFileJob and AddFileToFileSet, is one ‘safer’ to hook into than the other? Hard to say. If I did research in hyrax master branch, it might give me some clues.

I guess I’ll probably still do one of these things, or find another way around it. (A colleague suggested there might be an entirely different place to hook into instead, not the ‘actor stack’, but maybe in other code around the controller’s update action).

What are the lessons?

I don’t mean to cast any aspersions on the people who put in a lot of work, very well-intentioned work, conscientious work, to get hydra/samvera/sufia/hyrax where it is, being used by lots of institutions. I don’t mean to say that I could or would have done differently if I had been there when this code was written — I don’t know that I could or would have.

And, unfortunately, I’m not saying I have much idea of what changes to make to this architecture now, in the present environment, with regards to backwards compat, with regards to the fact that I’m still on code one or two major versions (and a name change) behind current development (which makes the local benefit from any work I put into careful upstream PR’s a lot more delayed, for a lot more work; I’m not alone here, there’s a lot of dispersion in what versions of these shared dependencies people are using, which adds a lot of cost to our shared development).  I don’t really! My brain is pretty tired after investigating what it’s already doing. Trying to make a shared architecture which is easily customizable like this is hard, no ways around it.  (ActiveSupport::Callbacks are trying to do something pretty analogous to the ‘actor stack’, and are one of the most maligned parts of Rails).

But I don’t think that should stop us from some evaluation.  Going forward making architecture that works well for us is aided immensely by understanding what has worked out how in what we’ve done before.

If the point of the “Actor stack” was to make it easy/easier to customize code in a safe/reliable way (meaning reasonably forward-compatible)–and I believe it was–I’m not sure it can be considered a success. We gotta start with acknowledging that.

Is it better than what it replaced?  I’m not sure, I wasn’t there for what it replaced. It’s probably harder to modify in the shared codebase going forward than the presumably simpler thing it replaced though… I can say I’d personally much rather have just one or two methods, or one ActiveJobs, that I just hackily monkey-patch to do what I want, that if it breaks in a future version will break in a simple way, or one that takes less time and brain to figure out what’s going on anyway. That wouldn’t be a great architecture, but I’d prefer it to what’s there now, I think.  Of course, it’s a pendulum, and the grass is always greener, if I had that, I’d probably be wanting something cleaner, and maybe arrive at something like the ‘actor stack’ — but now we’re all here now with what we’ve got, so we can at least consider that this may have gone in some unuseful directions.

What are those unuseful directions?  I think, not just in the actor stack, but in many parts of hydra, there’s an ethos that breaking things into many very small single-purpose classes/instances is the way to go, then wiring them all together.  Ideally with lots of dependency injection so you can switch in em and out.  This reminds me of what people often satirize and demonize in stereotypical maligned Java community architecture, and there’s a reason it’s satirized and demonized. It doesn’t… quite work out.

To pull this off well — especially in shared library/gem codebase, which I think has different considerations than a local bespoke codebase, mainly that API stability is more important because you can’t just search-and-replace all consumers in one codebase when API changes — you’ve got to have fairly stable APIs, which are also consistent and easily comprehensible and semantically reasonable.   So you can replace or modify one part, and have some confidence you know what it’s doing, when it will be called, and that it will keep doing this for at least a few months of future versions. To have fairly stable and comfortable APIs, you need to actually design them carefully, and think about developer use cases. How are developers intended to intervene in here to customize? And you’ve got to document those. And those use cases also give you something to evaluate later — did it work for those use cases?

It’s just not borne out by experience that if you make everything into as small single-purpose classes as possible and throw them all together, you’ll get an architecture which is infinitely customizable. You’ve got to think about the big picture. Simplicity matters, but simplicity of the architecture may be more important than simplicity of the individual classes. Simplicity of the API is definitely more important than simplicity of internal non-public implementation. 

When in doubt if you’re not sure you’ve got a solid stable comfortable API,  fewer cooperating classes with clearly defined interfaces may be preferable to  more classes that each only have a few lines. In this regard, rubocop-based development may steer us wrong, too much to the micro-, not enough to the forest.

To do this, you’ve got to be careful, and intentional, and think things through, and consider developer use cases, and maybe go slower and support fewer use cases.  Or you wind up with an architecture that not only does not easily support customization, but is very hard to change or improve. Cause there are so many interrelated coupled cooperating parts, and changing any of them requires changes to lots of them, and breaks lots of dependent code in local apps in the process. You can actually make forwards-compatible-safe code harder, not easier.

And this gets even worse when the cooperating objects in a data flow are spread accross multiple gems dependencies, as they often are in the hydra/samvera stack. If a change in one requires a change in another, now you’ve got dependency compatibility nightmares to deal with too. Making it even harder (rather than easier, as was the original goal) for existing users to upgrade to new versions of dependencies, as well as harder to maintain all these dependencies.  It’s a nice idea, small dependencies which can work together — but again, it only works if they have very stable and comfortable APIs.  Which again requires care and consideration of developer use cases. (Just as the Java community gives us a familiar cautionary lesson about over-architecture, I think the Javascript community gives us a familiar cautionary lesson about ‘dependency hell’. The path to code hell is often paved with good intentions).

The ‘actor stack’ is not the only place in hydra/samvera that suffers from some of these challenges, as I think most developers in the stack know.  It’s been suggested to me that one reason there’s been a lack of careful, considered, intentional architecture in the stack is because of pressure from institutions and managers to get things done, why are you spending so much time without new features?  (I know from personal experience this pressure, despite the best intentions, can be even stronger when working as a project-based contractor, and much of the stack was written by those in that circumstance).

If that’s true, that may be something that has to change. Either a change to those pressures — or resisting them by not doing rearchitectures under those conditions. If you don’t have time to do it carefully, it may be better not to commit the architectural change and new API at all.  Hack in what you need in your local app with monkey-patches or other local code instead. Counter-intuitively, this may not actually increase your maintenance burden or decrease your forward-compatibility!  Because the wrong architecture or the wrong abstractions can be much more costly than a simple hack, especially when put in a shared codebase. Once a few people have hacked it locally and seen how well it works for their use cases, you have a lot more evidence to abstract the right architecture from.

But it’s still hard!  Making a shared codebase that does powerful things, that works out of the box for basic use cases but is still customizable for common use cases, is hard. It’s not just us. I worked last year with spree/solidus, which has an analogous architectural position to hydra/samvera, also based on Rails, but in ecommerce instead of digital repositories. And it suffers from many of the same sorts of problems, even leading to the spree/solidus fork, where the solidus team thought they could do better… and they have… maybe… a little.  Heck, the challenges and setbacks of Rails itself can be considered similarly.

Taking account of this challenge may mean scaling back our aspirations a bit, and going slower.   It may not be realistic to think you can be all things to all people. It may not be realistic to think you can make something that can be customized safely by experienced developers and by non-developers just writing config files (that last one is a lot harder).  Every use case a participant or would-be participant has may not be able to be officially or comfortably supported by the codebase. Use cases and goals have to be identified, lines have to drawn. Which means there has to be a decision-making process for who and how they are drawn, re-drawn, and adjudicated too, whether that’s a single “benevolent dictator” person or institution like many open source projects have (for good or ill), or something else. (And it’s still hard to do that, it’s just that there’s no way around it).

And finally, a particularly touchy evaluation of all for the hydra/samvera project; but the hydra project is 5-7 years old, long enough to evaluate some basic premises. I’m talking about the twin closely related requirements which have been more or less assumed by the community for most of the project’s history:

1) That the stack has to be based on fedora/fcrepo, and

2) that the stack has to be based on native RDF/linked data, or even coupled to RDF/linked data at all.

I believe these were uncontroversial assumptions rather than entirely conscious decisions (edit 13 July, this may not be accurate and is a controversial thing to suggest among some who were around then. See also @barmintor’s response.), but I think it’s time to look back and wonder how well they’ve served us, and I’m not sure it’s well.  A flexible powerful out-of-the-box-app shared codebase is hard no matter what, and the RDF/fedora assumptions/requirements have made it a lot harder, with a lot more uncharted territory to traverse, best practices to be invented with little experience to go on, more challenging abstractions, less mature/reliable/performant components to work with.

I think a lot of the challenges and breakdowns of the stack are attributable to those basic requirements — I’m again really not blaming a lack of skill or competence of the developers (and certainly not to lack of good intentions!). Looking at the ‘actor stack’ in particular, it would need to do much simpler things if it was an ordinary ActiveRecord app with paperclip (or better yet shrine), it would be able to lean harder on mature shared-upstream paperclip/shrine to do common file handling operations, it would have a lot less code in it, and less code is always easier to architect and improve than more code. And meanwhile, the actually realized business/institutional/user benefits of these commitments — now after several+ years of work put into it — are still unclear to me.  If this is true or becomes consensual, and an evaluation of the fedora/rdf commitments and foundation does not look kindly upon them… where does that leave us, with what options?

Filed under: General

FOSS4Lib Upcoming Events: VuFind Summit 2017

planet code4lib - Thu, 2017-07-13 14:50
Date: Monday, October 9, 2017 - 08:00 to Tuesday, October 10, 2017 - 17:00Supports: VuFind

Last updated July 13, 2017. Created by Peter Murray on July 13, 2017.
Log in to edit this page.

VuFind Summit 2017 conference page

Open Knowledge Foundation: FutureTDM symposium: sharing project findings, policy guidelines and practitioner recommendations

planet code4lib - Thu, 2017-07-13 14:16

The FutureTDM project, in which Open Knowledge International participates, actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help improve the uptake of text and data mining (TDM) in Europe (read more). Last month, we held our FutureTDM Symposium at the International Data Science Conference 2017 in Salzburg, Austria. With the project drawing to a close, we shared the project findings and our first expert driven policy recommendations and practitioner guidelines. This blog report has been adapted from the original version on the FutureTDM blog.

The FutureTDM track at the International Data Science Conference 2017 started with a speech by Bernhard Jäger form SYNYO who did a brief introduction to the project and explained the purpose of the Symposium – bringing together policy makers and stakeholder groups to share with them FutureTDM’s findings on how to increase TDM uptake.

Introduction to the FutureTDM project from FutureTDM

This was followed by a keynote speech on the Economic Potential of Data Analytics by Jan Strycharz from Fundacja Projekt Polska, a FutureTDM project partner. It was estimated that automated (big) data and analytics – if developed properly – will bring over 200 B Euro to the European GDP by 2020. This means that algorithms (not to say robots) will be, then, responsible for 1.9% of the European GDP. You can read more on the TDM impact on economy in our report Trend analysis, future applications and economics of TDM.

Dealing with the legal bumps

The plenary session with keynote speeches was followed by the panel: Data Analytics and the Legal Landscape: Intellectual Property and Data Protection. As an introduction to this legal session Freyja van den Boom from Open Knowledge International presented our findings on the legal barriers to TDM uptake that mainly refer to type of content and applicable regime (IP or Data Protection). Having gathered evidence from the TDM community, FutureTDM has identified three types of barriers: uncertainty, fragmentation and restrictiveness and developed guidelines recommendation how to overcome them. We have summarised this in our awareness sheet Legal Barriers and Recommendations.

This was followed by the statements from the panelists: Prodromos Tsiavos (Onassis Cultural Centre/ IP Advisor) stressed the fact that with the recent changes in the European framework, the law faces significant issues and balancing the industrial interest is becoming necessary. He added that in order to initiate the uptake of the industry, a different approach is certainly needed because the industry will continue with license arrangements.

Duncan Campbell (John Wiley & Sons, Inc.) concentrated on Copyright and IP issues. How do we deal with all the knowledge created? How does the copyright rule has influence? He spoke about EU Commission Proposal and UK TDM exception – how to make an exception work?

Marie Timmermann (Science Europe) also focused on the TDM exception and its positive and negative sides. From the positive perspective, she views the fact that TDM exception moved from being optional to mandatory and it is not overridable. From the negative side she stated that the exception is very limited in scope. Startups or SMEs do not fall under this exception. Thus, Europe risks to lose promising researchers to other parts of the world.

This statement was also supported by Romy Sigl (AustrianStartups). She confirmed that anybody can created a startup today, but if startups are not supported by legislation, they move outwards to another country where more potential is foreseen.

The right to read is the right to mine

The next panel was devoted to an overview of FutureTDM case studies: Startups to Multinationals. Freyja van den Boom (OKI) gave on overview of the highlights of the stakeholder consultations, which cover different areas and stakeholder groups within TDM domain. Peter Murray-Rust (ContentMine) presented a researcher’s view and he stressed that the right to read is to right to mine, but we have no legal certainty what a researcher is allowed to do and what not.

Petr Knoth from CORE added that he believed that we needed the data infrastructure to support TDM. Data scientist are very busy with cleaning the data and they have little time to do the real mining. He added that the infrastructure should not be operated by the publishers but they should provide support.

Donat Agosti from PLAZI focused on how you can make the data accessible so that everybody can use it. He mentioned the case of PLAZI repository – TreatmentBank. It is open and extracts each article and creates citable data. Once you have the data you can disseminate it.

Kim Nilsson from PIVIGO spoke about the support for academics – they have already worked with 70 companies and provided support in TDM for 400 PhD academics. She mentioned how important data analytics and the possibility to see all the connections and correlations are for example for the medical sector. She stressed that data analytics is also extremely important for startups – gaining the access is critical for them.

Data science is the new IT

The next panel was devoted to Universities, TDM and the need for strategic thinking on educating researchers. FutureTDM project officer Kiera McNeice (British Library) gave an overview on the skills and education barriers to TDM. She stressed that there are many people saying that they need to have quite a lot of knowledge to use TDM and that there are skills gap between academia and industry. Also, the barriers to enter are still high because use of the TDM tools often require programming knowledge.

We have put together a series of guidelines to help stakeholders overcome the barriers we have identified. Our policy guidelines include encouraging universities to support TDM through both their research and education arm for example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it. You can read more in our Baseline report of policies and barriers of TDM in Europe, or walk through them via our Knowledge Base.

Kim Nilsson from PIVIGO stressed that the main challenge are software skills. The fact is that if you can do TDM you have fantastic options: startups, healthcare, charity. Their task is to offer proper career advice, help people understand what kind of skills are appreciated and assist them to build on them.

Claire Sewell (Cambridge University Library) elaborated on the skills from the perspective of an academic librarian. What important is the basic understanding on copyright law, keeping up with technical skills and data skills. “We want to make sure that if a researcher comes into the library we are able to help him.”- she concluded.

Jonas Holm from Stockholm University Library highlighted the fact that very little strategical thinking is going on in TDM area. “We have struggled to find much strategical thinking on TDM area. Who is strategically looking for improving the uptake at the universities? We couldn’t find much around Europe” – he said.

Stefan Kasberger (ContentMine) stressed that the social part of the education is also important – meaning inclusion and diversity.

Infrastructure for Technology Implementation

The last session was dedicated to technologies and infrastructures supporting Text and Data Analytics: challenges and solutions. FutureTDM Project Officer Maria Eskevich (Radboud University) delivered a presentation on the TDM landscape with respect to infrastructure for technical implementation.

Stelios Piperidis from OpenMinTed stressed the need for an infrastructure. “Following more on what we have discussed, it looks that TDM infrastructure has to respond to 3 key questions: How can I get hold on the data that I need? How can I find the tool to mine the data? How can I deploy the work carried out?”

Mihai Lupu form Data market Austria brought up the issue of data formats: For example, there is a lot of data in csv files that people don’t know how to deal with.

Maria Gavrilidou (clarin:el) highlighted the fact that not only the formats are problem but also identifying the source of data and putting in place lawful procedures with respect to this data. Meta data is also problematic because it very often does not exist.

Nelson Silva (know-centre) focused on using proper tools for mining the data. Very often there is no particular tool that meets your needs and you have to either develop one or search for open source tools. Another challenge is the quality of the data. How much can you rely on the data and how to visualise it? And finally, how to be sure that the people will have the right message.


The closing session was conducted by Kiera McNeice (British Library), who presented A Roadmap to promoting greater uptake of Data Analytics in Europe.  Finally, we also had a Demo Session with flash presentations by:

  • Stefan Kasberger (ContentMine),
  • Donat Agosti (PLAZI), Petr Knoth (CORE),
  • John Thompson-Ralf Klinkenberg (Rapidminer),
  • Maria Gavrilidou (clarin:el),
  • Alessio Palmero Aprosio (ALCIDE)

You can find all FutureTDM reports in our Knowledge Library, or visit our Knowledge Base: a structured collection of resources on Text and Data Mining (TDM) that has been gathered throughout the FutureTDM project.


OCLC Dev Network: DEVCONNECT 2017: Dashboards and Artificial Intelligence in Libraries

planet code4lib - Thu, 2017-07-13 13:00

Wayne State University Libraries has been working on several projects aimed at better understanding library material usage using OCLC products and APIs. 

Terry Reese: MarcEdit 7: MARC Tools Wireframe

planet code4lib - Thu, 2017-07-13 06:03

The changes aren’t big – they are really designed to make the form a little more compact and add common topics to the screen.  The big changes are related to integrations.  In MarcEdit 6.x, when you run across an error, you have to open the validator, pick the correct validation option, etc.  This won’t be the case any longer.  When the tool determines that the problem may be related to the record structure – it will just offer you option to check for errors in your file…no opening the validator, not picking options.  This should make it easier get immediate feedback regarding any structural processing errors that the tool may run up against.

MARC Tools Window Wireframe #1:

The second write frame collapses the list into an autocomplete/autosuggest options, moves data around and demonstrates some of the potential integration options.  I like this one as well – though I’m not sure if having the items in a dropdownlist with autocomplete would be more difficult to use than the current dropdown list.  I also use this as an opportunity to get ride of the Input File and Output file labels.  I’m not sure these are always necessary, and I honestly hate seeing them.  But I know that iconography maybe isn’t the best way to express meeting.  I think attaching tooltips to each button and textbox might allow me to finally let these labels go.

MARC Tools Wireframe #2:


Based on feedback, it sounds like the labels are still desired.  So here is wireframe #3 with a slight modification to allow for labels in the window.

MARC Tools Wireframe #3:


District Dispatch: Breaking #SaveIMLS news from ALA Pres. Jim Neal

planet code4lib - Wed, 2017-07-12 23:25

Today, ALA President Jim Neal sent this #SaveIMLS campaign update to the ALA membership regarding tomorrow’s vote on part of the House Appropriations Committee bill:


I’m pleased, but with important cautions, to tell you that all of our collective work to Fight for Libraries! is poised to pay off dramatically. Key parts of the House Appropriations Subcommittee bill that is scheduled to be voted on tomorrow afternoon at 4:30 EDT were released late this afternoon. The bill does NOT cut last year’s funding to the Institute of Museum and Library Services. Once final, that would mean no cuts to LSTA in this critical first vote stage! (We’ll know about Innovative Approaches to Literacy tomorrow.)

Now the cautions… While unlikely, an amendment could be offered to the bill that changes the IMLS appropriation. In addition, after tomorrow’s vote, there will be at least two further procedural opportunities for the bill to be amended. After the House acts, of course, the Senate will take its turn, though probably not for some months.

As the phrase goes, therefore, this is definitely NOT over until it’s over. We will report immediately on the results of tomorrow’s Subcommittee vote. I hope very much that the next thing for us to do will be to thank our House supporters.

Until early evening tomorrow in Washington, please join me in crossing your fingers.

Jim Neal
ALA President

Further updates will be sent out as we get them. In the meantime, if you are a constituent of a Representative working on the Labor HHS subcommittee, give them a call!

The post Breaking #SaveIMLS news from ALA Pres. Jim Neal appeared first on District Dispatch.

David Rosenthal: Is Decentralized Storage Sustainable?

planet code4lib - Wed, 2017-07-12 19:49
There are many reasons to dislike centralized storage services. They include business risk, as we see in le petit musée des projets Google abandonnés, monoculture vulnerability and rent extraction. There is thus naturally a lot of enthusiasm for decentralized storage systems, such as MaidSafe, DAT and IPFS. In 2013 I wrote about one of their advantages in Moving vs. Copying. Among the enthusiasts is Lambert Heller. Since I posted Blockchain as the Infrastructure for Science, Heller and I have been talking past each other. Heller is talking technology; I have some problems with the technology but they aren't that important. My main problem is an economic one that applies to decentralized storage irrespective of the details of the technology.

Below the fold is an attempt to clarify my argument. It is a re-statement of part of the argument in my 2014 post Economies of Scale in Peer-to-Peer Networks, specifically in the context of decentralized storage networks.

To make my argument I use a model of decentralized storage that abstracts away the details of the technology. The goal is a network with a large number of peers each providing storage services. This network is:
  • decentralized in the sense that no single entity, or small group of entities, controls the network (the peers are independently owned and operated), and
  • sustainable, in that the peers do not lose financially by providing storage services to the network.
I argue that this network is economically unstable and will, over time, become centralized. This argument is based on work from the 80s by the economist W. Brian Arthur1.

Let us start by supposing that such a decentralized storage network has, by magic, been created:
  • It consists of a large number of peers, initially all providing the same amount of storage resource to the network.
  • Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards.
  • Each peer incurs costs to supply this resource, in the form of hardware, bandwidth, power, cooling, space and staff time.
  • The network has no central organization which could contract with the peers to supply their resource. Instead, it rewards the peers in proportion to the resource they supply by a token, such as a crypto-currency, that the peers can convert into cash to cover their costs.
  • The users of the network rent space in the network by buying tokens for cash on an exchange, setting a market price at which peers can sell their tokens for cash. This market price sets the $/TB/month rent that users must pay, and that peers receive as income. It also ensure that users do not know which peers store their data.
Although the income each peer receives per unit of storage is the same, as set by the market, their costs differ. One might be in Silicon Valley, where space, power and staff time are expensive. Another might be in China, where all these inputs are cheap. So providing resources to the network is more profitable in China than in Silicon Valley.

Suppose the demand for storage is increasing. That demand will preferentially be supplied from China, where the capital invested in adding capacity can earn a greater reward. Thus peers in China will add capacity faster than those in Silicon Valley and will enjoy not merely a lower cost base because of location, but also a lower cost base from economies of scale. This will increase the cost differential driving the peers to China, and create a feedback process.

Competition among the peers and decreasing hardware costs will drive down the  $/TB/month rent to levels that are uneconomic for Silicon Valley peers, concentrating the storage resource in China (as we see with Bitcoin miners).

Lets assume that all the peers in China share the same low cost base. But some will have responded to the increase in demand before others. They will have better economies of scale than the laggards, so they will in turn grow at the laggards' expense. Growth may be by increasing the capacity of existing peers, or adding peers controlled by the entity with the economies of scale.

The result of this process is a network in which the aggregate storage resource is overwhelmingly controlled by a small number of entities, controlling large numbers of large peers in China. These are the ones which started with a cost base advantage and moved quickly to respond to demand. The network is no longer decentralized, and will suffer from the problems of centralized storage outlined above.

This should not be a surprise. We see the same winner-take-all behavior in most technology markets. We see this behavior in the Bitcoin network.

I believe it is up to the enthusiasts to explain why this model does not apply to their favorite decentralized storage technology, and thus why it won't become centralized. Or, alternatively, why they aren't worried that their decentralized storage network isn't actually decentralized after all.


  1. Arthur, W. Brian. Competing technologies and lock-in by historical small events: the dynamics of allocation under increasing returns. Center for Economic Policy Research, Stanford University, 1985. in Arthur, W. Brian. Increasing Returns and Path Dependence in the Economy, Michigan University Press, 1994.

Jodi Schneider: QOTD: Working out scientific insights on paper, Lavoisier case study

planet code4lib - Wed, 2017-07-12 19:04

…language does do much of our thinking for us, even in the sciences, and rather than being an unfortunate contamination, its influence has been productive historically, helping individual thinkers generate concepts and theories that can then be put to the test. The case made here for the constitutive power of figures [of speech] per se supports the general point made by F.L. Holmes in a lecture addressed to the History of Science Society in 1987. A distinguished historian of medicine and chemistry, Holmes based his study of Antoine Lavoisier on the French chemist’s laboratory notebooks. He later examined drafts of Lavoisier’s published papers and discovered that Lavoisier wrote many versions of his papers and in the course of careful revisions gradually worked out the positions he eventually made public (Holmes, 221). Holmes, whose goal as a historian is to reconstruct the careful pathways and fine structure of scientific insights, concluded from his study of Lavoisier’s drafts

We cannot always tell whether a thought that led him to modify a passage, recast an argument, or develop an alternative interpretation occurred while he was still engaged in writing what he subsequently altered, or immediately afterward, or after some interval during which he occupied himself with something else; but the timing is, I believe, less significant than the fact that the new developments were consequences of the effort to express ideas and marshall supporting information on paper (225).

– page xi of Rhetorical Figures in Science by Jeanne Fahnestock, Oxford University Press, 1999.

She is quoting Frederich L. Holmes. 1987. Scientific writing and scientific discovery. Isis 78:220-235. DOI:10.1086/354391

As Moore summarizes,

Lavoisier wrote at least six drafts of the paper over a period of at least six months. However, his theory of respiration did not appear until the fifth draft. Clearly, Lavoisier’s writing helped him refine and understand his ideas.

Moore, Randy. Language—A Force that Shapes Science. Journal of College Science Teaching 28.6 (1999): 366.
(which I quoted in
a review I wrote recently)

Fahnestock adds:
“…Holmes’s general point [is that] there are subtle interactions ‘between writing, thought, and operations in creative scientific activity’ (226).”

District Dispatch: Speak up on Day of Action to Save Net Neutrality

planet code4lib - Wed, 2017-07-12 15:06

Today, ALA joins close to 200 organizations participating in a Day of Action to Save Net Neutrality. Websites, internet users and online communities are coming together to sound the alarm about the Federal Communications Commission (FCC)’s attack on net neutrality. You can add your voice to a growing and powerful chorus.

ALA and America’s libraries believe protecting and preserving the open internet is essential for ensuring the free flow of information to all, which underpins free speech, research and learning, economic empowerment and digital innovation. We have historically and currently support strong, enforceable net neutrality rules from the FCC as a matter of our values and ethics, public mission and professional practice as broadband consumers and advocates.

I often get the question—from librarians, net neutrality advocates and adversaries) —about the specific ALA- and library-stake in this issue. I’ve thought about it a lot over the last 8+ years working on this issue and here are a few of the things I see through a library lens.

As a matter of principle, library professionals commit to professional values of intellectual freedom, equitable access to information and diversity. Consider:

  • Intellectual freedom and free expression are as fundamental to the Internet as the First Amendment is to American democracy. These also are the core values of America’s public, K-12 school, higher education and all libraries. Commercial ISPs should not be enabled to serve as gatekeepers for the information people may freely access online.
  • Equitable access to information online depends on the open internet. Prioritized access to some content over others is antithetical to librarian and democratic values. It also runs counter to the innovative and “permissionless” nature of the internet that enables creators to reach global audiences by the quality of their offerings rather than the size of their wallet.
  • Embedded in both of the above is a commitment to the need to foster and share a diversity of voices, ideas and experiences.

As a matter of practice, libraries collect, create, provide access to and disseminate essential information to the public over the internet. Consider:

  • Will libraries—largely funded through public dollars—be able to compete for priority access to share diverse digital collections that range from community newspaper and photo archives to downloadable local music to veterans’ oral history projects to documentary video?
  • Will libraries be able to pay increased fees to vendors that may pass along the costs of paying for prioritized access to their streaming and downloadable media resources?
  • Will libraries be required to consider the affiliated (and therefore likely prioritized) content available through a commercial ISP when selecting their broadband provider(s)? And will they be forced to pay multiple ISPs to provide service to enable public access to affiliated content?
  • Will libraries’ ability to provide no-fee public internet access to support bandwidth-intensive services ranging from high-definition video conferencing and distance learning to big data sharing to telemedicine be compromised if these services are throttled in favor of commercial content preferred by commercial ISPs?
  • How will libraries educate their public internet users to these choices and limitations related to prioritized content—both in terms of patrons’ access and their ability to contribute their own cultural and commercial products to other internet users? How transparent will ISP practices be to libraries and the campuses, communities and individuals we serve?

America’s libraries and librarians are rightly recognized as essential democratic institutions and leading advocates for people’s rights to read and express themselves freely. The internet is today’s most essential platform for this speech and expression.

An open internet in which commercial ISPs are prohibited from blocking, throttling, degrading, discriminating or prioritizing among online content and services is essential to free expression and equitable access and contribution to online information.

To preserve the open internet, we must have legal protections and the ability of the expert government agency to enforce these protections. The 2015 Open Internet Order enables these protections, creates a framework for addressing future conduct concerns, and allows for the flexibility to forbear where needed. The FCC must retain and use as needed its lawful authority and not entrust the future of the internet to entities whose financial interests may vary significantly from the public interests.

Today is a big day, but one among many ahead to preserve the gains ALA and other network neutrality advocates made in 2014 and 2015. Please join us now in sending the strongest message possible to the FCC about the value of an open internet for libraries and the people we serve. And please stay tuned and share your questions and ideas with us in the coming weeks and months.

The post Speak up on Day of Action to Save Net Neutrality appeared first on District Dispatch.

In the Library, With the Lead Pipe: Following the Yik Yak: Using Social Media Observations to Understand Student Needs on College Campuses

planet code4lib - Wed, 2017-07-12 12:00

In Brief:

What can social media tell us about our patrons? We look at Yik Yak through the lens of library ethnography, suggesting that anonymous social media can reveal not only complaints and commentary about library services and spaces, but also uncover students’ emotional and social experience with research and study space. Looking at social media posts on Yik Yak, we uncover common threads of social dynamics, expectations of quiet space, and frustration with studying, all of which increase our understanding of student experiences in US and international libraries.

by Celia Emmelhainz and Miriam Rigby

Parseltongue in the Library, screenshot by Celia Emmelhainz, CC-BY


As librarians, we always want to better understand and connect with our patrons—and to use our understanding to set up more useful spaces and services. In the past 10 years, librarians have moved from relying on user surveys to working with anthropologists and ethnographically-trained librarians. These collaborations let us learn not just what our users say to our face, but what they actually do—and how they live their lives in relation to the library as a space and social place.

Ethnographic methods provide librarians with a series of approaches that let us see our libraries in their social context.1 These could include things such as inhabiting study spaces and watching the social habits of patrons in our library, observing spatial usage and leading mapping exercises with students, embedding in research methods or capstone seminar courses to observe how library research is taught, and even inviting students to document their own experiences through photos, videos, and research diaries.2 Such ethnographic methods differ from the “interviews and surveys” baseline of qualitative user experience research in libraries, as ethnographic methods can allow us to embed over time in our users’ social and virtual spaces. Inhabiting the same world that our students inhabit can lead us to much deeper understanding than just recording their reported experiences through surveys, focus groups, or interviews alone.  

In this article, we take another ethnographic angle, that of digital ethnography. By collecting and reflecting on changing social media posts, we as researchers use the internet to observe–in real time–how patrons discuss their engagement with the library. Anonymous, location-based services like Yik Yak let us practice trace ethnography, observing the times, places, and digital traces by which students express their relationship to the library.3

One puzzle for us was that whether we run qualitative or quantitative studies, our libraries still find it hard to gain feedback from those who do not currently use our services.4 How can we know what people are thinking if they don’t share their thoughts? How can we learn what students need if they don’t darken our revolving doors, use our website, or volunteer for a research study with a side incentive of pizza?

And even when patrons complete an interview or survey, do they realize they’re editing their thoughts—perhaps based on a desire to be polite and helpful to us as librarians?

Using Social Media Apps to Reach Out and Listen to Students

With this in mind, we chose to gain a new angle on our libraries by analyzing anonymous and ephemeral social media posts on Yik Yak. Public, school, and academic librarians already use social media such as Twitter and Facebook to gain feedback from tech-savvy users.5 Yet trend-responsive patrons are always moving from one network to another. A prior generation left Tribe, Friendster, and MySpace for Facebook, while today’s students may leave shell profiles on Facebook, while using other apps in order to avoid parents, teachers, and corporate advertisers.6 One student at UT Austin describes Facebook as “dead to us” and an “awkward family dinner party,”7 noting that his friends now enjoy Medium, GroupMe, WhatsApp, Tumblr, and Snapchat instead.

As younger patrons leave networks like Facebook, they move to ephemeral social media such as Snapchat, where messages disappear after viewing;8 Instagram, where they curate daily life for a wide audience;9 Whisper, where they anonymously share their “secrets” with those nearby; and Yik Yak, a source of anonymous chatter on many college campuses.  

While Instagram remains strong, though, most apps are ephemeral. They move in and out of popularity, with posts fading out over minutes, hours, or days. In fact, Yik Yak itself announced in April 2017 that it is ceasing operations.10 We argue, though, that capturing these passing streams of student experiences in the library continues to have value. Like culture as a social construct, social discourse on social media apps is both passing and long-lasting in impact. So why pay attention to passing trends, and how can we capture lasting value from these ephemeral posts?

Listening to Campus Yaks

At the time we reviewed each of these apps in 2015, Yik Yak was reaching the height of its popularity. Discussions of the pros and cons of social media were appearing regularly in the Chronicle of Higher Education and Inside Higher Ed. Among those apps existing at the time, we found Yik Yak’s comments on campus life to be the most revealing in understanding our patrons.

Started by two graduates of Furman University, Yik Yak was a mobile app that launched in 2013 and gained rapid popularity on and near college campuses, as well as some high schools.11 It let users within a few miles12 share anonymous, wry comments on daily life, and allowed other users to upvote or downvote existing posts, as well as add their own responses and commentary. As we found with Yik Yak, anyone with a smartphone or tablet can view campus chatter, from hookup requests and dorm party announcements, to bathroom humor, complaints about homework, and anguish over life and relationships.13

Bonded With the People Near Me, screenshot by Celia Emmelhainz, CC-BY

Lessons from Yaks in the Wild

The authors had a unique opportunity for research when we found ourselves snowbound together during ALA Midwinter 2015. As blizzard-like winds blew across Chicago, we ensconced ourselves indoors and surveyed over 10,000 “Yaks” posted at 60 public, private, and religious universities across North America. Using Yik Yak’s “peek” feature to see campuses besides our own, we collected anonymous, public posts by students in the vicinity of major campuses.

Methodological discussions of the ethics of performing ethnography online differ in whether to inform subjects, often depending on the nature of the research, topics of study, and expectations of privacy involved. In this case we determined that the brief nature of comments on the library, full anonymity of users, ephemeral nature of posts, and public availability to any smartphone user worldwide all reduced student expectations of privacy as well as the risk of harm. Further, we found no way to directly contact users, and are now quoting posts not tied to any specific user, and which no longer exist except in our screenshots.14

This sample was collected on Sunday, 1 February, 2015 between 5pm-2am EST, and included posts about academic libraries from Hawai’i to London, although primarily in the United States. Because the app lacked a search feature, we scrolled through both the 100 most “recent” Yaks, which last until voted down, and the 100 most popular (liked) Yaks, which last several days until they time out of the system.15 While users can save images to their phones, most posts disappear within two to three days.

The timing for our survey was excellent: on some campuses, it was the Sunday before midterms; on others students were returning from winter break or anticipating a snow day, and many were balancing studying with parties for the Super Bowl. These varied conditions gave us a wide cross-section of student experiences as they prepared for a coming week on campus.

We searched by name for campuses that would give us a range of insights, and content varied widely. On some campuses we found just one or two Yaks referencing the library among 200 recent and popular posts. At others, the library was not mentioned (and so they are not included in our sample).

In other words, libraries comprise a small slice of youthful online commentary—less popular than hookups and pizza, but important enough to occasionally bubble up in student chatter.

In reviewing sixty library-related Yaks, we found forty that fell into four thematic clusters, including the library as study space, the library as site of suffering, and the library as social space, as well as students commenting in a way that suggested (to us as librarians!) a need for research support and information literacy training.

1. Libraries as a quiet study space

Many students framed their local library as a quiet space in which to study. For example:

  • Pomona: Shoutout to the people rushing to the library after Super Bowl ends #workhardplayhard
  • Endicott: Sunday is supposed to be a day to relax and yet it’s my busiest and most stressful day
  • Oregon: The library should have an adderall vending machine
  • Michigan: Are any libs open tomorrow? Where are we supposed to study if the libraries are closed [for a snow day]?
  • Oberlin: Is Mudd open? I actually have stuff to prepare for the first day of class and reading Marxian philosophy is really hard with chatter and rap in the background
  • New Mexico: Where can I get free pizza today? Since I’m stationed in the library today
  • Bates: The library is nice and quiet. Thanks Super Bowl!

Yet this desire for a focused study space led others to express frustration with patrons who were talking… or breathing loudly.

  • Boston: please don’t breathe so loudly in the library
  • British Columbia: Why are u here, girls that come to koerner to socialize and talk super loud?
  • Bates: That glare you give when you’re in the library and you hear people outside your window having fun
  • Brigham Young: Since when is it okay to hum in the library?
  • Carleton: The person behind me on 3rd Libe is panting profusely, wtf?
  • UCL, London: When the guy next to you is whispering his essay aloud in the library, and it sounds like parseltongue

In other words, in contrast to libraries’ attempts to open up interactive study spaces, many students were frustrated by the noise levels and distracting patrons around them. Students’ highly emotional responses to these issues online suggests the need for more quiet student study spaces–something that libraries such as Colby College have implemented as the result of library observations, interviews, and ethnographic projects.

2. Libraries as a site of suffering

While students appreciated the library as a quiet study space, they also expressed ambivalent feelings about studying in the context of a university library. Of course, we attempt to make the library a welcoming space, yet students still feel a range of internal affective experiences as they study. Because this momentary experience is not be captured in library surveys, we find social media all the more useful to help us grasp the quiet struggle going on in front of us, in carrels and at tables where students wrestle with the tension between getting work done and wanting to relax:

  • Colby: Nothing worse than seeing other people from your class leaving Miller while you’re stuck there with so much left to do
  • Pittsburgh: Opens up CourseWeb and sees 25 notifications, closes laptop and cries
  • Oxford, UK: Hate it when a procession of tourists come through the library. Makes me feel like an exhibit in the world’s most boring zoo
  • Charleston College: I’ll be suffering in the library studying for my cellular biology exam tomorrow if you need me
  • Berkeley: Frustrated guy next to me in the library just threw down his pen and sighed so hard he farted. Me too, bruh
  • Carleton: Things the Libe doesn’t have: heat, wifi. things the Libe does have: books on the origins of hats
  • Ohio State: The walk to the library is the boulevard of broken dreams
  • Cambridge, UK: Oh God, the library is getting spooky. No one else is here (I think) and my lack of sleep isn’t helping…
  • Georgetown: Is there anyone still in library?

Adderall Vending Machine, screenshot by Celia Emmelhainz, CC-BY. For example of an actual library vending machine with medications and other supplies, see

3. Libraries as social space

We also found that many students use Yik Yak to comment on studying in relation to other students. The social aspect of studying in the library is highlighted by the insider language used (“the Libe”) as well as the shared humor and social experience of students sleeping, sighing, or simply being in proximity to each other:

  • Charleston: Whenever I’ve been sitting in the same spot in the library for a long time, I always feel like I’ve bonded with the people near me
  • U Chicago: Nothing reminds me of the interconnectedness of humanity like realizing that everyone in the Harper chairs naps in the exact same position
  • British Columbia: When you open your laptop in a library and Beyoncé is still playing (British Columbia)
  • Sarah Lawrence: SLC life: smoothies in the library

Social media posts can also capture how the library is perceived both socially and spatially, at a more granular level than observed with other assessment measures:

  • U Washington: Library logic: *Sees a table with only one person* *Goes in search of another table*
  • Miami U: What floor you study on in King says a lot about you. What’s your favorite?
  • Texas A&M: Why do foreign people ALWAYS talk in the library? GTFO

As in the quote from Texas A&M above, we should here note that the anonymity of many social media apps also brings up critical concerns of racism and exclusion, highlighting how our libraries are still not a safe space for everyone. Although we did not encounter other racist or sexist comments in this brief survey, we have seen them over time on our home campuses. And while most student users quickly down-vote such posts out of existence, their presence still creates a hostile space of social exclusion, and mirrors the microaggressions minority students already experience offline on college campuses.16 This darker side of student sociality on the web should prompt us to check in with vulnerable populations and make sure we host a safer and more inclusive space in our campus libraries.

4. The ongoing need for study aids and information literacy

Finally, students used Yik Yak to comment on their information habits or seek assistance from others. We wondered, here, if they were either too intimidated to approach a teacher, or didn’t realize that a librarian could be of use in helping them access related resources. Whether asking about assignments, offering flashcards, or commenting on how they seek information, these posts hint at ways librarians could improve the cultural relevance of our marketing and services for students:

  • Alabama: What’s the format of Econ 110 tests?
  • Oxford: The great feeling when you open a journal article and half the pages are references
  • UNC: Writing a scientific journal article is about as useful as the magic scarf I got three years ago
  • Grinnell: Anyone in intro sociology that wants pretty comprehensive flashcards? It feels like a shame to waste them.
  • Ohio Dominican: Can’t find study aids anywhere [crying emoticon]

And a few more students raise issues of copyright and intellectual property, grousing about others copying their Yik Yak posts, or asking if they can get away with illegal downloads on their campus:

  • Ohio State: Reposts [of popular Yik Yaks] are academic misconduct
  • Montana: Can you get in trouble for using Youtube to MP3 on campus?
  • New Mexico: I read yik yak in the mornings like it’s the morning paper
  • Calgary: An update on my game: 3 weeks and still have not opened a book
Bringing it Home

As librarians, then, we found social media posts eye-opening for the light they shine on our campus and student experiences. In library ethnography, even a small study can be insightful,17 and these posts gave us one unobtrusive view into students’ digital lives.

The Floor You Study On, screenshot by author Celia Emmelhainz, CC-BY

In talking with others, though, we also found that comments on social media can be used to directly improve our services. At the University of Oregon Libraries, Lesli Larson (Director of Communications and Marketing) used Yik Yak to actually change library outreach, reporting:

I do monitor Yik Yak to find out what undergrads are saying in raw/unvarnished terms. The key takeaway from Yik Yak for me is how student chatter reinforces wayfinding findings about the library. Students prize quiet study spaces, compete for study spots during finals, and proudly identify themselves as frequent or first time library users.”

This student chatter helped Lesli keep displays relevant, and sparked her “program to create a display system for our reservable study rooms.” As she wrote to us, “I definitely invoked Yak feedback about the scramble for study spaces” as she planned library improvements. (Her strategy here was to frame improvements as pilot projects or ongoing iterations of existing projects, thus avoiding the need for extensive bureaucratic review before implementation.)

While specific social media services rise and fall, Lesli’s experience suggests that attending to these apps helps us see another angle of campus life. However, we have two caveats: first, that reading a hundred posts often resulted in only one about the library (and in Celia’s case at Colby College, checking in daily for a semester netted only 100 comments on the library). This suggests social media may be more useful for getting a sense of trends and conversations on campus as a whole, than for learning only about the library.

In addition, we recommend that librarians practice self-care when viewing negative comments, either about campus in general or about the library and vulnerable student populations in particular.

Because Yik Yak is an “offstage” space,18 we warn you to take it with a grain of salt. Students post cynical comments to win the upvotes of their peers, and not all students are equally represented. We believe it can be a useful inverse of positive encouragements on our library surveys, but it certainly isn’t the whole picture. When students write with an edge, it often does not reflect the deep respect they have for their peers, their campus, and the libraries and librarians who help them in their daily life.

Further, we suggest librarians avoid directly advertising on anonymous social media. Faculty do intervene, as when Colgate University faculty popped up on Yik Yak to encourage students during finals,19 but one-time posts by popular faculty are the exception to the rule. Even Lesli, who used Yik Yak for “stealth communication” and promoting free coffee at finals, chose to communicate as if student-to-student, rather than as an institutional advertiser.

Screenshot by Celia Emmelhainz, CC-BY

Instead, we recommend you let social media apps spark insight into how students relate to each other, and how they communicate wryly about the library among their peers. In ephemeral online spaces like Yik Yak, we learn best by quietly observing as ethnographers and listening in to learn how patrons relate to their library, their campus, and their world.

Acknowledgments: The authors would like to thank Erin Pappas (external peer reviewer) and Annie Pho (internal peer reviewer) for reviewing drafts of this article. We really appreciate their constructive comments. We would also like to thank our Publishing Editor, Amy Koester, for her guidance in this process. Finally, we would like to thank all the members of ANSS (the Anthropology and Sociology Section of ACRL) for their collegiality and for the opportunity to develop as anthropology librarians among a great cohort of peers.


Below is a list of colleges we reviewed, in alphabetical order. Although we did not sample systematically from a list of known colleges, we sought out diversity between urban and rural environments, selective liberal arts colleges, major research universities, and smaller campus environments. We sampled for geographic diversity across both the United States, as well as, in a more limited fashion, Canada and the UK. (We checked several university locations in France but did not find Yik Yak in operation there). This list is not comprehensive, as when we did not find any library-related posts at a college, we did not keep a record of the search.

Bates College

Boston College

BYU Hawai’i

Cambridge University (UK)

Carleton College

Carroll College

Charleston College

Colby College

Duke University

Endicott College

Georgetown University

Grinnell College

Harvard University

Lewis and Clark College

Macalester College

Miami University (Ohio)

New York Univeristy

Oberlin College

Ohio Dominican University

Ohio State University

Oxford University (UK)

Pomona College

Sarah Lawrence College

Stanford University

Texas A&M University

University College London (UK)

University of Alabama

University of British Columbia (Canada)

University of Calgary (Canada)

University of California Berkeley

University of California Irvine

University of Chicago

University of Hawai’i at Hilo

University of Michigan

University of Montana

University of New Mexico

University of North Carolina–Chapel Hill

University of Oregon

University of Pittsburgh

University of Washington

  1. Celia Emmelhainz, Interviews, focus groups, and social media: lessons from collaborative library ethnographies in America and Kazakhstan, conference paper for IFLA 2016,
  2. Nancy Fried Foster and Susan Gibbons, eds., Studying Students: The Undergraduate Research Project at the University of Rochester (Chicago: Association of College and Research Libraries, 2007), and Nancy Fried Foster, ed. Studying Students: a Second Look (Chicago: Association of College and Research Libraries, 2013).
  3. R. Stuart Geiger and David Ribes, “Trace ethnography: following coordination through documentary practices,” 44th Hawaii International Conference on System Sciences (HICSS)/IEEE (2011): 1-10: (accessed March 20, 2014).
  4. Leah Emary, “The why, what and how of using ethnography for designing user experience in libraries (and a few pitfalls to avoid),” in User Experience in Libraries: Applying Ethnography and Human-Centred Design, ed. Andy Priestner and Matt Borg (Routledge, 2016, p. 68-93).
  5. Andy Burkhardt, “Social media: a guide for college and university libraries,” C&RL 71, no. 1 (2010): 10-24, Elizabeth Price and Rebecca Richardson, “Eavesdropping on the user experience,” C&RL News, 78, no. 1 (2017): 16-10, (accessed February 9, 2017).
  6. Jillian D’Onofro, “What teens say about facebook, instagram, and snapchat,” Business Insider (2013), (accessed February 7, 2015).
  7. Andrew Watts, “A Teenager’s View on Social Media,” Medium (2015), (accessed February 2, 2015).
  8. Will Oremus, “Is Snapchat Really Confusing, or Am I Just Old?” Slate (2015), (accessed February 7, 2015).
  9. Danielle Salomon, “Moving on from Facebook: Using Instagram to connect with undergraduates and engage in teaching and learning,” C&RL 78, no. 8 (2013): 408-412,
  10. Emma Hinchliffe, “It’s the End of the Line for Yik Yak,” Mashable, 29 April 2017,, accessed 30 May, 2017.
  11. Amy Carlson, “Ignite Session: Outreach, Yik Yak, and Reality TV,” The Scoop, American Libraries, (2015), (accessed 20 February, 2016).
  12. Evelyn M. Rusli, “Yik Yak, Big in Schools, is a Hit With Investors Too,” Wall Street Journal (2014), (accessed February 8, 2015).
  13. However, as participation is based within a one-mile radius, some distance-education students may be limited to observing without the ability to directly participate, hinting at the unequal access to campus resources and dialogue that they face.
  14. For a further critique of the ethics of online ethnography, see Lisa Sugiura, Rosemary Wiles, and Catherine Pope, “Ethical Challenges in Online Research: Public/Private Perception,” Research Ethics (2016), 1-16.
  15. See our addendum for a list of colleges and more on our sampling methodology.
  16. Scott Jaschik, “Rally at Colby Against Racist Yik Yak Comments,” Inside HIgher Ed, April 17, 2015., accessed 30 May 2015.
  17. Sims Kline, “The librarian as ethnographer: an interview with David Green,” C&RL 74, no. 9 (2013): 488-491,
  18. Erving Goffman, The Presentation of Self in Everyday Life (New York: Doubleday Anchor Books, 1959).
  19. Kaitlin Mulhere, “Yik Yak Take Back: professors turn Yik Yak into happy space,” Inside Higher Ed (2014), (accessed 15 February 2015).


Subscribe to code4lib aggregator