You are here

Feed aggregator

LITA: Take Altmetrics to the Next Level with this LITA webinar

planet code4lib - Mon, 2017-10-16 15:23

Sign up today for 

Taking Altmetrics to the Next Level in Your Library’s Systems and Services

Instructor: Lily Troia, Engagement Manager, Altmetric
October 31, 2017, 1:00 pm – 2:30 pm Central time

Altmetrics are traditionally viewed as the realm of evaluation and assessment, but altmetric data can offer valuable insights throughout the research lifecycle. It can help inform researcher decisions around where to publish or self-deposit, with whom (and where) to collaborate, and provide those tasked with facilitating the scholarly process a richer, more complete view of research attention and influence.

Register here, courses are listed by date

This 90 minute webinar will bring participants up to speed on the current state of altmetrics, and focus in on changes across the scholarly ecosystem. Through sharing of use cases, tips, and open discussion, this session will help participants to develop a nuanced, strategic framework for incorporating and promoting wider adoption of altmetrics throughout the research lifecycle at their institution and beyond.

View details and Register here.

Discover upcoming LITA webinars and web courses

Building Services Around Reproducibility & Open Scholarship
Offered: November 1, 2017 – November 22, 2017

Introduction to and JSON-LD
Offered: November 15, 2017

Diversity and Inclusion in Library Makerspace
Offered: December 6, 2017

Digital Life Decoded: A user-centered approach to cyber-security and privacy
Offered: December 12, 2017

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

LITA: LITA Guides: Call for Proposals

planet code4lib - Mon, 2017-10-16 15:16

LITA is looking to expand its popular LITA Guide series. Rowman & Littlefield Publishers would like to offer a $250 gift card for the best LITA book proposal. 

Proposals must be submitted by DECEMBER 1st, 2017.

Topics for consideration include:

  • Tools for big data
  • Developing in-house technology expertise
  • Budgeting for technology
  • Writing a technology plan
  • K-12 technology
  • Applications of agile development for libraries
  • Grant writing for library technology
  • Security for library systems

Questions or comments can be sent to Marta Deyrup, LITA Acquisitions Editor. Proposals can be submitted to the Acquisitions editor using this link.

Open Knowledge Foundation: Pavel Richter stands down as Chief Executive Officer

planet code4lib - Mon, 2017-10-16 12:31

It is with a mixture of sadness and gratitude that the Board of Directors announces that Pavel Richter will be leaving his position as Chief Executive Officer of Open Knowledge International as of today. He will be available to support the Board and Leadership Team until the end of 2017, to ensure a smooth transition.

Since May 2015, Pavel has played a critical role in the development and success of the organisation. While we will miss him and his thoughtful leadership, we wish him the best of luck in his new endeavours. We want to thank him for more than 2 years of dedicated service which involved many significant accomplishments. He managed the successful transition from our founder to a new Leadership Team. He maintained the position of Open Knowledge International as one of the key stakeholders in the movement for more openness. Together with his Leadership Team, Pavel defined a clear strategy and partnership approach for Open Knowledge International to succeed in an ever-changing funding environment, and stabilized the organisation.

Over the coming months, we have asked Mark Gibbs and Paul Walsh to continue to lead the organisation. Mark and Paul will work closely with the Board to deliver the strategy that will carry Open Knowledge International’s mission forward in the coming years. During this transition period, it is the Board’s priority to maintain a stable and effective organisation. Mark will continue to lead the day-to-day operations of the organisation, and Paul will continue to lead and develop product offerings and the commercial strategy for the organisation. Mark and Paul have asked both Stephen Abbott Pugh and Katelyn Rogers to work with them and the Board in supporting our future fundraising efforts, as we define our future strategy.

Again, we thank Pavel for the dedication, enthusiasm and motivation he has given to Open Knowledge International over the past couple of years. He will be missed by the staff, Board members and partners alike. We look forward to following the success of his career, and are hopeful and excited for the next chapter of Open Knowledge International as we continue to support our Community, our Chapters and Civil Society Organisations in their important work to open up data and knowledge.

If you have any questions or concerns during this transition process, please do not hesitate to reach out to me directly at

Terry Reese: MarcEdit 7 alpha weekly build

planet code4lib - Mon, 2017-10-16 04:33

Following changes were made:

  • Bug Fix: Export Tab Delimited Records (open file and save file buttons not working)
  • Enhancement: XML Crosswalk wizard — enabled root element processing
  • Bug Fix: XML Crosswalk wizard — some elements not being picked up, all descendants should now be accounted for
  • Bug Fix: Batch Process Function – file collisions in subfolders would result in overwritten results.
  • Enhancement: Batch Processing Function: task processing uses the new task manager
  • Enhancement: Batch Processing Function: tasks can be processed as subdirectories

I had intended to move the program into beta this Sunday, the above issues made me decide to keep it in alpha for one more week while I finish checking legacy forms/code.

Downloads can be retrieved from:


HangingTogether: “MARC Must Die” 15 Years On

planet code4lib - Sun, 2017-10-15 16:48

Fifteen years ago to the day I declared that “MARC Must Die” in Library Journal. It sparked a firestorm of criticism, mostly from the cataloging community, and several invitations to speak. Over the years many would misquote the title or misunderstand my point, and claim that I said “MARC is dead”. I didn’t, and there is a key difference. I simply knew that MARC remained a sacred cow in libraryland and I was intent to blow it up (how’s that for mixing metaphors?)

I wanted librarianship to wake up to the fact that our foundational standard was no longer serving us like it should. 

Unfortunately, some seemed to take it as a blow directed against catalogers, not the standard that they were accustomed to working with. One AUTOCAT-L subscriber even wrote a parody of my piece titled “Why web services departments must die” (I was a web manager at the time). I wasn’t calling for catalogers to go away. I just wanted something better for them to work with.

There was also the feeling that MARC had everything we needed but that our automated systems were not doing enough with it. I agree with that to a point, but for example, how does one specify in MARC that a given URL will take someone to the openly available full-text of an item? That seems like a laughably simple thing to do and also very important, but there is no way for us to assert this unambiguously in a form that computers can easily use. Still. Metadata situations like this are completely indefensible in the world of the web.

As I re-read some of the discussion on AUTOCAT-L from back then I must admit that many of the posts were at least civil, and there were useful points brought to the debate. There were a couple people who wondered if I was a librarian (a fact easily Googled), despite having worked in libraries my entire adult life and writing a monthly column for five years in one of the most widely read professional magazines, but things like that come with the territory when you set out to be provocative. If you can’t take the heat, stay out of the kitchen. I’ve somehow always found myself in the kitchen. Starting fires, even.

I also feel a need to point out that I hadn’t come to this realization lightly, or quickly, that MARC was standing in our way more than helping. In the lead-up to that controversial assertion I had written columns like “Twenty-First Century Cataloging,” “The Art and Science of Digital Bibliography,” “The Consequences of Cataloging,” “Metadata as if Libraries Depended Upon It,” and “The Importance of Being Granular”, which all had something to say about library metadata and the challenges we faced with our existing tools. Those columns and others are collected in the book Managing the Digital Library, which I am requesting that the HathiTrust make open access (they have a form you can submit for just this purpose). [Update: They did! Here is a direct link to it.]

So what has happened over the last 15 years? For starters, no one seems to think it’s controversial anymore. The Library of Congress has not only admitted that MARC’s days are indeed numbered, they are actively working to develop a linked data replacement. I don’t by any means think that we are out of the woods of making this transition yet, and I also believe it will take many years.

The good thing is that we can finally move on from anger, denial, and hurt to a more constructive place of trying to figure out to more effectively replace it. It’s just that it has unfortunately taken half of my professional career to get here.

Lucidworks: Segment Merging, Deleted Documents and Why Optimize May Be Bad For You

planet code4lib - Fri, 2017-10-13 03:26
Solr merge policy and deleted docs

During indexing, whenever a document is deleted or updated, it’s not really removed from the index immediately, it’s just “marked as deleted” in its original segment. It doesn’t show in search results (or the new version is found in the case of update). This leads to some percentage of “waste”; your index may consist of, say, 15%-20% deleted documents.

In some situations, the amount of wasted space is closer to 50%. And there are certain situations where the percentage deleted documents can be even higher, as determined by the ratio of numDocs to maxDocs in Solr’s admin UI.

Having half or more of your index “wasted” isn’t ideal, this article explains how these conditions arise and what to do about it (spoiler: not a lot).

NOTE: We talk about “deleted” documents. For the purposes of this article, “deleted” includes both explicit deletes and updates to existing documents. This latter is implemented as a delete of the original document followed by an add.

Good news, bad news

The good news is it’s easy to avoid having more than 50% of your index consist of deleted documents; do not optimize. We’ll talk about what it is about optimize (aka forceMerge) that can allow more than 50% of the index to consist of deleted documents later.

The bad news is there’s no configuration setting as of Solr 7.0.1 that will guarantee that no more than 50% of your index consists of deleted documents. This topic is under active discussion on the Apache JIRA list at LUCENE-7976.

Lucene segments are “write once”:

Since about forever, Lucene indexes have been composed of “segments”. A single segment consists of multiple files sharing the same root name but different extensions. A single segment consists of files like _0.fdt, _0.fdx, _0.tim and the like.

Lucene has a “write once” policy with respect to segments. Whenever a “hard commit” happens, the current segment is closed and a new one opened. That segment is never written to again [1]. So say a segment contains 10,000 documents. Once that segment is closed, that segment will always contain 10,000 documents, even if some are marked as deleted.

This can’t go on forever, you’d run out of file handles if nothing else.

Merging to the rescue:

The solution is “merging”. We mentioned above that when a commit happens, the segment is closed. At that point Lucene examines the index for segments that can be merged. There are several “merge policies”, all of them decide that some segments can be combined into a new segment and the old segments that have been merged can be removed. The critical point is that when segments are merged, the result does not contain the deleted documents.

Say two segments, each consisting of 10,000 docs 2,500 of which are deleted. These two segments are combined into a single new segment consisting of 15,000 documents, the 2,500 deleted documents from each original segment are purged during merge.

Anyway so far so good. And the default TieredMergePolicy (TMP) usually keeps the number of deleted documents in an index around 10-15%.

It turns out, though, that there are situations where the various merge policies result in some behaviors that can result in up to 50% deleted documents in an index.

Why is merging done like this?

It’s always a balancing act when creating merge policies. Some options are:

  • Merge any segments with deleted documents. I/O would go through the roof and indexing (and querying while merging) performance would plummet. In the worst case scenario you would rewrite your entire index after deleting 0.01% of your documents.
  • Reclaim the data from the segments when documents were deleted. Unfortunately that would be equivalent to rewriting the entire index. These are very complex structures and just reaching in and deleting the information associated with one doc is prohibitively expensive.
What? I can’t afford 50% “wasted” space:

Indeed. So how does that happen in the first place? Well, it gets a little tricky. Note that there is a “max segment size” that defaults to 5G that I’m using here, which can be configured higher or lower.

The root problem is that only segments with < 50% “max segment size” live documents are eligible for merging.

Let’s take an idealized 200G index consisting of exactly 40 segments each 5G in size. Further assume that docs in the corpus are updated randomly. None of these segments are eligible for merging until they contain < 2.5G “live” documents. Eventually all of the original 40 segments will have 2.51G “live” documents (or close enough to 50% for me).

You can see Michael McCandless’ writeup here, with graphs! He’s been deeply involved in the merging code.

What can I do about this?

Unfortunately, not much. Here are some ideas people have had:

  • forceMerge (aka optimize) or expungeDeletes. These operations will, indeed, remove all deleted documents if you forceMerge and merge all segments over a certain percent in expungeDeletes. However, the downside here is that once you optimize, you have to keep optimizing it or live with lots of deleted documents [2]. The reason for this is that forceMerge (and expungeDeletes for that matter) create a large segment as a result [3]. In a 100G index example that single segment will be 100G despite the (default) 5G max segment size. Now to become eligible for merging, that single large segment must have < 2.5G “live” documents, it’ll have up to 97.5% wasted space.
  • Make the max segment size bigger. We mentioned that this defaults to 5G, which can be done through a setting in solrconfig.xml. Changing the max segment size makes no difference since the problem isn’t how big each segment can be, it’s the fact that the segment is not considered for merging until is has < 50% of max segment size undeleted documents.
  • Dig into the arcana of the merge policy and tweak some of the lower-level parameters. There are properties like “reclaimDeletesWeight” that can be configured by tweaking TieredMergePolicy in solrconfig.xml. None of those parameters matter since they don’t come into play until the segment has < 50%(max segment size) live documents, and there’s no option to configure this.
So where exactly does that leave us?

Lucene indexing has served well for years, so this is an edge case. Unless you are running into resource problems, it’s best to leave merging alone. If you do need to address this there are limited options at present:

  • Optimize/forceMerge/expungeDeletes. These options will remove some or all of the deleted documents from your index. However, as outlined above, these operations will create segments much larger than the maximum considered for future merges and you’ll have to perform these operations routinely. OPTIMIZING IS NOT RECOMMENDED FOR NRT INDEXES! These are very heavy-weight operations, generally suitable only for very slowly changing indexes.
  • Cleverly execute optimize/forceMerge. You can optionally tell the optimize/forceMerge operation to merge into N segments instead of just a single one, where N is something like (fudge factor) + (total index size)/(max segment size). Theoretically, that would not run into the pathological situation where > 50% of your index consisted of deleted documents. This is not explicitly supported behavior so I’d be very reluctant to predict the behavior.
  • Change the code. Either create a patch for LUCENE-7976 or propose a new merge policy.

This article looks scary, but do remember that TieredMergePolicy has been around for a long time and has served admirably. We created this document to provide a reference for those situations where users notice that their index consists of a greater percentage of deleted documents than expected. As Solr and Lucene move into ever-larger document sets, it may be time to tweak TieredMergePolicy and/or create a new merge policy and that discussion is already under way.

[1] We’re simplifying a little here, the documents in that segment are marked as deleted in a separate file associated with that segment, so in that sense the segment is written to after it’s closed. But that’s unimportant for this discussion.

[2] We’re simplifying again here. If your index is unchanging, or if you always add new documents so no documents are ever updated or deleted, optimizing doesn’t have this downside since there won’t be deleted documents in your index and will show some improved response time.

[3] You can specify that the result of forceMerge will be more than one segment, but that doesn’t change the overall situation.

The post Segment Merging, Deleted Documents and Why Optimize May Be Bad For You appeared first on Lucidworks.

Tim Ribaric: Eduroam on Raspberry Pi

planet code4lib - Thu, 2017-10-12 17:47

So it is possible to connect a Raspberry Pi to Eduroam.

read more

David Rosenthal: Crowdfunding

planet code4lib - Thu, 2017-10-12 15:00
ExoLife FinderI've been a fairly enthusiastic crowdfunder for the past 5 years; I started with the Raspberry Pi. Most recently I backed the ExoLife Finder, a huge telescope using innovative technology intended to directly image the surfaces of nearby exoplanets. Below the fold, some of my history with crowdfunding to establish my credentials before I review some recent research on the subject.

The LowlineMy Kickstarter account shows I've backed 32 funded and 5 unfunded projects. The funded ones include:
Scanadu ScoutI've also backed 8 projects on Indiegogo, such as:
  • Scanadu Scout, a pocket-size gadget that measured blood pressure, pulse, temperature and blood oxygenation. Alas, it didn't get through the FDA into production, and my now-irreplacable unit has just expired after about four years of daily use.
  • CODE: Debugging the Gender Gap, a wonderful movie about gender issues in technology.
Chibitronics Circuit StickersAnd 5 on Crowd Supply, such as:
In most cases, my reaction to the completed projects has ranged from OK to Wow! My ORWL showed up yesterday, a little later than promised. I haven't yet had time to explore its features, but it looks like a Wow! I'd guess that about 1/6 of my projects were disappointing, about half OK, and about 1/3 Wow! Which, for a venture capitalist, would be a great track record.

Ethan Mollick's 2013 paper The dynamics of crowdfunding: An exploratory study showed that:
the vast majority of founders seem to fulfill their obligations to funders, but that over 75% deliver products later than expected, with the degree of delay predicted by the level and amount of funding a project receives. Mollick found that:
The majority of products were delayed, some substantially, and may, ultimately, never be delivered. Of the 247 projects that delivered goods, the mean delay was 1.28 months (sd = 1.56). Of the 126 projects that were delayed, the mean delay to date was 2.4 months (sd = 1.97). Only 24.9% of projects delivered on time, and 33% had yet to deliver.
I find strong evidence that project size and the increased expectations around highly popular projects are related to delays. ... even controlling for project size, the degree to which projects are overfunded also predicts delays. Projects that are funded at10× their goal are half as likely to deliver at a given time, compared to projects funded at their goal. ... project delays were attributed to a range of problems associated with unexpected success: manufacturing problems, the complexity of shipping, changes in scale, changes in scope, and unanticipated certification issues were all listed as primary causes of delays.[ORWL}ORWL was funded four times its relatively modest $25K goal, so some delay is normal. In my opinion, overfunding delays are related to Bill Joy's Law of Startups: "success is inversely proportional to the amount of money". What Bill meant was that tight funding forces teams to take decisions quickly and stick with them, ensuring that if they are going to fail they fail fast. Lavish funding enables analysis paralysis, and pursuit of multiple options simultaneously, both of which detract from focus.

NerdwaxThe recently published research is Does the Crowd Support Innovation? Innovation Claims and Success on Kickstarter by Anirban Mukherjee et al with an overview here:
we arrive at the startling conclusion that novelty and usefulness are not viewed as synergistic by the crowd. While crowdfunding pledges are boosted when the project is said to be useful (or alternatively, novel), claiming that it is both reduces the total amount of pledges by 26 percent.

Our data show that claims of novelty or usefulness, taken separately, do increase the total pledge amount. As a matter of fact, they have a very large initial effect, meaning that even one claim for usefulness (or novelty) greatly boosted the total pledged sum (as compared with projects devoid of either claim). However, it is also important to pick one or the other, not combine them. This conclusion is based on analyzing the text, video and images of over 50K Kickstarter projects in product-oriented categories such as Hardware and Technology using machine-learning tools:
The resulting number of occurrences of the word “novel” and its synonyms served as proxy for novelty claims. Conversely, the sum of occurrences of the word “useful” and its synonyms became the measure for claimed usefulness.The authors ask:
our findings are consistent with the literature on idea screening but not that on consumer evaluation of innovation, as modest innovations are more likely to get funded than more extreme innovations, i.e., innovations that are high on both novelty and usefulness. What is a possible reason for this inconsistency, given that backers in a crowdfunding context typically receive the product in exchange for their support, thus making their decision more like a product choice decision than a typical idea screening decision?The authors speculate that:
this may be due to the high degree of uncertainty associated with the choice in a crowdfunding context, compared to a consumer purchase context. In the prototypical purchase context, consumer protection laws guarantee receipt of the purchased product. In the crowdfunding context, however, there is much greater uncertainty regarding (a) receiving the product and (b) features of the product, than in purchasing, for the following reasons. First, a project may not successfully reach its funding goal. In this case, backers are refunded but do not receive the product. Second, a successfully funded project may be delayed or may fail (the creator may be unable to follow-through). For example, a recent study ... found that more than three-quarters of successfully funded projects (on Kickstarter) are either delayed or failed. In this case, backers are neither guaranteed refunds – they may lose the entire amount pledged – nor guaranteed receipt of the product. Third, projects on Kickstarter are proposed blueprints, rather than descriptions, of the final product. ... we speculate that the higher level of uncertainty in the crowdfunding context drives backers to choose modest innovations and shy away from more extreme innovations, i.e., innovations that are high on both novelty and usefulness.HexBrightI agree that for product-oriented projects the extra uncertainty over a purchase tends to make backers conservative. But I may be an outlier. Overall, my experience for product-oriented projects is much better that Mollick's numbers; 1 failure to deliver and 2 long delays in 21 projects. Maybe I'm better than average at assessing projects. I have sometimes favored novelty over usefulness. For example, who really needs an open-source Arduino-compatible flashlight? But HexBright turned out to be a really good flashlight even ignoring the Arduino inside, which I have never found time to program.

USB CondomOn the other hand, I've sometimes favored usefulness over innovation. There's not a great deal of innovation in a USB Condom, which is simply two USB connectors on a circuit board lacking the data connections. You can check this by looking at the traces. But it is very useful even for people less paranoid than I.

It is important to note that, as far as I can see, almost all the research on crowdfunding is restricted to product-oriented projects. Products are only about 2/3 of my backings.

Rosie's Fine FoodBut I, like many backers, also fund scientific and engineering research, arts projects, even restaurants in return for T-shirts, meal coupons and other tchochkes. In these others we are not buying an expensive T-shirt, we are supporting research, art, urban recovery and countless other worthy endeavors. With a less easily measured result, research is more difficult, and there seems to be little of it.

Open Knowledge Foundation: AbreLatam / Condatos: after the first 5 years

planet code4lib - Thu, 2017-10-12 13:38

This is a somewhat belated entry about the Abrelatam and Condatos, the regional open data conference of Latin America. It comes more than a month after the conference took place in San José, Costa Rica, but the questions raised there are still relevant and super important for advancing open data in Latin America and working towards truly open states.

After five years, the discussions have shifted. We don’t only talk about open data and how to make it happen but about, for example: privacy and how we can make sure our governments will guarantee this the right to privacy in open data work; data standards and how to make them interoperable; and business models and how to be a sustainable organization that can last beyond project funding.

These discussions are crucial in the current context in Latin America, with cases of corruption like Lava Jato or #GobiernoEspía in Mexico. They are particularly important if we want open data to not only be a bunch of good intentions, but rather infrastructure that is there for and because of citizens.

Still, we have a big challenge ahead. As it was often commented in various sessions, we need to systematize all the knowledge we have gathered in these 5 years. We also need to be able to share it with the newcomers and open it up to organizations that aren’t traditionally in the open data sphere. This will help us avoid the echo chamber and keep the work focused on important matters and make open data a valuable asset in the construction of open states.

At the same time, we need to learn from our mistakes, understand what has worked and what hasn’t, continue improving the work, not only go to conferences and speak about the amazing work we do, but also talk about where we make mistakes and help other avoid them.

This won’t be an easy task, but I think we have the right ingredients to make it happen: we have a mature community that is eager to share its experiences and learnings. We’re ready to take on the next five years and construct an open region.


Ed Summers: Twitter and Tear Gas

planet code4lib - Thu, 2017-10-12 04:00

Twitter and Tear Gas: The Power and Fragility of Networks Protest by Zeynep Tufekci. New Haven, Yale University Press, 2017, xxxi+ 326 pp. (hardcover), 978-0-300-21512-0.

Originally published in Internet Histories.

In August 2014 I took part in a panel conversation at the Society of American Archivists meeting in Washington DC that focused on the imperative for archivists to interrogate the role of power, ethics and regulation in information systems. The conference itself stands out in my memory, because it began on 10 August, the day after Michael Brown was killed by police officer Darren Wilson in Ferguson, Missouri. I distinctly remember the hand that shot up immediately during the Q&A period to ask what, if anything, we will remember of the voices from Ferguson in social media, that raised awareness of the injustice that had occurred there. Before anyone had much of a chance to respond another voice asked whether anyone had seen the blog post about how radically different Twitter and Facebook’s presentations of Ferguson were. The topic of power, ethics and regulation were not simply academic subjects for discussion, they were demands for understanding from information professionals actively engaged in the work of historical production.

The blog post mentioned that day was What Happens to #Ferguson Affects Ferguson by Zeynep Tufekci. It was published on the social media platform Medium, as the sustained protests in Ferguson began to propel the hashtag #BlackLivesMatter into many Twitter timelines and newsrooms around the world. Like so much of her work, Tufekci’s post asked her readers to think critically about the algorithmic shift we have been witnessing in our media and culture since the advent of the web and the rise of social media. Tufekci is a consummate public scholar, who uses online spaces like her blog, Twitter, Medium, TED talks and editorials in the The Atlantic and The New York Times to advance a crucial discussion of how the affordances of information technology are both shaped, and being shaped, by social movements and political infrastructures. It is a pivotal time for scholars to step out from the pages of academic journals and into the World Wide Web spaces that are grappling with the impact of post-truth politics and fake news. It is into this time and place that Tufekci’s first book Twitter and Tear Gas: The Power and Fragility of Networked Protest is launched.

Tufekci’s book is divided into three main parts 1) Making a Movement, 2) A Protester’s Tools, and 3) After the Protests. While these suggest a chronological ordering to the discussion, the different parts, and the ten chapters found within them, reflect a shifting attention to the specifics of networked social movements. Part 1 provides the reader with a general discussion of how the networked public sphere operates with respect to social movements. This is followed by Part 2 which takes a deeper dive into the specific affordances and sociotechnical logics of social media platforms such as Twitter, Facebook and Google. And finally, Part 3 integrates the previous discussion by articulating a theory for how social movements function in, and through, online spaces.

Throughout the book Tufekci focuses on the specifics of protest and counter-protest, while stressing that social media spaces are not disembodied and virtual phenomena, but are actual, contingent configurations of people, technology and power. In teasing out the dimensions of networked public sphere Tufekci reminds me of Kelty’s concept of a recursive public in which the public’s participants are actively engaged in the maintenance, modification and design of the technical and material means that sustain the public itself (Kelty, 2008). In many ways Twitter and Tear Gas hacks the sociopolitical systems that it describes. It’s no mistake that the book is licensed with the Creative Commons and is freely downloadable from it’s companion website.

2013 Taksim Square by Fleshstorm

Prior to her academic career, Tufekci worked as a software developer at IBM where she first encountered the information infrastructure we call the Internet. You can sense this training and engagement with practice in her work which always seems to be pushing up against, but not overstepping, the art of what is possible. As a sociologist she brings the eye of an ethnographer to her study of protest. Tufekci is not a distant observer, but a participant, with actual stakes in the political outcomes she describes. She is pictured on the dust jacket wearing a helmet to protect her from tear gas canisters that were shot into the crowd that she was a part of in the Gezi Park protests. The book sits on the solid foundations of her own experience as well as the experiences of activists and organisers that she interviews. But Twitter and Tear Gas also significantly engages with sociological theories to bring clarity and understanding to how social media and social movements are co-produced.

In the pages of Twitter and Tear Gas you will find scenes of protests from around the world that are put into conversation with each other. From Zapatista solidarity networks, to the disruption of the World Trade Organization in Seattle, the global anti-war protests after 9/11, to [Occupy] in Zuccotti Park, the Egyptian Revolution in Tahrir Square, the Gezi Park protests in Istanbul, the Indignados in the Plaza del Sol, the Umbrella Movement in Hong Kong, and BlackLivesMatter in Ferguson, Missouri. Twitter and Tear Gas functions as a historical document that describes how individuals engaged in political action were empowered and inextricably bound up with social media platforms. While it provides a useful map of the terrain for those of us in the present, I suspect that Twitter and Tear Gas will also be an essential text for future historians who are trying to reconstruct how these historical movements were entangled with information technology, when the applications, data sources and infrastructures no longer exist, or have been transformed by neglect, mergers and acquisitions, or the demands for something new, into something completely different. Even if we have web archives that preserve some “sliver of a sliver” of the past web (Harris, 2002) we still need to remember the stories of interaction and the technosocial contingencies that these dynamic platforms provided. Despite all the advances we have seen in information technology a book is still a useful way to do this.

One of the primary theoretical contributions of this text is the concept of capacity, or a social movement’s ability to marshal end effect narrative, electoral and disruptive change. Tufekci outlines how the affordances of social media platforms make possible the leaderless adhocracy of just-in-time protests, and how these compare to our historical understanding of the African-American Civil Rights Movement in the United States. The use of hashtags in Twitter allow protesters to communicate at a great speed and distance to mobilise direct action in near real time. Planning the Civil Rights Movement took over a decade, and involved the development of complex communication networks to support long term strategic planning.

Being able to skip this capacity building phase allows networked social movements to respond more quickly and in a more decentralised fashion. This gives movements currency and can make them difficult for those in power to control. But doing so can often land these agile protests in what Tufekci calls a tactical freeze, where, after an initial successful march, the movement is unable to make further collective decisions that will advance their cause. In some ways this argument recalls Gladwell (2010) who uses the notion of weak ties (Granovetter, 1973) to contend that social media driven protests are fundamentally unable to produce significant activism on par with what was achieved during the civil rights era. But Tufekci is making a more nuanced point that draws upon a separate literature to make her argument, notably the capacity in development work of Sen (1993) and the capability theory of justice of Nussbaum (2003). Tufekci’s application of these concepts to social movements, and her categories of capacity, combined with the mechanics of signaling by which capacities observed and responded to operate as a framework for understanding why we cannot use simple outcome measures, such as numbers of people who attend a protest, when trying to understand the impact of networked social movements. For those who are listening she is also pointing to an area that is much in need of innovation, experimentation and study: tools and practices for collective decision-making that will allow people to thaw these tactical freezes.

Another significant theoretical thread running through Twitter and Tear Gas concerns the important role that attention plays in understanding the dynamics of networked protest. Social media is well understood as an attention economy, where users work for likes and retweets to get eyes on their content. Social and financial rewards can follow from this attention. While networked protest operates in a similar fashion, the dynamics of attention can often work against its participants, as they criticise each other in order to distinguish themselves. Tufekci also relates how the affordances of advertising platforms such as Google and Facebook made it profitable for Macedonian teenagers to craft and spread fake news stories that would draw attention away from traditional news sources, generate clicks and ad revenue, and as a side effect, profoundly disrupt political discourse.

Perhaps most significant is the new role that attention denial plays in online spaces, as a tactic employed by the state and other actors seeking to shape public opinion. Tufekci calls this the Reverse-Streisand Effect, since it uses the Internet to funnel attention to topics other than a particular topic at hand. She highlights the work of King, Pan, & Roberts (2013) that analysed how China’s so-called 50 Cent Army of web commenters shapes public opinion not simply by censoring material on the web, but by drawing attention elsewhere at key moments. Social media platforms are geo-political arenas, where bot armies are deployed to drown out hashtags and thwart communication, or attack individuals with threats and volumes of traffic that severely disrupt the target’s use of the platform. When people’s eyes can be guided, or pushed away, censorship is no longer needed. It is truly chilling to consider the lengths that those in power, or seeking power, might stoop to, in order to provide these events when needed.

Friday, Day 14 of Occupy Wall Street by David Shankbone

As significant as these theoretical contributions are, it is Tufekci’s personal voice combined with flashes of insight that I remember most from Twitter and Tear Gas. Details such as the use of the use of Occupy’s human microphone to amplify speaker’s voices and shape speech is a poignant metaphor for Twitter’s capacity for amplifying short message bursts that cascade through the network as retweets. In another Tufekci considers why so many protest camps set up libraries, and connects the work being done in social media to the work of pamphleteers throughout history. She describes the surreal experience of watching pastel hearts float across Periscope videos from Turkish Parliamentarians that were preparing to be bombed during an attempted coup. Near the end of the book she draws an analogy between the rise of fake news fueled by social media, and the ways in which Gutenberg’s printing press escalated the Catholic Church’s distribution of indulgences, opening itself up to the criticism found in Luther’s 95 theses–which were also printed. The stories work is generative of a humanistic outlook that does not deny or celebrate big data:

There is no perfect, ideal platform for social movements. There is no neutrality or impartiality–ethics, norms, identities, and compromise permeate all discussions and choices of design, affordances, policies, and algorithms on online platforms. And yet given the role of the these platforms in governance and expression, acknowledging and exploring these ramifications and dimensions seems more important than ever. (p. 185)

In fact, saying that Tufekci’s book has an explicit narrative arc is an oversimplification. It functions more like a fabric that weaves theory, observation and story, as topics are introduced and returned to later; there is no set chronology or teleology that is being pursued. On finishing the book it is clear how the concepts of attention and capacity are present throughout. But Tufekci makes these theoretical connections not with over abstraction and heavy citation, but by presenting scenes of protest where these concepts are being enacted. While there are certainly references to the supporting literature, the text is not densely packed with them. Finer theoretical manoeuvres are reserved for the endnotes, and do not overwhelm the reader as they move through the text. If you are teaching a course that surveys either communications, sociology or the politics of social media platforms and information infrastructures more generally Twitter and Tear Gas belongs on your syllabus. Your students will thank you: they can download the book for free, they can follow Tufekci on Twitter and Facebook, and her book speaks directly to the socio-political moment we are all living in.


Gladwell, M. (2010). Small change: Why the revolution will not be tweeted. The New Yorker. Retrieved from

Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology, 78(6), 1360–1380.

Harris, V. (2002). The archival sliver: Power, memory, and archives in South Africa. Archival Science, 2(1-2), 63–86.

Kelty, C. M. (2008). Two bits: The cultural significance of free software. Duke University Press. Retrieved from

King, G., Pan, J., & Roberts, M. E. (2013). How censorship in china allows government criticism but silences collective expression. American Political Science Review, 107(2), 326–343.

Nussbaum, M. (2003). Capabilities as fundamental entitlements: Sen and social justice. Feminist Economics, 9(2-3), 33–59.

Sen, A. (1993). The quality of life. In M. Nussbaum & A. Sen (Eds.), The quality of life. Oxford: Clarendon Press.

District Dispatch: Higher education reauthorization on Congressional fall agenda

planet code4lib - Wed, 2017-10-11 20:25

On the fall agenda for Congress is the reauthorization of the Higher Education Act (HEA). HEA was originally enacted in 1965 during the Lyndon Johnson Administration and was last reauthorized in 2008. While HEA has received significant interest in the past few sessions of Congress, its passage has stalled under partisan rancor.

U.S. President Lyndon B. Johnsonsigns the Higher Education Act Nov. 8, 1965. (AP Photo)

Even though HEA has been operating without being reauthorized, its reauthorization is important because it sends a message to the Appropriators that the program is a priority for Congress. Under the powers of Congress, a program or agency is “authorized”to operate and exist. Most program authorizations are designed to expire every few years and must be reauthorized. The process to reauthorize a program allows Congress an opportunity to examine if the program needs to be changed, modernized or possibly sunset (e.g., the Board of Tea Appeals, Board of Economic Warfare, etc.).

Authorizations can also include long-term spending plans for a program. However, an authorization is not necessary for a program to receive federal funding, nor does it guarantee a level of funding. Appropriations bills determine independently the level of funding a program is to receive in a given year. An unauthorized program may continue to receive funding, but some “fiscal hawks” in Congress are increasingly threatening to sunset unauthorized programs.

Many of the provisions, or Titles, of HEA will have minimal direct impact on libraries, but a few key areas warrant attention from the library community. How Congress views these programs may impact libraries at colleges and universities, particularly in two areas:

Title IV of HEA authorizes a broad array of aid programs to assist students in financing a higher education. The programs authorized under this title are the primary sources of federal aid to support higher education. Students who work in libraries or are enrolled in degree programs such as Master of Library and Information Science (MLIS) programs may qualify for loan relief. The two most impactful HEA authorized programs for libraries are the Public Service Loan Forgiveness (allows debt forgiveness for borrowers working in public service careers for 10 years, including libraries), and the Perkins Loan Cancellation (allows loan forgiveness for qualifying borrowers who work in school or tribal libraries or other educational settings).

Titles III and V authorize grants to higher education institutions that serve a high number of low-income and minority students (including Historically Black Colleges, Tribal Colleges and Universities, and Hispanic-Serving Institutions). These schools can utilize federal grants to meet a range of needs, including the purchase of library books and materials and the construction, renovation and improvement of classrooms and libraries. ALA opposes any efforts to reduce support for underserved students.

Programs to support higher education libraries and MLIS students are valuable assets at colleges and universities and support the mission of ALA. HEA reauthorization is likely to consume much of the higher education agenda for months, and the ALA Washington Office will keep you informed as these issues develop.


The post Higher education reauthorization on Congressional fall agenda appeared first on District Dispatch.

Meredith Farkas: The ballad of the sad instruction librarian

planet code4lib - Wed, 2017-10-11 19:22

It’s been a busy Fall term so far and I haven’t had much time to spend on Twitter, but I usually check it first thing every morning. When I did one day last week, this thread caught my eye:

Sitting in a FB thread of professors complaining (nicely) about unqualified librarians doing shitty instruction sessions. They’re not wrong.

— Archivist Wasp (@nnschiller) October 5, 2017

Of course I want to “NOT ALL LIBRARIANS!” but defensiveness never won me an argument. Plus, they’re not wrong.

— Archivist Wasp (@nnschiller) October 5, 2017

So I apologize & say most of us have developed reflective & skilled pedagogical practices, but I’m full of shit, aren’t I?

— Archivist Wasp (@nnschiller) October 5, 2017

This just made me feel really sad, particularly that Nick felt he had to apologize for us and that he has so little confidence in librarians’ ability to teach. I’m certainly not going to deny that there are bad library instructors, but I think it’s a lot more complicated than that. I also find it funny that when people talk about the quality of library instruction, they always assume that they are the good ones (not just you Nick, but all of us). How do we really know that? I have never assumed that I’m great at teaching. I know that I’ve improved, based on assessments I’ve done and how students and faculty respond to my teaching, but I want to keep improving. If you think you’re a great instructor already and don’t need to improve, maybe you’re the problem.

And even great instruction librarians have awful sessions. This happens to disciplinary faculty too; I’ve had conversations with friends who teach outside of libraries and we all have horror stories. It sucks that one bad session can sour a disciplinary faculty member on library instruction entirely, especially when they should recognize that they’ve probably had bad one-off teaching experiences too. We’re all human.

But, still, I agree that there are librarians who are bad at teaching and bad at engaging students. There are also plenty of librarians who never wanted to teach in the first place. At my first job, everyone taught, from the the Head of Tech Services to the ILL Librarian to the Systems Librarian. There are a lot of libraries like that. But I also think that library schools don’t make it clear that teaching is part of being a librarian in so many library jobs, especially in academia. And in this job market, people will take jobs that include things they really don’t want to do so that they’re employed. If a librarian doesn’t want to teach, how motivated will they be on their own to try to improve?

Looking at my alma mater, Florida State, here is the recommended coursework if you’re going to focus on academic librarianship:

LIS 5603 Introduction to Information Services
LIS 5511 Management of Information Collections
LIS 5442 Information Leadership
LIS 5602 Marketing of Library and Information Services
LIS 5603 Introduction to Information Services
LIS 5485 Introduction to Information Technologies
LIS 5105 Communities of Practice (variable content areas)
LIS 5203 Assessing Information Needs
LIS 5241 International & Comparative Information Service
LIS 5260 Information Science
LIS 5263 Theory of Information Retrieval
LIS 5270 Evaluating Networked Information Services & Systems
LIS 5271 Research in Information Studies
LIS 5442 Information Leadership
LIS 5417 Introduction to Legal Resources
LIS 5474 Business Information Needs and Sources
LIS 5590 Museum Informatics
LIS 5602 Marketing Library and Information Services
LIS 5661 Government Information
LIS 5736 Indexing and Abstracting
LIS 5787 Fundamentals of Metadata Theory and Practice

Their only instruction-focused class, LIS 5524 Instructional Role of the Informational Specialist, is recommended for people focusing on “Reference” and “Youth Services,” not academic librarianship (yet somehow we all need Museum Informatics??? WTF FSU?). When I was at FSU, the class was 100% geared toward students planning to become Library Media Specialists so I didn’t take it. Based on the courses offered at FSU, I had NO IDEA instruction was a huge part of library work. I’m tremendously disappointed to see that they STILL aren’t doing more to promote courses on instruction and instructional design. Talk about out of touch!

So I think about the people who want to improve, but don’t have the time within their work day to develop professionally and improve or just don’t know where to start. Not everyone has the luxury of time and money to support their professional development. If you’re doing so much teaching and working at the reference desk that you don’t even have time to reflect on how classes went, how are you going to get better? And the fault for that does not lie with librarian, but with the institution that doesn’t support their improvement.

In response to what Nick Schiller tweeted, my collaborator and friend Lisa Hinchliffe wrote:

I wish librs would stop hiring ppl to teach it aren't good at teaching. Hurts lib reputation+traumas librns. Align hiring w duties!

— Lisa Hinchliffe (@lisalibrarian) October 5, 2017

Here was my response to that:

There’s not being good and there’s being green (which oft. looks the same). Most libraries throw ppl into the deep end w/o training/support.

— Meredith Farkas (@librarianmer) October 5, 2017

I did a little informal survey on Twitter to get a sense of how many librarians were prepared in any way — either by their LIS programs or by their workplaces — to teach information literacy.

Did you receive training on effective #infolit instruction before you were expected to start teaching?

— Meredith Farkas (@librarianmer) October 5, 2017

That is tremendously depressing. I have worked at three different academic libraries and at none of them did I receive any training in how to teach. I could understand that more in my second and third jobs, because they had some expectation that I knew how to teach (though I really had to relearn how to teach when I came to PCC and started working with community college classes). In my first job, I was thrown into the deep end with zero support and am sure I did a crappy job early on, especially since all of my classes in college had been lecture-focused so I didn’t have any models for active learning-style classes. Over time, I read books and articles and tried to learn as much as I could about how to be an effective instructor. I started to incorporate more activities into my teaching so students were actually doing (and sometimes teaching!) instead of me being a sage on the stage all the time. But I got no help from my colleagues because, though they had more experience, they had not been taught how to teach effectively either. We were all just fumbling around.

When you think about how few workplaces actually prepare librarians to teach, it makes me wonder whether those places think teaching is something anyone can do or if they just don’t value instruction. Reference and instruction positions are usually seen as entry-level, which is ironic, since they have the most contact with our students and faculty. They, to a large extent, determine how the library is viewed by faculty, which is hugely important! Administrators who don’t have a formal training program for library instruction, do you think this work is something anyone off the street can do? Or do you not value it? If neither of those things are true, then why are you not setting your library staff/faculty up to succeed?

I think having a formal training program around information literacy instruction for all librarians who teach when they are new to an institution is critically important and I urge every library director, dean, and AUL to consider why they don’t have something to on-board librarians for teaching at their institution. If it’s for all new hires who teach, it then becomes something that is supportive and not accusatory. Even experienced librarians have something to learn and instruction looks different at different institutions with different goals and different student populations.

As a former head of instruction at two institutions, I know how much ego and defensiveness can crop up around efforts to support instruction librarians with their teaching. It can feel like a threat to some, like an accusation that they are doing a shitty job. I’ve written about my own efforts as an instruction coordinator to support instructional improvement and there are a lot of ways to approach this. But, really, we’re no different than disciplinary faculty who are often equally uncomfortable being observed and/or critiqued. The difference is that we always have an instructor watching us when we teach, while they don’t.

Sometimes it’s less about the quality of the instructor and more about the approach the instructor takes. Every librarian has their own style; their own way of teaching certain concepts that may be more or less embraced by the people for whom we are teaching. My colleagues are all great teachers, but we all have widely varying approaches. At my library, each of us has instructors who request us specifically. I’ve been warned about instructors I loved teaching for and I’ve had classes go badly with instructors other colleagues love working with. I know instructors are sometimes frustrated that they will get a different approach to the outcomes depending on who is assigned to the class, but, again, we’re no different than they are. They don’t all teach the same either.

Any librarian who teaches information literacy also knows that there are things completely out of their control that impact how the class goes. Sometimes it’s the culture of the class. I remember once working with three sections of a criminal justice class in a row with the same instructor. Two of them went really well and one just was flat. The students were really low-energy and didn’t want to participate in activities. The instructor told me that class was like that with her as well. For some classes, I get to sit in on part of their class before I provide instruction, which gives me an interesting little window into the culture of the class. I’ve seen instructors who keep their students in rapt attention and instructors whose students look comatose. Not surprisingly, the students in classes where they are more engaged by their instructor are also usually more engaged when I’m teaching them. The instructor can really set the tone. Of course, we can still screw it up and I have, but how the instructor manages their own classroom makes a big difference. I’ve sometimes felt like a rock star leaving a classroom when, really, so much of the credit for how it went should have gone to their regular instructor.

We often walk into classes with incorrect or incomplete information about what the students are working on, where they are, and what they struggle with because their instructor doesn’t communicate the information to us. We walk into classes where students know nothing about the assignment even though the instructor told us they’d have selected topics by then. Sometimes they are doing their assignment later in the term, but the instructor requested that day because they couldn’t be in class. Sometimes in response to asking about their goals, instructors just tell us to do the “usual library spiel” or the “usual library tour” as if such a thing existed. Some instructors make it really difficult for us to create a tailored lesson plan for their class and sometimes we end up having to throw that plan out the window because we were misinformed. I recently wrote numerous times to an instructor who’d requested instruction to find out what they were working on and never received a response to any of my inquiries!

Our time and expertise is sometimes disrespected. We get instructors who request instruction because they’re going to be out that day. Often, we don’t find that out until the last minute when the instructor doesn’t show up. We have instructors who sit in the back and check email instead of participating in the class or even just being present. We get instructors who have noisy one-on-one conferences with students in the classroom while we are teaching (which isn’t at all distracting, right?). We get instructors who don’t give us enough time to cover the outcomes they want us to focus on or that give us time but then take the first 20 minutes to cover class stuff without warning us in advance. I’ve had instructors show up on the wrong day with their classes. One got angry at me about it, even after I showed them the confirmation email I’d sent. I ended up teaching the class (totally unprepared) and she never requested instruction again. Back in my first job, I was just starting a jigsaw activity with an English 101 class when the instructor said “I don’t want them doing that. That doesn’t sound useful.” Can you imagine how demoralizing it is to be contradicted in that way when you are teaching?

These stories do not represent the majority of the classes I work with. I also work with plenty of fantastic instructors who I love to work with year after year. I have instructors who really collaborate with me around determining the shape of library support for their classes. I have instructors who are totally game to try new things, even if they don’t always go well (and they’re kind enough to sympathize when things don’t go well). I have instructors who adequately prepare the students for what I’m going to cover in the information literacy session — they set the table for me. I have instructors who are active participants in my information literacy sessions. I have instructors who show they appreciate what I do. And those classes, not surprisingly, tend to go better than the ones where the instructors are checked out, disrespectful, or dismiss the work we put into tailoring a session to their students.

As I’ve mentioned before, I teach a class for San Jose State University’s iSchool on library embedment, which is mainly focused on embedding information literacy instruction and support into the curriculum and beyond the curriculum. A lot of what we read early on is focused on librarian-faculty collaboration and students always notice that there is often a lot of misunderstanding and also ego on both sides (am I the only person who now hears Donald Trump every time I hear or write “on both sides?” — barf). Librarians often assume that instructors are not teaching information literacy themselves and if they are, they’re certainly not doing it well. Instructors often underestimate librarians, seeing them more as service providers who demo databases rather than as instructors, experts, or collaborators. You can see it in the language both groups use. I witness that disconnect every time I see someone requesting a “library tour” when they don’t mean “walking students around the library” but actually mean information literacy instruction.

I think both librarians and disciplinary faculty should try to better understand and respect what the other does. I think we should cut each other some slack when it doesn’t always go well and also be willing to offer feedback, which I know is difficult (both for librarians and disciplinary faculty), but makes things better. I have saved many students from bad and unclear assignments by gently questioning the instructor about them and I would love to know what I can do to make their class’ experience better.

The problem of people who are poor instructors or lack motivation can only be solved by the Library. More resources should go toward training and on-boarding librarians to teach. The library should be set up to support the continuing development of veteran instruction librarians too; we all have more to learn. This won’t fix everything — there will always be people who just don’t care and aren’t motivated to improve — but everyone I have worked with earnestly wants to teach well and really cares about students. If we all had better support, the vast majority of us would be better instructors; and that includes our disciplinary colleagues.

Image credit: UIUC admissions blog

LITA: Jobs in Information Technology: October 11, 2017

planet code4lib - Wed, 2017-10-11 18:53

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University of North Florida – Thomas G. Carpenter Library, Systems Librarian, Jacksonville, FL

Dayton Metro Library, Technology Development Manager, Dayton, OH

Metropolitan State University, Electronic Resources and Discovery Librarian, St Paul, MN

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Terry Reese: MarcEdit Delete Field by Position documentation

planet code4lib - Wed, 2017-10-11 16:13

I was working through the code and found an option that quite honestly, I didn’t even know existed.  Since I’m creating new documentation for MarcEdit 7, I wanted to pin this somewhere so I wouldn’t forget again.

A number of times on the list, folks will ask if they can delete say the second field in a field group.  Apparently, you can.  In the MarcEditor, select the Add/Delete field tool.  To delete by position, you would enter {#} to denote the position to delete in the find.

Obviously, this is pretty obscure – so in MarcEdit 7, this function is exposed as an option

To delete multiple field positions, you just add a comma.  So, say I wanted to delete fields 2-5, I would enter: 2,3,4,5 into the Field Data box and check this option.  One enhancement that I would anticipate a request for is the ability to delete just the last option – this is actually harder than you’d think – in part, because it means I can’t process data as it comes in, but have to buffer it first, then process, and there are some reason why this complicates things due to the structure of the function.  So for now, it’s by direct position.  I’ll look at what it might take to allow for more abstract options (like last).


Dan Cohen: Roy’s World

planet code4lib - Wed, 2017-10-11 15:03

In one of his characteristically humorous and self-effacing autobiographical stories, Roy Rosenzweig recounted the uneasy feeling he had when he was working on an interactive CD-ROM about American history in the 1990s. The medium was brand new, and to many in academia, superficial and cartoonish compared to a serious scholarly monograph.

Roy worried about how his colleagues and others in the profession would view the shiny disc on the social history of the U.S., and his role in creating it. After a hard day at work on this earliest of digital histories, he went to the gym, and above his treadmill was a television tuned to Entertainment Tonight. Mary Hart was interviewing Fabio, fresh off the great success of his “I Can’t Believe It’s Not Butter” ad campaign. “What’s next for Fabio?” Hart asked him. He replied: “Well, Mary, I’m working on an interactive CD-ROM.”

Ten years ago today Roy Rosenzweig passed away. Somehow it has now been longer since he died than the period of time I was fortunate enough to know him. It feels like the opposite, given the way the mind sustains so powerfully the memory of those who have had a big impact on you.

The field that Roy founded, digital history, has also aged. So many more historians now use digital media and technology to advance their discipline that it no longer seems new or odd like an interactive CD-ROM.

But what hasn’t changed is Roy’s more profound vision for digital history. If anything, more than ever we live in Roy’s imagined world. Roy’s passion for open access to historical documents has come to fruition in countless online archives and the Digital Public Library of America. His drive to democratize not only access to history but also the historical record itself—especially its inclusion of marginalized voices—can been seen in the recent emphasis on community archive-building. His belief that history should be a broad-based shared enterprise, rather than the province of the ivory tower, can be found in crowdsourcing efforts and tools that allow for widespread community curation, digital preservation, and self-documentation.

It still hurts that Roy is no longer with us. Thankfully his mission and ideas and sensibilities are as vibrant as ever.

Open Knowledge Foundation: Remix public domain artworks: join the GIF IT UP 2017 competition

planet code4lib - Wed, 2017-10-11 13:00

This blogpost has been adapted from the press release by Europeana.

Open Knowledge International has for many years advocated for the importance of open cultural data, which enables citizens from across the world to enjoy this material, understand their cultural heritage and re-use this material to produce new works of art. Some examples of this work include the OpenGLAM initiative that promotes free and open access to digital cultural heritage held by Galleries, Libraries, Archives and Museums, and The Public Domain Review, an online journal and not-for-profit project dedicated to promoting and celebrating the public domain in all its abundance and variety. Another great initiative encouraging the reuse of openly licensed cultural data is the GIF IT UP competition, which is open for contributions this month.

From 1-31 October, all GIF­makers, history nuts, cultural heritage enthusiasts and lovers of the internet are invited to take part in the fourth annual GIF IT UP competition. The competition encourages people to create new, fun and unique artworks from digitized cultural heritage material. A GIF is an image, video or text that has been digitally manipulated to become animated. Throughout the month, they can create and submit their own, using copyright-free digital video, images or text from Europeana CollectionsDigital Public Library of America (DPLA)Trove, or DigitalNZ.

All entries help promote public domain and openly licensed collections to a wider audience, and increase the reuse of material from these four international digital libraries, including Europeana Collections. The contest is supported by GIPHY, the world’s largest library of animated GIFs.

The 2017 competition will have a special focus on first-time GIF-makers and introduce them to openly licensed content. A GIF-making workshop, providing tools and tutorials to help visitors create their first artworks, will be held on 14-15 October in cooperation with THE ARTS+, the creative business festival at the Frankfurt Book Fair.

One of this year’s contributions, via GIPHY

The jury, made up of representatives from GIPHY, DailyArt and Public Domain Review, will be awarding one grand prize winner with an Electric Object – a digital photo frame especially for GIFs – sponsored by GIPHY. Prizes of online gift cards will go to three runners-up as well as winners in a first-time GIF-makers category. Special prizes will be allocated in thematic categories: transport, holidays, animals and Christmas cards.

People are also invited to take part in the People’s Choice Award and vote on the competition website for their favourite GIF, which will receive a Giphoscope. All eligible entries will be showcased on the GIPHY channel dedicated to the competition, and promoted on social media with the hashtag #GIFITUP2017.

GIF IT UP started in 2014 as an initiative by the Digital Public Library of America (DPLA) and DigitalNZ, and has since become a cultural highlight. 368 entries from 33 countries are featured on the GIF IT UP Tumblr. In 2016, the grand prize was awarded to ‘The State Caterpillar’, created by Kristen Carter and Jeff Gill from Los Angeles, California, using source material from the National Library of France via Europeana. Nono Burling, who got awarded the 2016 People’s Choice Award for ‘Butterflies’, said: “I adore animated GIFs made from historic materials and have for many years. The first contest in 2014 inspired me to make them myself, and every year I try to improve my skills.”

Results of the 2017 competition will be announced in November on the GIF IT UP website and related social media.

DuraSpace News: Reminder: Board-Member-at-Large Nominations Accepted through Friday!

planet code4lib - Wed, 2017-10-11 00:00
DuraSpace invites the Community to nominate the next Board-Member-at-Large!  Nominations accepted through Friday, October 13th.

Karen Coyle: Google Books and Mein Kampf

planet code4lib - Tue, 2017-10-10 17:43
I hadn't look at Google Books in a while, or at least not carefully, so I was surprised to find that Google had added blurbs to most of the books. Even more surprising (although perhaps I should say "troubling") is that no source is given for the book blurbs. Some at least come from publisher sites, which means that they are promotional in nature. For example, here's a mildly promotional text about a literary work, from a literary publisher:

This gives a synopsis of the book, starting with:

"Throughout a single day in 1892, John Shawnessy recalls the great moments of his life..." 
It ends by letting the reader know that this was a bestseller when published in 1948, and calls it a "powerful novel."

The blurb on a 1909 version of Darwin's The Origin of Species is mysterious because the book isn't a recent publication with an online site providing the text. I do not know where this description comes from, but because the  entire thrust of this blurb is about the controversy of evolution versus the Bible (even though Darwin did not press this point himself) I'm guessing that the blurb post-dates this particular publication.

"First published in 1859, this landmark book on evolutionary biology was not the first to deal with the subject, but it went on to become a sensation -- and a controversial one for many religious people who could not reconcile Darwin's science with their faith."That's a reasonable view to take of Darwin's "landmark" book but it isn't what I would consider to be faithful to the full import of this tome.

The blurb on Hitler's Mein Kampf is particularly troubling. If you look at different versions of the book you get both pro- and anti- Nazi sentiments, neither of which really belong  on a site that claims to be a catalog of books. Also note that because each book entry has only one blurb, the tone changes considerably depending on which publication you happen to pick from the list.

First on the list:
"Settling Accounts became Mein Kampf, an unparalleled example of muddled economics and history, appalling bigotry, and an intense self-glorification of Adolf Hitler as the true founder and builder of the National Socialist movement. It was written in hate and it contained a blueprint for violent bloodshed."
Second on the list:
"This book has set a path toward a much higher understanding of the self and of our magnificent destiny as living beings part of this Race on our planet. It shows us that we must not look at nature in terms of good or bad, but in an unfiltered manner. It describes what we must do if we want to survive as a people and as a Race."That's horrifying. Note that both books are self-published, and the blurbs are the ones that I find on those books in Amazon, perhaps indicating that Google is sucking up books from the Amazon site. There is, or at least at one point there once was, a difference between Amazon and Google Books. Google, after all, scanned books in libraries and presented itself as a search engine for published texts; Amazon will sell you Trump's tweets on toilet paper. The only text on the Google Books page still claims that Google Books is about  search: "Search the world's most comprehensive index of full-text books." Libraries partnered with Google with lofty promises of gains in scholarship:
"Our participation in the Google Books Library Project will add significantly to the extensive digital resources the Libraries already deliver. It will enable the Libraries to make available more significant portions of its extraordinary archival and special collections to scholars and researchers worldwide in ways that will ultimately change the nature of scholarship." Jim Neal, Columbia UniversityI don't know how these folks now feel about having their texts intermingled with publications they would never buy and described by texts that may come from shady and unreliable sources.

Even leaving aside the grossest aspects of the blurbs and Google's hypocrisy about its commercialization of its books project, adding blurbs to the book entries with no attribution and clearly not vetting the sources is extremely irresponsible. It's also very Google to create sloppy algorithms that illustrate their basic ignorance of the content their are working with -- in this case, the world's books.

David Rosenthal: IPRES 2017

planet code4lib - Tue, 2017-10-10 15:00
Kyoto Railway MuseumMuch as I love Kyoto, now that I'm retired with daily grandparent duties (and no-one to subsidize my travel) I couldn't attend iPRES 2017.

I have now managed to scan both the papers, and the very useful "collaborative notes" compiled by Micky Lindlar, Joshua Ng, William Kilbride, Euan Cochrane, Jaye Weatherburn and Rachel Tropea (thanks!). Below the fold I have some notes on the papers that caught my eye.

I have appreciated the Dutch approach to addressing problems ever since the late 70s, when I worked with Paul ten Hagen and Rens Kessner on the Graphical Kernel System standard. This approach featured in two of the papers:
  • How the Dutch prepared for certification by Barbara Sierman and Kees Waterman describes how six large cultural heritage organizations worked together to ease each of their paths up the hierarchy of repository certification from DSA to Nestor. The group added two preparatory stages before DSA (Initial Self-Assessment, and Exploratory Phase), comprising activities that I definitely recommend as a starting point. They also translated the DSA and Nestor standards into Dutch, enhanced some of the available tools, and conducted surveys and awareness-raising.
  • A Dutch approach in constructing a network of nationwide facilities for digital preservation together by Joost van der Nat and Marcel Ras reported that:
    In November 2016, the NCDD research on the construction of a cross-domain network of facilities for long-term access to digital Cultural Heritage in the Netherlands was rewarded the Digital Preservation Award 2016 in the category Research and Innovation. According to the judges the research report presents an outstanding model to help memory institutes to share facilities and create a distributed, nationwide infrastructure network for Digital Preservation. The NCDD didn't go all-out for either centralization or distribution, but set out to find the optimum balance for infrastructure spanning diverse institutions:
    Under the motto “Joining forces for our digital memory”, a research project was started in 2014 ... This project had the purpose to find out what level of differentiation between the domains offers the best balance for efficiency. Without collaboration, inefficiencies loom, while individual institutes continue to expand their digital archives and may be reinventing the same wheel over and over again. The project’s objective was and is to avoid duplication of work, and to avoid wasting time, money, and energy. Economies of scale make it easier for the many smaller Dutch institutes to profit from available facilities, services, and expertise as well. Policy makers can now ponder the question “The same for less money, or more for the same money?”.
I've blogged before about the important work of the Software Heritage Foundation. Software Heritage: Why and How to Preserve Software Source Code by Roberto Di Cosmo and Stefano Zacchiroli provides a comprehensive overview of their efforts. I'm happy to see them making two justifications for preserving open-source software that I've been harping on for years:
Source code is clearly starting to be recognized as a first class citizen in the area of cultural heritage, as it is a noble form of human production that needs to be preserved, studied, curated, and shared. Source code preservation is also an essential component of a strategy to defend against digital dark age scenarii in which one might lose track of how to make sense of digital data created by software currently in production.But they also provide other important justifications, such as these two:
First, Software Heritage intrinsic identifiers can precisely pinpoint specific software versions, independently of the original vendor or intermediate distributor. This de facto provides the equivalent of “part numbers” for FOSS components that can be referenced in quality processes and verified for correctness ....

Second, Software Heritage will provide an open provenance knowledge base, keeping track of which software component - at various granularities: from project releases down to individual source files — has been found where on the Internet and when. Such a base can be referenced and augmented with other software-related facts, such as license information, and used by software build tools and processes to cope with current development challenges. Considering Software Heritage's relatively short history the coverage statistics in Section 9 of the paper are very impressive, illustrating the archive-friendly nature of open-source code repositories.

Emulation featured in two papers:
  • Adding Emulation Functionality to Existing Digital Preservation Infrastructure by Euan Cochrane, Jonathan Tilbury and Oleg Stobbe is a short paper describing how Yale University Library (YUL) interfaced bwFLA, Freiburg's emulation-as-a-service infrastructure to their Preservica digital preservation system. The goal is to implement their policy:
    YUL will ensure access to hardware and software dependencies of digital objects and emulation or virtualization tools by [...] Preserving, or providing access to preserved software (applications and operating systems), and pre-configured software environments, for use in interacting with digital content that depends on them. Yale is doing important work making Feiburg's emulation infrastructure easy-to-use in libraries.
  • Trustworthy and Portable Emulation Platform for Digital Preservation by Zahra Tarkhani, Geoffrey Brown and Steven Myers:
    provides a technological solution to a fundamental problem faced by libraries and archives with respect to digital preservation — how to allow patrons remote access to digital materials while limiting the risk of unauthorized copying. The solution we present allows patrons to execute trusted software on an untrusted platform; the example we explore is a game emulator which provides a convenient prototype to consider many fundamental issues. Their solution depends on Intel's SGX instruction set extensions, meaning it will work only on Skylake and future processors. I would expect it to be obsoleted by the processor-independent, if perhaps slightly less bullet-proof, W3C Encrypted Media Extensions (EME) available in all major browsers. Of course, if SGX is available, implementations of EME could use it to render the user even more helpless.
Always on the Move: Transient Software and Data Migrations by David Wilcox is a short paper describing the import/export utility developed to ease the data migration between versions 3 and 4 of Fedora. This has similarities with the IMLS-funded WASAPI web archive interoperability work with which the LOCKSS Program is involved.

Although they caught my eye, I have omitted here two papers on identifiers. I plan a future post about identifiers into which I expect they will fit:

HangingTogether: Beyond the Authorized Access Point?

planet code4lib - Tue, 2017-10-10 13:00

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Stephen Hearn of the University of Minnesota. Can we still insist on using the authorized access point as the primary identifier? It is scary to imagine that we have to build authorized access points for titles in a “work” focused environment. Other communities are putting together separate pieces of information to help select the correct name or title. Dates are not always the most informative choice for the user. Libraries receive an influx of records where we have no control over the authorized form of the name anyway. Other environments make use of identifiers. Wikipedia, IMDb and MusicBrainz differentiate entities and then prompt you for more information. We have an opportunity to work with a larger community.

Do we still need an authorized access point as a “primary identifier”?  Let’s distinguish identifiers from their associated labels. Access points rely on unique text strings to distinguish them from other access points. A unique identifier could be associated with an aggregate of other attributes that would enable users to distinguish one entity from another. Ideally, we could take advantage of the identifiers and attributes from other, non-library sources. Wikidata, for example, aggregates a variety of identifiers as well as labels in different languages, as pictured above.

The library community has started to move towards the use of identifiers by adding identifiers in the $0 of heading fields. OCLC algorithmically added FAST (Faceted Application of Subject Terminology) headings, with their identifiers, to all WorldCat records that had an LC subject heading.  Other communities have started including VIAF (Virtual International Authority File) cluster identifiers to their entity descriptions. Providing contextual information is more important than providing one unique label. Labels could differ depending on communities—such as various spellings of names and terms, different languages and writing systems, and different disciplines—without requiring that one form be preferred over another.

Catalogers have long added value by supplying information about relationships. RDA attributes have spurred libraries to move toward contextualization. We now have ways of making that information more understandable to users. As those capabilities continue to evolve, the need for unique strings could diminish.

NACO is a valuable program but not everyone is able to contribute. Even in institutions that are NACO contributors, only staff who have received the requisite training can create LC/NACO authority records. The volume of names without authority control is increasing, especially as academic institutions commit to providing a comprehensive overview of their researchers’ output, often stored in separate local databases or scholar profile systems. NACO-level work isn’t sustainable beyond MARC records.

Could Wikidata be an alternative to contribute information about entities?  Adding names or information about entities into Wikidata could be a very low barrier way to for non-NACO staff to supplement NACO contributions. For example, the University of Miami’s RAMP (Remixing Archival Metadata Project) generates Wikipedia pages out of archival descriptions (discussed in the 2014 OCLC Research Webinar,  Beyond EAD). Encouraging contributions to Wikidata could also tap the expertise within our communities.

Envisioning the future:  The authorized access point was designed for a closed, MARC-based environment. Its time has come and gone.  We already see examples of “identifier hubs” that aggregate multiple identifiers referring to the same entity. More work is needed to establish “same as” relationships among different identifiers and to add identifiers to our large legacy databases that can point to one or more of these “identifier hubs.” We need technology that can integrate the metadata from all the sources that generated the identifiers, filtered according to the context. We could start by focusing on identifiers rather than labels as a means to concatenate result sets. Greater functionality for identifiers would drive the value proposition for datasets that merge them and provide correlations among the various sources.


Subscribe to code4lib aggregator