You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 6 hours 51 min ago

Karen Coyle: 2 Mysteries Solved!

Mon, 2016-09-26 16:57
One of the disadvantages of a long tradition is that the reasons behind certain practices can be lost over time. This is definitely the case with many practices in libraries, and in particular in practices affecting the library catalog. In U.S. libraries we tend to date our cataloging practices back to Panizzi, in the 1830's, but I suspect that he was already building on practices that preceded him.

A particular problem with this loss of history is that without the information about why a certain practice was chosen it becomes difficult to know if or when you can change the practice. This is compounded in libraries by the existence of entries in our catalogs that were created long before us and by colleagues whom we can no longer consult.

I was recently reading through volume one of the American Library Journal from the year 1876-1877. The American Library Association had been founded in 1876 and had its first meeting in Philadelphia in September, 1876. U.S. librarianship finally had a focal point for professional development. From the initial conference there were a number of ALA committees working on problems of interest to the library community. A Committee on Cooperative Cataloguing, led by Melvil Dewey, (who had not yet been able to remove the "u" from "cataloguing") was proposing that cataloging of books be done once, centrally, and shared, at a modest cost, with other libraries that purchased the same book. This was realized in 1902 when the Library of Congress began selling printed card sets. We still have cooperative cataloging, 140 years later, and it has had a profound effect on the ability of American libraries to reduce the cost of catalog creation.

Other practices were set in motion in 1876-1877, and two of these can be found in that inaugural volume. They are also practices whose rationales have not been obvious to me, so I was very glad to solve these mysteries.
Title caseSome time ago I asked on Autocat, out of curiosity, why libraries use sentence case for titles. No one who replied had more than a speculative answer. In 1877, however, Charles Ammi Cutter reports on The Use of Capitals in library cataloging and defines a set of rules that can be followed. His main impetus is "readability;" that "a profusion of capitals confuses rather than assists the eye...." (He also mentions that this is not a problem with the Bodleian library catalog, as that is written in Latin.)

Cutter would have preferred that capitals be confined to proper names, eschewing their use for titles of honor (Rev., Mrs., Earl) and initialisms (A.D). However, he said that these uses were so common that he didn't expect to see them changed, and so he conceded them.

All in all, I think you will find his rules quite compelling. I haven't looked at how they compare to any such rules in RDA. So much still to do!
CentimetersI have often pointed out, although it would be obvious to anyone who has the time to question the practice, that books are measured in centimeters in Anglo-American catalogs, although there are few cultures as insistent on measuring in inches and feet than those. It is particularly un-helpful that books in libraries are cataloged with a height measurement in centimeters while the shelves that they are destined for are measured in inches. It is true that the measurement forms part of the description of the book, but at least one use of that is to determine on which shelves those books can be placed. (Note that in some storage facilities, book shelves are more variable in height than in general library collections and the size determination allows for more compact storage.) If I were to shout out to you "37 centimeters" you would probably be hard-pressed to reply quickly with the same measurement in inches. So why do we use centimeters?

The newly-formed American Library Association had a Committee on Sizes. This committee had been given the task of developing a set of standard size designations for books. The "size question" had to do with the then current practice to list sizes as folio, quarto, etc. Apparently the rise of modern paper making and printing meant that those were no longer the actual sizes of books. In the article by Charles Evans (pp. 56-61) he argued that actual measurements of the books, in inches, should replace the previous list of standard sizes. However, later, the use of inches was questioned. At the ALA meeting, W.F. Poole (of Poole's indexes) made the following statement (p. 109):
"The expression of measure in inches, and vulgar fractions of an inch, has many disadvantages, while the metric decimal system is simple, and doubtless will soon come into general use."The committee agreed with this approach, and concluded:
"The committee have also reconsidered the expediency of adopting the centimeter as a unit, in accordance with the vote at Philadelphia, querying whether it were really best to substitute this for the familiar inch. They find on investigation that even the opponents of the metric system acknowledge that it is soon to come into general use in this country; that it is already adopted by nearly every other country of importance except England; that it is in itself a unit better adapted to our wants than the inch, which is too large for the measurement of books." (p. 180)
The members of the committee were James L. Whitney, Charles A. Cutter, and Melvil Dewey, the latter having formed the American Metric Bureau in July of 1876, both a kind of lobbying organization and a sales point for metric measures. My guess is that the "investigation" was a chat amongst themselves, and that Dewey was unmovable when it came to using metric measures, although he appears not to have been alone in that. I do love the fact that the inch is "too large," and that its fractions (1/16, etc.) are "vulgar."

Dewey and cohort obviously weren't around when compact discs came on the scene, because those are measured in inches ("1 sound disc : digital ; 4 3/4 in"). However, maps get the metric treatment: "1 map : col. ; 67 x 53 cm folded to 23 x 10 cm". Somewhere there is a record of these decisions, and I hope to come across them.

It would have been ideal if the U.S. had gone metric when Dewey encouraged that move. I suspect that our residual umbilical chord linking us to England is what scuppered that. Yet it is a wonder that we still use those too large, vulgar measurements. Dewey would be very disappointed to learn this.

So there it is, two of the great mysteries solved in the record of the very first year of the American library profession. Here are the readings; I created separate PDFs for the two most relevant sections:

American Library Journal, volume 1, 1876-1877 (from the Internet Archive)
Cutter, Charles A. The use of capitals. American Library Journal, v.1, n. 4-5, 1877. pp. 162-166
The Committee on Sizes of Books, American Library Journal, v.1, n. 4-5, 1877, pages 178-181

Also note that beginning on page 92 there is a near verbatim account of every meeting at the first American Library Association conference in Philadelphia, September, 1876. So verbatim that it includes the mention of who went out for a smoke and missed a key vote. And the advertisements! Give it a look.

Library of Congress: The Signal: 2016-2017 Class of National Digital Stewardship Residents Selected

Mon, 2016-09-26 15:15

Five new National Digital Stewardship Residents will be joining the Library in late September 2016. Selected from a competitive pool and representing five different library schools, the residents bring a range of skills and experience in working with digital and archival collections. The NDSR program offers recent graduates an opportunity to gain professional experience under the guidance of a mentor. They will acquire hands-on knowledge and skills in the collection, selection, management, long-term preservation and accessibility of digital assets.

Throughout the year, residents and their mentors will attend digital stewardship workshops at the Library of Congress and at one of their five host institutions in the greater Washington, D.C. region.

  • Meredith Broadway of Dallas, Texas, has a Master of Science in Data Curation and Certificate in Special Collections from the University of Illinois at Urbana-Champaign, and a bachelor’s degree from Rhodes College. Meredith will be a resident at the World Bank Group focusing on an assessment framework and appraisal guidelines for identification of data for permanent preservation; a set of analytic process document guidelines to enable documentation of processes used in the collection and analysis of data; and guidelines for linking datasets to related documents and analytical reports.
  • Joseph Carrano of Middlebury, Connecticut, has dual Master’s degrees from the University of Maryland in History and Library Science, and a bachelor’s degree from the University of Connecticut. Joe will be part of a team at the Georgetown University Library developing open-source project guidelines, documentation and workflows for different preservation platforms. He will be involved in all stages of the process of inventory, selection, curation, preparation and ingest of files of all formats.
  • Elizabeth England of Washington, DC, has a Masters degree in Library and Information Science from the University of Pittsburgh, and a Bachelor’s degree from Drew University. Elizabeth will be a resident in the University Archives at the Johns Hopkins University Sheridan Libraries, applying core archival functions such as appraisal, accessioning, processing, preservation, description, and provision of access to a 50 terabyte collection of born-digital photographs, using scripting languages and tools that are vital to manipulating large data sets.
  • Amy Gay of Binghamton, New York, has a Masters degree in Library and Information Science from Syracuse University, and a bachelor’s degree from the State University of New York, Oneonta. Amy will be a resident at the Food & Drug Administration, Office of Science & Engineering Laboratories, Center for Devices & Radiological Health, working on the “CDRH Science Data Catalogue Pilot”; a joint project to develop a searchable digital catalog for data sets, software code, computational models, images and more as part of federally mandated public access efforts. She will lead catalog content acquisition and curation, as well as refining the metadata schema and taxonomy.
  • Megan Potterbusch of Nashville, Tennessee, has a master’s degree from the School of Library and Information Science at Simmons College, and a bachelor’s degree from Earlham College. Megan will serve as a resident at the Association of Research Libraries working in partnership with the George Washington University Libraries and the Center for Open Science to prototype the process of linking the output from a university research unit to a library digital repository through the Open Science Framework — an open source tool that integrates and supports research workflow.

David Rosenthal: The Things Are Winning

Mon, 2016-09-26 15:00
More than three years ago my friend Jim Gettys, who worked on One Laptop Per Child, and on the OpenWrt router software, started warning that the Internet of Things was a looming security disaster. Bruce Schneier's January 2014 article The Internet of Things Is Wildly Insecure — And Often Unpatchable and Dan Geer's April 2014 Heartbleed as Metaphor were inspired by Jim's warnings. That June Jim gave a talk at Harvard's Berkman Center entitled (In)Security in Home Embedded Devices. That September Vint Cerf published Bufferbloat and Other Internet Challenges, and Jim blogged about it. That Christmas a botnet running on home routers took down the gaming networks of Microsoft's Xbox and Sony's Playstation. That wasn't enough to motivate action to fix the problem.

As I write this on 9/24/16 the preceding link doesn't work, although the Wayback Machine has copies. To find out why the link isn't working and what it has to do with the IoT, follow me below the fold.

The insecurity of the IoT has been a theme of many of my blog posts since 2014, pointing out that it was handing the bad guys, even relatively unskilled bad guys, a weapon that could render the Internet unusable. But nothing has been done to fix the problems and defuse the weapon. Dan Goodin's Why the silencing of KrebsOnSecurity opens a troubling chapter for the ‘Net tells us that we are running out of time:
KrebsOnSecurity, arguably the world's most intrepid source of security news, has been silenced, presumably by a handful of individuals who didn't like a recent series of exposés reporter Brian Krebs wrote. ... On Thursday morning, ... he reported that a sustained attack was bombarding his site with as much as 620 gigabits per second of junk data. ... At 4 pm, Akamai gave Krebs two hours' notice that it would no longer assume the considerable cost of defending KrebsOnSecurity. Krebs opted to shut down the site to prevent collateral damage hitting his service provider and its customers. ... In 2013, attacks against anti-spam organization Spamhaus generated headlines because the 300Gb torrents were coming uncomfortably close to Internet-threatening size. The assault against KrebsOnSecurity represents a much greater threat for at least two reasons. First, it's twice the size. Second and more significant, ... the attacks against KrebsOnSecurity harness so-called Internet-of-things devices—think home routers, webcams, digital video recorders, and other everyday appliances that have Internet capabilities built into them.Go read the whole article.

This is asymmetric warfare. It doesn't take much skill or many resources to build a DDOS weapon of this kind. But defending against it is beyond the reach of most websites:
Krebs said he has explored the possibility of retaining a DDoS mitigation service, but he found that the cost—somewhere between $100,000 and $200,000 per year for the type of always-on protection he needs against high-bandwidth attacks—is more than he can afford.So, unless you're seriously wealthy, any time you publish something on the net the bad guys don't like, they can blow your web presence away. Krebs' conclusion is sad:
"Free speech in the age of the Internet is not really free," he said. "We're long overdue to treat this threat with a lot more urgency. Unfortunately, I just don't see that happening right now."And don't think that knocking out important individual Web sites like KrebsOnSecurity is the limit of the bad guys capabilities. Everyone seems to believe that the current probing of the root servers' defenses is the work of China but, as the Moon Worm showed, careful preparation isn't necessarily a sign of a state actor. There are many bad guys out there who could take the Internet down; the only reason they don't is not to kill the goose that lays the golden eggs.

Pastor Martin Niemöller had it right:
First they came for the security gurus, and I did not speak out -
Because I was not a security guru.This is probably yet another reason why we need to evolve to a Decentralized Internet (not just a Decentralized Web), perhaps Named Data Networking (NDN). Although, as I wrote, I'm not aware of a major "black hat" analysis of these decentralized proposals, the argument is very plausible.

Why can a large number of small, compromised devices with limited bandwidth upstream bring down a large, powerful Web site, even one defended by an expensive DDOS mitigation service? Two reasons:
  • In today's centralized Internet, the target Web site will be at one, or a small number of IP addresses. The network focuses the traffic from all the compromised devices on to those addresses, consuming massive resources at the target.
  • In today's centralized Web, the target Web site will be be one tenant sharing the resources of a data center, so the focused traffic inflicts collateral damage on the other tenants. It was the cost in resources and the risk to other customers that caused Akamai to kick out KrebsOnSecurity.
In NDN, a request for a resource only travels as far as one of the nearest copies. And in the process it creates additional copies along the path, so that a subsequent request will travel less far. Thus, instead of focusing traffic, large numbers of requests defocus the traffic. They spread the responsibility for satisfying the request out across the infrastructure instead of concentrating it. By moving the load caused by bad behavior closer to the bad actors, it creates incentives for the local infrastructure to detect and prevent the bad behavior.

Denial-of-service attacks are possible in NDN. They take the form of flooding requests for resources that are known not to exist; flooding requests for resources that do exist, such as posts that you don't like, won't work. But both local and cooperative detection and mitigation techniques seem likely to be effective, see for example:
The fundamental problems, as in so many areas, are that the thinking is short-term and the incentives are misaligned. Iain Thomson at The Register reports on a parallel example:
A study by the RAND Corporation, published in the Journal of Cybersecurity, looked at the frequency and cost of IT security failures in US businesses and found that the cost of a break-in is much lower than thought – typically around $200,000 per case. With top-shelf security systems costing a lot more than that, not beefing up security looks in some ways like a smart business decision.

Romanosky analyzed 12,000 incident reports and found that typically they only account for 0.4 per cent of a company's annual revenues. That compares to billing fraud, which averages at 5 per cent, or retail shrinkage (ie, shoplifting and insider theft), which accounts for 1.3 per cent of revenues.Of course, if because of the insecurity of IoT devices the Internet becomes unusable, or even merely uninteresting once the bad guys have driven anything interesting away, everyone, from the ISPs to the DDOS mitigation services to the IoT device vendors will be out of business. But right now the money is rolling in and it doesn't cost anything to just kick off the targets of the bad guys wrath. Actually fixing things is someone else's problem.

Update 9/25/16: Cory Doctorow writes:
Meanwhile, Krebs was eventually bailed out by Google's Project Shield, one of Jigsaw's anti-"surveillance, extremist indoctrination, and censorship" tools. That right there is another sign of the times: the attacks launched by state-level actors and those who can muster comparable firepower are no match for Google -- so far.He quotes a post by Krebs called The Democratization of Censorship:
But what we’re allowing by our inaction is for individual actors to build the instrumentality of tyranny. And to be clear, these weapons can be wielded by anyone — with any motivation — who’s willing to expend a modicum of time and effort to learn the most basic principles of its operation.Krebs post is long but important - go read it now, before it goes away again. If it does, the Wayback Machine has it.

District Dispatch: ALA asks presidential candidates about broadband plans

Mon, 2016-09-26 13:35

Tonight, candidates for president Hillary Clinton and Donald Trump will face one another in the first presidential debate of this election in Hempstead, New York, at Hofstra University, moderated by NBC’s Nightly News anchor Lester Holt. The theme of tonight’s discussion will be “the direction of America, achieving prosperity, and securing America.”

ALA and other groups have addressed an open letter to debate moderators, calling them to ask candidates about broadband access in their infrastructure plans.

Both candidates have expressed that updating our country’s infrastructure is critical to economic development and America’s global competitiveness. We believe our digital infrastructure—broadband to homes, schools, libraries, and other community anchor institutions and businesses—should be part of that conversation. That’s why today we have joined a number of groups on an open letter to the 2016 presidential debate moderators, calling them to ask candidates about how they’ll address broadband in their infrastructure plans. The letter outlines our shared position that many Americans lack access to digital infrastructure and calls on the debate moderators to ask the following question of candidates:

“Home broadband internet access has become an essential tool for education, employment, civic engagement, and even healthcare. Yet 34 million people still lack access to affordable high­speed internet. What will you do as president to help expand access to affordable high­speed internet for everyone in America?”

The debate will run from 9:00 to 10:30 p.m. (Eastern time).

The post ALA asks presidential candidates about broadband plans appeared first on District Dispatch.

FOSS4Lib Recent Releases: VuFind - 3.1

Mon, 2016-09-26 12:56

Last updated September 26, 2016. Created by Demian Katz on September 26, 2016.
Log in to edit this page.

Package: VuFindRelease Date: Monday, September 26, 2016

Access Conference: Dine-Arounds

Mon, 2016-09-26 12:18

Thursday night, hang out with friends and try some of the best dining (and brewing!) Fredericton has to offer with Dine-Arounds. Sign-up sheets will be available at the registration desk on Wednesday and Thursday (Oct. 5 and 6), but you can get a sneak peek at the dining selection now. In addition to our dine-around reservations, you’ll find recommendations and links to menus for a great selection of restaurants, pubs, and bistros to sample from during your visit to Fredericton!

Karen G. Schneider: A scholar’s pool of tears, Part 1

Sun, 2016-09-25 14:58

This is Part 1 of the origin story of the following scholarly article. In this blog post I review how this article was produced and accepted for publication, and why I chose a non-OA journal.

Schneider, K.G. (in press). To Be Real: Antecedents and Consequences of Sexual Identity Disclosure by Academic Library Directors, The Journal of Academic Librarianship, Available online 13 August 2016, ISSN 0099-1333,

Chapter 1: Somehow, I write this thing

To be Real is a heavily-remastered version of the qualifying paper I wrote for the doctoral program I’m in. This article was a hurdle I had to pass on the way to becoming a doctoral candidate in a program, now ended, for organizational leadership in libraries (officially called Managerial Leadership in the Information Professions). This program has produced PhDs at the leadership level now working in executive roles in dozens of library organizations, and I look forward to donning the tam that will invest me in their ranks.

To be Real was just one hurdle before I could begin working on my dissertation (now in progress). Some of you are now itching to tell me that “the best dissertation is a done dissertation,” or that “the most important page is the one with the signatures on it.” O.k., consider it said. Also insert any joke you like about the quality of dissertations; I’ve heard it. In the end, I still need to produce a redoubtable piece of original scholarship that meets the expectations of my program and my committee.  Now let’s move on.

There were other milestones in the program. I needed to stump through two years of classes, including 6 residential intensives in Boston or other East Coast locations; a heavy reading schedule; coursework, also known as endless hours flailing at a keyboard; a $500 moving violation incurred when I was headed to SFO to fly to Boston for my second semester and wearily zombied through a right turn without stopping; about 30 red Sharpie Ultra Fine Point markers (aka RPOD, or Red Pens of Death); and my “comps,” which were two four-hour closed-book exams requiring copious quantities of memorization, a feat at any age, no comment on what that meant for me.

What has kept me going is a mixture of pride, stubbornness, encouragement from others, good executive skills, and a keen interest in the topic. I have also benefited from the advantage of what is known in life course theory as the principle of time and place. (Note how I can no longer just say “I had lucky timing.” Hopefully, with a good intervention team, I can be deprogrammed post-dissertation.)

To be real, known as the “680” (for its course number), was not the first or the second, but my third attempt at producing scholarly research on the path to my doctorate. The first two efforts were technically solid, featuring all the structural elements of a good research paper. But the more I learned, the more I felt they were topically dubious, and I issued cease-and-desists after they made it through the IRB process.

Yes, I picked the topics, then watched myself outgrow them, which was a good process in itself. It was hard to wave goodbye to the earlier projects, but the value of earning an MFA in writing is that I don’t feel bad about discarding weak work. “Reduce, reuse, recycle” is my battle cry.

Once my committee accepted To be real, I began developing my doctoral topic, which builds on the work in this dissertation but goes into bold new areas–or so I comfort myself when I am spending lovely early-autumn weekend days analyzing 900 minutes of interviews and drafting chapters. I defended my topic to my committee, then squired my dissertation proposal through institutional review, and kaboom! I was finally ABD.

At several key points in my proposal, I cite To be Real, which was gathering metaphorical dust in a metaphorical drawer in my real-world office. Rather than have my dissertation lean at key points on an unpublished paper, my ever-patient dissertation advisor suggested that I actually try publishing To be Real. Frankly, as I trudged through milestones toward the doctorate while balancing huge day jobs and Life Issues, I had entirely forgotten this was something I should do.

Chapter 2, In which I seek publication

Publish my research–what a positively brill idea! I asked someone whose insights I deeply respect where I should send it, and was given a list of six LIS journals  to consider for the first round. Yes, that’s how I made the first cut, which is similar to how I have determined where to send literary essays: by referrals from people I trust.

From that list of peer-reviewed LIS journals, the key factors I considered were:

  1. Prestige of the publication
  2. How much work I had to do to have my paper considered for publication
  3. How likely it was my article would be published before I finished my dissertation
  4. Open access was a plus, but not a requirement.

You might be surprised to learn how much #2 and #3 drove my decision-making. At least for the first round of submissions, I rejected journals that require authors to reformat citations from APA to another citation schema simply to submit a paper for consideration. No other principle was at stake than “I do not have the time for this.” Nevertheless, learning that some journals do indeed require this investment of personal effort on a highly speculative venture made me greatly sympathetic to the thousands of tenure-track librarians jumping through hoops of fire to try to get at least an in-press citation in time to show scholarly production in their annual review.

Also, time was of the essence, since I wanted the article to at least be accepted before my dissertation was finished, and I’m a writing banshee these days, trying to get ‘er done. We all at least nominally subscribe to the myth of scrupulously avoiding simultaneous submissions to multiple journals. Indeed, I was faithful to this practice simply because I didn’t have the bandwidth to submit to more than one journal at a time. But that ruled out journals that might take a couple of years to reject my article, let alone accept it.

I was open to paying subvention fees (the cost to make an article Gold OA), noting that they ranged from $1100 to $2500 for the journals I was considering–something that would be prohibitive on a junior faculty’s salary. In the same vein, I would have paid an author’s fee to publish in an OA journal that used that funding model. But not everyone has that kind of scratch.

In any event, the paper went out to the first journal on the list, and very quickly I heard back from the editor with feedback from two reviewers. The paper was accepted, provided I made some changes. I hadn’t planned on being accepted by the first journal I submitted to, but to paraphrase Yogi Berra, I saw a fork in the road, and I took it.

Chapter 3: In which I encounter the peer review process

Yet another advantage of having gone through an MFA program is understanding that Anne Lamott’s writing about “shitty first drafts” is an artful understatement; for most of my writing, I can only tell if a piece is a keeper by the fifth or so draft, if that.

I had faith in my research, and my paper had all the right components, well-executed, but I questioned my writing. It felt turgid, dense, and remote–characteristics belying its subject matter or the very interesting interviews that were its primary data. I know good writers feel that way pretty much all the time, but I had a persistent sense of unease about my paper, without quite being able to determine what to do about it. It did not help that when I showed it to peers their response was… silence. Above all, I wanted my research not simply to be published, but to be read.

I have written in the past how much I love a good editor. It’s like working with a great hair stylist. You are you, and yet, so much better. With that in mind, we’ll scoot quickly past the feedback from Reviewer 1, a living parody of the peer review process.

You know those jokes about reviewers who blithely object to the research direction on which the paper is based? Yes, Reviewer 1 was that kind of reviewer.  “The authors really only present the viewpoints of those who are ‘out.'” I don’t even know how to respond to that, other than to say that’s my area of research. Reviewer 1 also ruminated aloud–painfully, considering this person lives and breathes among us in higher education–that he or she did not understand the term “antecedent.” (The “antecedent and consequences” framework is classic and well-understood in qualitative research; and in any event, the word “antecedent” is hardly obscure.) And so on.

If Reviewer 2 had been like Reviewer 1, I would have pushed on to another journal. There is a difference between knowing that my work needs improvement and radically redesigning a valid and important research project from the ground up based on reviewers’ whims, nor was there a middle ground where I could have simultaneously satisfied Reviewer 1 and Reviewer 2. As much as I wanted to publish To be Real in a timely manner, my career wasn’t hanging on the balance if I didn’t.

But Reviewer 2 not only respected my research direction, but  also provided some of the best writing feedback I have received since, indeed, the MFA program–advice that I fully believe not only improved this paper tenfold, but is helping my dissertation. In close to 1,000 words, Reviewer 2 commented on the value and quality of my research, but gently advised me: to use pseudonyms or labels for the research participants; to extend quotations more fully; and to do a better job of summing up paragraphs and linking the findings to the literature review (“divergence and convergence”). Reviewer 2 ever so delicately observed that the conclusion had “too much context” and that all that blurbage (my term, not the reviewers) blurred the main points. There was more, all of it worthwhile.

I summarized Reviewer 2’s advice, taped it to the wall over my desk, and got to work. Indeed, once I labeled participants (Leader A, Leader B, etc.) and extended their quotations, I felt vastly better about my article. Doing this moved my writing from being an over-long jumble of “data analysis” to a paper about real people and their lived experiences. Following the other recommendations from Reviewer 2–expand, chop, link, add, tighten, clarify; Reduce! Reuse! Recycle!–also improved the paper to the point where I no longer felt apologetic about inflicting it on the scholarly canon.

Several more editorial go-rounds quickly followed, largely related to citations and formatting. The editors were fast, good, and clear, and when we had moments of confusion, we quickly came to agreement. In the last go-round, with a burst of adrenaline I looked up every single citation in my article and found that five had the wrong pagination; each one of these errors, for the record, was mine alone. Correcting these errors felt like a victory lap.

I then tried to follow the guidance for green OA, and the reason this blog post doesn’t link to the author’s final corrected proof, and indeed the reason I broke this post in two, is that three weeks and two days after the first of three help desk inquiries with very pleasant people, I’m still not entirely sure which document version To be Real that represents.

Part 2 of A Scholar’s Pool of Tears will have a link to the author’s final corrected proof of To be Real and will discuss the intricacies of navigating the liminal world of OA that is not born OA; the OA advocacy happening in my world; and the implications of the publishing environment scholars now work in.

Bookmark to:

Jason Ronallo: Closing in on Client-side IIIF Content Search

Sun, 2016-09-25 13:33

It sounds like client-side search inside may at some point be feasible for a IIIF-compatible viewer, so I wanted to test the idea a bit further. This time I’m not going to try to paint a bounding box over an image like in my last post, but just use client-side search results to create IIIF Content Search API JSON that could be passed to a more capable viewer.

This page is a test for that. Some of what I need in a Presentation manifest I’ve only deployed to staging. From there this example uses an issue from the Nubian Message. First, you can look at how I created the lunr index using this gist. I did not have to use the manifest to do this, but it seemed like a nice little reuse of the API since I’ve begun to include seeAlso links to hOCR for each canvas. The manifest2lunr tool isn’t very flexible right now, but it does successfully download the manifest and hOCR, parse the hOCR, and create a data file with everything we need.

In the data file are included the pre-created lunr.js index and the documents including the OCR text. What was extracted into documents and indexed is the the text of each paragraph. This could be changed to segment by lines or some other segment depending on the type of content and use case. The id/ref/key for each paragraph combines the identifier for the canvas (shortened to keep index size small) and the x, y, w, h that can be used to highlight that paragraph. We can just parse the ref that is returned from lunr to get the coordinates we need. We can’t get back from lunr.js what words actually match our query so we have to fake it some. This limitation also means at this point there is no reason to go back to our original text for anything just for hit highlighting. The documents with original text are still in the original data should the client-side implementation evolve some in the future.

Also included with the data file is the URL for the original manifest the data was created from and the base URLs for creating canvas and image URLs. These base URLs could have a better, generic implementation with URL templates but it works well enough in this case because of the URL structure I’m using for canvases and images.

manifest URL: base canvas URL: base image URL:

Now we can search and see the results in the textareas below.

Raw results that lunr.js gives us are in the following textarea. The ref includes everything we need to create a canvas URI with a xywh fragment hash.

Resulting IIIF Content API JSON-LD:

Since I use the same identifier part for canvases and images in my implementation, I can even show matching images without going back to the presentation manifest. This isn’t necessary in a fuller viewer implementation since the content search JSON already links back to the canvas in the presentation manifest, and each canvas already contains information about where to find images.

I’ve not tested if this content search JSON would actually work in a viewer, but it seems close enough to begin fiddling with until it does. I think in order for this to be feasible in a IIIF-compatible viewer the following would still need to happen:

  • Some way to advertise this client-side service and data/index file via a Presentation manifest.
  • A way to turn on the search box for a viewer and listen to events from it.
  • A way to push the resulting Content Search JSON to the viewer for display.

What else would need to be done? How might we accomplish this? I think it’d be great to have something like this as part of a viable option for search inside for static sites while still using the rest of the IIIF ecosystem and powerful viewers like UniversalViewer.

Ed Summers: Nicolini (6)

Sun, 2016-09-25 04:00

This chapter focuses on Ethno-Methodology (EM) which Nicolini characterizes as practice-oriented much like the earlier praxeology of Bourdieu, but more interested in description and less in theory building and particularly the correctness of the descriptions. Garfinkel (1967) is cited as codifying EM around the idea of accountability or making activities legible to others. It’s interesting that Garfinkel originally went to school to study accounting, at least according to Wikipedia. There are several characteristics of accountability:

  • social activities have an order
  • the order is public (observable)
  • the order is mundane, banal, witnessed by anyone
  • orders are oriented to each other
  • the order makes sense to the people performing
  • experts in the order can describe it, they have language for it

This attention to rules is borrowed from some extent from Husserl and Schutz, but comes very close to Wittgenstein’s notion of rules, and rule following. His idea of relexivity is different from Bourdieu and Giddens in that reflexivity is connected with accountability: people make their practices accountable by making them reflexive. Similarly Garfinkel uses the idea of indexicality to talk about the way meanings are embedded in actions, and much of EM’s work can be found in the study of how people work with this indexicality when it pushes up against the way things work in the world: How do people do it?

EM is also concerned with how people perform their competency in an activity, and their membership in a group of other competent people. EM inspired two lines of research: work studies and conversation analysis. It’s interesting and curious that Nicolini says that these ideas of accountability, indexicality, membership and reflexivity are used just to open the space for research, and are abandoned as concepts when it comes to doing the work of EM.

It is important for the descriptions to embody just-thisness or haecceity (a new word for me) – they are told from the perspective of the people involved in the action, using their distinctive words and motivations. To do this the researcher must immerse themself in the domain under study. They must become a legitimate participant to understand the language, rules and activities. This idea is known as unique adequacy. It can require the researcher to dedicate their life to becoming a proficient member of a community. Mehan & Wood (1975) goes so far as to claim that EM isn’t so much a method or a theory but a form of life (recalling Wittgenstein again). This strong version of EM can lead to researchers giving up their research as their membership in the community under study takes over. It feels like there must be some pretty strong parallels with this approach to Bourdieu’s habitus.

EM was apparently the subject of fierce debate in the sociology community, and EM practitioners found it difficult to get academic jobs. In the 1990s EM practices got new life in the work of Suchman, Dourish (who are on my reading list for later in the semester) and others who conducted workplace studies, examining the effects of technology in order to inform design.

EM-orientated workplace studies are not limited, in fact, to claiming in principle—as other theories do—that actors are bricoleurs, that they improvise and construct their world, that there is power and hierarchy. Rather, such studies go to great length to present living instances of bricolaging, improvisation, and power and hierarchy making and unmaking. They are not limited to claiming that organizations and society are socially constructed, that decisions and conducts are context-dependent, and that knowledge is practical and situated. Instead, they set out to provide evidentiary empirical substantiation to these claims, describing in detail how ordered and organized scenes of action are accomplished, how members build on contextual clues for accomplishing their activities, how knowing is visibly exhibited in what members actually do or refrain from doing. (p. 148)

EM feels like the most humanistic of the practice theories reviewed so far. It doesn’t attempt to make a theory, but instead embodies a sensibility, or a set of questions, and a way of approaching study, rather than a formula for conducting the research. EM is data driven and at the same time it is a literary genre.


Garfinkel, H. (1967). Studies in ethnomethodology. Prentice Hall.

Mehan, H., & Wood, H. (1975). The reality of ethnomethodology. John Wiley & Sons Inc.

Cynthia Ng: Yet Another Tips Post on Job Applications

Sat, 2016-09-24 22:28
There is so much literature out there already on how to write job applications (namely cover letters and resumes) that I wasn’t sure I was going to write this post, but based on the job applications that I was looking over, I’m almost amazed at how many glaring errors people still make. Applying for jobs … Continue reading Yet Another Tips Post on Job Applications

Villanova Library Technology Blog: Sravanthi Adusumilli – New Library Technology Development Graduate Assistant

Sat, 2016-09-24 17:00

Sravanthi (Sravs) Adusumilli , a graduate of Acharya Nagarjuna University, Guntar, India, joined the Library Technology Development team in August. She reports to Demian Kratz, team leader. She is currently working on redesigning “Finding Augustine.” “Finding Augustine” is “[a] rich and readily accessible biographical collection concerning Augustine of Hippo and his legacy;” it is sponsored by the Augustinian Institute at Villanova University.

Adusumilli has a bachelor’s degree in computer science engineering and is now enrolled in the Master of Science in Computer Engineering program with an anticipated graduation in May 2018. She plans to work as a data scientist.

Her hometown is Machilipatnam, India, a city on the southeast coast. Adusumilli’s hobbies are cooking and gardening.


Jason Ronallo: Client-side Search Inside for Images with Bounding Boxes

Sat, 2016-09-24 15:26

It is possible to create a Level 0 IIIF Image API implementation with just static images and an info.json. And some institutions are probably pre-creating Presentation API manifests or even hand-crafting them. All that’s required then is to put those files up behind any web server with no other application code running and you can provide the user with a great viewing experience.

The one piece that currently requires a server-side application component is the IIIF Content Search API. This usually involves a search index like Solr as well as application code in front of it to convert the results to JSON-LD. I’ve implemented search inside using the content search via Ocracoke. With decent client-side search from libraries like lunr.js it ought to be possible to create a search inside experience even for a completely static site.

Here’s a simple example:

This works first of all because the page has been OCR’d with Tesseract which outputs hOCR. (I developed Ocracoke in part to help with automating an OCR workflow.) The hOCR output is basically HTML that also includes the bounding boxes of sections of the page based on the size of the digitized image. We can then use this information to draw boxes over top of the corresponding portion of the image. So how do we use search to find the section of the page to highlight?

The first step in this case for simplicity’s sake was to use an image of known size. This is possible to do hit highlighting in a tiling pan/zoom viewer like OpenSeadragon as evidenced by UniversalViewer. The page image at 20% of the original fits within the width of this site:

I then used some code from Ocracoke to rescale the original hOCR to create bounding box coordinates that would match on the resized page image. I parsed that resized hOCR file to find all the paragraphs and recorded their position and text in a JSON file.

At this point I could have created the lunr.js index file ahead of time to save the client some work. In this example the client requests the JSON file and adds each document to the index. The Fabric.js library is used to create a HTML canvas, add the page image as a background, and draw and remove rectangles for matches over top of the relevant section. Take a look at the JavaScript to see how this all works. Pretty simple to put all these pieces together to get a decent search inside experience.

If you gave this a try you’ll notice that this implementation does not highlight words but sections of the page. It might be possible to make this work for individual words, but it would increase the total size of the documents as the bounding boxes for each word would need to be retained. Indexing each word separately would also disrupt the ability to do phrase searching. There’s some discussion in lunr.js issues about adding the ability to get term start positions within a text that may make this possible in the future without these drawbacks. I had originally considered just trying to achieve getting the user to the correct page, but I think targeting some level of segment of the page is a reasonable compromise.

I don’t use the IIIF Content Search API in this demonstration, but it ought to be enough of a proof of concept to show the way towards a viewer that can support a completely static site including search inside. Anyone on ideas or thoughts on how a static version of content search could be identified in a IIIF Presentation manifest? Without a URL service point what might this look like?

District Dispatch: Pew: A generation gap for digital readiness

Fri, 2016-09-23 22:09

Digital Readiness Gaps,” a new Pew Research Center report, explores a spectrum of digital readiness, from digitally ready to unprepared. Researcher John Horrigan finds that more than half (52%) of U.S. adults may be considered “relatively hesitant” and the least likely to use digital tools for personal learning.
The research explores three dimensions of digital readiness: (1) the digital skills required to use the internet; (2) trust, namely, people’s ability to assess the trustworthiness of information found online and to protect their personal information; and (3) use, that is, the extent to which people use digital tools to complete online activities (e.g., personal learning or online courses).

“Digital Readiness Gaps,” a new Pew Research Center report, explores a spectrum of digital readiness.

The analysis identifies five distinct groups on the spectrum:

Relatively more prepared

  • Digitally Ready (17%): Have technology resources and are confident in their digital skills and capacity to determine the trustworthiness of online information. The Digitally Ready enjoy high-income and education levels, and are likely to be in their 30s or 40s.
  • Cautious Clickers (31%): Have strong levels of tech ownership, are confident in their digital skills, and are relatively knowledgeable about new online learning concepts. Unlike the Digitally Ready, they are less likely to use the internet for personal learning. The Cautious Clickers have above average educational and income levels, and are usually in their 30s or 40s.

Relatively hesitant

  • The Reluctant (33%): Have below average confidence in their digital skills, little concern about their ability to trust information online, and very low awareness of online learning concepts. The Reluctant are middle-aged and have relatively lower levels of income and education.
  • Traditional Learners (5%): Are active learners and have technology, but are unlikely to use the internet for learning purposes, tend to need help with using digital devices, and express above average concern about the trustworthiness of information online. This group is more likely to be middle-aged, ethnically diverse, and lower- to lower-middle income.
  • The Unprepared (14%): Have relatively lower levels of tech adoption, very low confidence in their digital skills, and a high degree of difficulty determining whether online information is trustworthy. The Unprepared are older, with relatively low income and educational levels.

By examining digital readiness, rather than the “digital divide,” Pew’s research highlights the fact that people’s lack of digital skills and trust in technology may, in turn, impact their use of digital resources. In other words, digital literacy and trust may boost meaningful internet use.

As the report observes, libraries understand that digital readiness involves digital skills combined with the digital literacy tools to enable people to assess the trustworthiness of online information. The report also notes that library users and the highly wired are more likely to use the internet for personal learning (55% and 60%, respectively, compared with 52% of all personal learners) and more likely to have taken an online course.

Horrigan notes that the research focuses on online learning, and may not project to people’s capacity (or lack of capacity) to perform health-related web searches or use mobile apps for civic activities, for instance. There also is some fluidity among the groups identified, and the finding represent a snapshot in time that may change in coming years as e-learning evolves.

Unsurprisingly, libraries have long been at the forefront of digital literacy efforts in their communities, as ALA documented in 2013. As the recent Digital Inclusion Survey indicated, all public libraries provide free public access to the internet, and most offer diverse digital content and services, as well as formal and informal technology training.

What’s more, the public trusts libraries to teach digital literacy skills. In a prior report, Pew found that 47 percent of American adults agree that libraries contribute “a lot” to providing a trusted place for people to learn about new technologies. Another Pew report revealed that 80 percent of adults believe that libraries should “definitely” offer programs to teach people how to use digital tools.

This newest report is an important addition to the body of research conducted by the Pew Research Center (including a previous public library engagement typology) and fodder for planning related to digital inclusion efforts, including work underway at the Federal Communications Commission.

Note: OITP Deputy Director Larra Clark will interview Pew researcher John Horrigan for a Public Libraries Online podcast interview, which will be posted in coming weeks.

The post Pew: A generation gap for digital readiness appeared first on District Dispatch.

Jason Ronallo: Client-side Video Search Inside

Fri, 2016-09-23 21:42

Below the video use the input to search within the captions. This is done completely client-side. Read below for how it was done.

As part of thinking more about how to develop static websites without losing functionality, I wanted to be able to search inside a video.

To create the WebVTT captions file I used random words and picked 4 randomly to place as captions every 5 seconds throughout this 12+ minute video. I used an American English word list, randomly sorted it and took the top 100 words. Many of them ended with “’s” so I just removed all those for now. You can see the full word list, look at the WebVTT file, or just play the video to see the captions.

sort -R /usr/share/dict/american-english | head -n 100 > random-words.txt

Here’s the script I used to create the WebVTT file using our random words.

#!/usr/bin/env ruby random_words_path = File.expand_path '../random-words.txt', __FILE__ webvtt_file_path = File.expand_path '../search-webvtt.vtt', __FILE__ def timestamp(total_seconds) seconds = total_seconds % 60 minutes = (total_seconds / 60) % 60 hours = total_seconds / (60 * 60) format("%02d:%02d:%02d.000", hours, minutes, seconds) end words = cue_start = 0 cue_end = 0, 'w') do |fh| fh.puts "WEBVTT\n\nNOTE This file was automatically generated by\n\n" 144.times do |i| cue_words = words.sample(4) cue_start = i * 5 cue_end = cue_start + 5 fh.puts "#{timestamp(cue_start)} --> #{timestamp(cue_end)}" fh.puts cue_words.join(' ') fh.puts end end

The markup including the caption track looks like:

<video preload="auto" autoplay poster="" controls> <source src="" type="video/mp4"> <source src="" type="video/webm"> <track id="search-webvtt" kind="captions" label="captions" lang="en" src="/video/search-webvtt/search-webvtt.vtt" default> </video> <p><input type="text" id="search" placeholder="Search the captions..." width="100%" autocomplete='off'></p> <div id="result-count"></div> <div class="list-group searchresults"></div> <script type="text/javascript" src="/javascripts/search-webvtt.js"></script>

In the browser we can get the WebVTT cues and index each of the cues into lunr.js:

var index = null; // store the cues with a key of start time and value the text // this will be used later to retrieve the text as lunr.js does not // keep it around. var cue_docs = {}; var video_elem = document.getElementsByTagName('video')[0]; video_elem.addEventListener("loadedmetadata", function () { var track_elem = document.getElementById("search-webvtt"); var cues = track_elem.track.cues; index = lunr(function () { this.field('text') this.ref('id') }); for (var i = 0; i <= cues.length - 1; i++) { var cue = cues[i]; cue_docs[cue.startTime] = cue.text; index.add({ id: cue.startTime, text: cue.text }); } });

We can set things up that when a result is clicked on we’ll get the data-seconds attribute and make the video jump to that point in time:

$(document).on('click', '.result', function(){ video_elem.currentTime = this.getAttribute('data-seconds'); });

We create a search box and display the results. Note that the searching itself just becomes one line:

$('input#search').on('keyup', function () { // Get query var query = $(this).val(); // Search for it var result =; var searchresults = $('.searchresults'); var resultcount = $('#result-count'); if (result.length === 0) { searchresults.hide(); } else { resultcount.html('results: ' + result.length); searchresults.empty(); // Makes more sense in this case to sort by time than relevance // The ref is the seconds var sorted_results = result.sort(function(a, b){ if (a.ref < b.ref) { return -1; } else { return 1; } }); // Display each of the results for (var item in sorted_results) { var start_seconds = sorted_results[item].ref; var text = cue_docs[start_seconds]; var seconds_text = start_seconds.toString().split('.')[0]; var searchitem = '<a class="list-group-item result" data-seconds="'+ start_seconds +'" href="#t='+ start_seconds + '">' + text + ' <span class="badge">' + seconds_text + 's</span></a>'; searchresults.append(searchitem); }; } });

And that’s all it takes to create search within for a video for your static website.

Video from Boiling Process with Sine Inputs–All Boiling Methods.

Jason Ronallo: Site Search with Middleman and lunr.js

Fri, 2016-09-23 21:00

One of the tasks I have for this year is to review all the applications I’ve developed and consider how to lower their maintenance costs. Even for applications that aren’t being actively fed new content they need to be updated for security vulnerabilities in the framework and libraries. One easy way to do that is to consider shutting then down, and I wish more applications I have developed were candidates for sunsetting.

We have some older applications that are still useful and can’t be shut down. They’re are largely static but occasionally do get an update. We’ve thought about how to “pickle” certain applications by taking a snapshot of them and just making that static representation live on without the application code running behind it, but we’ve never pursued that approach as making changes that need to be applied across the site can be annoying.

For a couple of these applications I’m considering migrating them to a static site generator. That would allow us to make changes, not worry about updating dependencies, and remove concerns about security. One feature though that seemed difficult to replace without a server-side component is search. So I’m newly interested in the problem of site search for static sites. Here’s how I added site search to this blog as a way to test out site search without a server-side component.

Before making this change I was just pointing out to a Google site search, which isn’t the kind of thing I could do for one of our sites at work. What I’m doing now is certainly more complex than a simple search box like that, but the middleman-search gem made it rather simple to implement. There were a few things that took me a little time to figure out, so I’m sharing snippets here to maybe save someone else some time.

First, if using this with Middleman 4 using the master version might help:

gem 'middleman-search', github: 'manastech/middleman-search'

Then the code to activate the plugin in config.rb was updated for the structure of my blog. The pages for tagging polluted the index so I added a very rudimentary way to skip over some paths from getting indexed. I also added a way to store the section of the site (as “group”) in order to be able to display that along with any search result.

activate :search do |search| search.resources = ['about/', 'blog/', 'bots/', 'bots-blog/', 'demos/', 'experience/', 'presentations/', 'projects/', '/writing'] search.fields = { title: {boost: 100, store: true, required: true}, content: {boost: 50}, url: {index: false, store: true} } search_skip = ['Articles Tagged', 'Posts by Tag'] search.before_index = do |to_index, to_store, resource| if search_skip.any?{|ss| ss ==} throw(:skip) end to_store[:group] = resource.path.split('/').first end end

When the site is built is creates a search.json file at the root (unless you tell it to put it somewhere else). In order to encourage the client to cache it, we’ll set our ajax request to cache it. As the site gets updated we’ll want to bust the cache, so we need to add “.json” to the list of extensions that Middleman will create a digest hash for and properly link to. The way of doing this that is in all of the documentation did not work for me. This did, but required spelling out each of the extensions to create a hash for rather than just trying to append “.json” to asset_hash.exts.

activate :asset_hash do |asset_hash| asset_hash.ignore = [/demos/] asset_hash.exts = %w[ .css .js .png .jpg .eot .svg .ttf .woff .json ] end

Now I just created a simple erb file (with frontmatter) to make up the search page. I’ve added a form to fallback to a Duck Duck Go site search.

--- title: Search --- <%= javascript_include_tag 'search' %> <h1>Search</h1> <p> <input type="text" id="search" placeholder="Search..." width="100%"> </p> <div id="result-count"></div> <div class="list-group searchresults"> </div> <div id="duckduckgo-fallback-search"> <p>If you can't find what you're looking for try searching this site via Duck Duck Go:</p> <form action="" method="get" role="search"> <div class="form-group"> <input class="search form-control" type="text" name="q" value=" " autocomplete="off"> </div> </form> </div>

And here’s the JavaScript, the beginnings of it borrowed from the middleman-search readme and this blog post. Unfortunately the helper search_index_path provided by middleman-search did not work–the method was simply never found. One magic thing that took me a long time to figure out was that using this helper was completely unnecessary. It is totally fine to just include the URL as /search.json and Middleman will convert it to the asset hash name when it builds the site.

The other piece that I needed to open the console for was to find out why the search results only gave me back documents with a ref and score like this: { ref: 6, score: 0.5273936305006518 }. The data packaged into search.json includes both the index and the documents. Once we get the reference to the document, we can retrieve the document to give us the url, title, and section for the page.

Updated 2016-09-23 to use Duck Duck Go as the fallback search service.

var lunrIndex = null; var lunrData = null; // Download index data $.ajax({ url: "/search.json", cache: true, method: 'GET', success: function(data) { lunrData = data; lunrIndex = lunr.Index.load(lunrData.index); } }); $(document).ready(function () { var duckduckgosearch = $('#duckduckgo-fallback-search'); duckduckgosearch.hide(); $('input#search').on('keyup', function () { // Get query var query = $(this).val(); // Search for it var result =; // Output it var searchresults = $('.searchresults'); var resultcount = $('#result-count'); if (result.length === 0) { // Hide results searchresults.hide(); resultcount.hide(); if (query.length == 0) { duckduckgosearch.hide(); } else { } } else { // Show results resultcount.html('results: ' + result.length); searchresults.empty(); for (var item in result) { // A result only gives us a reference to a document var ref = result[item].ref; // Using the reference get the document var doc =[ref]; // Get the section of the site var group = " <span class='badge'>" + + '</span>'; var searchitem = '<a class="list-group-item" href="' + doc.url + '">' + doc.title + group + '</a>'; searchresults.append(searchitem); }; } }); });

That’s it. Solr-like search for a completely static site. Try it.

Brown University Library Digital Technologies Projects: Workflows and Tools

Fri, 2016-09-23 16:34

Digital preservation is simultaneously a new and old topic. So many libraries and archives are only now dipping their toes into these complicated waters, even though the long-term preservation of our born-digital and digitized holdings has been a concern for a while now. I think it is often forgotten that trustworthy standard-bearers, like the Digital Preservation Management Workshop and The Open Archival Information System (OAIS) Model, have been around for over a decade. The OAIS Reference Model in particular is a great resource, but it can be intimidating. Full implementation requires a specific set of resources, which not all institutions have. In this way, comparing one’s own program to another which is further along in an attempt to emulate their progress is often a frustrating endeavor.

I’ve witnessed this disparity most notably at conferences. Conferences, unconferences, and colloquia can be really helpful in that people are (thankfully) very open with their workflows and documentation. It’s one of my favorite things about working in a library; there aren’t trade secrets, and there isn’t an attitude of competition. We celebrate each other’s successes and want to help one another. With that said, some of the conversations at these events are often diluted with tool comparison and institution-specific jargon. The disparity of resources can make these conversations frustrating. How can I compare our fledgling web archiving initiative with other institutions who have entire web archiving teams? Brown has a robust and well-supported Fedora repository, but what about institutions who are in the early stages of implementing a system like that? How do we share and develop ideas about what our tools should be doing if our conversations center around the tools themselves?

For our digital accession workflow, I’ve taken a different approach than what came naturally at first. I initially planned our workflow around the implementation of Artefactual’s Archivematica, but I could never get a test instance installed adequately. This, of course, did not stop the flow of digitized and born-digital material in need of processing. I realized I was trying to plan around the tool, when I wasn’t even sure what I needed to tool to do. Technology will inevitably change, and unless we have a basis for why a tool was implemented, it will be very difficult to navigate that change.

For this reason, I’ve been working on a high-level born-digital accessioning workflow where I can insert or take out tools as needed (see above). This workflow outlines the basic procedures of stabilizing, documenting, and packaging content for long-term storage. It has also been a good point of discussion among both internal and external colleagues. For example, after sharing this diagram on Twitter, someone suggested creating an inventory before running a virus scan. When I talked about this in our daily stand-up meeting, one of the Library’s developers mentioned that compressed folders may in fact strengthen their argument. Unless both the inventory and the virus scan account for items within a compressed folder, there is actually a risk that the scan might miss something. This is one example of the type of conversations I’d like to be having. It’s great to know which tools are available, but focusing strictly on tool implementation keeps us from asking some hard questions.

Access Conference: Crowne Plaza Shuttle

Fri, 2016-09-23 15:18

For Access and Hackfest attendees who are staying at the Crowne Plaza Fredericton Lord Beaverbrook, daily shuttle runs between the Crowne Plaza and the Wu Conference Centre have been arranged.

Tuesday, Oct. 4

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:15pm & 4:35pm

Wednesday, Oct. 5

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:30pm & 4:50pm

Thursday, Oct. 6

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 5:10pm & 5:30pm

Friday, Oct. 7

  • Crowne to Wu: 7:30am & 7:50am
  • Wu to Crowne: 4:45pm

See? We love you that much! Thanks to the Crowne Plaza and UNB Conference Services for helping make this service available!

In the Library, With the Lead Pipe: The Collective Approach: Reinventing Affordable, Useful, and Fun Professional Development

Fri, 2016-09-23 13:30

In Brief:  In 2014, a small group of librarians at the University of Tennessee set out to redefine the library conference landscape. Frustrated by the high cost and lack of tangible skills and takeaways at professional organization gatherings, they conceived of a low-cost, high-value symposium where academic librarians might learn, create, and collaborate together. The resulting event, The Collective, first took place in February 2015 and is now an annual opportunity for librarians from around the US and the globe to redefine scholarly communication and professional development in a fun and creative platform. The Collective model offers a practical and repeatable blueprint for other librarians or academics looking to further reinvent and revolutionize their continuing education and convocation.

by Ashley Maynor and Corey Halaychik


In 2014, a small group of librarians at the University of Tennessee set out to redefine the library conference landscape. Frustrated by the high cost and lack of tangible skills and takeaways at professional organization gatherings, they conceived of a low-cost, high-value symposium where academic librarians might learn, create, and collaborate together. The resulting event, The Collective, first took place in February 2015 and is now an annual opportunity for librarians from around the US and the globe to redefine scholarly communication and professional development in a fun and creative platform. The Collective model offers a practical and repeatable blueprint for other librarians or academics looking to further reinvent and revolutionize their continuing education and convocation.

Current Professional Development Landscape

There are a number of professional organizations that service library professionals, many of which offer annual conferences where librarians come together to share knowledge, skills, and learn about new products. These gatherings, however, tend to be costly for participants, rely heavily on trade industry sponsorships (that may impact or influence programming decisions), and tend towards “show and tell” or “sage on a stage” presentations with little time dedicated towards audience interaction. Few, if any, offer transparency in their review process (i.e. sharing reviewer names or qualifications, disclosing the procedure or nature of the review process, sharing feedback with submitters, etc.). There is also often a large span of time between the proposal process and conference itself; as just one example, the American Library Association solicits proposals fifteen months before the annual conference gathering.

At their worst, the result is stale programming, bait and switch session descriptions, and undue corporate influence on panels and program content with a low return on the registration fee and cost of attendance for conference goers. Discounts are often offered tiered or offered only to select individuals. It is common for conferences to offer “early bird” registrations and special rates for, presenters, organizers, or other privileged individuals. Furthermore, many conferences highlight differences among attendees types using paraphernalia, such as ribbons, to designate organizers, presenters, sponsors, committee members, and the like as “special” attendees. The gatherings are often large (400+). The size combined with the typical presentation format often translates into an intimidating environment for connecting with new people.

Figure 1: 2015 Registration and lodging costs and conferred benefits by conference. Data taken from official conference websites.

The Collective Mission & Values

The Collective is designed as an alternative to these large, expensive, traditional professional gatherings that compose the professional development landscape. Figure 1 above shows how The Collective measures in terms of its costs and confered benefits compared to some of the most well-known conferences for academic librarians. Its mission is to support learning, networking, and kick-starting new ideas among next-generation librarians and library stakeholders where the participants determine the content.

The Collective seeks to achieve the following goals:

  • To dissolve the traditional conference divide between “presenters” and “attendees” by making everyone active participants.
  • To make a low-cost event— where all participant costs are subsidized and everyone, even conference organizers, pay an equally low registration fee. We believe participants should receive more value than the registration fee paid, as opposed to the traditional profit-generating conference.
  • To eliminate vendor expo areas and create an event climate where vendors are treated as colleagues who can learn and collaborate with us to better serve our users. We believe developing relationships is far more effective than hard sales and we think sessions should contain content, not sales pitches.
  • To have a practitioner-oriented gathering—aimed at librarians on the front lines rather than highlighting administrators or those in positions of top-level leadership.
  • To offer interactive sessions—no “sage on a stage,” with an emphasis on tangible takeaways, networking, conversation, hands-on activities, and collaboration.
  • To offer transparency and fairness in the proposal review process. Session content is solicited through a public forum (see Figure 2) with a public voting process combined with a blind peer-review and committee review. We offer feedback on all submissions, including all non-accepted proposals.To help keep our content relevant, we shorten the lag between proposals and the event; ours is less than six months.

To help librarians “learn, create and collaborate” as our tagline professes, we have carefully designed our programming process to support these goals.


The quality of a conference and its utility to participants is often in correlation to the quality of its programming, so we sought to program The Collective in a new way to ensure high quality content. The overall style of The Collective draws on the best of conferences and un-conferences alike, including THATCamp and The Conversation (a unique coming together in the film industry in 2008 and 2010). We hope to achieve a balance of the flexibility, surprise, and active participation of an unconference combined with the organization, programming rigor, and geographic diversity of a national or international conference.

For the main program, conference organizers solicit session topics, ideas, and feedback through a transparent, inclusive three-step process. Rather than create a niche conference to serve a limited type of librarians, we solicit ideas each year around a broad theme that encourages cross-pollination among attendee types/librarian roles. Past themes include Libraries as Curators & Creators (2015), Adopt, Adapt, Evolve: Reinvigorating & Rejuvenating Our Libraries (2016), and Make It Beautiful, Make It Usable (2017).

First, ideas for conference sessions are solicited through a public “Session Picker,” an online, public idea generation, commenting, and voting platform inspired by the SXSW Interactive Conference PanelPicker (Figure 2). First round submissions are quick and easy: all that’s required is a title, three-sentence description, and indication of session format. The formats encouraged include but are not limited to lightning talks, pecha kucha, dork shorts, interactive plenaries, interactive panels, roundtable discussions, hands-on workshops, Q&A sessions, small group breakouts, skill-building workshops, make hack and play sessions.

Figure 2: A screenshot from the 2017 Session Picker.

While some conferences include workshops for additional fees pre- or post conference, we aim to make every single session a worthwhile opportunity for hands-on learning, discussion, skill-building, and/or networking with no special fees or paid events. Proposals are solicited through dozens of listservs, on The Collective website, through social media the summer/fall before the gathering, and strategic partnerships. At this early proposal stage, all presenters in a session do not have to be known; in fact, we encourage prospective attendees to use the platform to network outside their known circle to find additional presenters to collaborate with. Collective organizers will also assist with finding session collaborators via a call for participation. Lastly, unlike many conferences, individuals are free to suggest sessions they might find interesting but they themselves won’t directly be involved in organizing.

When the picker closes, a programming committee of academic librarians reviews the proposals for feasibility, diversity, interest (based in part on public votes), and quality. At this stage, some submitters might be encouraged to combine or collaborate on proposals if theirs are similar. Most proposals are invited to round two – peer review. Those that do not make it to the second round are rejected due to content overlap, lack of interactivity, or otherwise failing meet the spirit of The Collective motto (i.e. “learn, create, collaborate”).

In the second phase, invitations are sent to round one applicants who are asked to submit more detailed proposals, roughly 200-500 words, with special attention to the format and interactivity of their session. Submitters are encouraged to outline any handouts/tip sheets, tangible takeaways or skills, or anticipated outcomes of their session. Each of these proposals is then scored on a rubric (Figure 3) and commented on by at least two (and usually three) outside reviewers. The review panel is constituted of a rotating combination of academic librarians, library product designers/vendors, and independent information professionals. While reviewers do not know the identity of the submitters, the reviewer names and qualifications are posted on the conference website. We also screen all submissions for any obvious conflicts of interest and assign reviews according to the topic of the session (as designated by the submitter) vis-à-vis a particular reviewer’s qualifications (Figure 4).

Reviewers are asked to evaluate submissions on relevance to the upcoming Collective’s theme, whether or not the topic is timely and current, the interest or novelty of the session’s approach, whether or not the presentation is conducive to participation, and evidence that the idea can and will be further developed before the actual event (Figure 3). Sessions are then ranked according to their public vote and peer review score before the final phase, program committee review.

Figure 3: A scoring rubric from the 2015 Collective peer review.

Figure 4: Topic areas for proposals for the 2016 Collective and for reviewer expertise.

The programming committee carefully examines each proposal, its ranking, and the balance among session types or focus for making final selections. Regardless of whether or not a session is selected for inclusion, each submitter receives their rubric scores, vote tally, and anonymized comments from the peer review. Often, non-accepted proposal submitters are invited to collaborate with accepted sessions or may reply to open calls for participation for accepted sessions. Because of the emphasis on interactive sessions, the traditional hierarchy of presenter and non-presenter is subverted; even those not presenting will have much to gain from attending The Collective.

Finally, the organizers of all accepted sessions are required to participate in a planning phone call with a member of The Collective’s programming team. This phone call is used to assist in further development of interactive components, to discuss technical needs, decide on space requirements, and to troubleshoot any issues that session organizers are having. This personal, one-on-one contact helps ensure a smooth event, creates a personal connection before the occasion, and ensures that what is proposed on paper can be translated in person.

We strive to treat programming as ongoing relationships with professionals in our field rather than a simple “submit and forget it” process. The programming committee endeavors to develop relationships of support that begin with the proposal submission, continue with the planning call, and extend beyond the gathering as a long-term network of peers.

Importantly, The Collective’s programming is not limited to sessions from the open call. We also include two non-keynote plenary sessions. In the past, these have included an interactive discussion of library leadership and a board-game networking mashup. Day one of The Collective traditionally closes with the “Failure Confessions” – a series of lightning talks, followed by an open mic, where librarians share stories about spectacular failures we can all learn from.

To encourage informal networking and collaboration, our meeting space also features various pop-up “unconference” spaces, which include the IdeaLibrary, TinkerLab, and Shhh Room. The IdeaLibrary (Figure 5) is a space for impromptu brainstorming, networking, and discussion. We provide inspirational books on creativity, notepads, and other supplies to help participants kick-start conversations. The TinkerLab (Figure 6) is a mobile makerspace with some simple tools and kits for hands-on exploration and demonstrations of gaming, conductivity and DIY electronics, music-making, 3-D printing, and prototyping. Past available equipment included Creopop pens, Ozobot robots, 3Doodlers, Makey-Makeys, and LittleBits electronic kits. The Shhh Room is a space free of cell phones and computers and dedicated to quiet reflection. The space is also equipped with yoga mats, seat cushions, and meditative coloring books.

Figure 5: Books and a postcard mailing station in the 2016 IdeaLibrary.

Figure 6: Photos from the 2016 TinkerLab.

Because of the highly interactive hands-on nature of the sessions, we do not stream or record sessions. Instead, we emphasize real-time, face-to-face interaction. Instead, , to encourage that takeaways travel home with the attendees, we aim to document and share information from each year through community notetaking.. Each session has a designated volunteer notetaker who takes notes in an open Google document, which is open for editing and additions from all participants. Documents, such as slides, handouts, and takeaways are shared through both Sched, our online schedule manager, and through Google Docs post-conference.

The conference closes with a door prize raffle—open to everyone who completes our conference feedback survey. Immediately following the raffle, we host an open mic for sharing best conference lessons and feedback. The unedited, anonymous survey results are made public each year and are used for continuous re-thinking and improving of future events.

Networking & Local Connections

A major focus of The Collective is providing multiple opportunities for attendees to network with their peers. This is an important aspect of The Collective which builds a sense of community among attendees and, more importantly, presents opportunities for future collaboration outside of the annual gathering. We believe that the best networking events are those that build camaraderie through informal, shared experiences. We also use our networking events as a way to introduce attendees to the city of Knoxville as a great place to live and work by highlighting the best of Knoxville’s local dining and entertainment scenes. Professional gatherings often seem to take place in a vacuum, confined to the grounds of sterile conference centers. At The Collective, however, we’ve made concerted efforts to emphasize the place of our gathering—Knoxville, Tennessee, and just minutes from the University of Tennessee—and conduct the business of the meeting so as to benefit the community we call home.

Rather than host lunch or dinner at a hotel or conference center, we make it easy for participants to explore the city by locating our gathering a few minutes’ walk to downtown, providing custom maps of the city with recommended dining spots, and hosting our social events outside of the meeting space. Our first social activity offering is optional dine-arounds the evening before the conference: dutch-style dinners at downtown restaurants arranged by local hosts. These small group dinners provide an easy way to dine well in Knoxville and get to know attendees ahead of the main event. Each group of diners is led by a local host that not only leads the group’s walk to and from dinner but also answers questions about Knoxville attractions, what to expect at The Collective, etc.

We also partner with our local tourism board, VisitKnoxville, to use uncommon spaces for our complimentary reception and dinner for all attendees. In 2015, we hosted a Blues and BBQ night atop the iconic Sunsphere, built for the 1982 World’s Fair. In 2016, we organized a Southern Speakeasy at The Standard event venue in downtown Knoxville where the fifteen-piece Streamliners Jazz Orchestra performed while participants had photos taken in our photo booth, and enjoyed cocktails, a catered dinner from locally owned Rickard Ridge Barbeque, and nationally renowned Mag-Pies cakes.

Other community efforts include working with local artists to design artwork for our conference tote, hiring local musicians for our reception, sourcing branded swag from local suppliers, and using locally owned restaurants and bakers to cater our receptions. We’ve also scheduled our event during a typical low tourist season for our city. This scheduling not only provides a benefit to our community but also to our participants: hotels offer great room rates during this time of year to help our attendees maximize their travel budgets.

Finally, each closing night, we invite attendees to join in a“Literary Libations” pub crawl, organized in partnership with VisitKnoxville, our local tourism bureau. Local establishments offer appetizer specials and literary-themed cocktails to entice Collective attendees to come out and socialize; in 2016, the closing night outing wrapped up with an exclusive party Collective attendee-only event at the Oliver Royale restaurant and bar. This self-directed socializing showcases our city and also helps to boost our local economy in the off-season, which makes it an attractive partnership opportunity for venues.

Incentives & Partnerships

In the same way we’re breaking the mold of how to host a gathering, we also seek to redefine how an organization works with event sponsors. First, we don’t exchange money for programming directly or implicitly. Instead, we gladly work with vendors who wish to submit session proposals to move beyond the “show and tell” and sales demonstrations that are so common in the conference landscape. We require that vendor sessions adhere to the same standards of participation of other sessions and must not skew towards a sales pitch.

Rather than looking for one or two sponsors to foot the bill of the conference, we ask for small amounts of money from many different organizations. We keep it simple by offering sponsorship opportunities set at two funding levels. Each level provides a package of benefits including registrations, sponsorship acknowledgment, and, in the case of the higher funding tier, opportunity to include marketing materials in the conference tote. We aim to foster interactions between vendor representatives and librarians who normally wouldn’t interact with one another, which helps to redefine vendors as not “other” than librarians but instead as thoughtful partners who share the same commitment to getting the right information to the right user at the right time.

In this spirit, we do not provide a vendor expo area or marquee or sponsored programming sessions. This approach helps ensure that we don’t privilege one product or service over another and also means smaller businesses and start-ups have a chance to participate in our unique event, which we believe fosters meaningful and long-term relationships with librarians instead of elevator pitches as folks walk the conference floor.

We keep our conference bag design free of any advertising and our commemorative drink bottle only displays The Collective logo. Furthermore, we highly encourage sponsoring organizations to send technical experts, product development managers, and user experience employees in the place of sales staff as a way to connect product users directly to those designing the products.

Additionally, we create opportunities for all sponsor representatives to participate as equals during the gathering. All attendees (regardless of status) are invited and encouraged to participate in any session they’d like to. By removing the sales-focused channels that frequently occur at conferences (expos, product demos, etc.) we believe this also removes the tension between vendors and librarians. Because vendors aren’t tied up focusing on sales, they are free to participate in the sessions. Both of these factors help create an environment of idea exchange where everyone has the ability to redefine the roll between vendor and librarian as adversarial to one of collaborative partnership.

Finally, The Collective was started in large part to relieve discontent with the state of many contemporary conferences. While we are excited about what the Collective provides, we believe the movement to change the landscape is more important than any one particular conference or “brand.” We therefore welcome organizers from other conferences to participate in The Collective and willingly share our documentation and provide advice to anyone interested in starting their own non-profit, regional gatherings.

Logistics & Funding

The size, scheduling, and event overlap can greatly color the participant’s experience of a conference. So, while we encourage and receive registrations from more than 36 states and 140 organizations, we intentionally limit the size of the gathering (170 in 2015, 225 in 2016, 300 in 2017) and don’t plan to grow beyond an annual attendance of 350, so that attendees can easily meet new people and not feel intimidated by the environment. We use software to make our session schedule easy to use and have no more than four concurrent sessions happening at any one time. The programming committee takes great pains to distribute session content so that like-themed sessions are not competing in the same time slots and that there’s something for everyone during each session period. We also keep the pop-up spaces open for the duration of the gathering, so that there is always an informal unconference alternative to any planned session.

The cost of any professional development can be significant and how funds are used equally shapes the conference goers’ experience. Our smaller sponsorship level approach requires more entities to be contacted about funding, so we dedicate a significant amount of time to fundraising with the campaign typically starting around the time of ALA Annual (June/July preceding The Collective) and continuing until the first day of the event. Further complicating this funding challenge is our pledge to be an affordable alternative to large mega-conferences. The cost of attendance is heavily subsidized by sponsorship funds with the registration fees covering roughly half of the actual cost of the event per person, so we must raise more funds than the typical event since costs are not passed on to attendees.

While the amount and source of funding received matters to us, equally important is how funds are used. The Collective employs a number of strategies for being good stewards of sponsorship dollars so that we can do more with less. First, we use a mixed approach to procuring goods and services; when possible, we borrow material from other institutions, such as the University of Tennessee, the University of Dayton, or North Carolina State University libraries, especially for our TinkerLab. Borrowing materials such as laptop carts, additional microphones, etc., significantly cuts down on the expense of buying or renting overpriced AV technology.

We also think carefully about our food and beverage costs at the meeting venue. Rather than pay exorbitant amounts for soft drinks (a canned cola at one venue was $4.00 per can consumed) or bottled water, we provide endless coffee and supply each attendee with a beautiful commemorative bottle for water, which has the added eco-friendly benefit. We also minimize the size of our printed program (to save paper and printing costs) and instead encourage attendees to use the free Sched mobile app for our schedule.

These simple savings free up funds to purchase supplies that directly support presenters, inspire creativity, and help off-set total costs. We also strive to purchase materials that can be used beyond the event as a way to demonstrate value to our institutional sponsor and promote The Collective year-round. For example, many of the demo tools and equipment we purchase for the TinkerLab can be used by the University of Tennessee Libraries, our primary institutional sponsor, throughout the year for their studio and makerspaces. While it’s a hassle to store reception décor items, we’ve found that purchasing linens (and paying to have them laundered), purchasing reusable items such as LED candles and the like is significantly lower than the rental cost for these supplies.

Our choice of host city is also key in keeping our costs down. We are able to leverage existing community relationships of The Collective co-directors, resulting in discounts, cost-in-kind donations of goods and services, and unique experiences for attendees. As mentioned earlier, our making The Collective a “Knoxville event” allowed us to partner with the local, VisitKnoxville tourism office.  VisitKnoxville has assisted in negotiating conference and reception venues, resulting in lower room rates, discounted reception expenses, and free meeting venue space with a reasonable minimum food and beverage spend.

We also strive to keep our marketing efforts affordable by using a combination of low- or no-cost marketing channels. Low-cost efforts involve using printed marketing material sparingly and ensuring that material is not tied to a specific year so that it may be reused. We also take advantage of free modes of communication including social media and listservs to advertise both calls for proposals and to share registration updates.

We rely on an energetic volunteer staff of individuals as passionate as we are about revolutionizing the professional development landscape. Our collective, unpaid work means we can program, plan logistics, and maintain our website and marketing at no cost to the conference participant.

Building A Community

Building a community requires an online presence and identity, social media and sharing, and year-round activities. When we created The Collective, we spent many hours working with a group to develop an inspiring logo and website that represents our mission: to learn, create, and collaborate. As our logo’s Venn diagram illustrates, we look for connections, overlap, and opportunities among seemingly disparate groups and ideas.

During the event, we highly encourage the use of Twitter, Instagram, and Facebook posts to share information and takeaways. Additionally, our collaborative note-taking documents live on via our website for those unable to attend in person or those wishing to review the content of past gatherings.

We also design our conference swag to provide year-round brand presence and awareness. Our unique conference totes offer fun designs that are free of any marketing and sure to to conversation topics. Our complimentary drink bottles are high-quality and branded only with The Collective logo to help attendees remember their great experience throughout the year.

While The Collective’s unique approach to professional development offers plenty of opportunities for networking and collaboration during the gathering, we believe that the good work being born from it shouldn’t end when the event does. We therefore have focused on building a year-round community among our attendees: we use social media to both keep our alumni and future participants informed and as a way for them to connect once the annual event has concluded to promote the work of and celebrate the success of past attendees. Social media is also used to advertise meetups between Collective attendees at other conferences. Finally, this article itself is an attempt to created documentation and share with like-minded individuals interested in hosting a similar event.

The focus we have given to community building has paid off in terms of an increasing number of and increased involvement from attendees year after year. In our first year, we were surprised to have attendance from over 170 individuals from 31 states. (We would have been pleased with a regional gathering of 70!) In 2016, we moved to a larger venue and attendance between 2015 and 2016 increased by 40% with participants hailing from over 140 institutions. This diversity has been especially helpful in gaining wider points of view with regards to programing preferences and we are especially excited to see a growing range of geographic diversity with our first international participants from Canada and Sweden in 2016.


With two successful gatherings completed and a third in planning, we believe we have found a recipe for organizing and executing a successful and useful professional development event. Those wishing to revitalize or start their own event can employ the following tips to ensure their own success:

  1. There’s no substitute for excellent content. Make programing the main focus of the event; aim to attract and offer sessions that allow for learning, creating, and collaboration. Keep activities fresh, ensure participants walk away with tangible skills, and open the door to the sharing of ideas and future collaboration. We strongly suggest that traditional “sage on the stage” style conference presentations be eschewed and aim for hands-on or highly interactive sessions that make the participants the focus instead of the presenter. This interactivity brings more people into the conversation, opens the door for better discovery, higher interaction, and builds a stronger sense of community.
  2. Make the event about everyday practitioners. Administrators can certainly bring a wealth of knowledge, level of expertise, and unique point of view to sessions, but we believe that the ratio of administrators to practitioners at a conference should be reflective of the real-world landscape. This ensures that those individuals who are in the field have an opportunity to both share their experiences and learn from colleagues who face the same daily challenges. Furthermore, all sessions should offer an opportunity for the free exchange of ideas to occur. No one should be harassed or put down for their ideas; while dissent is an important aspect of discussion it should be done in a respectful manner and in an atmosphere of openness.
  3. Because librarians don’t work in a vacuum, we believe professional development shouldn’t either. Conferences planned around a broad theme rather than a job specialization facilitates cross-pollination between various groups and stakeholders which can lead to better understanding of personal roles and creative solutions to common challenges. It also opens the door for broader collaboration between the multitude of specializations that exist in today’s universities.

Finally, work hard to keep costs down. Professional development shouldn’t feel like tithing and participants will be more energized – and likely to return – if they feel the value to cost ratio is high. Keeping registration rates affordable also lowers the entry threshold for librarians with small or non-existent travel budgets. This creates a broader range of experiences, opinions, and points of view during sessions which improves the overall quality of idea exchanges taking place.


Many thanks to the peer reviewers for this article, Bethany Messersmith and Kathy Hart, and publishing editor Sofia Leung for their contributions. Thanks also to the many volunteers and attendees of The Collective who have made our dream of better professional development a reality.

Works Cited & Further Reading

THATCamp – The Humanities and Technology Camp – is “an open, inexpensive meeting where humanists and technologists of all skill levels learn and build together in sessions proposed on the spot.” Read more about this inexpensive unconference format here:

The Conversation – This grass-roots gathering was “developed by a group of filmmakers, investors, entrepreneurs, journalists, and consultants interested in the new creative and business opportunities — and new ways of connecting with audiences.” It took place at a pivotal moment for film distribution in 2008 and 2010. See for more information.

SXSW Festival PanelPicker – South by Southwest uses a three-part process to select content for its annual Interactive Conference that combines public votes, an advisory board, and staff. This format helped inspire our three-part programming process. Read about it here:

The Collective –

Brown University Library Digital Technologies Projects: New Theses &amp; Dissertations site

Fri, 2016-09-23 12:54

Last week, we went live with a new site for Electronic Theses and Dissertations.

My part in the planning and coding of the site started back in January, and it was nice to see the site go into production (although we do have more work to do with the new site and shutting down the old one).

Old Site

The old site was written in PHP and only allowed PhD dissertations to be uploaded. It was a multi-step process to ingest the dissertations into the BDR: use a php script to grab the information from the database and turn it into MODS, split and massage the MODS data as needed, map the MODS data files to the corresponding PDF, and run a script to ingest the dissertation into the BDR. The process worked, but it could be improved.

New Site

The new site is written in Python and Django. It now allows for Masters theses as well as PhD dissertations to be uploaded. Ingesting the theses and dissertations into the BDR will be a simple process of selecting the theses/dissertations in the Django admin when they are ready to ingest, and running the ingest admin action – the site will know how to ingest the theses and dissertations into the BDR in the correct format.

Ed Summers: Nicolini (5)

Fri, 2016-09-23 04:00

In Chapter 5 Nicolini takes a look at how practice theories have been informed by activity theory. Activity theory was pioneered by the psychologist Lev Vygotsky in the 1920s and 1930s. Since Vygotsky activity theory has grown and evolved in a variety of directions that are all characterized by the attention to the role of objects and an attention to the role of conflict or dialectic in human activity. Nicolini spends most of the chapter looking specifically at cultural and historical activity theory.

He starts off by recalling the previous discussion of Marx, particularly his description of work in Das Kapital, where work is broken up into a set of interdependent components:

  1. the worker
  2. the material upon which the worker works
  3. the instruments used to carry out the work
  4. the actions of the worker
  5. the goal towards which the worker works
  6. the product of the work

The identity of the worker is a net effect of this process. Vygotsky and other activity theorists took these rough categories and refined them. Vygotsky in particular focused attention on mediation, or how we as humans typically interact with our environments using cultural artifacts (things designed by people) and that language itself was an example of such an artifact. These artifacts transform the person using them, and the environment: workers are transformed by their tools.

Instead of focusing on individual behavior, activity theorists often examine how actions are materially situated at various levels: actions, activities and operations which are a function of thinking about the collective effort involved. This idea was introduced by Leont’ev (1978). Kuutti (1996) is cited a few times, which is interesting because Kuutti & Bannon (2014) is how I found out about Nicolini in the first place (small world). To illustrate the various levels Leont’ev has an example of using the gears in a car with manual transmission, and how a person starts out performing the individual actions of shifting gears as they learn, but eventually they become automatic operations that are performed without much thinking during other activities such speeding up, stopping, going up hills, etc. The operations can also be dismantled and reassembled and recomposed to create new actions. I’m reminded of push starting my parent’s VW Bug when the battery was dead. The example of manual transmission is particularly poignant because of the prevalence of automatic cars today, where those shifting actions have been subsumed or embodied in the automatic transmission. The actions can no longer be decomposed, at least not by most of us non-mechanics. It makes me wonder briefly about the power dynamics are embodied in that change.

It wasn’t until Y. Engeström (1987) that the focus came explicitly to bear on the social. Yrjö Engeström (who is referenced and linked in Wikipedia but there is not an article for him yet) is credited for starting the influential Scandinavian activity theory strand of work, and helping bring it to the West. The connection to Scandinavia makes me think about participatory design which came from that region, and what connections there are between it and activity theory. Also action research seems similarly inflected, but perhaps it’s more of a western rebranding? At any rate Engeström got people thinking about an activity system which Nicolini describes as a “collective, systemic, object-oriented formation”, which is summarized with this diagram:

This makes me wonder if there might be something in this conceptual diagram from Engeström for me to use in analyzing my interviews with web archivists. It’s kind of strange to run across this idea of object-oriented again outside of the computer science context. I can’t help but wonder how much cross-talk there was between psychology/sociology and computer science. The phrase is also being deployed in humanistic circles with the focus on object oriented ontology, which is a philosophy of decentering the human. It’s kind of ironic given how object-oriented programming has fallen out of favor a bit in software development, with a resurgence of interest in functional programming. But functions can be objects, but then there is the post-functionalism move, so…but I digress, completely.

This is where Cultural and Historical Activity Theory (CHAT) come in, which is concerned with the ways in which objects are both socially and objectively constructed. It seems like an attempt at a middle path between social sciences and the physical sciences. This focus on the material and where it intersects with the social is something I really like about this line of thought coming from Marx. It’s interesting that Nicolini uses the phrase “bite back” here to talk about how objects can affect us as well. I seem to remember running across this idea in some of Ian Bogost’s work but I can’t place it right now. It’s an interesting phrase that might be fun to follow in the literature. Anyway CHAT is a relatively recent formulation (relative to activity theory) and credited to Miettinen & Virkkunen (2005). It seems like a useful thing to follow up on in the context of my Web archiving study because I do need a way to talk about the material of the archive, the archival system, the Web and the people working with it (archivists, researchers, etc)–and fitting the analysis into existing work will be helpful.


  • objects and agents emerge together and define each other (Miettinen & Virkkunen, 2005)
  • the object is inherently fragmented (never completely visible, seen in multiple ways)
  • objects evolve: they are always contestable and (often) contested

CHAT folks will often look at at least two activity systems, to show how interactions between activities embed social practices: Knotworking. The resulting networks (???) looks like an important paper to read to follow up on this idea.

Activity systems are, in fact, by definition internally fragmented and inconsistent. The tensions and conflicts emerging from such contradictions constitute the origin and the source of energy for the continuous change and expansion of activity systems and their components. (p. 114)

I’m wondering if there is a connection between the idea of Broken-ness, fragmentation and broken world thinking & repair. The idea of Knotworking (Y. Engeström, Engeström, & Vähäaho, 1999) specifically recalls Jackson, Gillespie, & Payette (2014). I like the idea of zooming on sites of conflict or contradiction as a way of locating activities and practices, and seeing them as integral features, and locations for dialectical processes and resolution (Marx). CHAT also stresses that these sites are also useful as spaces for intervention and redesign. It is suggested that it might be necessary to engage at this level to truly understand the activity. [Antonio Gramsci] is cited here for his idea of organic intellectuals. Engestrom (2000) and Y. Engeström (2001) both look like they could be good things to read to follow up on this idea about interventions, particularly for its focus on ethnographic methods, the necessity to generate thick description. Also there is a connection back to American Pragmatism that seems like an important connection, at least for me (Miettinen, 2006).

It’s a bit early to say, but after reading this chapter about CHAT I feel like I’ve found the conceptual tool I was missing for analyzing my interview transcripts. It also situates my work on DocNow by positioning that work as an intervention for understanding, which is extremely helpful. Nicolini’s critique of a strong version of CHAT, one that treats the activity system itself into a thing itself seems very apt here. Also, some Marxists have criticized CHAT for its conservative use of Marx: fixing small local problems, without looking at the larger picture.


Engestrom, Y. (2000). Activity theory as a framework for analyzing and redesigning work. Ergonomics, 43(7), 960–974.

Engeström, Y. (1987). Learning by expanding: N activity-theoretical approach to developmental research. Orienta-Konsultit.

Engeström, Y. (2001). Expansive learning at work: Toward an activity theoretical reconceptualization. Journal of Education and Work, 14(1), 133–156.

Engeström, Y., Engeström, R., & Vähäaho, T. (1999). Activity theory and social practice: Cultural- historical approaches. In S. Chaiklin, M. Hedegaard, & U. J. Jensen (Eds.),. Aarhus University Press Aarhus,, Denmark.

Jackson, S. J., Gillespie, T., & Payette, S. (2014). The policy knot: Re-integrating policy, practice and design in CSCW studies of social computing. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 588–602). Association for Computing Machinery.

Kuutti, K. (1996). Activity theory as a potential framework for human-computer interaction research. Context and Consciousness: Activity Theory and Human-Computer Interaction, 17–44.

Kuutti, K., & Bannon, L. J. (2014). The turn to practice in HCI: Towards a research agenda. In Proceedings of the 32nd annual ACM Conference on Human Factors in Computing Systems (pp. 3543–3552). Association for Computing Machinery. Retrieved from

Leont’ev, A. N. (1978). Activity, consciousness, personality. Prentice Hall.

Miettinen, R. (2006). Epistemology of transformative material activity: John dewey’s pragmatism and cultural-historical activity theory. Journal for the Theory of Social Behaviour, 36(4), 389–408.

Miettinen, R., & Virkkunen, J. (2005). Epistemic objects, artefacts and organizational change. Organization, 12(3), 437–456.