A failed attempt of speeding up grouping in Solr, with an idea for next attempt.Grouping at a Statsbiblioteket project
We have 100M+ articles from 10M+ pages belonging to 700K editions of 170 newspapers in a single Solr shard. It can be accessed at Mediestream. If you speak Danish, try searching for “strudsehest”. Searches are at the article level, with the results sorted by score and grouped by edition, with a maximum of 10 articles / edition. Something like this:q=strudsehest&group=true&group.field=edition&group.limit=10
This works well for most searches. But for the heavier ones, response times creeps into seconds, sometimes exceeding the 10 second timeout we use. Not good. So what happens in a grouped search that is sorted by document score?
- The hits are calculated
- A priority queue is used to find the top-X groups with the highest scores
- For each hit, calculate its score
- If the score is > the lowest score in the queue, resolve the group value and update the priority queue
- For each of the top-X groups, a priority queue is created and filled with document IDs
- For each hit, calculate is score and resolve its group value (a BytesRef)
- If the group value matched one of the top-X groups, update that group’s queue
- Updating the queue might involve resolving multiple field values for the document, depending on in-group sorting
- Iterate the top-X groups and resolve the full documents
Observation 1: Hits are iterated twice. This is hard to avoid if we need more than 1 entry in each group. An alternative would be to keep track of all groups until all the hits has been iterated, but this would be extremely memory costly with high cardinality fields.
Observation 2: In step 3.1, score and group resolving is performed for all hits. It is possible to use the same logic as step 2.1, where the group is only resolved if the score is competitive.Attempt 1: Delayed group resolving
The idea in observation 2 has been implemented as a kludge-hacky-proof-of-concept. Code is available at the group_4_10 branch at GitHub for those who like hurt.
When the hits are iterated the second time, all scores are resolved but only the group values for the documents with competitive scores are resolved. So how well does it work?
Observation: Optimized (aka lazy group value resolving) grouping is a bit slower than vanilla Solr grouping for some result sets, probably the ones where most of the group values has to be resolved. For other result sets there is a clear win.
It should be possible to optimize a bit more and bring the overhead of the worst-case optimized groupings down to near-zero. However, since there are so few best-case result sets and since the win is just about a third of the response time, I do not see this optimization attempt as being worth the effort.Idea: A new level of lazy
Going back to the algorithm for grouping we can see that “resolving the value” occurs multiple times. But what does it mean?
With DocValued terms, this is really a two-step process: The DocValue ordinal is requested for a given docID (blazingly fast) and the ordinal is used to retrieve the term (fast) in the form of a BytesRef. You already know where this is going, don’t you?
Millions of “fast” lookups accumulates to slow and we don’t really need the terms as such. At least not before we have to deliver the final result to the user. What we need is a unique identifier for each group value and the ordinal is exactly that.
But wait. Ordinals are not comparable across segments! We need to map the segment ordinals to a global structure. Luckily this is exactly what happens when doing faceting with facet.method=fc, so we can just scrounge the code from there.
With this in mind, the algorithm becomes
- The hits are calculated
- A priority queue is used to find the top-X groups with the highest scores
- For each hit, calculate its score
- If the score is > the lowest score in the queue, resolve the group value ordinal and update the priority queue
- For each of the top-X groups, a priority queue is created and filled with document IDs
- For each hit, resolve its group value segment-ordinal and convert that to global ordinal
- If the group value ordinal matches one of the top-X groups, update that group’s queue
- Updating the queue might involve resolving the document score or resolving multiple field value ordinals for the document, depending on in-group sorting
- Iterate the top-X groups and resolve the Terms from the group value ordinals as well as the full documents
Note how the logic is reversed for step 3.1, prioritizing value ordinal resolving over score calculation. Experience from the facet code suggests that ordinal lookup is faster than score calculation.
This idea has not been implemented yet. Hopefully it will be done Real Soon Now, but no promises.
Ariadne hits its 20th birthday, and its 75th issue.
Back in 1994 the UK Electronic Libraries Programme (eLib) was set up by the JISC, paid for by the UK's funding councils. One of the many projects funded by eLib was an experimental magazine that could help document the changes under way and give the researchers working on eLib projects a means to communicate with one another and their user communities. That magazine was called Ariadne. Originally produced in both print and web versions, it outlived the project that gave birth to it. We are now at the point where we can celebrate 20 years of the web version of Ariadne. Read more about Editorial: Happy 20th Birthday Ariadne!Article type: Issue number: Authors: Organisations: Date published: Sun, 01/17/201675http://www.lboro.ac.uk/issue75/editorial
This may sound radical, but the fact is the FRBR does define some subtypes. They don't appear in the three high-level diagrams, so it isn't surprising that many people aren't aware of them. They are present, however in the attributes. Here is the list of attributes for FRBR work:
title of the work
form of work
date of the work
other distinguishing characteristic
context for the work
medium of performance (musical work)
numeric designation (musical work)
key (musical work)
coordinates (cartographic work)
equinox (cartographic work)I've placed in italics those that are subtypes of work. There are two: musical work, and cartographic work. I would also suggest that "intended termination" could be considered a subtype of "continuing resource", but this is subtle and possibly debatable.
Other subtypes in FRBR are:
Expression: serial, musical notation, recorded sound, cartographic object, remote sensing image, graphic or projected image
Manifestation: printed book, hand-printed book, serial, sound recording, image, microform, visual projection, electronic resource, remote access electronic resourceThese are the subtypes that are present in FRBR today, but because sub-typing probably was not fully explored, there are likely to be others.
Object-oriented design was a response to the need to be able to extend a data model without breaking what is there. Adding a subtype should not interfere with the top-level type nor with other subtypes. It's a tricky act of design, but when executed well it allows you satisfy the special needs that arise in the community while maintaining compatibility of the data.
Since we seem to respond well to pictures, let me provide this idea in pictures, keeping in mind that these are simple examples just to get the idea across.
The above picture models what is in FRBR today, although using the inheritance capability of OO rather than the E-R model where inheritance is not possible. Both musical work and cartographic work have all of the attributes of work, plus their own special attributes.
If it becomes necessary to add other attributes that are specific to a single type, then another sub-type is added. This new subtype does not interfere with any code that is making use of the elements of the super-type "work". It also does not alter what the music and maps librarians must be concerned with, since they are in their own "boxes." As an example, the audio-visual community did an analysis of BIBFRAME and concluded, among other things, that the placement of duration, sound content and color content in the BIBFRAME Instance entity would not serve their needs; instead, they need those elements at the work level.*
This just shows work, and I don't know how/if it could or should be applied to the entire WEMI thread. It's possible that an analysis of this nature would lead to a different view of the bibliographic entities. However, using types and sub-types, or classes and sub-classes (which would be the common solution in RDF) would be far superior to the E-R model of FRBR. If you've read my writings on FRBR you may know that I consider FRBR to be locked into an out-of-date technology, one that was already on the wane by 1990. Object-oriented modeling, which has long replaced E-R modeling, is now being eclipsed by RDF, but there would be no harm in making the step to OO, at least in our thinking, so that we can break out of what I think is a model so rigid that it is doomed to fail.
*This is an over-simplification of what the A-V community suggested, modified for my purposes here. However, what they do suggest would be served by a more flexible inheritance model than the model currently used in BIBFRAME.
Marieke Guy, Philip Hunter, John Kirriemuir, Jon Knight and Richard Waller look back at how Ariadne began 20 years ago as part of the UK Electronic Libraries Programme (eLib), how some of the other eLib projects influenced the web we have today and what changes have come, and may yet come, to affect how digital libraries work.
Ariadne is 20 years old this week and some members of the current editorial board thought it might be useful to look back at how it came to be, how digital library offerings have changed over the years, and maybe also peer into the near future. To do this, we’ve enlisted the help of several of the past editors of Ariadne who have marshalled their memories and crystal balls. Read more about FIGIT, eLib, Ariadne and the Future.
Marieke Guy, Philip Hunter, John Kirriemuir, Jon Knight, Richard WallerOrganisations: Article type: Issue number: Authors: Date published: Sun, 01/17/201675http://www.lboro.ac.uk/issue75/editorsreview
There have been a number of workshops and presentations that I’ve seen floating around that talk about ways of using MarcEdit and OpenRefine together when doing record editing. OpenRefine, for folks that might not be familiar, use to be known as Google Refine, and is a handy tool for working with messy data. While there is a lot of potential overlap between the types of edits available between MarcEdit and OpenRefine, the strength of the tool is that it allows you to access your data via a tabular interface to easily find variations in metadata, relationships, and patterns.
For most folks working with MarcEdit and OpenRefine together, the biggest challenge is moving the data back and forth. MARC binary data isn’t supported by OpenRefine, and MarcEdit’s mnemonic format isn’t well suited for import using OpenRefine’s import options as well. And once the data has been put into OpenRefine, getting back out and turned into MARC can be difficult for first time users as well.
Because I’m a firm believe that uses should use the tool that they are most comfortable with – I’ve been talking to a few OpenRefine users trying to think about how I could make the process of moving data between the two systems easier. And to that end, I’ll be adding to MarcEdit a toolset that will facilitate the export and import of MARC (and MarcEdit’s mnemonic) data formats into formats that OpenRefine can parse and easily generate. I’ve implemented this functionality in two places – one as a standalone application found on the Main MarcEdit Window, and one as part of the MarcEditor – which will automatically convert or import data directly into the MarcEditor Window.
Exporting Data from MarcEdit
As noted above, there will be two methods of exporting data from MarcEdit into one of two formats for import into OpenRefine. Presently, MarcEdit supports generating either json or tab delimited format. These are two formats that OpenRefine can import to create a new project.
If I have a MARC file and I want to export it for use in OpenRefine – I would using the following steps:
- Open MarcEdit
- Select Tools/OpenRefine/Export from the menu
- Enter my Source File (either a marc or mnemonic file)
- My Save File – MarcEdit supports export in json or tsv (tab delimited)
- Select Process
This will generate a file that can used for importing into OpenRefine. A couple notes about that process. When importing via tab delimited format – you will want to unselect options that does number interpretation. I’d also uncheck the option to turn blanks into nulls and make sure the option is selected that retains blank rows. These are useful on export and reimport into MarcEdit. When using Json as the file format – you will want to make sure after import to order your columns as TAG, Indicators, Content. I’ve found OpenRefine will mix this order, even though the json data is structured in this order.
Once you’ve made the changes to your data – Select the export option in OpenRefine and select the export tab delimited option. This is the file format MarcEdit can turn back into either MARC or the mnemonic file format. Please note – I’d recommend always going back to the mnemonic file format until you are comfortable with the process to ensure that the import process worked like you expected.
And that’s it. I’ve recorded a video on YouTube walking through these steps – you can find it here:
This if course just shows how to data between the two systems. If you want to learn more about how to work with the data once it’s in OpenRefine, I’d recommend one of the many excellent workshops that I’ve been seeing put on at conferences and via webinars by a wide range of talented metadata librarians.
The VIVO Committers Group. The VIVO project now has a committers group!
Emma Tonkin discusses how the words we use, and where we use them, change over time, and how this can cause issues for digital preservation.
'Now let's take this parsnip in.'
'Parsnip, coffee. Perrin, Wellbourne. What does it matter what we call things?'
– David Nobbs, The Fall And Rise of Reginald Perrin
Introduction Read more about Lost Words, Lost Worlds.
Emma TonkinOrganisations: Article type: Issue number: Authors: Date published: Sat, 01/16/201675http://www.lboro.ac.uk/issue75/tonkin
A few of us at Tech Connect participated in the #1Lib1Ref campaign that’s running from January 15th to the 23rd . What’s #1Lib1Ref? It’s a campaign to encourage librarians to get involved with improving Wikipedia, specifically by citation chasing (one of my favorite pastimes!). From the project’s description:
Imagine a World where Every Librarian Added One More Reference to Wikipedia.
Wikipedia is a first stop for researchers: let’s make it better! Your goal today is to add one reference to Wikipedia! Any citation to a reliable source is a benefit to Wikipedia readers worldwide. When you add the reference to the article, make sure to include the hashtag #1Lib1Ref in the edit summary so that we can track participation.
Below, we each describe our experiences editing Wikipedia. Did you participate in #1Lib1Ref, too? Let us know in the comments or join the conversation on Twitter!
I recorded a short screencast of me adding a citation to the Darbhanga article.
— Eric Phetteplace
I used the Citation Hunt tool to find an article that needed a citation. I selected the second one I found, which was about urinary tract infections in space missions. That is very much up my alley. I discovered after a quick Google search that the paragraph in question was plagiarized from a book on Google Books! After a hunt through the Wikipedia policy on quotations, I decided to rewrite the paragraph to paraphrase the quote, and then added my citation. As is usual with plagiarism, the flow was wrong, since there was a reference to a theme in the previous paragraph of the book that wasn’t present in the Wikipedia article, so I chose to remove that entirely. The Wikipedia Citation Tool for Google Books was very helpful in automatically generating an acceptable citation for the appropriate page. Here’s my shiny new paragraph, complete with citation: https://en.wikipedia.org/wiki/Astronautical_hygiene#Microbial_hazards_in_space.
— Margaret Heller
I edited the “Library Facilities” section of the “University of Maryland Baltimore” article in Wikipedia. There was an outdated link in the existing citation, and I also wanted to add two additional sentences and citations. You can see how I went about doing this in my screen recording below. I used the “edit source” option to get the source first in the Text Editor and then made all the changes I wanted in advance. After that, I copy/pasted the changes I wanted from my text file to the Wikipedia page I was editing. Then, I previewed and saved the page. You can see that I also had a typo in my text and had to fix that again to make the citation display correctly. So I had to edit the article more than once. After my recording, I noticed another typo in there, which I fixed it using the “edit” option. The “edit” option is much easier to use than the “edit source” option for those who are not familiar with editing Wiki pages. It offers a menu bar on the top with several convenient options.
The recording of editing a Wikipedia article:
— Bohyun Kim
It has been so long since I’ve edited anything on Wikipedia that I had to make a new account and read the “how to add a reference” link; which is to say, if I could do it in 30 minutes while on vacation, anyone can. There is a WYSIWYG option for the editing interface, but I learned to do all this in plain text and it’s still the easiest way for me to edit. See the screenshot below for a view of the HTML editor.
I wondered what entry I would find to add a citation to…there have been so many that I’d come across but now I was drawing a total blank. Happily, the 1Lib1Ref campaign gave some suggestions, including “Provinces of Afghanistan.” Since this is my fatherland, I thought it would be a good service to dive into. Many of Afghanistan’s citations are hard to provide for a multitude of reasons. A lot of our history has been an oral tradition. Also, not insignificantly, Afghanistan has been in conflict for a very long time, with much of its history captured from the lens of Great Game participants like England or Russia. Primary sources from the 20th century are difficult to come by because of the state of war from 1979 onwards and there are not many digitization efforts underway to capture what there is available (shout out to NYU and the Afghanistan Digital Library project).
Once I found a source that I thought would be an appropriate reference for a statement on the topography of Uruzgan Province, I did need to edit the sentence to remove the numeric values that had been written since I could not find a source that quantified the area. It’s not a precise entry, to be honest, but it does give the opportunity to link to a good map with other opportunities to find additional information related to Afghanistan’s agriculture. I also wanted to chose something relatively uncontroversial, like geographical features rather than historical or person-based, for this particular campaign.
— Yasmeen Shorish
Villanova Library Technology Blog: Martin Luther King, Jr. at Villanova University, January 20, 1965
In commemoration of Martin Luther King Day (January 18, 2016), the Rev. Dennis Gallagher, OSA, PhD, University archivist, collaborated with Joanne Quinn, graphic designer and Communication and Service Promotion team leader, to create this exhibit, “Martin Luther King, Jr., at Villanova University, January 20, 1965.” The exhibit fills two cases and features materials from the University Archives which are located in Falvey Memorial Library. All materials were selected by Father Gallagher; he also wrote the captions that accompany the objects. Quinn created the graphics and arranged the exhibit.
The first case displays two large black and white photographs from Martin Luther King’s visit to Villanova on January 20, 1965, and a typewritten copy of the speech he delivered that day. Father Gallagher describes the first photograph thus, “Reverend Doctor Martin Luther King, Jr., civil rights leader and Nobel Prize winner gives an address at Villanova on January 20, 1965.” The second photograph shows the Reverend John A. Klekotka, OSA, University president, Dr. King and Thomas J. Furst, student body president. In the center is the manuscript of King’s talk, “Challenges of the New Age.” This copy of the speech was donated to the Villanova University Archives by Thomas Bruderle in 2015. Dr. King’s speech was part of the Villanova Forum Series.
Three objects fill the second case: a 1965 Belle Air yearbook, a Villanova pennant containing an image of the Wildcat, and a January 8, 1965 issue of the Villanovan newspaper. The yearbook is opened to display four photographs of Dr. King as he gave his speech. The bound volume of the Villanovan (vol. 40, no. 11, p. 1, Jan. 8, 1965) shows the feature story, “Forum Features Dr. Martin Luther King,” and his photograph.
If this small exhibit whets your curiosity, Falvey has a multitude of books for you. The sources listed below represent just a small part of the library’s holdings.
Martin Luther King, Jr.: The Making of a Mind (1982) John J. Ansbro.
Martin Luther King (2010) Godfrey Hodgson.
The Speeches of Martin Luther King (Video) (1988) Martin Luther King.
My Life with Martin Luther King, Jr. (1994) Coretta Scott King (Martin Luther King’s wife).
The Autobiography of Martin Luther King, Jr. (1998) Martin Luther King.
King: A Biography (2012) David L. Lewis.
Villanova Library Technology Blog: Library Trials to Routledge Handbooks Online and Taylor & Francis eBooks
From January 11 to March 11, the library will be running a trial of two major e-book platforms from Taylor & Francis: Routledge Handbooks Online, and Taylor & Francis eBooks (which contains mostly Routledge titles). Both collections are strong in a wide range of humanities and social science disciplines.
Routledge Handbooks Online contains collections of scholarly review articles on commonly researched topics. More than 600 volumes (about 18,000 chapters) are included. The articles are useful for getting a general overview of a topic, and make good jumping-off points for further investigation. Each chapter can be viewed in HTML or downloaded as a PDF.
Taylor & Francis eBooks contains more than 50,000 ebooks—both single-author texts and edited collections. Many of these are recent publications, but the collection contains works spanning the last century as well. The majority of them are DRM-free, with no time limits or print limitations, though some do have restrictions.
Your feedback about these resources is valuable to us. Please send your comments to Nik Fogle at firstname.lastname@example.org. We’d particularly like to know what you found useful about them, what was lacking, and to what extent you would use this material in the future. Please be sure to tell us which of the two platforms your comments are about.
Thanks to Fiona Chambers, a student leader on the Martin Luther King Jr. Day of Service Committee, library staff did their part to draw attention to the MLK Day of Service by wearing t-shirts provided by the committee.
The Library also served as one of the MLK Day of Service Coat Drive locations on campus after being contacted by Rebecca Lin, another student leader on the MLK Day of Service Committee.
The Library will be closed on Monday, Jan. 18, to honor Dr. Martin Luther King Jr. and to allow library staff and students to participate in MLK Day of Service events.
Page via Wayback MachineIn those early days, the Web may not have had cat videos, but that didn't mean it lacked cats. Cats colonized the Web very early. Among the leaders were the twins Nijinski and Pavlova, sadly now deceased. 21 years ago this month, on Jan 11th 1995, the late Mark Weiser created Nijinksy and Pavlova's Web page. A year later they were featured in the book Internet for Cats by Judy Heim. On Dec 1st 1998 the Internet Archive's crawler visited the page, then 1,421 days old and thus a veteran among Web pages. This was the first of what would be 39 captures of the page over the next decade, the last being on May 11th 2008.
The page achieved a Methuselah-like longevity of at least 4,870 days, or over 13 years. Fortunately, the last capture shows that Mark never updated the images. Nijinksy and Pavlova remain immortalized in all their kitten-cuteness and similarity. Nijinksy, who inherited all the brains and the energy of the twins, remained svelte and elegant to the end of his days, but in later life his lethargic sister became full-figured. Although we cannot know the precise date the web page vanished, it appears that both cats outlived their page by a year or more.
Page via oldweb.todayThanks to Ilya Kreymer's oldweb.today, we can view the page using Mosaic 2.2, a contemporary browser. Note the differences in the background and the fonts, and the fact that all the resources oldweb.today loaded came from the Internet Archive. This is expected with pages as old as Nijinksy and Pavlova's. In those days the Internet Archive was pretty much alone in collecting and preserving the Web.
Over the last two decades the Internet Archive has become an essential resource. Please support their work by making a donation.
As January brings a chill, sit by the fireside and read a newly digitized work! This week brings a host of new titles including:
American Catholic Historical Society
Records of the ACHS (11 articles added)
I.C.B.U. Journal (33 issues added)
Contributions from Augustinian Theologians and Scholars
Saint Augustine : monk, priest, bishop / by Luc Verheijen, O.S.A. (campus access only)
Dime Novel and Popular Literature
Old Sleuth Weekly (9 issues added)
Chicago Ledger (1 issue added)
New York Weekly (2 issues added)
New York Saturday Journal (1 issue added)
Frederick William Walter Papers (1 item added, set completed)
Joseph McGarrity Collection
The Shan Van Vocht (5 issues added)
Villanova Digital Collection
Falvey Memorial Library
Daily Doodles (2013: 29 doodles added)
Journal of Web Librarianship: The use of microblogs and social networking services: A comparison between academic libraries of the United States and China
The Izmir Institute of Technology Library team.
Martin Luther King Jr. gave his famous “I have a dream” speech on August 28, 1963 on the steps of the Lincoln Memorial as part of the March on Washington for Jobs and Freedom. To this day the speech is a key piece of oration for the Civil Rights movement. Like every great orator, MLK had some prepared speech notes for his address — notes that actually did not include the famous “I have a dream” section (which was spun on the spot from the heart) — but he did not keep them. What happened to those notes, you ask?
They came into the possession of Villanova alum and College Basketball Hall of Famer George Raveling, class of 1960.
George Raveling, 10th on Villanova’s all-time rebounding list and the second ever black basketball player at Villanova, was inducted into the College Basketball Hall of Fame in 2013 and the Basketball Hall of Fame in 2015. He was not only a talented college basketball player, but also went on to be an inspiring coach. He got his coaching start as a part-time assistant to Villanova coach Jack Kraft and later went on to coach full-time for Washington State, the University of Iowa, and the University of Southern California. Since retiring from coaching, Raveling has worked as Director for International Basketball for Nike.
So how did Raveling become the proud keeper of MLK’s speech notes? Raveling and his good friend Warren Wilson were only young men when they decided to go to Washington D.C. for the march in 1963. They were approached by one of the march’s organizers and asked to provide security–and they agreed. Raveling wound up just a few feet from MLK on the steps of the Lincoln Memorial. He was enthralled by MLK and his message of equality and civil rights. After the speech concluded and the crowd on the steps moved to disperse, he simply asked King, “Can I have that?”
And so they became his.
The notes have since been museum-treated and framed and are stored in a vault for safe-keeping. Raveling does not want to ever sell them, but is interested in their public display; he is currently in talks with various educational and museum groups.
You can read the full Sports Illustrated article on George Raveling and the MLK speech notes here. USA Today also covered the story. To learn more about Raveling’s induction to the College Basketball Hall of Fame, check out this article via VU Hoops.
Article by Michelle Callaghan, graduate assistant on the Communication and Service Promotion team. She is currently pursuing her MA in English at Villanova University.
From Morton’s challenging meditation on scale:
We need to get out of the persuasion business and start getting into the magic business, or the catalysis business, or the magnetizing business, or whatever you want to call it. Using reason isn’t wrong. But with objects this huge, this massively distributed, this counterintuitive, this transdimensional, it’s not enough simply to use art as candy coating on top of facts. We can’t just be in the PR business. Percy Shelley put it beautifully when he wrote, “We lack the creative faculty to imagine that which we know.” That was back in 1820, and it’s only gotten worse. Consider the heavy hydrocarbons that subtend the soil of the Lago Agrio oil field in Ecuador, a black fudge hyperobject that oozes into drinking water, with unknown and under-studied mutagenic and carcinogenic effects. We do not need to keep on parsing the data like Chevron, the defendants in the lawsuit on behalf of the people affected by the contaminated soil. Such parsing of data would be using the very same tactic as the gigantic corporation, the strategy of producing endless maps and graphs. (Morton, 2013, pp 181-182).
Morton, T. (2013). Hyperobjects: Philosophy and ecology after the end of the world. University of Minnesota Press.
Eric Hellman: Not using HTTPS on your website is like sending your users outside in just their underwear.
viewed from the escalatorThis past weekend, I spent 3 full days talking to librarians, publishers, and library vendors about making the switch to HTTPS. The Library Freedom Project staffed a table in the exhibits at the American Library Association Midwinter meeting. We had the best location we could possibly wish for, and we (Alison Macrina, Nima Fatemi, Jennie Rose Halperin and myself) talked our voices hoarse with anyone interested in privacy in libraries, which seemed to be everyone. We had help from Jason Griffey and Andromeda Yelton (who were next to us, showing off the cutest computers in town for the "Measure the Future" project).
Badass librarians with
framed @snowden tweet.We had stickers, we had handouts. We had ACLU camera covers and 3D-printed logos. We had new business cards. We had a framed tweet from @Snowden praising @libraryfreedom and "Badass Librarians", who were invited to take selfies.
DHS fought to stop libraries from using privacy technology, but @LibraryFreedom beat them. Librarians are badass. https://t.co/5BzFywnJLn— Edward Snowden (@Snowden) October 11, 2015 Apart from helping to raise awareness about internet privacy, talking to lots of real people can help hone a message. Some people didn't really get encryption, and a few were all "What??? Libraries don't use encrypted connections???" By the end of the first day, I had the message down to the one sentence:
Not using HTTPS on your website is like sending your users outside in just their underwear.Because, if you don't use HTTPS, people can see everything, and though there's nothing really WRONG with not wearing clothes outside, we live in a society where doing so by custom is the respectful thing. There are many excellent reasons to preserve our users' privacy, but many of the reasons tend to highlight the needs of other people. The opposing viewpoint is often "Privacy is a thing of the past, just get over it" or "I don't have anything to hide, so why work hard so you can keep all your dirty secrets?" But most people don't think wearing clothes is a thing of the past; a connection made between encrypted connections and nice clothes just normalizes the normal.
We've previously used the analogy that HTTP is like sending postcards while HTTPS is like sending notes in envelopes. This is a harder analogy to use in a 30 second explainer because you have to make a second argument that websites shouldn't be sent on postcards.
We need to craft better slogans because there's a lot of anti-crypto noise trying to apply an odor of crime and terrorism to good privacy and security practices. The underwear argument is effective against that - I don't know anyone that isn't at least a bit creeped out by the "unclothing" done by the TSA's full body scanners.
No Pants Subway Ride 2015: cosmetic trierarchs CC BY-NC-ND by captin_nod
Maybe instead of green lock icons for HTTPS, browser software could display some sort of flesh-tone nudity icon for unencrypted HTTP connections. That might change user behavior rather quickly. I don't know about you but I never lose sleep over door locks, but I do have nightmares about going out without my pants!