You are here

Feed aggregator

Harvard Library Innovation Lab: IIPC 2017 – Day One

planet code4lib - Thu, 2017-06-15 18:06


It's exciting to be back at IIPC this year to chat and web archives!


The conference kicked off at on Wednesday, June 14, at 9:00 with coffee, snacks, and familiar faces from all parts of the world. Web archives bring us together physically!


So many people to meet. So many collaborators to greet!


Jane Winters and Nic Taylor welcomed. It’s wonderful to converse and share in this space — grand, human, bold, warm, strong. Love the Senate House at University of London. Thank you so much for hosting us!

Leah Lievrouw, UCLA
Web history and the landscape of communication/media research

Leah told us that computers are viewed today as a medium — as human communication devices. This view is common now, but hasn’t been true for too long. Computers as a medium was very fringe even in the early 80s.

We walked through a history of communications to gain more understanding of computers as human communication devices and started with some history of information organization and sharing.

Paul Otlet pushed efforts forward to organize all of the world’s information in the late 19th century Belgium and France.

The Coldwar Intellectuals by J Light describes how networked information moved from the government and the military to the public.

And, how that network information became interesting when it was push and pull — send an email and receive a response, or send a message on a UNIX terminal to another user and chat. Computers are social machines, not just calculating machines.

Leah took us through how the internet and early patterns of the web were formed by the time and the culture — in this case, the incredible activity of Stanford, Berkley. Mileu of the Bay Area — bits and boolean logic through psychedelics. Fred Turner’s From Counterculture to Cyberculture is a fantastic read on this scene.

Stewart Brand, Ted Nelson, the WELL online community, and so on.

We’re still talking about way before the web here. The idea of networked information was there, but we didn’t have a protocol (http) or a language (html) being used (web browser) at large scale (the web). Wired Cities by Dutton, Blumer, Kraemer sounds like a fantastic read to understand how mass wiring/communication made the a massive internet/web a possibility!

The Computer as Communication Device described by J.C.R. Licklider and Bob Taylor was a clear vision to the future — we’re still not at a place where computers understand us as humans, we’re still are fairly rigid with defined request and responses patterns.

The web was designed to access, create docs, that’s it. Early search engines and browsers exchanged discrete documents — we thought about the web as discrete, linked documents.

Then, user generated content came along — wikis, blogs, tagging, social network sites. Now it’s easy for lots of folks to create content and and the network is even more powerful as a communication tool for many people!

The next big phase came with mobile — about mid 2000s. More and more and more people!

Data subject (data cloud or data footprint) is an approach that has felt interesting recently at UCLA. Maybe it’s real-time “flows” rather than “stacks” of docs or content.

Technology as cultural material and material culture.


University of London is a fantastic space!


Jefferson Bailey, Internet Archive
Advancing access and interface for research use of web archives

Internet Archive is a massive archive! 32 Petabytes (with duplications)

And, they have search APIs!!

Holy smokes!!! Broad access to wayback without a URL!!!!!!!

IA has been working on a format called WAT. It’s about 20-25% the size of a WARC and contains just about everything (including title, headers, link) except the content. And, it’s a JSON format!

Fun experiments when you have tons of web archives!!! and US Military powerpoints are two gems!


Digital Desolation
Tatjana Seitz

A story about a homepage can be generated using its layout elements — (tables, fonts, and so on). Maybe the web counter and the alert box mark the page in time and can be used to understand the page!

Analysis of data capture cannot be purely technical, has to be socio-technical.

Digital desolation is a term that describes abandoned sites on the web. Sites that haven’t been restyled. Sites age over time. (Their wrinkles are frames and table !!?? lol)

Old sites might not bubble to the top in today’s search engines — they’re likely at the long tail of what is returned. You have to work to find good old pages.


The team grabbing some morning coffee


Ralph Schroederucla, Oxford Internet Institute
Web Archives and and theories of the web

Ralph is looking at how information is used and pursued.

How do you seek information? Not many people ask this core question. Some interesting researcher (anyone know?) in Finland does thought. He sits down with folks and asks “how do you think about getting information when you’re just sitting in your house? How does your mind seek information?”

Googlearchy — a few sites exist that dominate !

You can look down globally at which websites dominate the attention space. The idea that we’d all come together in a one global culture, that hasn’t happened yet — instead, there’s been a slow crystallization of different clusters

It used to be an anglo-ization of the web, now things may have moved to the south asian – Angela Wu talks about this.

Some measurements show that American and Chinese devote their attention to about the same bubble of websites — it might be that Americans are no more outward looking than are Chinese

We need a combined quantitative and qualitative study of web attention — we don’t access the web by typing in a URL (unless you’re in internet archive) we go to google

It’s hard to know about internet as a human right
Maybe having reliable information about health could be construed as civil rights
And unreliable, false information goes against human rights

  London is a delightful host for post-conference wanderings


Oh, dang, it’s lunch already. It’s been a fever of web archiving!

We have coverage at this year’s IIPC! What a fantastic way to attend a conference — with the depth and breadth of much of hte team!

Anastasia Aizman, Becky Cremona, Jack Cushman, Brett Johnson, Matt Phillips, and Ben Steinberg are in attendance this year.


Caroline Nyvang, Thomas Hvid Kromann & Eld Zierau
Continuing the web at large


The authors conducted a survey of 35 master thesis from University of Copenhagen found that there were 899 web refs, 26.4 web refs on avg, 0 min, 80 max.

About 80% of links in theses were not dated or loosely dated — urls without dates are not reliable for citations?

Students are not consistent when they refer to web material, even if they followed well known style guides.

The speakers studied another corpus — 10 danish academic monographs and found similar variation around citations. Maybe we can work toward a good reference style?

Form of suggested reference might be something like


PWID: Persistent web identifier #WAWeek2017
(personal consideration: i feel perplexed with the idea of new identifiers)

— Raffaele Messuti (@atomotic) June 14, 2017

Where page is the content coverage, or thing the author is citing. Fantastic!

What if we were to make the content coverage in a fragment identifier (the stuff after the # in the address? Maybe something like this,<timestamp>/<url>#<content coverage>


And totally unrelated, this fridge was spotted later that day on the streets of London. We need a fridge in LIL. Probably not worth shipping back though.


Some Author, some organization

The UK Web Archive has been actively grabbing things from the web since 2004.

Total collection of 400 TB of UK websites only, imposing a “territorial” boundary –
.uk, .scot, .cymru, etc.

Those TLDs are not everything though — if the work is made available from a website with a uk domain name or that person is physically based in uk


Fantastic first day!! Post-conference toast (with a bday cheers!)!!

Recap, decompress, and keep the mind active for day two of IIPC!

The day was full of energy, ideas, and friendly folks sharing their most meaningful work. An absolute treat to be here and share our work! Two more days to soak up!


Shelley Gullikson: UXLibs III: Conference Thoughts

planet code4lib - Thu, 2017-06-15 17:24

(This was a difficult post to write and ended up being quite personal. You might just want my UXLibs III conference notes.)

I was really really looking forward to UXLibs III. I love the UXLibs conferences and this year, I was presenting with Kristin Meyer. Kristin and I wrote an article for WeaveUX last year and it was an absolutely amazing experience. We had never met and yet the partnership was so easy; we had similar ideas about deadlines and effort, and we had similar writing styles. With UXLibs III, we were able to work together again and would finally meet in person. Exciting!

And the conference was great. The content was absolutely and completely what I’m interested in. Meeting Kristin in person and presenting together was fabulous. The other people I met were really great. The people I already knew were completely lovely. Plus, there was ceilidh dancing!

And yet… coming home, I don’t feel as fired up as I have in previous years. Is my love affair with UXLibs over?

During the conference, I had a great conversation with Bernadette Carter from Birmingham City University about the Team Challenge. She was struck by how most of us wanted to fix all the problems identified in the research documents, even the things that weren’t our responsibility—like broken plugs. She loved that we all cared so much that we wanted to fix ALL THE THINGS. But we also talked about how, back in our own libraries, it can be incredibly frustrating when we can’t fix things outside of the library’s control.

I wonder if the implicit promise of the first UXLibs was that we were learning how to fix everything for our users. We just needed to observe behaviour, ask the right questions, read a number of love letters and break-up letters and we would understand what our users needed. Then it would just be a matter of designing solutions, taking them through a few iterations and voilà! Fixed!

But we can’t fix everything for our users—for any number of reasons—and that’s hard. But UXLibs is also now a community where we can talk about it being hard.

In Andy’s opening address (partly replicated here), he talked about his struggles with having UX work either ignored or undermined at his previous workplace. I didn’t take any notes during Andy’s talk, and I think that was because I was busy thinking about how similar his themes were to what I was about to say in the presentation Kristin and I were doing immediately after Andy’s talk.

In that presentation, I talked about a UX project that didn’t go well, mostly because of the organizational culture in my library. When I look at Andy’s model of UX adoption (below), I think my library would rate even worse than his—all in the red. On top of our not-great org culture, we are going through a tremendous amount of change. I don’t see (yet) how the UX work I want to do fits. I don’t see how I fit.

This year has been difficult for me professionally. I’ve felt uninspired. I’ve felt useless. I still feel a bit adrift. UXLibs was a shining beacon in my calendar that pulled me through the winter. It was supposed to save me, I think; to help me feel inspired and useful and full of purpose again.

Having been pretty open about challenges in my library on the first morning of the conference, many of the conversations I had during the rest of the conference were related to that. So I guess it’s not surprising that, post-conference, I’m not feeling fired up with inspiration. It was incredibly helpful to share feelings of struggle, but it hasn’t created momentum for what I might do next.

Thinking about the conference keynotes, my takeaways weren’t so much ideas for doing things, but rather cautions to be more careful with, and more thoughtful about the things I do. This is not at all a negative; I think it’s a sign of maturity.

In the Question Time panel on the last day, one of the questions was whether UX was a fad. I thought it was a bit of a silly question at the time and of course none of panelists agreed that UX is a fad. But thinking about it a bit more deeply now, I think for me UX was not a fad but a new lens—a shiny one! It extended my long-held interest in human-computer interaction and usability to the larger library: physical and virtual space and services. My intro to UX coincided with a change of job, and with that change, I had newfound freedom to pursue UX work. It wasn’t a fad, but it was a new love—a great and glorious infatuation. The love isn’t gone, but I’m starting to notice the snoring and farting, and really couldn’t someone else cook dinner once in a goddamned while?

UXLibs has matured in three years, and most relationships do lose a bit of fire after the first while. My more muted reaction to the conference this year is not a reflection of anything that’s wrong with UXLibs. I’ve just got my own stuff to work out. But I’m in this for the long haul. I’ll be back next year, as excited to attend as ever. These are my people. This is my place.

David Rosenthal: Emulation: Windows10 on ARM

planet code4lib - Thu, 2017-06-15 15:00
At last December's WinHEC conference, Qualcomm and Microsoft made an announcement to which I should have paid more attention:
Qualcomm ... announced that they are collaborating with Microsoft Corp. to enable Windows 10 on mobile computing devices powered by next-generation Qualcomm® Snapdragon™ processors, enabling mobile, power efficient, always-connected cellular PC devices. Supporting full compatibility with the Windows 10 ecosystem, the Snapdragon processor is designed to enable Windows hardware developers to create next generation device form factors, providing mobility to cloud computing. The part I didn't think about was:
New Windows 10 PCs powered by Snapdragon can be designed to support x86 Win32 and universal Windows apps, including Adobe Photoshop, Microsoft Office and Windows 10 gaming titles. How do they do that? The answer is obvious: emulation! Below the fold, some thoughts.

Because of the ubiquity of the x86 instruction set, much of the work described as emulation is more correctly described as virtualization. As discussed in my report on emulation, virtualization and emulation are end-points of a spectrum; the parts that the hardware you're running does implement are virtualized and the parts it doesn't are emulated. Because ARM and x86 are completely different instruction sets, Qualcomm is at the emulation end of the spectrum. More than two decades ago, Apple used emulation to migrate from the Motorola 68000 to the PowerPC instruction set; this isn't anything new or surprising.

It is obviously in everyone's interest, except Intel's, to have more effective competition in the market for chips to run Windows than AMD has been able to provide. This is especially true given the way PC and mobile technologies are merging. Intel's consistent failure to deliver performance competitive with ARM in the mobile market and Qualcomm's ability to integrate 5G connectivity are significant.

Now, MojoKid at /. points me to Brandon Hill's Intel Fires Warning Shot At Qualcomm And Microsoft Over Windows 10 ARM Emulation In X86 Birthday Blog Post. The Intel blog post is authored by Steven Rogers, EVP and General Counsel for Intel, and Richard Uhlig, Intel Labs Fellow and Director of Systems and Software Research, and it clearly is a warning shot:
There have been reports that some companies may try to emulate Intel’s proprietary x86 ISA without Intel’s authorization. Emulation is not a new technology, and Transmeta was notably the last company to claim to have produced a compatible x86 processor using emulation (“code morphing”) techniques. Intel enforced patents relating to SIMD instruction set enhancements against Transmeta’s x86 implementation even though it used emulation.Transmeta vs. Intel was an unequal battle, and Transmeta lost (my emphasis):
On October 24, 2007, Transmeta announced an agreement to settle its lawsuit against Intel Corporation. Intel agreed to pay $150 million upfront and $20 million per year for five years to Transmeta in addition to dropping its counterclaims against Transmeta. Transmeta also agreed to license several of its patents and assign a small portfolio of patents to Intel as part of the deal. Transmeta also agreed to never manufacture x86 compatible processors again.But Microsoft+Qualcomm vs. Intel is a battle of equals, especially given Intel and Microsoft's co-dependent quasi-monopoly. It is likely to go down to the wire. If it ends up in court, is likely to clarify the legalities of using emulation significantly.

Unfortunately, the interests of preservation won't figure in any such court battle. Clearly, these interests would favor Qualcomm+Microsoft, but the favor wouldn't be returned. Their interests would have a much closer time horizon. The way this conflict plays out will have a big effect on the PC business, and on the future of emulation as a preservation strategy.

HangingTogether: Is Library IT on the Margins?

planet code4lib - Thu, 2017-06-15 14:13

Recently Ithaka S+R published an Issue Brief entitled “Finding a Way from the Margins to the Middle: Library Information Technology, Leadership, and Culture,” by Dale Askey and Lisa Janicke Hinchliffe. Their basic premise is that library information technology as an organizational unit has been pushed to the margins, and is not as centrally positioned within the library as other units. It’s a fairly provocative thesis, so I decided to bring together some IT leaders from our partner institutions to discuss it. The people who joined me were:

  • Karen Estlund, Penn State
  • Cheryl Gowing, University of Miami,
  • David Lacy, Temple University
  • Jennifer Vinopal, Ohio State University
  • Evviva Weinraub, Northwestern University

We were also informed by Jennifer’s recent keynote at the OCLC DevConnect Conference (speaker notes and slides available here), as well as a recent blog post by Lauren Magnuson at ACRL TechConnect, “Decentralizing Library IT.”

The panel thought that it is was good that Askey and Hinchliffe had started an important conversation, and we ended with the awareness that our initial conversation is really just the beginning of a process of learning more about the issues and considering ways in which we can each work to improve how we function within our institutions.

Evviva kicked off the discussion by saying that she really wanted to be talking about how to foster collaboration. “The technology work that we’re doing is about public services,” she asserted, “All of us are really in the game of engaging with and supporting public services even if some of the technology that we’re supporting is for back-end services.”

Jennifer also wanted to reframe the discussion along similar lines. “How do we think about IT not as separate from, but as part of, the library organization? How do we behave in ways through our organizational culture, training, and our communication strategies that treats IT like any of the other units in libraries — as a core service and as part of the structure and service environment that we’ve always provided in libraries?” David brought up a point of agreement with the Issue Brief. “The potential for professional growth within the library IT sector is not as well defined as the traditional librarian role,” he stated, “It’s very common to have four or five different levels of librarian, but it’s uncommon to have that kind of progression in library IT.”

In reference to Lauren Magneson’s blog post, Evviva had this to say: “Even if we still have structured IT units, there is a lot of IT work going on in other parts of the organization just naturally and these open up opportunities for more extensive communication and collaboration. This resonates a lot more with me in terms of what I’m seeing in libraries in general and the kinds of opportunities that we have across the organization to use IT innovation and experimentation as an opportunity for further engagement.”

Cheryl didn’t like the term “decentralization,” which makes it sound like you’re “breaking something up”. “Rather, she’s [Magneson] talking about building information technology skills throughout the organization,” Cheryl said, “to make that as a baseline…The core expectations for many of our positions are now changing and people are utilizing technology in their everyday work, and that can be done in collaboration with a more formal IT department…It’s more of a building than a decentralizing.”

The idea of “product owners” was brought up and how that affects the organization. These are staff outside of IT but who collaborate with IT staff on a particular project. The product owner is typically a project’s key stakeholder. Part of the product owner responsibilities is to have a vision of what they want for the tool and how it should work. This is key to successfully starting and maintaining any software project because running and maintaining these services requires regular feedback and conversation. The product owner does this in part through prioritizing features and acting as the lead user of the tool.

Regarding this role, Karen said: “When you actually get public services staff at the table and talking about what they want and need there is a kind of excitement that comes from acknowledging that there’s something really valuable they bring and they don’t need to know how to code to be able to contribute. I think there is something really valuable, empowering, and useful for your organization when you can get people to see how they can contribute to technology without necessarily having to contribute technical skills.” But she also acknowledged that this can take a lot of cultural acclimation work to get people comfortable in such new roles.

“Askey and Hinchliffe are talking about pushing IT to the center of the organization,” Jennifer said, “but I think it’s not IT that should be in the middle of the organization, I think it’s our users and our collections, and other things that we want to serve, or highlight, or preserve. That should really be at the center of our thinking.” She also said that if she were to pick one thing that should be at the center of our thinking it would be user experience, to serve as our over-arching concern.

As you can see, we had a wide-ranging conversation during the hour we shared together virtually, and we agreed that this is just the beginning of a conversation we should have as a community to help us determine how best to integrate our important information technology work within our various organizations. In the end, the question posed in the title of this post is not the right question. It should be something more like “Is Library IT positioned effectively within the organization to address our key concerns — helping our users become as effective and empowered as we can?”

We look forward to additional conversations to answer that question within a wide range of diverse institutional environments.    

Open Knowledge Foundation: The final Global Open Data Index is now live

planet code4lib - Thu, 2017-06-15 10:59

The updated Global Open Data Index has been published today, along with our report on the state of Open Data this year. The report includes a broad overview of the problems we found around data publication and how we can improve government open data. You can download the full report here.

Also, after the Public Dialogue phase, we have updated the Index. You can see the updated edition here

We will also keep our forum open for discussions about open data quality and publication. You can see the conversation here.


District Dispatch: ALA hosts librarians from Kazakhstan

planet code4lib - Thu, 2017-06-15 10:56

ALA Washington was delighted to welcome an international delegation of librarians from the Republic of Kazakhstan.

For the second time in my first five months on staff with ALA Washington, we welcomed an international delegation of librarians. Alan Inouye, Shawnda Hines, our summer Google Fellow Alisa Holahan and I were delighted to spend an hour of our morning with a group from the Republic of Kazakhstan, including:

Chocolate presented to the ALA Washington Office from the Kazakhstani delegation.

Especially for a new librarian like myself, these visits are unmatched opportunities to gain exposure to the wide range of priorities and experiences in the international library field. It is also an excellent prompt to learn more about countries I have not visited. (Do you know what Kazakhstan’s national anthem sounds like?) The Kazakhstani librarians also brought chocolate, which was a delicious surprise.

Delegations are invited to the U.S. via the Department of State’s International Visitor Leadership Program. For yesterday’s visit, our Kazakhstani friends indicated in advance that they wanted to cover the following topics:

  • The wide variety of U.S. libraries
  • Policy-making for programs and activities in American libraries
  • The role and functions of libraries and information specialists in U.S. society
  • Information technology in libraries, including online and digital services
  • Maker spaces

During our exchange, we touched on the variety of libraries in America as well as policy-making in today’s political climate, but we did not have time to cover maker spaces or information technology. The delegation was particularly interested in the structure and management of ALA and our state chapters. They shared their own experiences starting a federation of provincial libraries during the economic recession of 2008 and expressed interest in continuing to grow this new league of professionals.

The delegation plans to make several stops in Washington this week and will join the Annual meeting in Chicago to observe and learn more about American libraries, information professionals, and the management and structure of ALA.

I know I speak for my colleagues in saying that we thoroughly enjoyed our time together and that we look forward to next Wednesday when a delegation from Ethiopia will join us for what is sure to be another illuminating conversation.

The post ALA hosts librarians from Kazakhstan appeared first on District Dispatch.

DuraSpace News: AVAILABLE NOW: Fedora and Samvera Camp Curriculum

planet code4lib - Thu, 2017-06-15 00:00

Find out what will be covered in the Fedora and Samvera (formerly Hydra) Camp in the curriculum–now available. The Camp will be held at Oxford University, Sept 4 - 8, 2017. Register here, and remember that an early bird discount will be offered until July 10.

LITA: Jobs in Information Technology: June 14, 2017

planet code4lib - Wed, 2017-06-14 19:10

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Koios, Backend Developer (Contract / Part-time, Remote OK), Washington, DC

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Evergreen ILS: Evergreen to Highlight Innovative Catalog Features at ALA

planet code4lib - Wed, 2017-06-14 17:43

The Evergreen community will be highlighting some of the ways the Evergreen catalog helps users find the material that they need at the American Library Association Conference in Chicago later this month.

It’s a Popularity Contest — Innovation in the Evergreen Catalog is scheduled for 4:30 to 5:30 p.m. Saturday, June 24 in room W177 in McCormick Place. Elizabeth Thomsen of the North of Boston Library Exchange (NOBLE) will be facilitate the session.

If you’re attending ALA, please join us for this meetup. Even if you’re not attending ALA, please share this information with others in your Evergreen library / consortium so that they know of this opportunity to learn about these features and to meet with other Evergreen users and those who are interested in using Evergreen.

In this session, we’ll look at features that help users find the material they want. The use of popularity data to enhance the relevance of search results is just one new feature in the Evergreen library system catalog. In addition to recent circulations, relevancy in Evergreen can be enhanced by taking into account the number of copies held, the number of holds, or newness. The factors offer a range of possibilities that can be customized for each library’s needs and clientele, and are available in Evergreen’s current release.

Also in the most recent release, Evergreen has improved on its long-standing ability to group formats and editions. Wading through long pages of search results that are primarily different formats and editions of the same work can be tedious for users. With Evergreen, the catalog is able to help the user find the work they’re seeking in fewer screens before narrowing in on the specific edition they want. Even better, if they don’t care about an edition, Evergreen will allow them to place a hold across multiple editions of a work to make the process easier for them.

Evergreen has built upon this popular feature by adding quick links in the record to other formats and editions of the work.

Attendees are also encouraged to ask other questions about Evergreen or the open-source community that supports it.

We hope to see you there!

Islandora: 2017 Islandora AGM - July 14, 2-3 PM EST

planet code4lib - Wed, 2017-06-14 10:54
The Islandora Foundation Annual General Meeting will be held July 14th from 2:00 - 3:00 PM Eastern. We invite members institutions to designate a representative to vote at the AGM. The meeting will be held electronically via the Islandora conference call line:   Dial-in Number: (641) 715-3570   All are welcome to attend, but voting will be limited to members of the Islandora Foundation. The agenda and meeting package will be forwarded later.

Shelley Gullikson: UXLibs III: Conference Notes

planet code4lib - Tue, 2017-06-13 23:00

As usual (and I love that I have an “as usual” with UXLibs), I’m going to break my experience into two posts: this one with my notes on the specific sessions I attended, and a second one with my overall thoughts about the conference. These notes aren’t a record of the conference, just the bits that stood out most for me.

Ethical UX (Matthew Reidsma’s keynote)

I am a huge and ridiculous fan of Matthew Reidsma. I love his work, I think he is a lovely and delightful person, and he is a great keynote speaker. So I feel particularly bad that my notes on his talk are a bit scanty. I think I was too absorbed to write much.

  • Your values get encoded in your work, whether you intend it or not. So we should intentionally code our values into our services and software
    • Ask “What happens in the world if we make this thing? How will it affect people?”
    • Think of Mike Monteiro: “We need to fear the consequences of our work more than we love the cleverness of our ideas” (watch the talk that this quotation comes from)
  • Matthew recommended the book “Design for the Real World” by Victor Papanek, and its call for social and moral judgment to come before design
  • When we use personas, are they always smiling? Shouldn’t some of our personas be cranky? Or sleepy? (I didn’t note down all of Matthew’s wonderful examples of real-world personas)
  • What does your library or library website look like to a student in crisis? (I can’t remember if Matthew referenced the book “Design for Real Life” by Eric Meyer and Sara Wachter-Boettcher, but it’s all about this and is really really great)
  • When we rank search results by relevance, are these actually relevant results or results that look relevant?
  • Related searches or items (such as Summon’s “Topic Explorer” feature) can create reckless associations and whether these are intentional or not they can cause real harm to our users. (Matthew’s article “Algorithmic Bias in Library Discovery Systems” goes into a lot more depth on this)

Matthew’s presentation slides are now available.

How White is Your UX Practice? Inclusion and Diversity in Critical UX Research (Karine Larose and Simon Barron)

Karine and Simon worked on UX research for Imperial College that was nicely documented in reports they shared online, but this talk was about what didn’t make it into the reports: what they did wrong.

Essentially, they failed to attract a representative sample of their student body, which is diverse across lines of class, gender, race, and disability.

New to me was their use of a social model of disability, where it is the social environment that makes people feel disabled (see Penny Andrews’ chapter “User experience beyond ramps: The invisible problem and the special case” in the book “User Experience in Libraries: Applying Ethnography and Human-Centred Design” for more). I really like inclusive design and this seems to make it even more imperative; we can design environments that work for people or we can design environments that make people need to look for or ask for accommodations.

Karine and Simon also made the point that UX as it is often practiced can lead to majority rule; an averaging out that can erase the experience of people on the margins. They didn’t say this bit, but it would definitely be useful to explore our edge cases and determine if we’re consistently ignoring the same group(s) of people when we choose to ignore these edge cases.

Recommendations to address diversity in UX research in:

  1. Recruitment: be aware of unconscious bias when we recruit research participants, particularly when we approach users in person for quick, guerrilla-style research. Sending messages for recruitment might be a more inclusive approach, but we still need to make sure messages are visible to the full range of users and that our language is not alienating anyone.
  2. Research design: our research question needs to be clear – “the labour of understanding should be on the researcher, not on the user.” Jargon and complex methods can increase cognitive load.
  3. Actions: we should be ready to challenge non-inclusive UX research in our institutions (brava to Karine, who did just that at Imperial!). We need to seek out more and different voices in the UX world when we’re doing our own learning.  [I also have a note of “UX inclusivity framework” but no further explanation and I’m not sure exactly what that means. Perhaps creating a structure or plan for making our UX research inclusive?]

Text of Karine and Simon’s presentation is now available.


Sponsor presentations

Deirdre from EBSCO consistently gives a presentation that I find relevant and interesting. This year, she talked about her user research group and how it’s evolving within her organization.

  • “Year of Yes”: Deirdre and her colleagues said yes to every request for user research for a year. This was part of evangelizing UX throughout the organization and creating buy-in. But it was only one year, because once the word spreads, you have to make decisions about how to best use your time.
  • As her team evolved over 3 years, there were some lessons learned: 

I don’t have a team, but much of this resonates as an individual trying to do UX in my library.

My favourite bit from Deirdre’s presentation was when she talked about how draining it can be to do one-on-one research with users. She said “shared vulnerability is exhausting” and wow, that resonated with me a LOT.

Serena and Stephen from ProQuest talked about user research with their customers.

  • I didn’t find this as interesting because I know what kinds of features my colleagues ask for, and they’re often not in our users’ best interest! But of course there are features in these products that are geared toward library staff, so it’s good that library staff are involved in this part.
  • “Avoiding negative impact is as important as creating positive impact” is a great point and very relevant to Matthew’s keynote. I think we often focus on making changes that will help without also looking at whether they can harm.
Unmasking the Authentic User Experience (Amy Kimura and Heather A. Dalal)

Amy and Heather used Open Hallway to be able to conduct unmoderated user research with students. The research was my favourite kind: they asked students to search for something they needed for an assignment. They did want students to start with the library’s discovery tool, but other than that, it was completely up to the student what they searched for and how. I love watching users do searches they’re invested in; it’s so much better than forcing them through questions that library people come up with.

With Open Hallway, the students were able to record their searches on their own so they could search when and where it was convenient for them (often 1am in their own room). The students were asked to think aloud as they searched. Amy and Heather showed some video clips that seemed to indicate that students were pretty good at thinking aloud without a moderator prompting them.

The students were quite frank with their comments, and Amy and Heather said it “captured the human aspect of student research.” A few findings that caught my attention:

  • Students have an extremely low tolerance for friction. This is not new to me, but I always welcome the reminder. Students have an extremely low tolerance for friction. Especially at 1am.
  • Many students did not understand the “Get it @ [library]” links. These links lead to a link resolver that will bring them to an online version, but it was misinterpreted to mean they had to go to the physical library. (I’ve not witnessed this confusion with students at my own institution, but we do use the phrase “Get it @ Carleton U” so I’m wondering if we should look into this.)
  • When students did make it to the link resolver, they had a lot of problems understanding how to get to full text, or even if they could. (Here, I felt super smug about deciding to use Matthew Reidsma’s 360Link Reset to make our own link resolver much more user friendly. Always steal use open source solutions from Reidsma!)
  • Amy and Heather said it’s reasonable to have high expectations for our students, since they are at university, but we don’t have to make it harder for them than it has to be.

I find this last point interesting, and it reminds me of discussions about how learning about accessing information is an important part of their education and that we shouldn’t “dumb it down.” To what extent is it a valuable learning experience for students to navigate our systems to find information? We don’t want them to have to become mini-librarians, but is there really learning value here? Or is this just a convenient excuse for libraries with shitty systems (i.e. pretty much all of us)? If students could easily access articles and books and other information, would that really be a detriment to their education? I can’t see that it would.

User Experience… Our Experience (Lorna Dodd filling in for Laura Connaughton)

Lorna and Laura work at Maynooth University Library, which had a big renovation 5 years ago and they wanted to learn more about how the space is being used now.

They used students in the Masters of Anthropology program to conduct ethnographic work, and they used a variety of methods (that I did not take careful note of!). The MA students (researchers) observed and interviewed 30 research participants (students). The researchers looked at the spaces where the users study, and they also looked at how the students interacted with the library space. It was interesting to me that “where they study” and “the library space” seemed to be separate entities for most of the students.

They found that users were interpreting the space differently than library staff. The renovated space included a beautiful, open entryway that is used for events and exhibits and staff were very proud of this space. Students didn’t understand why it was there:

(I was happy to see this particular finding, but maybe that falls under confirmation bias.)

They also found that although spaces had been created to be noisier (as part of being a “modern library,” Lorna explained), this was not necessarily appreciated by students, who want more quiet space for studying. I talked about this point later with Diane Granfield from Ryerson, who is currently working on designating more quiet library space after opening a very modern, noisy library space on her campus last year. My own library’s silent spaces are very popular with our students. Quiet space can be hard to find in the world, and those who want it – or need it – value it highly.

Team Challenge

For the team challenge this year, we received reams and reams of raw data from UX research carried out at the University of Lanarkshire Library and were asked to come up with next steps.

I was getting pretty tired by this time, but even though there was a LOT of information to go through (or maybe because there was so much), the Challenge seemed easier to grasp this year because it was clearly bounded. Then again, I was obviously not thinking as creatively as the winning Exxilon team who took a step back from all the data and looked at the bigger picture (which, apparently, included Roxette).

It’s interesting (and hard!) to work so intensely with people you’ve mostly just met. I felt bad because I got a bit testy by the end of the afternoon (see: tired!), but it’s all part of the process – working as a group on a short deadline. I think we did quite well; I didn’t have any sense of panic about presenting the next morning. And we even won our heat!

No Fear: Social Responsibility and Community Archiving — Pushing Boundaries of Archival Theory: Privacy, Preserving in Real Time (Meredith Evans’ keynote)

First off, Meredith joked about the many titles of her talk. I can’t remember which was the original and which she decided suited the talk better, but I’m guessing that “No Fear” was the one she really wanted to use.

I loved Meredith’s keynote. The work she does is really inspiring – from the LGBTQ archives at the University of North Carolina at Charlotte, to Documenting Ferguson at Washington University, DocNow, and as head of the Jimmy Carter Presidential Library.

  • Look for gaps and fill them in, particularly around services for and collections about marginalized people.
  • Connect with your community most of all; if you aren’t being supported by senior management, maybe support from your community will help bring them around.
  • Be bold, be brave. In the Documenting Ferguson project, they decided to let people upload whatever items they wanted, with whatever description they wanted. There was no gate-keeping, no professional selection process. This is not how archives usually work!
  • The Ferguson archive was originally created for the Washington University community, but became the voice of the larger community (Ferguson was about 8 miles away from campus). Users of the archive included journalists and law enforcement, and this created a lot of discussion around protecting the creators of the content in the archive. My notes are a bit vague here, but it seems like those discussions helped lead to the creation of DocNow: Documenting the Now.
  • DocNow has a board that includes archivists, librarians, academics, IT folks, activists, and journalists. They talk about ethics a lot. Just because Twitter is public, is it okay to preserve tweets? To preserve handles with those tweets? Is saying something in public (speaking out loud or on Twitter) different than making a public statement (explicitly creating a public record)? Should those things all be treated equally in an archive?
  • The creators of DocNow wanted to build the tool with their users, not just for them.
  • The work can be tiring and discouraging and you have to figure out how to get yourself through and beyond this.
  • Meredith spoke a lot about the importance of ethics and protecting your users / creators. With most of her work, wanting to protect the users / creators makes the project an activist project. There is a strong sense of social responsibility.
  • The overarching message, similar to Matthew’s keynote, was the need to be (more) conscious and to take (more) care when creating tools and services for our users.
Inward UX: Workshop with Matthew Reidsma

Matthew led us through a great exercise where we paired up to plan a party. One person made suggestions and the other responded to each with “yes, but” and filled in what was usually a concern or reason why the idea wouldn’t work. After that, we did exactly the same thing, except that this time the response started with “yes, and.” “Yes, and” created a totally different, much more positive atmosphere in the room. It’s a small thing, but it made a big difference. Although of course – as someone in the room pointed out – there are times when you really have to say “yes, but.” Still, it’s good to be conscious of whether you’re but-ing or and-ing when you’re reacting to someone else’s ideas.

We then each came up with some “how might we” questions to get at issues in our organizational culture. But instead of trying to come up with ideas on our own, we gave our partners our question and they had to come up with possible answers. Working on solutions for someone else’s problem was a great way to reframe and get a new perspective. I also found it easier to generate possibilities because I was less precious about the problem and didn’t immediately discount options as unworkable.

After this, we then created a rapid prototype for one of the solutions we came up with. Rapid prototyping can help you quickly see if something might work or not, and can lead you to see what other questions still need answering.

My partner and I seemed to have a bit of an issue with a language barrier (I think we each misunderstood the other’s original question/problem), but we stayed open to each other’s ideas and in the end I was surprised to have an interesting possible solution despite the fact that it was created to solve a totally different problem! It seemed to reinforce the power of “yes, and.”

The slides for the workshop I attended are now available (also available are the slides for round 2 of the same workshop)

Experience Mapping: Workshop with Anneli Friberg and Anna Kagedal

My expectations for this workshop were unfairly high, given that it was only one hour. I was hoping for complexity that, in retrospect, could not possibly have been achieved in one hour! But Anneli and Anna were lovely and positive and their instructions were clear and I think it was a pitch-perfect intro session. Apparently I know more about experience mapping than I had thought! Which was also a useful thing to learn.

Anneli and Anna led us through completing an experience map. Once we were done, they asked us to evaluate the session using love letters/break-up letters and a technique that was new to me: the 4L method. With the 4L method, you write down what you Liked, Learned, Lacked, and/or Longed for in the session. It’s a variation on other evaluation methods I’m familiar with but I liked it quite a bit.

Panel: Question Time

I really didn’t take many notes on the panel. (So tired! Brain full!) I liked Andy talking about the need to get better at writing short summaries of our work, rather than pages and pages of reports. Meredith validated this by saying that, as a manager, she only reads the first 3 lines, so you have to have a good subject line (or title) and then make that first bit really count.

I think this was where Matthew jumped in with the very pithy “The point of the work we do is not to do the work.” The work has larger purpose: to improve things for our users. So we have to get past the documenting stage and get to the doing stage.

Andy made a comment that you don’t have to transcribe everything you record and this generated further discussion on Twitter that seemed to center around UX as practical work vs UX research (and the concomitant librarian-as-practitioner vs librarian-as-academic/researchers). Fraught! Honestly, I see and understand both sides. I think the most important thing is to be clear from the outset what your own objective is – practice or research – and then conduct yourself accordingly. Certainly not all UX work needs to be academic research, but when it is we have to do it right.


I have a lot more thoughts about the conference, beyond these notes. I’ll write those up in the next few days.

HangingTogether: 10 years, Shifting Gears

planet code4lib - Tue, 2017-06-13 16:31

In 2007, OCLC Research and the Society of American Archivists convened the Digitization Matters meeting to explore barriers preventing institutions from scaling up digitization of special collections. At that time, the Google Books project and the Open Content Alliance had opened a floodgate of digitization of books in libraries, which caused us to ask, what inhibits us from digitizing special collections materials at high volume? The report Shifting Gears: Gearing Up to Get into the Flow (written by Ricky Erway and Jennifer Schaffner) summarized those ideas for making digitized special collections more ubiquitously available.

Much has changed in the library and external landscape in the ten years since Shifting Gears was published. Amidst this change, we continue to seek how we can effectively digitize special collections and improve discovery and access, no matter the portal or platform. It is not only critical that our collections are made available on the internet, we must provide them frictionless passage to the environments where our communities seek them.

We are republishing Shifting Gears on its tenth anniversary as OCLC Research and the OCLC Research
Library Partnership reconsider our work agenda around unique and distinctive materials. Our intention is
that this report, in addition to other work we produce in the coming year, will provide a framework for
community action.

Over the coming months, we’ll hosting a discussion here on HangingTogether about how Shifting Gears does or does not hold up over time. I was joined by Michelle Light (University of Nevada, Las Vegas) and Erik Moore (University of Minnesota) at the ACRL 2017 Conference to start that conversation. We’ll continue that here and hope that you will join in!

Eric Hellman: Book Chapter on "Digital Advertising in Libraries"

planet code4lib - Tue, 2017-06-13 15:44
I've written a chapter for a book, edited by Peter Fernandez and Kelly Tilton, to be published by ACRL. The book is tentatively titled Applying Library Values to Emerging Technology: Tips and Techniques for Advancing within Your Mission. Digital Advertising in Libraries: or... How Libraries are Assisting the Ecosystem that Pays for Fake NewsTo understand the danger that digital advertising poses to user privacy in libraries, you first have to understand how websites of all stripes make money. And to understand that, you have to understand how advertising works on the Internet today.

The goal of advertising is simple and is quite similar to that of libraries. Advertisers want to provide information, narratives, and motivations to potential customers, in the hope that business and revenue will result. The challenge for advertisers has always been to figure out how to present the right information to the right reader at the right time. Since libraries are popular sources of information, they have long provided a useful context for many types of ads. Where better to place an ad for a new romance novel than at the end of a similar romance novel? Where better to advertise a new industrial vacuum pump but in the Journal of Vacuum Science and Technology? These types of ads have long existed without problems in printed library resources. In many cases the advertising, archived in libraries, provides a unique view into cultural history. In theory at least, the advertising revenue lowers the acquisition costs for resources that include the advertising.
On the Internet, advertising has evolved into a powerful revenue engine for free resources because of digital systems that efficiently match advertising to readers. Google's Adwords service is an example of such a system. Advertisers can target text-based ads to users based on their search terms, and they only have to pay if the user clicks on their ad. Google decides which ad to show by optimizing revenue—the price that the advertiser has bid times the rate at which the ad is clicked on. In 2016, Search Engine Watch reported that some search terms were selling for almost a thousand dollars per click. [Chris Lake, “The most expensive 100 Google Adwords keywords in the US,” Search Engine Watch (May 31, 2016).] Other types of advertising, such as display ads, video ads, and content ads, are placed by online advertising networks. In 2016, advertisers were projected to spend almost $75 billion on display ads; [Ingrid Lunden, “Internet Ad Spend To Reach $121B In 2014, 23% Of $537B Total Ad Spend, Ad Tech Boosts Display,” TechCrunch, (April 27, 2014).] Google's Doubleclick network alone is found on over a million websites. [“DoubleClick.Net Usage Statistics,” BuiltWith (accessed May 12, 2017). ]
Matching a user to a display ad is more difficult than search-driven ads. Without a search term to indicate what the user wants, the ad networks need demographic information about the user. Different ads (at different prices) can be shown to an eighteen-year-old white male resident of Tennessee interested in sports and a sixty-year-old black woman from Chicago interested in fashion, or a pregnant thirty-year-old woman anywhere. To earn a premium price on ad placements, the ad networks need to know as much as possible about the users: age, race, sex, ethnicity, where they live, what they read, what they buy, who they voted for. Luckily for the ad networks, this sort of demographic information is readily available, thank to user tracking.
Internet users are tracked using cookies. Typically, an invisible image element, sometimes called a "web bug," is place on the web page. When the page is loaded, the user's web browser requests the web bug from the tracking company. The first time the tracking company sees a user, a cookie with a unique ID is set. From then on, the tracking company can record the user's web usage for every website that is cooperating with the tracking company. This record of website visits can be mined to extract demographic information about the user. A weather website can tell the tracking company where the user is. A visit to a fashion blog can indicate a user's gender and age. A purchase of scent-free lotion can indicate a user's pregnancy. [Charles Duhigg, “How Companies Learn Your Secrets,” The New York Times Magazine, (February 16, 2012).] The more information collected about a user, the more valuable a tracking company's data will be to an ad network.
Many websites unknowingly place web bugs from tracking companies on their websites, even when they don't place adverting themselves. Companies active in the tracking business include AddThis, ShareThis, and Disqus, who provide functionality to websites in exchange for website placement. Other companies, such as Facebook, Twitter, and Google similarly track users to benefit their own advertising networks. Services provided by these companies are often placed on library websites. For example, Facebook’s “like” button is a tracker that records user visits to pages offering users the opportunity to “like” a webpage. Google’s “Analytics” service helps many libraries understand the usage of their websites, but is often configured to collect demographic information using web bugs from Google’s DoubleClick service.  [“How to Enable/Disable Privacy Protection in Google Analytics (It's Easy to Get Wrong!)” Go To Hellman (February 2, 2017).]
Cookies are not the only way that users are tracked. One problem that advertisers have with cookies is that they are restricted to a single browser. If a user has an iPhone, the ID cookie on the iPhone will be different from the cookie on the user's laptop, and the user will look like two separate users. Advanced tracking networks are able to connect these two cookies by matching browsing patterns. For example, if two different cookies track their users to a few low-traffic websites, chances are that the two cookies are tracking the same user. Another problem for advertisers occurs when a user flushes their cookies. The dead tracking ID can be revived by using "fingerprinting" techniques that depend on the details of browser configurations. [Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, and Claudia Diaz, “The Web Never Forgets: Persistent Tracking Mechanisms in the Wild.” In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 674-689. DOI] Websites like Google, Facebook, and Twitter are able to connect tracking IDs across devices based on logins. 
Once a demographic profile for a user has been built up, the tracking profile can be used for a variety of ad-targeting strategies. One very visible strategy is "remarketing." If you've ever visited a product page on an e-commerce site, only to be followed around the Internet by advertising for that product, you've been the target of cookie-based remarketing.
Ad targeting is generally tolerated because it personalizes the user's experience of the web. Men, for the most part, prefer not to be targeted with ads for women’s products. An ad for a local merchant in New Jersey is wasted on a user in California. Prices in pounds sterling don't make sense to users in Nevada. Most advertisers and advertising networks take care not to base their ad targeting on sensitive demographic attributes such as race, religion, or sexual orientation, or at least they try not to be too noticeable when they do it.
The advertising network ecosystem is a huge benefit to content publishers. A high traffic website has no need of a sales staff—all they need to do is be accepted by the ad networks and draw users who either have favorable demographics or who click on a lot of ads. The advertisers often don't care about what websites their advertising dollars support. Advertisers also don't really care about the identity of the users, as long as they can target ads to them. The ad networks don't want information that can be traced to a particular user, such as email address, name or home address. This type of information is often subject to legal regulations that would prevent exchange or retention of the information they gather, and the terms of use and so-called privacy policies of the tracking companies are careful to specify that they do not capture personally identifiable information. Nonetheless, in the hands of law enforcement, an espionage agency, or a criminal enterprise, the barrier against linking a tracking ID to the real-world identity of a user is almost non-existent.
The amount of information exposed to advertising networks by tracking bugs is staggering. When a user activates a web tracker, the full URL of the referring page is typically revealed. The user's IP address, operating system, and browser type is sent along with a simple tracker; the JavaScript trackers that place ads typically send more detailed information.  It should be noted that any advertising enterprise requires a significant amount of user information collection; ad networks must guard against click-jacking, artificial users, botnet activity and other types of fraud. [Samuel Scott, “The Alleged $7.5 Billion Fraud in Online Advertising,” Moz, (June 22, 2015).] is a good example of a content site supported by advertising placed through advertising networks. A recent visit to the Breitbart home page turned up 19 advertising trackers, as characterized by Ghostery: [Ghostery is a browser plugin that can identify and block the trackers on a webpage.]
  • 33Across
  • [x+1]
  • AddThis
  • adsnative
  • Amazon Associates
  • DoubleClick
  • eXelate
  • Facebook Custom Audience
  • Google Adsense
  • Google Publisher Tags
  • LiveRamp
  • Lotame
  • Perfect Market
  • PulsePoint
  • Quantcast
  • Rocket Fuel
  • ScoreCard Research Beacon
  • Taboola
  • Tynt

While some of these will be familiar to library professionals, most of them are probably completely unknown, or at least their role in the advertising industry may be unknown. Amazon, Facebook and Google are the recognizable names on this list; each of them gathers demographic and transactional data about users of libraries and publishers. AddThis, for example, is a widget provider often found on library and publishing sites. They don't place ads themselves, but rather, they help to collect demographic data about users. When a library or publisher places the AddThis widget on their website, they allow AddThis to collect demographic information that benefits the entire advertising ecosystem. For example, a visitor to a medical journal might be marked as a target for particularly lucrative pharmaceutical advertising.
Another tracker found on Breitbart is Taboola. Taboola is responsible for the "sponsored content" links found even on reputable websites like Slate or Taboola links go to content that is charitably described as clickbait and is often disparaged as "fake news." The reason for this is that these sites, having paid for advertising, have to sell even more click-driven advertising. Because of its links to the Trump Administration, Breitbart has been the subject of attempts to pressure advertisers to stop putting advertising on the site.  A Twitter account for "Sleeping Giants" has been encouraging activists to ask businesses to block Breitbart from placing their ads. [Osita Nwanevu, “‘Sleeping Giants’ Is Borrowing Gamergate’s Tactics to Attack Breitbart,” Slate, December 14, 2016.] While several companies have blocked Breitbart in response to this pressure, most companies remain unaware of how their advertising gets placed, or that they can block such advertising. [Pagan Kennedy, “How to Destroy the Business Model of Breitbart and Fake News,” The New York Times (January 7, 2017).] 
I'm particularly concerned about the medical journals that participate in advertising networks. Imagine that someone is researching clinical trials for a deadly disease. A smart insurance company could target such users with ads that mark them for higher premiums. A pharmaceutical company could use advertising targeting researchers at competing companies to find clues about their research directions. Most journal users (and probably most journal publishers) don't realize how easily online ads can be used to gather intelligence as well as to sell products.
It's important to note that reputable advertising networks take user privacy very seriously, as their businesses depend on user acquiescence. Google offers users a variety of tools to "personalize their ad experience." [If you’re logged into Google, the advertising settings applied when you browse can be viewed and modified.] Many of the advertising networks pledge to adhere to the guidance of the "Network Advertising Initiative" [The NAI Code and Enforcement Program: An Overview,”],  an industry group.  However, the competition in the web-advertising ecosystem is intense, and there is little transparency about enforcement of the guidance. Advertising networks have been shown to spread security vulnerabilities and other types of malware when they allow JavaScript in advertising payloads. [Randy Westergren, “Widespread XSS Vulnerabilities in Ad Network Code Affecting Top Tier Publishers, Retailers,” (March 2, 2016).]
Given the current environment, it's incumbent on libraries and the publishing industry to understand and evaluate their participation in the advertising network ecosystem. In the following sections, I discuss the extent of current participation in the advertising ecosystem by libraries, publishers, and aggregators serving the library industry.PublishersAdvertising is a significant income stream for many publishers providing content to libraries. For example, the Massachusetts Medical Society, publisher of the New England Journal of Medicine, takes in about $25 million per year in advertising revenue. Outside of medical and pharmaceutical publishing, advertising is much less common. However, advertising networks are pervasive in research journals.
In 2015, I surveyed the websites of twenty of the top research journals and found that sixteen of the top twenty journals placed ad network trackers on their websites. [“16 of the Top 20 Research Journals Let Ad Networks Spy on Their Readers,” Go To Hellman (March 12, 2015). ]Recently, I revisited the twenty journals to see if there had been any improvement. Most of the journals I examined had added tracking on their websites. The New England Journal of Medicine, which employed the most intense reader tracking of the twenty, is now even more intense, with nineteen trackers on a web page that had "only" fourteen trackers two years ago. A page from Elsevier's Cell went from nine to sixteen trackers. [“Reader Privacy for Research Journals is Getting Worse,” Go To Hellman (March 22, 2017). ] Intense tracking is not confined to subscription-based health science journals; I have found trackers on open access journals, economics journals, even on journals covering library science and literary studies.
It's not entirely clear why some of these publishers allow advertising trackers on their websites, because in many cases, there is no advertising. Perhaps they don’t realize the impact of tracking on reader privacy. Certainly, publishers that rely on advertising revenue need to carefully audit their advertising networks and the sorts of advertising that comes through them. The privacy commitments these partners make need to be consistent with the privacy assurances made by the publishers themselves. For publishers who value reader privacy and don't earn significant amounts from advertising, there's simply no good reason for them to continue to allow tracking by ad networks.VendorsThe library automation industry has slowly become aware of how the systems it provides can be misused to compromise library patron privacy. For example, I have pointed out that cover images presented by catalog systems were leaking search data to Amazon, which has resulted in software changes by at least one systems vendor. [“How to Check if Your Library is Leaking Catalog Searches to Amazon,” Go To Hellman (December 22, 2016).] These systems are technically complex, and systems managers in libraries are rarely trained in web privacy assessment. Development processes need to include privacy assessments at both component and system levels.LibrariesThere is a mismatch between what libraries want to do to protect patron privacy and what they are able to do. Even when large amounts of money are at stake, there is often little leverage for a library to change the way a publisher delivers advertising bearing content. Nonetheless, together with cooperating IT and legal services, libraries have many privacy-protecting options at their disposal. 
  1. Use aggregators for journal content rather than the publisher sites. Many journals are available on multiple platforms, and platforms marketed to libraries often strip advertising and advertising trackers from the journal content. Reader privacy should be an important consideration in selecting platforms and platform content.
  2. Promote the use of privacy technologies. Privacy Badger is an open-source browser plugin that knows about, and blocks tracking of, users. Similar tools include uBlock Origin, and the aforementioned Ghostery.
  3. Use proxy-servers. Re-writing proxy servers such as EZProxy are typically deployed to serve content to remote users, but they can also be configured to remove trackers, or to forcibly expire tracking cookies. This is rarely done, as far as I am aware.
  4. Strip advertising and trackers at the network level. A more aggressive approach is to enforce privacy by blocking tracker websites at the network level. Because this can be intrusive (it affects subscribed content and unsubscribed content equally) it's appropriate mostly for corporate environments where competitive-intelligence espionage is a concern.
  5. Ask for disclosure and notification. During licensing negotiations, ask the vendor or publisher to provide a list of all third parties who might have access to patron clickstream data. Ask to be notified if the list changes. Put these requests into requests for proposals. Sunlight is a good disinfectant.
  6. Join together with others in the library and publishing industry to set out best practices for advertising in web resources.
ConclusionThe widespread infusion of the digital advertising ecosystem into library environments presents a new set of challenges to the values that have been at the core of the library profession. Advertising trackers introduce privacy breaches into the library environment and help to sustain an information-delivery channel that operates without the values grounding that has earned libraries and librarians a deep reserve of trust from users. The infusion has come about through a combination of commercial interest in user demographics, consumer apathy about privacy, and general lack of understanding of a complex technology environment. The entire information industry needs to develop understanding of that environment so that it can grow and evolve to serve users first, not the advertisers.

David Rosenthal: Crowd-sourced Peer Review

planet code4lib - Tue, 2017-06-13 15:00
At Ars Technica, Chris Lee's Journal tries crowdsourcing peer reviews, sees excellent results takes off from a column at Nature by a journal editor, Benjamin List, entitled Crowd-based peer review can be good and fast. List and his assistant Denis Höfler have come up with a pre-publication peer-review process that, while retaining what they see as its advantages, has some of the attributes of post-publication review as practiced, for example, by Faculty of 1000. See also here. Below the fold, some commentary.

List's column describes an experiment in which he compared conventional pre-publication reviewing with what he and Höfler call "selected crowd-sourced peer review" of the same papers:
I am not proposing what is sometimes referred to as crowdsourced reviewing, in which anyone can comment on an openly posted manuscript. I believe that anonymous feedback is more candid, and that confidential submissions give authors space to decide how to revise and publish their work. I envisioned instead a protected platform whereby many expert reviewers could read and comment on submissions, as well as on fellow reviewers’ comments. This, I reasoned, would lead to faster, more-informed editorial decisions.The experiment worked like this. They:
recruited just over 100 highly qualified referees, mostly suggested by our editorial board. We worked with an IT start-up company to create a closed online forum and sought authors’ permission to have their submissions assessed in this way. Conventional peer reviewers evaluated the same manuscripts in parallel. After an editorial decision was made, authors received reports both from the crowd discussion and from the conventional reviewers. ... we put up two manuscripts simultaneously and gave the crowd 72 hours to respond.The results were encouraging:
Each paper received dozens of comments that our editors considered informative. Taken together, responses from the crowd showed at least as much attention to fine details, including supporting information outside the main article, as did those from conventional reviewers. ... So far, we have tried crowd reviewing with ten manuscripts. In all cases, the response was more than enough to enable a fair and rapid editorial decision. Compared with our control experiments, we found that the crowd was much faster (days versus months), and collectively provided more-comprehensive feedback. The authors liked the new process. They plan to switch their journal to it, tweaking it as they gain experience.

As I've been saying since the first post to this blog more than a decade ago, conventional pre-publication review is long overdue for a revolution. Chris Lee's Ars Technica piece is well worth reading. He describes List and Höfler's experiment in the context of a broader discussion of the problems of conventional per-publication peer review:
The utter randomness of peer review is frustrating for everyone. Papers get delayed, editors get frustrated, the responsible reviewers get overloaded. Even when everyone is trying their best, any set of three reviewers can disagree so wholeheartedly about your work that the editor has to actually think about a decision—something no editor ever wants to be faced with.But, more interestingly, Lee looks at the peer-review process from a signal-processing viewpoint:
I'd suggest that there is a physical analog to traditional peer review, called noise. Noise is not just a constant background that must be overcome. Noise is also generated by the very process that creates a signal. The difference is how the amplitude of noise grows compared to the amplitude of signal. For very low-amplitude signals, all you measure is noise, while for very high-intensity signals, the noise is vanishingly small compared to the signal, even though it's huge compared to the noise of the low-amplitude signal. Our esteemed peers, I would argue, are somewhat random in their response, but weighted toward objectivity. Using this inappropriate physics model, a review conducted by four reviewers can be expected (on average) to contain two responses that are, basically, noise. By contrast, a review by 100 reviewers may only have 30 responses that are noise.It might seem that this simply multiplies the work demanded from already-overloaded reviewers. But Lee ingeniously and credibly argues that this isn't the case. This argument is the best part of the piece, and I urge you to stop reading me and start reading Lee.


For those of you too lazy to take my advice, here is the Cliff Notes version. A larger number of reviewers brings a broader range of expertise to the review. Since they can all see each other's contributions, each reviewer can focus on the part that matches their specific expertise, and thus avoid the need to cover areas with which they are less familiar, and which thus involve more work in validating references, using less familiar analysis tools, etc.

DPLA: Digital Maryland Collections Now Discoverable in DPLA

planet code4lib - Tue, 2017-06-13 14:25

Our collections and our partner network are growing! The collections of our newest hub, Digital Maryland, are now searchable in the Digital Public Library of America (DPLA) alongside millions of resources from partners across the country. The new Maryland Service Hub represents a collaborative effort between Digital Maryland, a statewide digitization program, and University System of Maryland & Affiliated Institutions (USMAI), a consortium of seventeen public colleges and universities across the state. Through the efforts of Digital Maryland and USMAI, over 83,000 new resources from public libraries, museums, historical societies, and college libraries are now available via DPLA.   

Digital Maryland offers a unique and rich array of materials that speak to the distinctive history of the state, the Chesapeake region, and its people, as well as to national history and culture. Explore the development of the nation’s earliest railroads through the B&O Railroad Museum collection, dive into the life and letters of one of American literature’s most intriguing writers with Enoch Pratt Free Library’s Edgar Allan Poe collection, and learn how women took charge of Maryland’s farms during World War I in Montgomery County Historical Society’s Woman’s Land Army of America collection–and that’s just a preview!

You will also find new materials to support your next research project, such as resources on African American life in Maryland, nineteenth-century diaries and personal correspondence, photographs documenting everyday life on Maryland’s Eastern Shore, and critical documents that help reveal the lives of enslaved people and families in the decades before the Civil War.

Take a look below at some of the new materials from Digital Maryland and start exploring today!

Interested in locomotive transportation? Check out the earliest trains of the Baltimore and Ohio Railroad, founded in 1828, which helped the growing city of Baltimore compete as a major trade and transportation hub in the nineteenth century. The B&O Railroad Museum collection contains over 150 images showcasing the development one of the first railroads in the United States. In this photograph you can see B&O Railroad Class A-B Locomotive, built by Baltimore foundry A. and W. Denmead and Sons in 1857, courtesy of B&O Railroad Museum via Digital Maryland.


The Views of African American Life in Maryland collection from Enoch Pratt Free Library provides insight into the social, economic, and political lives of African Americans in Maryland from 1890s to the 1970s through over 200 photographs. In this photograph, a young boy takes every opportunity to read, even while holding the reins for the horse-drawn Enoch Pratt Free Library book wagon as it visited Dallas Street, Baltimore in 1945.




Montgomery County Historical Society’s Women’s Land Army of America, 1918-1923 collection documents the experiences of the women who chipped in and took charge at local farms during World War I. In this photograph, “Farmerette” Miss Jane McIlheney from Washington, D.C., wrangles a calf on a farm in Rockville, Maryland around 1918, courtesy of Montgomery County Historical Society via Digital Maryland.






Explore the genius of poet and author Edgar Allan Poe, who called Baltimore home for many years, through the Enoch Pratt Free Library’s collection of letters and artifacts from his life. The most Poe-etic artifact in the collection dedicated to a man best known for the macabre? Perhaps this framed lock of hair from both Poe and his young wife Virginia Clemm Poe.



Organized to commemorate the sesquicentennial of the Civil War, Digital Maryland’s Civil War in Your Attic collection represents a statewide effort to locate, digitize, and preserve treasured materials documenting the Civil War and Reconstruction, held in private collections across the state. Among a trove of letters, diaries, photographs, reports, and more, you will find the discharge record for African American Union soldier George Washington, who served with the Massachusetts 55th Regiment, the second all-black regiment from the North organized after the issuance of the Emancipation Proclamation.


Howard County Historical Society’s collection of manumissions, indentures, and bills of sale document the lives of some of the men, women, and families enslaved in Howard County before and during the Civil War. These records include several manumissions deeds for enslaved men who were granted freedom in return for military service for the Union during the Civil War, like John Anderson, who was manumitted by enslaver Arianna Banks on May 4, 1864.





Welcome, Maryland!

Open Knowledge Foundation: A Data Driven Look at Refugee Crime in Hong Kong

planet code4lib - Tue, 2017-06-13 10:00

This blog post is part of our summer series featuring updates from local groups across the Open Knowledge Network and was submitted by Open Data Hong Kong (ODHK). It was first published on Open Data Hong Kong (ODHK)’s website and has been written by Robert Porsch, a PhD student studying statistical genetics at the University of Hong Kong, and has a general interest in all kinds of data analysis.

Open Data Hong Kong participated in the 2017 International Open Data Day hackathon organised at the City University of Hong Kong. Some of the ODHK team pitched a project looking at refugee crime data (see the hackpad and our slides), which is being used as a political football in Hong Kong at the moment, despite there being a big data vacuum in this area. With no relevant data available from, we’ve been forced to try and gather disparate data from FOI requests and other sources to assist local NGO’s such as Justice Centre Hong Kong in their policy work. This project attempted to fill some of the gaps and visualise this data.

Arrests of Refugees in Hong Kong. Is there the “surge” the media is portraying?

Like many societies in recent times, Hong Kong is having a heated discussion about immigration, with regards to refugees especially. A common belief here is that refugees commit more crime than the general population and that most criminals are of South East Asian ethnicity. Further, some have suggested that the increase in refugees has led to a general increase in crime within Hong Kong. This has led to strong comments by some politicians (e.g. Dominic Lee in Sham Shui Po calling for internment camps). However, there is surprisingly little public data available to base these on.

Therefore, Open Data Hong Kong has attempted to acquire some data on the topic, especially Scott Edmunds who had to spend a lot of time collecting the data by contacting individual police districts and police regions in Hong Kong through Code on Access to Information requests (the Hong Kong equivalent to FOI). So here I will take a look at the data and see if I can find some answers.

What does the data say? 

It is important to note something of significance about refugees in Hong Kong; I was unable to find accurate figures on the total numbers of Asylum Seekers in Hong Kong, although according to the immigration department, there were around 9,618 people registered and claiming asylum in Hong Kong in 2014, 10,922 in 2015, and 9,981 in 2016. Official numbers will not count the unregistered and do not provide demographic breakdowns. Hong Kong never joined the UN Convention on the Status of Refugees, and Asylum Seekers can only apply under the UN Convention Against Torture, or at least cite it as a reason for protection.

Furthermore, the recognition rate is very low. About 0.16% of applicants are accepted (the global average is 27%). The application process is quite slow as well. This results in many applicants staying in the city for years without the possibility of deportation since asylum seeker whose application have been rejected cannot be deported due to in many cases a lack of extradition agreement with the corresponding home countries. During their stay,  applicants as well as those who are rejected, are not allowed to work, but the government provides some minimal rental, food and medical subsidy (For example, Hong Kong allocated HK$450 Million in the budget of 2013/2014). Some have suggested that these subsidies are too low to maintain a living in Hong Kong and provide incentives to be involved with criminal activities. The majority of claimants are from South and Southeast Asia.

To assess crimes committed by refugees in Hong Kong, I took a look at the data provided by Open Data Hong Kong, as well as publicly available census data and crime statistics. Unfortunately, not all police districts in Hong Kong were able to provide the criminal statistics of refugees. In fact, only West Kowloon region was able to provide a complete picture across their district. Furthermore, these numbers are arrest statistics and not convictions (ODHK has collected data showing roughly 50% of arrests result in convictions). So any conclusions should be viewed with care.

Has there been an increase in arrests with the presence of refugees?

This question is relatively easy to answer and I have plotted the overall number of arrests for each region by year below.

As you can see there seems to be no overall dramatic increase in arrests for all of the regions. However, there is a slight increase in crime Kowloon East and West, but in general the trend points downwards. This would suggest crime in Hong Kong is not increasing.

Arrests of refugees

Since I only have limited data available about refugees in Hong Kong, I was only able to look at Kowloon West Region. Hence I compared the number of arrests of refugees with the total number of arrests within this region.

Let me explain the plotted graph above in a bit more detail. I used data available for 2014 and 2015. Also, Hong Kong does not use the word ‘refugee’ because the territory has not signed the UN Refugee Convention, so the exact legal classifications are a bit vague. Nevertheless, some police stations have called refugees “Form 8” (F8) holders so I will use this phrase here as well. Thus, the graph above shows the number of F8 holders arrested between 2014 and 2015 in the Kowloon West region.

So comparably, those arrest rates look quite small. Indeed in 2014 and 2015, the proportion of arrests of F8 holders was 4% and 5% respectively. So these numbers seem rather stable and would suggest no major change between 2014 and 2015, despite a slight increase in the number of refugees.

Do refugees commit more crime than others?

This question turned out to be much more difficult to answer than I thought. One problem is that I do not know how many refugees live in Kowloon West, further police districts are not the same as council districts. This makes it difficult to get a population estimate since the census data from 2011 only looked at council districts. Thus I am unable to answer this question with the current data. Only the availability of the exact arrest numbers of refugees for the whole of Hong Kong or the exact numbers of refugees living in Kowloon would help to answer this question.


There is no evidence of an increase in crime in Hong Kong (at least from the available data), also there seems to be a slight increase from 2014 to 2015 (looks more like random noise to me). Arrests of F8 holders was relatively stable between 2014 and 2015. Intuitively I think the proportions of arrests of F8 holders are higher than one would expect given a small population of around 10,000 but one needs to keep in mind that arrests are not convictions. In general, the data is not really sufficient to make a conclusive statement. Except that HK is incredibly safe compared to other major cities (0.4 murders per 100,000 people in 2016; one of the lowest in the world).

For more information about the work of Open Data Hong Kong, email the team:; leave a comment on their Google+ page or join their regular meetups or events and engage with the community.

DuraSpace News: VIVO Updates for June 11–Vitro, and Why Ontology?

planet code4lib - Tue, 2017-06-13 00:00

Vitro  Vitro is the engine that makes VIVO work.  We don't talk about it much.  That's a shame.  Everyone should know about Vitro.  One could even say that VIVO is a customized Vitro.

Code4Lib: Code4Lib Journal Issue 38 Call for Papers

planet code4lib - Mon, 2017-06-12 15:17
Topic: journal

The Code4Lib Journal (C4LJ) exists to foster community and share information among those interested in the intersection of libraries, technology, and the future.

We are now accepting proposals for publication in our 38th issue. Don't miss out on this opportunity to share your ideas and experiences. To be included in the 38th issue, which is scheduled for publication in mid October, 2017, please submit articles, abstracts, or proposals at or to by Friday, Friday, July 14, 2017. When submitting, please include the title or subject of the proposal in the subject line of the email message.

C4LJ encourages creativity and flexibility, and the editors welcome submissions across a broad variety of topics that support the mission of the journal. Possible topics include, but are not limited to:

  • Practical applications of library technology (both actual and hypothetical)
  • Technology projects (failed, successful, or proposed), including how they were done and challenges faced
  • Case studies
  • Best practices
  • Reviews
  • Comparisons of third party software or libraries
  • Analyses of library metadata for use with technology
  • Project management and communication within the library environment
  • Assessment and user studies

C4LJ strives to promote professional communication by minimizing the barriers to publication. While articles should be of a high quality, they need not follow any formal structure. Writers should aim for the middle ground between blog posts and articles in traditional refereed journals. Where appropriate, we encourage authors to submit code samples, algorithms, and pseudo-code. For more information, visit C4LJ's Article Guidelines or browse articles from the first 36 issues published on our website:

Remember, for consideration for the 38th issue, please send proposals, abstracts, or draft articles to no later than Friday, July 14, 2017.

Send in a submission. Your peers would like to hear what you are doing.

Code4Lib Journal Editorial Committee

FOSS4Lib Recent Releases: veraPDF - 1.6.2

planet code4lib - Mon, 2017-06-12 14:42

Last updated June 12, 2017. Created by Peter Murray on June 12, 2017.
Log in to edit this page.

Package: veraPDFRelease Date: Monday, June 5, 2017

FOSS4Lib Recent Releases: Fedora Repository - 4.7.3

planet code4lib - Mon, 2017-06-12 13:59

Last updated June 12, 2017. Created by Peter Murray on June 12, 2017.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Wednesday, June 7, 2017


Subscribe to code4lib aggregator