You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 58 min 28 sec ago

David Rosenthal: BITAG on the IoT

Thu, 2016-12-01 16:00
The Broadband Internet Technical Advisory Group, an ISP industry group, has published a technical working group report entitled Internet of Things (IoT) Security and Privacy Recommendations. It's a 43-page PDF including a 6-page executive summary. The report makes a set of recommendations for IoT device manufacturers:
In many cases, straightforward changes to device development, distribution, and maintenance processes can prevent the distribution of IoT devices that suffer from significant security and privacy issues. BITAG believes the recommendations outlined in this report may help to dramatically improve the security and privacy of IoT devices and minimize the costs associated with collateral damage. In addition, unless the IoT device sector—the sector of the industry that manufactures and distributes these devices—improves device security and privacy, consumer backlash may impede the growth of the IoT marketplace and ultimately limit the promise that IoT holds.Although the report is right that following its recommendations would "prevent the distribution of IoT devices that suffer from significant security and privacy issues" there are good reasons why this will not happen, and why even if it did the problem would persist. The Department of Homeland Security has a similar set of suggestions, and so does the Internet Society, both with the same issues. Below the fold I explain, and point out something rather odd about the BITAG report. I start from an excellent recent talk.

I've linked before to the work of Quinn Norton. A Network of Sorrows: Small Adversaries and Small Allies is a must-read talk she gave at last month's examining the reasons why the Internet is so insecure. She writes:
The pre­dic­tions for this year from some analy­sis is that we’ll hit seventy-five bil­lion in ran­somware alone by the end of the year. Some esti­mates say that the loss glob­al­ly could be well over a tril­lion this year, but it’s hard to say what a real num­ber is. Because in many ways the­se fig­ures can’t touch the real cost of inse­cu­ri­ty on the Internet. The cost of humil­i­a­tion and iden­ti­ty theft and pri­va­cy trad­ed away. The lost time, the wor­ry. The myr­i­ads of tiny per­son­al tragedies that we’ll nev­er hear about.These large numbers conflict with estimates from companies as to the cost of insecurity. As I mentioned in You Were Warned, Iain Thomson at The Register reported that:
A study by the RAND Corporation, published in the Journal of Cybersecurity, looked at the frequency and cost of IT security failures in US businesses and found that the cost of a break-in is much lower than thought – typically around $200,000 per case. With top-shelf security systems costing a lot more than that, not beefing up security looks in some ways like a smart business decision.

Romanosky analyzed 12,000 incident reports and found that typically they only account for 0.4 per cent of a company's annual revenues. That compares to billing fraud, which averages at 5 per cent, or retail shrinkage (ie, shoplifting and insider theft), which accounts for 1.3 per cent of revenues.Note, however, that 0.4% of global corporate revenue is still a whole lot of money flowing to the bad guys. The reason for the apparent conflict is that, because companies are able to use Terms of Service to disclaim liability, the costs fall largely on the (powerless) end user. Norton uses an example:
One media report in the US esti­mat­ed 8,500 schools in America have been hit with ran­somware this year. Now, the rea­son why I think it’s real­ly inter­est­ing to point out the American fig­ures here is this is also a nation­al sys­tem where as of last year, half of all stu­dents in US pub­lic schools qual­i­fy for pover­ty assis­tance. Those are the peo­ple pay­ing the­se ran­somwares. And it’s hard to get a real fig­ure because most schools are hid­ing this when it hap­pens.Her audience was people who can fix the security problems:
most peo­ple who are pulling a pay­check in this field are not inter­act­ing with the pain that most peo­ple are expe­ri­enc­ing from net­work inse­cu­ri­ty. Because you end up work­ing for peo­ple who pay. ... That high school can’t afford any­one in this room. And that means that so much of this pain and inse­cu­ri­ty in the world isn’t read­i­ly vis­i­ble to the peo­ple who work in the field, who are sup­posed to be fix­ing it.The potential fixers are not putting themselves in the shoes of those suffering the problem:
Because in the end, one of the con­flicts that comes up over this, one of the rea­sons why users are seen as a point of inse­cu­ri­ty, is because get­ting the job done is more impor­tant than get­ting it done secure­ly. And that will always be in con­flict.This is where Norton's talk connects to the BITAG report. The report's recommendations show no evidence of understanding how things look to either the end users, who are the ISP's customers, or to the manufacturers of IoT devices.

First, the view from the ISP's customers. They see advertising for, webcam baby monitors or internet-enabled door-locks. They think it would be useful to keep an eye on baby or open their front door from wherever they are using their smartphone. They are not seeing:
WARNING: everyone on the Internet can see your baby!or:
WARNING: this allows the bad guys to open your front door!They may even know that devices like this have security problems, but they have no way to know whether one device is more secure than another and, lets face it, none of these devices is actually "secure" compared to things people think of as secure, such as conventional door locks. They all have vulnerabilities that, with the passage of time, will be exploited. Even if the vendor followed the BITAG recommendations, there would be windows of time between the bad guys finding the vulnerability and the vendor distributing a patch when the bad guys would be exploiting it.

They are definitely not seeing a warning on the router they got from their ISP saying:
WARNING: this router gives the bad guys the password to your bank account!After all, they pretty much have to trust their ISP. Nor are they seeing:
WARNING: This device can be used to attack major websites!Even if the customer did see this warning, the fate of major websites is not the customer's problem.

Customers aren't seeing these warnings because no-one in the IoT device supply chain knows that these risks exist, nor is anyone motivated to find out. Even if they did know they wouldn't be motivated to tell the end user either prior to purchase, because it would discourage the purchase, or after the purchase, because thanks to Terms of Service it is no longer the vendor's problem.

Expecting end users to expend time and effort fixing the security issues of their IoT devices before disaster strikes is unrealistic. As Norton writes:
If you are sit­ting in this room, to some degree peo­ple are pay­ing you to use a long pass­word. People are pay­ing you to wor­ry about key man­age­ment. If you are a trash col­lec­tor or radi­ol­o­gist or a lawyer, this takes away from your work day.Second, the view from the IoT device manufacturer. In June 2014 my friend Jim Gettys, who gained experience in high-volume low-cost manufacturing through the One Laptop Per Child project and the OpenWrt router software effort, gave a talk at Harvard's Berkman Center entitled (In)Security in Home Embedded Devices. It set out the problems IoT device manufacturers have in maintaining system security. It, or Bruce Schneier's January 2014 article The Internet of Things Is Wildly Insecure — And Often Unpatchable that Jim inspired are must-reads.

The IoT device supply chain starts with high-volume, low-margin chip vendors, who add proprietary "binary blobs" to a version of Linux. Original device manufacturers (ODMs), again a low-margin business, buy the chips and the software and build a board. The brand-name company buys the board, adds a user interface, does some quality assurance, puts it in a box and ships it. Schneier explains:
The problem with this process is that no one entity has any incentive, expertise, or even ability to patch the software once it’s shipped. The chip manufacturer is busy shipping the next version of the chip, and the ODM is busy upgrading its product to work with this next chip. Maintaining the older chips and products just isn’t a priority.The result is:
the software is old, even when the device is new. For example, one survey of common home routers found that the software components were four to five years older than the device. The minimum age of the Linux operating system was four years. The minimum age of the Samba file system software: six years. They may have had all the security patches applied, but most likely not. No one has that job. Some of the components are so old that they’re no longer being patched.Because the software is old, many of its vulnerabilities will have been discovered and exploited. No-one in the supply chain has the margins to support life-long software support, quality assurance and distribution. Even it were possible to provide these functions, a competitor providing them would price them selves out of the market. The BITAG recommendations would work in a different world, but in this one the supply chain has no ability nor resources to implement them.

Bruce Schneier recently testified to the House Energy & Commerce Committee, pointing out the reason why, even if the BITAG recommendations were in effect, the problem wouldn't be solved:
These devices are a lower price margin, they’re offshore, there’s no teams. And a lot of them cannot be patched. Those DVRs are going to be vulnerable until someone throws them away. And that takes a while. We get security [for phones] because I get a new one every 18 months. Your DVR lasts for five years, your car for 10, your refrigerator for 25. I’m going to replace my thermostat approximately never. So the market really can’t fix this.There are already enough insecure IoT devices on the network to bring down the Internet. Millions more are being added every week. And they aren't going away any time soon.

So, to conclude, what is odd about the report? As far as I can see, there is nothing in the report from the Broadband Internet Technical Advisory Group about what the Broadband Internet industry can do to fix the security issues the report raises. It lays the blame for the problem squarely on the IoT device industry. Very convenient, no?

There clearly are things the broadband industry could do to help. Intel's Schrecker has made one proposal, but it is equally impractical:
As for coping with the threat we face now, courtesy of millions of pathetically insecure consumer IoT devices, Schrecker’s proposed solution sounds elegantly simple, in theory at least: “Distribute, for example, gateways. Edge gateways that can contain a DDoS and are smart enough to talk to each other and help contain them that way.”ISPs haven't deployed even the basic BCP38 filtering, which would ensure that packets had valid source addresses, and thus make DDoS attacks traceable. But they're going to buy and deploy a whole lot of new hardware? Note that the Mirai DDoS botnet technology has recently been upgraded to spoof source addresses:
Propoet also advertised another new feature, which is the ability to bypass some DDoS mitigation systems by spoofing (faking) the bot's IP address. Previous versions of the Mirai malware didn't include this feature.

2sec4u confirmed in a private conversation that some of the newly-spawned Mirai botnets can carry out DDoS attacks by spoofing IP addresses.The upgraded technology is used in a botnet four times bigger than the one that took out Dyn last month. It rents for $50-60K/month, nothing compared to the damage it can do. Mirai has been updated with some zero-day exploits to which somewhere between 5M and 40M home routers appear to be vulnerable. Estimating 30% utilization of the 5M resource at $50K/month suggests Mirai-based botnets are a $2.2M/year business.

Schrecker is right about the seriousness of the DDoS threat:
If the operators behind these IoT-enabled botnets were to “point them at industry” instead of smaller targets such as individual journalists’ websites, as happened with infosec researcher Brian Krebs, the impact on the world economy could be “devastating”, he added.ISPs could do more to secure IoT devices, for example by detecting devices with known vulnerabilities and blocking access to and from them. But this would require a much higher level of user support than current ISP business models could support. Again, an ISP that "did the right thing" would price themselves out of the market.

There is plenty of scope for finger-pointing about IoT security. Having industry groups focus on what their own industry could do would be more constructive than dumping responsibility on others whose problems they don't understand. But it appears in all cases that there is are collective action and short-termism problems. Despite the potential long-term benefits, individual companies would have to take actions against their short-term interests, and would be out-competed by free-riders.

District Dispatch: Charlie Wapner to serve as Senior Research Associate

Thu, 2016-12-01 15:06

I am pleased to announce the appointment of Charlie Wapner as a Senior Research Associate in ALA’s Office for Information Technology Policy (OITP). In this role, Charlie will provide research and advice on the broad array of issues addressed by OITP, and especially as needed to advocate with the three branches of the federal government and communicate with the library community.

Charlie Wapner, newly appointed Senior Research Associate at OITP

Charlie will be familiar to District Dispatch readers as he was a Senior Information Policy Analyst here in OITP in 2014-16. Among his contributions for ALA included the completion of two major reports. He completed a major report, “Progress in the Making: 3D Printing Policy Considerations Through the Library Lens,” which attracted library and general press coverage (e.g., Charlie contributed to a piece by the Christian Science Monitor), and he was invited to write an article for School Library Journal.

OITP’s work on entrepreneurship was launched by Charlie through the development and publication of “The People’s Incubator: Libraries Propel Entrepreneurship” (.pdf), a 21-page white paper that describes libraries as critical actors in the innovation economy and urges decision makers to work more closely with the library community to boost American enterprise. The paper is rife with examples of library programming, activities and collaborations from across the country. Charlie’s work is the basis for our current policy advocacy and the creation of a brief on libraries and entrepreneurship and small business.

Charlie came to ALA in March 2014 from the Office of Representative Ron Barber (Ariz.), where he was a legislative fellow. Earlier, he also served as a legislative correspondent for Representative Mark Critz (Penn.). Charlie also interned in the offices of Senator Kirsten Gillibrand (N.Y.) and Governor Edward Rendell (Penn.). After completing his B.A. in diplomatic history at the University of Pennsylvania, Charlie received his M.S. in public policy and management from Carnegie Mellon University.

The post Charlie Wapner to serve as Senior Research Associate appeared first on District Dispatch.

LibUX: The Non-Reader Persona

Thu, 2016-12-01 14:52

The saga of the user experience of ebooks continues. An in-time-for-Thanksgiving breakdown by Pew Research Center’s Andrew Perrin looks at the demographics of Americans who don’t read any books whatsoever – and as bleak as that sounds, I think in the spirit of the weekend we should be thankful.

Why’s that? Well, we in libraries could do better about knowing who not to cater to.

This data helps us better understand our non-adopters.

Given the share that hasn’t read a book in the past year, it’s not surprising that 19% of U.S. adults also say they have not visited a library or a bookmobile in the past year. The same demographic traits that characterize non-book readers also often apply to those who have never been to a library. Andrew Perrin


Who are “non-adopters”?

I am on record generally thinking that personas aren’t particularly useful in design, but there are three I like:

  • First adopters perceive an immediate need for a service or product. Once offered, they’re on board.
  • Late adopters probably see your service favorably – but there’s no rush. Maybe the price isn’t right, or it doesn’t quite solve a job they need done just yet. They’ll come around.
  • Non adopters are disinterested and aren’t likely to use your service, period.

You organize your design and development strategy around these: first adopters will adopt, generate feedback, some income — or whatever numbers matter to your organization, whether that’s foot traffic, registration, and so on — and create word-of-mouth that in time will loop-in late adopters. Each type of user values the features of your service differently, but because first adopters are core to reaching others, you prioritize your early efforts for them.

Identifying non-adopters is useful in the short-term so you don’t waste your time catering to them. It sounds crass, but features non-adopters like that first- and late-adopters don’t aren’t to be mistaken for features that will engage non-adopters.

They’re red-herrings.

Are non-adopters driving our decision making?

Earlier this year in an episode about library usage and trends for Metric: A UX Podcast, we observed how the support for libraries in a separate Pew survey outweighed their actual usage, and feedback about which services to provide differed noticeably between those who use libraries and those who don’t. As the trends in public libraries move toward makerspaces, 3d-printing and the like, libraries need to be very clear about who precisely is asking for these.

When asked why they visit public libraries in person, large numbers of library users cite fairly traditional reasons. These include borrowing printed books (64% of library visitors do this, down slightly from the 73% who did in 2012, but similar to the 66% who did so in 2015) or just sitting and reading, studying, or engaging with media (49%, identical to the share who did so in 2012). John B. Horrigan

It’s hard to tell whether this chart demonstrates actual interest in the use of 3d printers or other high-tech devices, or whether these services weren’t yet available in the respondents’ community. I’d guess for many it was the latter. We can probably chalk some of this up to lack of awareness.

Even so, the trend is clear.

Libraries are putting real steam behind this service category. At this time there are 730 libraries plotted in Amanda’s map of 3d printers in libraries – and growing.

The question is whether meaningful investment in these features engage users as much or more than others. Do we know? Libraries don’t need to make profit, but there’s some concern about the impact failure might have on experimentation in the future – let alone on the overall impact on community support during election season.

Appealing to the wrong users might have gross consequences on the user experience of everyone else – especially if it knocks libraries off the user-centric bandwagon all together.

What better way to scare library administration from iterative design thinking than going full-bore without the prerequisite user research, burning time and budget into projects that patrons don’t care about?

Non-adopters in the long-term

In the long term, non-adopters deserve a second look. They define the boundaries of our practical decision-making but they also represent potential users of new services.

For most organizations and companies, non-adopters are a lost cause. The target-audience of adopters is narrowly defined by use cases. Reaching non-adopters demands a tangential service that meets and entirely unrelated need, but the overhead for designing, developing, and supporting these can be too much.

Libraries are unique in that “disparate community services” — academic or public — are sort of what they’re about. Collecting and distributing, teaching, entertaining, and advocating exemplify this, which now defines the makeup of what people think libraries do and why there is high public support. It doesn’t seem that much of a stretch to branch into software development, W3C standards-making, the block chain, makerspaces, 3d printing, media labs, coworking, and more.

Organizationally libraries are pre-positioned to extend into new service categories more naturally than others.

The challenge is to iterate sustainably.

Non-readers are likely to not be library users

Or, more optimistically, non-readers are likely to not be library users yet. There are opportunities to engage them, but the point of this whole thread is to not make light of the risk when you are budget- or time- or talent-constrained.

Andrew determined non-readers tend to be

  • adults with a high-school degree or less
  • less likely to own smartphones or tablets
  • at or below a household income of $30,000 per year
  • potentially older: 29% of adults ages 50 and up have not a read a book in the past year
How the non-reader persona impacts library design

The lack of a smartphone doesn’t rule-out that non-readers use the web. In fact, we know from the kind of work we do both that the digital divide is real and, more importantly, that libraries play an important role bridging that gap by providing free internet and access — even lending devices, in some cases – having done so since the ’90s. Increasingly even reluctant internet users must become internet users when applying for work or participating in government, assistance for which also fall within the boundaries of what libraries do.

None of this really matters however if the library web presence, which is increasingly the cornerstone for even tangible library services (like circulation), isn’t designed to reach the greatest common denominator of device support. There are people who don’t own a smartphone intentionally, but for many it, the data plan, and internet access is cost prohibitive. These users might have old phones, old browsers, low data threshold, slow internet, or just lack familiarity with or comfort using the internet.

To even hope of reaching these folks imply that our websites must

  • be reasonably backward compatible with older browsers
  • fit as many device shapes and screen sizes as possible
  • go easy on the page weight (see “what does my site cost“?)
  • be accessible

let alone emphasizing easy onboarding of new patrons in our physical spaces, ensuring here also accessibility, findability, and affordance.

This means that library websites that aren’t progressively enhanced, mobile-first, responsive, lightweight and fast (use this system of measurements) are almost guaranteed to fail to engage this group.

District Dispatch: Registration opens for National Library Legislative Day 2017

Thu, 2016-12-01 14:00

Photo Credit: Adam Mason

We are happy to announce that registration for the 43rd annual National Library Legislative Day is open. This year, the event will be held in Washington, D.C. on May 1-2, 2017, bringing hundreds of librarians, trustees, library supporters, and patrons to Washington, D.C. to meet with their Members of Congress and rally support for libraries issues and policies. As with previous years, participants will receive advocacy tips and training, along with important issue briefings prior to their meetings. Featured issues include:

  • Library funding
  • Privacy and surveillance reform
  • Copyright modernization
  • Access to government information
  • Affordable broadband access
  • Net neutrality protection

Participants at National Library Legislative Day have the option of taking advantage of a discounted room rate by booking at the Liaison. To register for the event and find hotel registration information, please visit the website.

Want to see a little more? Check out the photos from last year!

We also offer a scholarship opportunity to one first-time participant at National Library Legislative Day. Recipients of the White House Conference on Library and Information Services Taskforce (WHCLIST) Award receive a stipend of $300 and two free nights at a D.C. hotel. For more information about the WHCLIST Award, visit our webpage.

I hope you will consider joining us!

For more information or assistance of any kind, please contact Lisa Lindle, ALA Washington’s Grassroots Communications Specialist, at or 202-628-8140.

The post Registration opens for National Library Legislative Day 2017 appeared first on District Dispatch.

Open Knowledge Foundation: OpenTrialsFDA presents prototype as finalist for the Open Science Prize

Thu, 2016-12-01 11:54

For immediate release

Open Knowledge International is thrilled to announce that the OpenTrialsFDA team is presenting its prototype today at the BD2K Open Data Science Symposium in Washington, DC as finalist for the Open Science Prize. The Open Science Prize is a global science competition to make both the outputs from science and the research process broadly accessible. From now until 6 January 2017, the public is asked to help select the most promising, innovative and impactful prototype from among the six finalists – of which one will receive the grand prize of $230,000.

OpenTrialsFDA is a collaboration between Dr. Erick Turner (a psychiatrist-researcher and transparency advocate), Dr. Ben Goldacre (Senior Clinical Research Fellow in the Centre for Evidence Based Medicine at the University of Oxford) and the team behind OpenTrials at Open Knowledge International.  

OpenTrialsFDA works on making clinical trial data from the FDA (the US Food and Drug Administration) more easily accessible and searchable. Until now, this information has been hidden in the user-unfriendly Drug Approval Packages that the FDA publishes via its dataportal Drugs@FDA. These are often just images of pages, so you cannot even search for a text phrase in them. OpenTrialsFDA scrapes all the relevant data and documents from the FDA documents, runs Optical Character Recognition across all documents, links this information to other clinical trial data, and now presents it through a new user-friendly web interface at

Explore the OpenTrialsFDA search interface

Any user can type in a drug name, and see all the places where this drug is mentioned in an FDA document. Users can also access, search and present this information through the application programming interfaces (APIs) the team will produce. In addition, the information has been integrated into the OpenTrials database, so that the FDA reports are linked to reports from other sources, such as, EU CTR, HRA, WHO ICTRP, and PubMed.

The prototype will provide the academic research world with important information on clinical trials in general, improving the quality of research, and helping evidence-based treatment decisions to be properly informed. Interestingly, the FDA data is unbiased, compared to reports of clinical trials in academic journals. Dr. Erick Turner explains: “With journal articles everything takes place after a study has finished, but with FDA reviews, there is a protocol that is submitted to the FDA before the study has even started. So the FDA learns first of all that the study is to be done, which means it can’t be hidden later. Secondly it learns all the little details, methodological details about how the study is going to be done and how it is going to be analyzed, and that guards against outcome switching.”

Dr Ben Goldacre: “These FDA documents are hugely valuable, but at the moment they’re hardly ever used. That’s because – although they’re publicly accessible in the most literal sense of that phrase – they are almost impossible to search, and navigate. We are working to make this data accessible, so that it has the impact it deserves.”

Voting for the Open Science Prize finalists is possible through more information on OpenTrialsFDA is available from and from the team’s video below.


Editor’s notes

Dr. Ben Goldacre
Ben is a doctor, academic, writer, and broadcaster, and currently a Senior Clinical Research Fellow in the Centre for Evidence Based Medicine at the University of Oxford. His blog is at and he is @bengoldacre on twitter. Read more here. His academic and policy work is in epidemiology and evidence based medicine, where he works on various problems including variation in care, better uses of routinely collected electronic health data, access to clinical trial data, efficient trial design, and retracted papers. In policy work, he co-authored this influential Cabinet Office paper, advocating for randomised trials in government, and setting out mechanisms to drive this forwards. He is the co-founder of the AllTrials campaign. He engages with policy makers. Alongside this he also works in public engagement, writing and broadcasting for a general audience on problems in evidence based medicine. His books have sold over 600,000 copies.

Dr. Erick Turner
Dr. Erick Turner is a psychiatrist-researcher and transparency advocate. Following a clinical research fellowship at the NIH, he worked for the US Food and Drug Administration (FDA), acting as gatekeeper for new psychotropic drugs seeking to enter the US market. In 2004 he published a paper drawing researchers’ attention to the Drugs@FDA website as a valuable but underutilized source of unbiased clinical trial data. Dissatisfied with the continuing underutilization of Drugs@FDA, he published a paper in the BMJ in order to encourage wider use of this trove of clinical trial data.

Open Knowledge International   
Open Knowledge International is a global non-profit organisation focussing on realising open data’s value to society by helping civil society groups access and use data to take action on social problems. Open Knowledge International addresses this in three steps: 1) we show the value of open data for the work of civil society organizations; 2) we provide organisations with the tools and skills to effectively use open data; and 3) we make government information systems responsive to civil society.

Open Science Prize
The Open Science Prize  is a collaboration between the National Institutes of Health and the Wellcome Trust, with additional funding provided by the Howard Hughes Medical Institute of Chevy Chase, Maryland.  The Open Data Science Symposium will feature discussions with the leaders in big data, open science, and biomedical research while also showcasing the finalists of the Open Data Science Prize, a worldwide competition to harness the innovative power of open data.

Harvard Library Innovation Lab: pockets of people

Wed, 2016-11-30 21:23

we hosted a bunch of amazing visitors earlier this week (knight prototype workshop!) and we were fortunate enough to gather everyone for dinner. after drinks were served, i used my phone’s camera and swooped into each booth aka pocket of people.

swooping into these pockets of people is surprisingly meaningful and rich — i very much get a distinct sense for the vibe/mood/energy at each table. this swoop in and pan pattern is deep.

what should i do with these clips? feels like there’s some coolness here but i can’t seem to grab it. ideas?

LITA: Jobs in Information Technology: November 30, 2016

Wed, 2016-11-30 19:26

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Yale University, Software Engineer, Yale Center for British Art, New Haven, CT

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

District Dispatch: ALA seeks nominations for 2017 James Madison awards

Wed, 2016-11-30 17:20

The American Library Association’s (ALA) Washington Office is calling for nominations for two awards to honor individuals or groups who have championed, protected and promoted public access to government information and the public’s right to know.

The James Madison Award, named in honor of President James Madison, was established in 1986 to celebrate an individual or group who has brought awareness to these issues at the national level. Madison is widely regarded as the Father of the Constitution and as the foremost advocate for openness in government.

James Madison Award logo

The Eileen Cooke Award honors an extraordinary leader who has built local grassroots awareness of the importance of access to information. Cooke, former director of the ALA Washington Office, was a tireless advocate for the public’s right to know and a mentor to many librarians and trustees.

Both awards are presented during Freedom of Information (FOI) Day, an annual event on or near March 16, Madison’s birthday.

Nominations should be submitted to the ALA Washington Office no later than January 20, 2017. Submissions should include a statement (maximum one page) about the nominee’s contribution to public access to government information, why it merits the award and one seconding letter. Please include a brief biography and contact information for the nominee.

Send e-mail nominations to Jessica McGilvray, Deputy Director for the ALA Office of Government Relations, at Submissions can also be mailed to:

James Madison Award / Eileen Cooke Award
American Library Association
Washington Office
1615 New Hampshire Avenue, NW
Washington, D.C. 20009-2520

The post ALA seeks nominations for 2017 James Madison awards appeared first on District Dispatch.

District Dispatch: CopyTalk webinar: Section 108 video project

Wed, 2016-11-30 17:12

Starting in the late 1970s academic libraries built collections of VHS titles with an emphasis on supporting classroom teaching. On average, academic libraries have more than 3,000 VHS tapes. 

Eclipsed by robust and rapid adoption of DVDs, the VHS era is now over. But a crisis is welling for libraries. Of the hundreds of thousands of VHS recordings commercially released, a substantial number never were released on DVD or in streaming format. To compound matters, industry experts estimate that various forces converge against VHS (age of tapes, irreparable and irreplaceable equipment, retirement of VHS technicians) ultimately making the format inaccessible by 2027.

Under Section 108 of U.S. Copyright law, libraries have an available remedy to this problem. The law allows duplication of some items if prior to duplication, a reasonable search determined that an unused copy of the title is not available.

This session presents a cooperative database, established to capture the search efforts for current distribution of VHS video titles, and to identify titles eligible for duplication under Section 108.

Our speaker will be deg farrelly, who has been a media librarian for 40 years, the last 25 at Arizona State University. He has played instrumental roles at multiple companies in the development of streaming video collections and licensing, including the first PDA, the first subscription and the first EBA models. Co-investigator of two national studies, The Survey of Academic Library Streaming Video (2013) and Academic Library Streaming Video Revisited (2015), farrelly writes and presents frequently on issues related to streaming video.

Thursday, December 1st at 2pm Eastern/11am Pacific for our hour long free webinar. Join us!

Go to and sign in as a guest. You’re in.

This  free webinar program is brought to you by OITP’s copyright education subcommittee. Space is limited, but all CopyTalk webinars are archived.

The post CopyTalk webinar: Section 108 video project appeared first on District Dispatch.

FOSS4Lib Upcoming Events: German DSpace User Group Meeting 2017

Wed, 2016-11-30 16:44
Date: Thursday, September 21, 2017 - 09:00 to 17:00Supports: DSpace

Last updated November 30, 2016. Created by Peter Murray on November 30, 2016.
Log in to edit this page.

Following meetings in 2014, 2015, and 2016, we are happy to announce that there will be a fourth German DSpace User Group Meeting in 2017. The German DSpace User Group Meeting 2017 will be organized by Fraunhofer IRB, Stuttgart University Library and The Library Code. It will take place at University of Stuttgart on Thursday, 21st September 2017. So please, save the date!  Further information will be sent out.
Information about the German DSpace User Group may be found here.

FOSS4Lib Upcoming Events: VIVO Conference 2017

Wed, 2016-11-30 16:41
Date: Wednesday, August 2, 2017 - 09:00 to Friday, August 4, 2017 - 17:00Supports: Vivo

Last updated November 30, 2016. Created by Peter Murray on November 30, 2016.
Log in to edit this page.

For details see the conference homepage.

Terry Reese: MarcEdit Update

Wed, 2016-11-30 14:07

In what’s become a bit of a tradition, I took some of my time over the Thanksgiving holiday to work through a few things on my list and put together an update (posted last night).  Updates were to all versions of MarcEdit and cover the following topics:


* Enhancement: Dedup Records – addition of a fuzzy match option
* Enhancement: Linked Data tweaks to allow for multiple rules files
* Bug Fix: Clean Smart Characters can now be embedded in a task
* Enhancement: MARC Tools — addition of a MARC=>JSON processing function
* Enhancement: MARC Tools — addition of a JSON=>MARC processing function
* Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support
* Dependency Updates: Updated Saxon XML Engine
* Enhancement: Command-Line Tool: MARC=>JSON; JSON=>MARC processes added to the command-line tool
* Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes)
* removal of some deprecated dependencies

Mac OS

* Enhancement: Dedup Records – addition of a fuzzy match option
* Enhancement: Linked Data tweaks to allow for multiple rules files
* Enhancement: MARC Tools — addition of a MARC=>JSON processing function
* Enhancement: MARC Tools — addition of a JSON=>MARC processing function
* Behavior Change: SPARQL Browser updates — tweaks make it more simple at this point, but this will let me provide better support
* Dependency Updates: Updated Saxon XML Engine
* Enhancement: continued updates to the Automatic updater (due to my webhost continuing to make changes)
* Enhancement: Linked data enhancement — allow selective collection processing
* Enhancement: MarcEditor: Smart Character Cleaner added to the Edit ShortCuts menu
* removal of some deprecated dependencies

Couple notes about the removal of deprecated dependencies.  These were mostly related to a SPARQL library that I’d been using – but having some trouble with due to changes a few institutions have been making.  It mostly was a convenience set of tools for me, but they were big and bulky.  So, I’m rebuilding exactly what I need from core components and shedding the parts that I don’t require.

Couple other notes – I’ll be working this week on adding the Edit Shortcuts functionality into the Mac versions task manager (that will bring the Windows and Mac version back together).  I’ll also be working to do a little video recording on some of the new stuff just to provide some quick documentation on the changes.

You can download from the website: or assuming my webhost hasn’t broke it, the automatic downloader.  And I should not, the automatic downloader will now work differently – it will attempt to do a download, but if my host causes issues, it will automatically direct your browser to the file for download following this update.


DPLA: DPLA and Library of Congress Announce New Collaboration

Tue, 2016-11-29 17:55

The Library of Congress today signed a memorandum of understanding (MOU) with the Digital Public Library of America to become a Content Hub and will ultimately share a significant portion of its rich digital resources with DPLA’s database of digital content records.

The first batch of records will include 5,000 items from three major Library of Congress map collections – the Revolutionary War, Civil War, and panoramic maps.

“We are pleased to make the Digital Public Library of America a new door through which the public can access the digital riches of the Library of Congress,” said Librarian of Congress Carla Hayden. “We will be sharing some beautiful, one-of-a-kind historic maps that I think people will really love. They are available online and I hope even more people discover them through DPLA.”

“We couldn’t be more thrilled to collaborate closely with the Library of Congress, to work with them on the important mission of maximizing access to our nation’s shared cultural heritage,” said DPLA’s Executive Director Dan Cohen, “and we deeply appreciate not only the Library’s incredible collections, but also the great efforts of the Librarian and her staff.”

“The Library of Congress’s extraordinary resources will be exponentially more available to everyone in the United States through DPLA. This partnership will benefit everyone, from curious thinkers to scholars,” said Amy Ryan, President of DPLA’s Board of Directors.

“The United States of America laid down from the best authorities, agreeable to the Peace of 1783,” one of 5,000 Library of Congress maps that will soon be discoverable in DPLA. More info

The Digital Public Library of America, the product of a widely shared vision of a national digital library dating back to the 1990s, was launched with a planning process bringing together 40 leaders from libraries, foundations, academia and technology projects in October, 2010 followed by an intense community planning effort that culminated in 2013. Its aim was to supersede the silo effect many digitization efforts were subject to. Based in Boston, the board of directors includes leading public and research librarians, technologists, intellectual property scholars, and business experts from across the nation. Its goal is to create “an open, distributed network of comprehensive online resources that would draw on the nation’s living heritage from libraries, universities, archives, and museums in order to educate, inform, and empower everyone in current and future ­generations.”

The Library of Congress expects to add a significant portion of its digital items to the original trio of collections over time, covering other collections such as photos, maps and sheet music.

Library of Congress items already appear in the DPLA database. Earlier in this decade, the Library digitized more than 100,000 books in its collections as part of its membership in the HathiTrust and the Biodiversity Heritage Library, both current partners with the DPLA. As a result, those books are already in the DPLA’s collections through those partners.

The Digital Public Library of America strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated more than 14 million items from more than 2,000 institutions. The DPLA is a registered 501(c)(3) non-profit.

The Library of Congress is the world’s largest library, offering access to the creative record of the United States—and extensive materials from around the world—both on site and online. The Library is the main research arm of the U.S. Congress and the home of the U.S. Copyright Office. Explore collections, reference services and other programs and plan a visit at, access the official site for U.S. federal legislative information at, and register creative works of authorship at

David Rosenthal: Talks at the Library of Congress Storage Architecture Meeting

Tue, 2016-11-29 16:00
Slides from the talks at last September's Library of Congress Storage Architecture meeting are now on-line. Below the fold, links to and commentary on three of them.

Fontana's 2015 analysis
Robert Fontana updated his invaluable survey of storage media trends with 2015 numbers and more detailed discussion. You need to go read the whole thing; extracts cannot do it justice.

Many of the conclusions he drew are similar to those from my post The Future of Storage and earlier posts:
  • The Kryder rates for tape, hard disk and NAND flash are similar, and in the 20%/yr range. The days of 40%/yr are gone for good.
  • The impact of this change on the costs of storing data for the long haul has yet to sink in. As Fontana says "Storage is more valuable, less replaceable, and must be reliable for longer time periods".
  • No medium's $/GB is outpacing the others by a huge margin, although over time flash is gaining ground.
  • Fontana EB shippedTotal Exabytes shipped is increasing linearly, not exponentially, at around 77EB/yr. Storage is not undergoing Mind-Boggling Growth; the IDC numbers for "data generated" have nothing to do with storage demand.
  • Total revenue is about flat, now with more of the dollars going to flash and less to hard disk.
  • Last year flash shipped 83EB and hard disk shipped 565EB. For flash to displace hard disk immediately would need 32 new state-of-the-art fabs at around $9B each or nearly $300B in total investment. So not going to happen.
  • But over the next 4 years Fontana projects NAND flash shipments will grow to 400EB/yr versus hard disk shipments perhaps 800EB/yr. So there will be continued gradual erosion of hard disk market share.
I've long admired the work of Kestutis Patiejunas on Facebook's long-term storage systems. He and Sam Merat presented An Optical Journey: Building the largest optical archival data storage system at Facebook. They described the collaboration between Facebook and Panasonic to get the prototype optical storage system that attracted attention when it was announced in 2014 into production. They deployed 10s of Petabytes of 100GB BluRay disks and found a disk failure rate of 0.015%. In 2017 they expect to deploy 100s of PB of a second-generation system with 300GB disks, and in 2018 to achieve Exabyte scale with a third generation using 500GB disks. Panasonic showed the production hardware at the meeting.

I've consistently been skeptical of the medium-term prospects for DNA storage, as in my post debunking Nature's reporting on a paper from Microsoft. Karin Strauss and Luis Ceze, from the team behind that paper, presented A DNA-Based Archival Storage System. Despite my skepticism, I believe this team is doing very important work. The reason is the same as why Facebook's work on optical storage is interesting; it is the system aspects not the media themselves that are important.

The Microsoft team are researching what a DNA-based storage system would look like, not just trying to demonstrate the feasibility of storing data in DNA. For example, they discuss how data might be encoded in DNA to permit random access. Although this is useful research, the fact remains that DNA data storage requires a reduction in relative synthesis cost of at least 6 orders of magnitude over the next decade to be competitive with conventional media, and that currently the relative write cost is increasing, not decreasing.

Open Knowledge Foundation: Git for Data Analysis – why version control is essential for collaboration and for gaining public trust.

Tue, 2016-11-29 11:00

Openness and collaboration go hand in hand. Scientists at PNNL are working with the Frictionless Data team at Open Knowledge International to ensure collaboration on data analysis is seamless and their data integrity is maintained.

I’m a computational biologist at the Pacific Northwest National Laboratory (PNNL), where I work on environmental and biomedical research. In our scientific endeavors, the full data life cycle typically involves new algorithms, data analysis and data management. One of the unique aspects of PNNL as a U.S. Department of Energy National Laboratory is that part of our mission is to be a resource to the scientific community. In this highly collaborative atmosphere, we are continuously engaging research partners around the country and around the world.

Image credit: unsplash (public domain)

One of my recent research topics is how to make collaborative data analysis more efficient and more impactful. In most of my collaborations, I work with other scientists to analyze their data and look for evidence that supports or rejects a hypothesis. Because of my background in computer science, I saw many similarities between collaborative data analysis and collaborative software engineering. This led me to wonder, “We use version control for all our software products. Why don’t we use version control for data analysis?” This thought inspired my current project and has prompted other open data advocates like Open Knowledge International to propose source control for data.

Openness is a foundational principle of collaboration. To work effectively as a team, people need to be able to easily see and replicate each other’s work. In software engineering, this is facilitated by version control systems like Git or SVN. Version control has been around for decades and almost all best practices for collaborative software engineering explicitly require version control for complete sharing of source code within the development team. At the moment we don’t have a similarly ubiquitous framework for full sharing in data analysis or scientific investigation. To help create this resource, we started Active Data Biology. Although the tool is still in beta-release, it lays the groundwork for open collaboration.

The original use case for Active Data Biology is to facilitate data analysis of gene expression measurements of biological samples. For example, we use the tool to investigate the changing interaction of a bacterial community over time; another great example is the analysis of global protein abundance in a collection of ovarian tumors. In both of these experiments, the fundamental data consist of two tables: 1) a matrix of gene expression values for each sample; 2) a table of metadata describing each sample. Although the original instrument files used to generate these two simple tables are often hundreds of gigabytes, the actual tables are relatively small.

To work effectively as a team, people need to be able to easily see and replicate each other’s work.

After generating data, the real goal of the experiment is to discover something profoundly new and useful – for example how bacteria growth changes over time or what proteins are correlated with surviving cancer. Such broad questions typically involve a diverse team of scientists and a lengthy and rigorous investigation. Active Data Biology uses version control as an underlying technology to ease collaboration between these large and diverse groups.

Active Data Biology creates a repository for each data analysis project. Inside the repository live the data, analysis software, and derived insight. Just as in software engineering, the repository is shared by various team members and analyses are versioned and tracked over time. Although the framework we describe here was created for our specific biological data application, it is possible to generalize the idea and adapt it to many different domains.

An example repository can be found here. This dataset originates from a proteomics study of ovarian cancer. In total, 174 tumors were analyzed to identify the abundance of several thousand proteins. The protein abundance data is located in this repository. In order to more easily analyze this with our R based statistical code, we also store the data in an Rdata file (data.Rdata). Associated with this data file is a metadata table which describes the tumor samples, e.g. age of the patient, tumor stage, chemotherapy status, etc. It can be found at metadata.tsv (For full disclosure, and to calm any worries, all of the samples have been de-identified and the data is approved for public release.)

Data analysis is an exploration of data, an attempt to uncover some nugget which confirms a hypothesis. Data analysis can take many forms. For me it often involves statistical tests which calculate the likelihood of an observation. For example, we observe that a set of genes which have a correlated expression pattern and are enriched in a biological process. What is the chance that this observation is random? To answer this, we use a statistical test (e.g. a Fisher’s exact test). As the specific implementation might vary from person to person, having access to the exact code is essential. There is no “half-way” sharing here. It does no good to describe analyses over the phone or through email; your collaborators need your actual data and code.

In Active Data Biology, analysis scripts are kept in the repository. This repository had a fairly simple scope for statistical analysis. The various code snippets handled data ingress, dealt with missing data (a very common occurrence in environmental or biomedical data), performed a standard test and returned the result. Over time, these scripts may evolve and change. This is exactly why we chose to use version control, to effortlessly track and share progress on the project.

We should note that we are not the only ones using version control in this manner. Open Knowledge International has a large number of GitHub repositories hosting public datasets, such as atmospheric carbon dioxide time series measurements. Vanessa Bailey and Ben Bond-Lamberty, environmental scientists at PNNL, used GitHub for an open experiment to store data, R code, a manuscript and various other aspects of analysis. The FiveThirtyEight group, led by Nate Silver, uses GitHub to share the data and code behind their stories and statistical exposés. We believe that sharing analysis in this way is critical for both helping your team work together productively and also for gaining public trust.

At PNNL, we typically work in a team that includes both computational and non-computational scientists, so we wanted to create an environment where data exploration does not necessarily require computational expertise. To achieve this, we created a web-based visual analytic which exposes the data and capabilities within a project’s GitHub repository. This gives non-computational researchers a more accessible interface to the data, while allowing them access to the full range of computational methods contributed by their teammates. We first presented the Active Data Biology tool at Nature’s Publishing Better Science through Better Data conference. It was here that we met Open Knowledge International. Our shared passion for open and collaborative data through tools like Git led to a natural collaboration. We’re excited to be working with them on improving access to scientific data and results.

On the horizon, we are working together to integrate Frictionless Data and Good Tables into our tool to help validate and smooth our data access. One of the key aspects of data analysis is that it is fluid; over the course of investigation your methods and/or data will change. For that reason, it is important that the data integrity is always maintained. Good Tables is designed to enforce data quality; consistently verifying the accuracy of our data is essential in a project where many people can update the data.

One of the key aspects of data analysis is that it is fluid…For that reason, it is important that the data integrity is always maintained.

One of our real-world problems is that clinical data for biomedical projects is updated periodically as researchers re-examine patient records. Thus the meta-data describing a patient’s survival status or current treatments will change. A second challenge discovered through experience is that there are a fair number of entry mistakes, typos or incorrect data formatting. Working with the Open Knowledge International team, we hope to reduce these errors at their origin by enforcing data standards on entry, and continuously throughout the project.

I look forward to data analysis having the same culture as software engineering, where openness and sharing has become the norm. To get there will take a bit of education as well as working out some standard structures/platforms to achieve our desired goal.

DuraSpace News: New add-ons: the #makeITopen campaign by 4Science

Tue, 2016-11-29 00:00

From Susanna Mornati, Head of Operations 4Science

A new opportunity for add-ons that we call, the #makeITopen campaign, could represent a turning point in the Community support services already offered by 4Science.

At 4Science we truly believe in the value of openness and we are committed to release in open source everything we develop. In order to do so, we need the Community support and participation.

DuraSpace News: Announcing the German DSpace User Group Meeting 2017

Tue, 2016-11-29 00:00

Following meetings in 2014, 2015, and 2016, we are happy to announce that there will be a fourth German DSpace User Group Meeting in 2017. The German DSpace User Group Meeting 2017 will be organized by Fraunhofer IRB, Stuttgart University Library and The Library Code. It will take place at University of Stuttgart on Thursday, 21st September 2017. So please, save the date!  Further information will be sent out.

Information about the German DSpace User Group may be found here.

DuraSpace News: VIVO Updates for Nov 27–Internationalization

Tue, 2016-11-29 00:00

From Mike Conlon, VIVO Project Director

Internationalization  VIVO is used in more than 25 countries around the world.  See for a list of sites.  You can use the country selector on the right hand side of the list to select sites by country.  

DuraSpace News: NOW AVAILABLE: Fedora 4.6.1 and 4.7.0 Releases

Tue, 2016-11-29 00:00

From Andrew Woods, Fedora Tech Lead

Austin, TX  The Fedora Team is proud to announce the two releases: Fedora 4.6.1 and 4.7.0. The Fedora 4.7.0 release includes many fixes and improvements, documented below, as well as an upgrade to the backend datastore. Therefore, the Fedora 4.6.1 includes a patch to the underlying ModeShape that generates backups suitable for restore to Fedora 4.7.0.