- The Mt. Gox theft.
- Several protocol vulnerabilities.
- Bitcoin's failure to remain effectively distributed.
- The DAO theft. and subsequent hard fork.
- And most recently the Bitfinex theft.
Last Friday Nathaniel Popper at the New York Times wrote in Envisioning Bitcoin’s Technology at the Heart of Global Finance:
A new report from the World Economic Forum predicts that the underlying technology introduced by the virtual currency Bitcoin will come to occupy a central place in the global financial system.
A report released Friday morning by the forum, a convening organization for the global elite, is one of the strongest endorsements yet for a new technology — the blockchain — that has become the talk of the financial industry, despite the shadowy origins of Bitcoin. Apparently:
The 130-page report from the forum is the product of a year of research and five gatherings of executives from several major institutions, including JPMorgan Chase, Visa, MasterCard and BlackRock.
The report estimates that 80 percent of banks around the world could start distributed ledger projects by next year. Large central banks are also studying how the blockchain will alter the way money moves around the globe.What could possibly go wrong? The idea that institutions like these would take insane risks, crash the world economy, and blackmail governments into bailing them out is ridiculous. But Popper notes:
But few real-world uses of the blockchain have come to fruition, other than Bitcoin itself. That has led to some questions about whether the blockchain is the proverbial solution looking for a problem, rather than an innovation that will be used widely.
Existing virtual currencies have continued to struggle with security problems. One of the largest Bitcoin exchanges, Bitfinex, recently lost more than $60 million worth of Bitcoin in a hacking — the latest of several such incidents.
The World Economic Forum report suggests that it will take some time for such problems to be worked out. In addition to the technology issues, the report says that the industry will have to work with governments to create standard rules and laws to govern transactions.So that's OK then. They aren't going to rush into deploying new technology without understanding all the implications, or at least making sure that they aren't left holding the bag when something does go wrong.
Does anyone remember the last time the banks tried to replace old, shopworn record-keeping technology with spiffy new computerized systems? It was a company called MERS, a shell company owned by the banks. Replacing the paper system for recording mortgages with an electronic system saved the banks billions, led to rampant fraud by the banks, cost innocent people their homes, and enabled the derivatives that crashed the economy in 2008.
Lets look at some of the implications of global distributed ledger systems. The day before the nearly $70M theft from Bitfinex Izabella Kaminska at the Financial Times, whose work in the area has been consistently and appropriately skeptical, posted Bitcoin’s panopticon problem pointing out that because the average Bitcoin user needs intermediary services such as Coinbase:
the average customer needs to give up as much if not more personal data, often by more dubious means (online upload mechanisms, email or the post) to much less experienced organisations. Once in the system, meanwhile, customer transactions can be linked on a much broader and more publicly intrusive level than anything in the standing banking system. Moreover, there are no associated par value or liquidity guarantees for the customer if and when things go wrong.So cryptocurrencies are mostly:
a giant privacy bait and switch. There are simply no money transmitter institutions of Coinbase’s size that can afford to operate in defiance of the law of the land, unless they care to be based in the sort of jurisdictions most other banking institutions won’t care to do business with.
Meanwhile, if the cost in the banking system is indeed mostly related to the cost credit checking, due diligence and policing non-compliance, it’s worth considering how exactly the likes of Coinbase improve on these processes vis-a-vis traditional institutions?Once the Bitfinex theft hit the headlines, Kaminska was off and running, first with Time to reevaluate blockchain hype:
The mark-to-market value of the stolen coins is roughly $70m, but again who can really tell their true worth. Bitcoin is an asset class where the liquidation of 119,756 (approximately 0.8 per cent of the total bitcoin circulation) can move the market more than 20 per cent, suggesting a certain fantastical element to the valuation.and:
We probably won’t know what really happened at Bitfinex for a while. But what is clear is that thus far the technology which was supposed to be revolutionising finance and making it more secure (oddly, by skirting regulations) is looking awfully like the old technology which ran the system into the ground.
Either way it’s unlikely to be good news for Bitfinex. If the failing was down to a problem with the multi-signature mechanism, then the affair potentially stands to undermine many of the blockchain systems and companies which have come to rely on the system for security. On the same basis it also stands to undermine the side-chain and escrow-based solutions bitcoin developers are working on to overcome the bitcoin network’s scaling constraint.
If the failing was down to an internal security breach or poor risk management on the other hand (say due to naivety or inexperience), this creates an argument for additional capital provisioning, regulatory scrutiny and macroprudential oversight — taking away much of the cost advantage associated with the network.Two days later Kaminska was back with Day three post Bitfinex hack: Bitcoin bailouts, liabilities and hard forks, among other interesting observations returning to the panopticon issue:
The first relates to the ongoing legal recourse rights of Bitfinex victims. Even though they may have lost their right to pursue Bitfinex for compensation, they are still going to be entitled to track the funds across the blockchain to seek recourse from whomsoever receives the bitcoins in their accounts. That’s good news for victims, but mostly likely very bad news for bitcoin’s fungible state and thus its status as a medium of exchange.
Just one successful claim by a victim who tracks his funds to an identifiable third party, and the precedent is set. Any exchanges dealing with bitcoin in a legitimate capacity would from then on be inclined to do much stronger due diligence on whether the bitcoins being deposited in their system were connected to ill-gotten gains. This in turn would open the door to the black-listing of funds that can not prove they were originated honestly via legitimate earnings.This got Tim Worstall at Forbes going with Bitcoin's Latest Economic Problem - Market Ouvert Or Squatters' Rights.
Of course, people should not steal things. And yet for a currency to work it has to be possible to take the currency at its face value. Thus it may well be that the bank robber paid you for his beer with stolen money but you got it fair and square and thus the bank doesn’t get it back as and when they find out. Another way to put this is that the crime dies with the criminal. And yet the blockchain upends all of that. Because every transaction which any one bitcoin has been involved in is traceable.Three days later, Kaminska returned with Bitfinex and a 36 percent charge from the school of life:
Publicly, the Hong Kong-based bitcoin exchange Bitfinex has lumped its users with a 36 per cent haircut on all balances to cover the $70m hack which it experienced last week.
The haircut applies to all customers irrespective of whether they were holding bitcoin balances or dollar balances or other altcoin balances. ...
Privately and anecdotally, however, customers are reporting some variance with regard to the way the haircut is being imposed. Some US customers, for example, who only had dollar balances are reporting they’ve been able to get all their money back.Another three days and Kaminska posted How I learned to stop blockchain obsessing and love the Barry Manilow, a sustained analogy between the hype cycle of music and fashion, and the hype cycle of blockchain technology which argues:
there is some commentary emerging to suggest we are indeed in a phase transition and what’s cool isn’t the blockchain anymore but rather the defiant acknowledgement that the old operating system — for all its flaws — is built on the right regulatory, legal and trusted foundations after all and just needs some basic tweaking.and goes on to point to a number of very interesting such commentaries, starting with Credit Suisse:
The buzz surrounding blockchain is comparable to that surrounding the internet in the late 1980s – some go as far as to suggest that blockchain has the potential to reimagine and reinvent key institutions – for example, the corporation. We are less sanguine, and note eight key challenges that have the potential to limit the utility, and therefore reduce adoption, of blockchain systems.Every one of the eight is apposite, especially:
8. A forked road, the lesson of the DAO attack… The DAO attack exposed flaws in smart contracts on Ethereum which should act as a reminder that nascent code is susceptible to bugs before it is truly tire-kicked, and even then, complete surety is never guaranteed. The ‘hard fork’ undertaken by the Ethereum community also shows that blockchains are only immutable when consensus wants them to be. So in practice blockchains are decentralized (not), anonymous (not and not), immutable (not), secure (not), fast (not) and cheap (not). What's (not) to like?
I have trouble pinpointing the exact moment when Evergreen was conceived, even though I was one of the principal agents. Let me start by covering a few acronyms: PINES is the name of a statewide inter-lending library consortium project in Georgia, one of the largest of its kind. GPLS is short for the Georgia Public Library Service, a state agency that administers PINES and many other library projects. In 2002, I was hired by GPLS for PINES as a contractor to develop an add-on reporting system to address the limitations of their then library automation system. With the success of that project, I was hired on as a full-time employee in 2003 to maintain and further develop that system, as well as create other needed software solutions to prop up their existing system. It was during that time that we lobbied for and eventually received the go-ahead to develop Evergreen, and in 2004 we hired Mike Rylander and Bill Erickson to help develop the software. In 2006, PINES went live on Evergreen and the rest is history.
But there’s a part of the story that doesn’t get told often enough, and that’s the influence of the free/libre and open-source software movements, with the likes of Richard M. Stallman, who wrote the GNU General Public License and started the Free Software Foundation, Eric S. Raymond, who wrote the Cathedral and the Bazaar, Larry Wall, the creator of the Perl programming language, and Linus Torvalds, the creator of the Linux kernel and more recently, the Git version control system. I was (and am) a huge open source and free software advocate; I cut my teeth on Linux during college and followed the battles between open source and proprietary software very closely. There were huge forces arrayed against us; Microsoft was abusing their monopoly power preventing OEM’s from installing Linux while calling open source a cancer, the SCO lawsuit was happening in 2003, and most governments and governmental agencies were very skeptical of open source software, GPLS included. It’s funny how some of those same battles were later mirrored in our efforts.
There are some very philosophical reasons why open source software meshes well with libraries (software developers even collect code into “libraries”), and while we did use those as arguments in our appeal for Evergreen, it was really the pragmatic aspects that made it all possible.
1) The building blocks in the software world were a lot bigger than they used to be (and this trend continues to be true), and increasingly open source themselves. We didn’t have to constantly reinvent the wheel, and could use software like GCC, Linux, Apache, PostgreSQL, Ejabberd, Mozilla, CVS (and then Subversion, Bazaar, and Git), MARC::Record, Simple2ZOOM, etc. We could be informed by and share code with other open source efforts like Koha.
2) Each of these open source applications had (and continues to have) development communities and ecosystems that we could participate in (including the wider open source community as a whole). We could (and do) leverage volunteers and domain experts who just want to help out. Or pay people if we needed to (we did that too, for example, with some enhancements to PostgreSQL).
And all of this not starting from scratch actually allowed us to start from scratch, with more modern design paradigms. For example, we made real use (not mere buzzword compliance) of relational databases and a service-oriented architecture.
Most importantly, we were already using these things prior to Evergreen in our daily work, and demonstrated what just a single developer could do with modern open source tools and software. Now, with almost a dozen active committers and many more contributors of domain expertise, documentation, testing, etc.–in other words, a community of our own–we’re pretty much unstoppable. Happy Birthday Evergreen!
— Jason Etheridge, Community and Migration Manager
Austin, TX Elections were held in July to choose members who will serve on DuraSpace community-supported open source project steering groups for three year terms. The DSpace, Fedora and VIVO Projects are pleased to announce the following results and extend a warm welcome to new Steering Group members.
DSpace Project leaders look forward to the efforts of the following new Steering Group members in helping to guide the project.
"By reducing the complexity of natural language to manageable sets of nouns and verbs that are well-defined and unambiguous, sentence-like statements can be interpreted...."
The "he" in the first quote is John Wilkins, and the date is 1668. His goal was to create a scientifically correct language that would have one and only one term for each thing, and then would have a set of particles that would connect those things to make meaning. His one and only one term is essentially an identifier. His particles are linking elements.
The second quote is from a publication about OCLC's linked data experiements, and is about linked data, or RDF. The goals are so obviously similar that it can't be overlooked. Of course there are huge differences, not the least of which is the technology of the time.*
What I find particularly interesting about Wilkins is that he did not distinguish between classification of knowledge and language. In fact, he was creating a language, a vocabulary, that would be used to talk about the world as classified knowledge. Here we are at a distance of about 350 years, and the language basis of both his work and the abstract grammar of the semantic web share a lot of their DNA. They are probably proof of some Chomskian theory of our brain and language, but I'm really not up to reading Chomsky at this point.
The other interesting note is how similar Wilkins is to Melvil Dewey. He wanted to reform language and spelling. Here's the section where he decries alphabetization because the consonants and vowels are "promiscuously huddled together without distinction." This was a fault of language that I have not yet found noted in Dewey's work. Could he have missed some imperfection?!
*Also, Wilkins was a Bishop in the Anglican church, and so his description of the history of language is based literally on the Bible, which makes for some odd conclusions.
Schulte-Albert, Hans G. Classificatory Thinking from Kinner to Wilkins: Classification and Thesaurus Construction, 1645-1668. Quoting from Vickery, B. C. "The Significance of John Wilkins in the History of Bibliographical Classification." Libri 2 (1953): 326-43.
Godby, Carol J, Shenghui Wang, and Jeffrey Mixter. Library Linked Data in the Cloud: Oclc's Experiments with New Models of Resource Description. , 2015.
 Wilkins, John. Essay Towards a Real Character, and a Philosophical Language. S.l: Printed for Sa. Gellibrand, and for John Martyn, 1668.
As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Bloomberg’s Michael Nilsson and Diego Ceccarelli’s talk, “Learning to Rank in Solr”.
In information retrieval systems, learning to rank is used to re-rank the top X retrieved documents using trained machine learning models. The hope is that sophisticated models can make more nuanced ranking decisions than a standard Solr query. Bloomberg has integrated a reranking component directly into Solr, enabling others to easily build their own learning to rank systems and access the rich matching features readily available in Solr. In this session, Michael and Diego review the internals of how Solr and Lucene score documents and present Bloomberg’s additions to Solr that enable feature engineering, feature extraction, and reranking.
Michael Nilsson is a software engineer working at Bloomberg LP, and has been a part of the company’s Search and Discoverability team for four years. He’s used Solr to build the company’s terminal cross domain search application, searching though millions of people, companies, securities, articles, and more.
Diego Ceccarelli is a software engineer at Bloomberg LP, working in the News R&D team. His work focuses on improving search relevance in the news search functions. Before joining Bloomberg, Diego was a researcher in Information Retrieval at the National Council of Research in Italy, whilst completing his Ph.D. in the same field at the University of Pisa. He is experienced in Lucene and Solr, dating back to his work on the Europeana project in 2010, and since then enjoys diving into these technologies.
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP from Lucidworks
Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Ten years ago, something remarkable happened. A brand new open source ILS went live in over 200 libraries across the state of Georgia. While the migration happened in a matter of days, it was the culmination of two years’ worth of work by a small team.
Today, that same open source ILS is utilized by more than 1,500 libraries all over the United States, Canada, and across the world. The small team has grown into an active community, supporting and improving the software each year. That software is Evergreen and Equinox is beyond proud to be the leading provider of support and development services for it.
As we approach Evergreen’s tenth birthday–Labor Day weekend–we’ll look at each year of Evergreen’s life. Equinox Team Members will be posting a blog post each day leading up to Labor Day, beginning on Thursday, August 18 (That’s tomorrow! Yay!). Join us as we take a closer look at the software that has brought so many people together.
Open Knowledge Foundation: An interview with Rufus Pollock – Why I am Excited about MyData 2016 in Finland
A few weeks ago I sat down for a virtual interview with Molly Schwartz from Open Knowledge Finland about my thoughts on open data and mydata and why I am so excited about the MyData 2016 conference. The three-day conference is taking place from August 31 to September 2 in Helsinki and is being organized by Open Knowledge Finland in partnership with Aalto University and Fing.
You can register for MyData 2016 here. Discount price for the members of the Open Knowledge Network is just 220 eur / 3 day conference. Ask for the discount code from (email@example.com) before registering at the MyData 2016 Holvi store. You can also still apply to be a volunteer for the conference.
This event shares many of the same organizers as the 2012 Open Knowledge Festival in Helsinki so you can expect the same spirit of fun, creativity and quality that made that such an incredible experience.Transcript:
Molly Schwartz: So hi everybody, this is Molly Schwartz here, one of the team members helping to put on the MyData conference in Helsinki from August 31 to September 2. And I’m sitting here with one of our plenary speakers, Dr. Rufus Pollock, who is one of founders and the president of Open Knowledge, a worldwide network working to provide access to more open and broad datasets. So we’re very excited to have him here. So, Rufus, something that not a lot of people know is that MyData is actually an initiative that was born out the Finnish chapter of Open Knowledge (OKFFI), how do you feel about things that were kind of started by your idea springing up of their own accord?
Rufus Pollock: Well, it’s inspirational and obviously really satisfying. And not just in a personal way: it’s just wonderful to see how things flourish. Open Knowledge Finland have been an incredibly active chapter. I first went to Finland in I think it was 2010, and I was inspired then. Finland is just a place where you have a feeling you are in a wise society. The way they approach things, they’re very engaged but they have that non-attachment, a rigor of looking at things, and also trying things out. Somehow there’s not a lot of ego, people are very curious to learn and also to try things out, and I think deep down are incredibly innovative.
And I think this event is really in that tradition. I think the area of personal data and MyData is a huge issue, and one with a lot of connections to open data, even if it’s distinct. So I think it’s a very natural thing for a chapter from Open Knowledge to be taking on and looking at because it’s central to how we look at the information society, the knowledge society, of the 21st Century.
MS: Definitely. I totally agree. I like that you brought up that this concept of personal data is somewhat distinct, but it’s inevitably tied to this concept of opening data. Oftentimes opening datasets, you’re dealing with personal datasets as well. So, what are the kind of things you’re planning to speak about, loosely, at the conference, and what do you look forward to hearing from other people who will be at the MyData?
RP: Yes, that’s a great question. So, what am I looking to talk about and engage with and what am I looking forward to hearing about? Well, maybe I’ll take the second first.
What I am looking forward to
I think one of the reasons I’m really excited to participate and come is it’s the area where – even though I obviously know a lot about data and open data – this area of personal data is one where I am not as much an expert – by a long way. So I’m really curious to hear about it and especially about things like: what is the policy landscape? What do people think are the big things that are coming up? I’m really interested to see what the business sector is looking at.
There’s been quite a lot of discussion about how one could innovate in this space in a way that is both a wider opportunity for people to use data, personal data in usable ways, maybe in health care, maybe in giving people credit, I mean in all kinds of areas. But how do you do that in a way that respects and preserves people’s privacy, and so on. So, I think that’s really interesting as well, and again I’m not so up on that space. I’m looking forward to meeting and hearing from some of the people in that area.
And similarly on the policy, on the business side, and also on the civil society side and on the research side. I’ve heard about things like differential privacy and some of the breakthroughs we’ve had over the last years about how one might be able to allow people like researchers to analyse information, like genetics, like healthcare without getting direct access to the individual data and creating privacy issues. And there’s clearly a lot of value one could have from researchers being able to look at, for example, at genomic data from individuals across a bunch of them. But it’s also crucial to be able to preserve privacy there, and what are the kind of things going on there? And the research side I think would also touch on the policy side of matters as well.
What I would like to contribute
That brings me to what, for my part, I would like to contribute. I think Open Knowledge and we generally are on a journey at a policy level. We’ve got this incredible information revolution, this digital revolution, which means we’re living in a world of bits, and we need to make sure that world works for everyone. And that it works, in the sense that, rather than delivering more inequality – which it could easily do – and more exploitation, it give us fairness and empowerment, it brings freedom rather than manipulation or oppression. And I think openness is just key there.
And this vision of openness isn’t limited to just government – we can do it for all public datasets. By public datasets I don’t just mean government datasets, I mean datasets that you can legitimately share with anyone.
Now private, personal data you can’t legitimately give to anyone, or share with anyone — or you shouldn’t be able to!
So I think an interesting question is how those two things go together — the public datasets and the private, personal data. How they go together both in overall policy, but also in the mind of the public and of citizens and so on — how are they linked?
And this issue of how we manage information in the 21st century doesn’t just stop at some line where it’s like, oh, it’s public data, you know, and therefore we can look at it this way. Those of us working to make a world of open information have to look at private data too.
At Open Knowledge we have always had this metaphor of a coin. And one side of this coin is public data, e.g. government data. Now that you can open to everyone, everyone is empowered to have access. Now the flip side of that coin is YOUR data, your personal data. And your data is yours: you should get to choose how it’s shared and how it’s used.
Now while Open Knowledge is generally focused on, if you like, the public side, and will continue to do so, overall I think across the network this issue of personal data is just huge, it has this huge linkage. And I think the same principles can be applied. Just as for open data what we say is that people have freedom to access, share, use, government, whatever data is being opened, so with YOUR data, YOU should be empowered to access, share, and use that, as YOU see fit. And right now that is just not the case. And that’s what leads to the abuses we get concerned about, but it’s also what stops some of the innovation and stops people from being empowered and able to understand and take action on their own lives — what might you learn from having my last five years of, say, your shopping receipts or mobile phone location data.
Ultimately what happens to public data and what happens to personal data, they’re interconnected, both in people’s minds and, in a sense, they don’t just care about one thing or another, they care about, how is digital information going to work, how’s my data going to be managed, how’s the world’s data going to be managed.
I also think MyData is some of the most relevant issues for ordinary people. For example, just recently I had to check if someone paid me and it was just a nightmare. I had to scroll back through endless screens on my online banking account to find ways to download different files to piece it all together. Why didn’t they let me download all the data in a convenient way rather than having to dig forever and then only get the last three months. They’ve got that data on their servers, why can’t I have it? And, you know, maybe not only do I want it, but maybe there’s some part I would share anonymized, it could be aggregated and we could discover patterns that might be important — just, as one example we might be able to estimate inflation better. Or take energy use: I would happily share my house’s energy use data with people, even if it does tell you when I go to bed, I’d be happy to share if that let’s us discover how to make things environmentally better.
The word I think at the heart of it is empowerment. We at Open Knowledge want to see people empowered in the information age to understand, to make choices, to hold power to account, and one of the fundamental things is you being empowered with the information about you that companies or governments have, and we think you should be given access to that, and you should be choosing who else has access to it, and not the company, and not the government, per se.
MS: Yes. And that’s exactly the principle of why MyData came out of Open Knowledge, as you mentioned earlier, the idea of why can not these principles of Open Knowledge, of the datasets we want to be receiving, also apply to our data that we would like to be open in the same way back to us?
RP: Absolutely correct Molly, I mean just yes, absolutely.
MS: And that’s why it’s also so interesting, so many people have been talking about this kind of inherent tension between openness and privacy, and kind of, changing how we’re thinking about that, and seeing it actually as the same principles just being applied to individual people.
RP: Exactly, back in 2013 I wrote a post with my co-CEO Laura James about this idea and even used the term MyData. There’s an underlying unity that you’re pointing out that actually is a deep principle.
Remember openness isn’t an end in itself, right, it’s a means to an end – like money! And the purpose of having information and opening it up is to empower human beings to do something, to understand, to innovate, to learn, to discover, to earn a living, whatever it is. And that idea of empowerment, fundamentally, is common in both threads, both to MyData and personal data and access to that, and the access to public data for everyone. So I think you are totally right.
MS: Yes. So, thank you so much Rufus for joining us today, we are so looking forward to having you at the conference. You mention that you’ve been to Finland before. How long ago was that?
RP: I was there in 2012 for Open Knowledge Festival which was amazing. And then in 2010. Finland is an amazing place, Helsinki is an amazing place, and it will be an amazing event, so I really invite you to come along to the conference.
MS: I second that, and it’s many of the same people who are involved in organizing the Open Knowledge Festival who are involved in organizing MyData, so we can expect much of the same.
RP: A brilliant programme, high quality people. An incredible kind of combination of kind of joy and reliability, so you’ll have an amazing time, come join us.
MS: Yes. Ok, so thank you Rufus, and we will see you in August!
RP: See you in August!
Security issues happen for two reasons –
1. Developers who have just started and cannot really tell a difference between using MD5 or bcrypt.
2. Developers who know stuff but forget/ignore them.
Our detailed explanations should help the first type while we hope our checklist helps the second one create more secure systems. This is by no means a comprehensive guide, it just covers stuff based on the most common issues we have discovered in the past.
Their security checklist — I think — demonstrates just how involved web security can be in that, first, no wonder so many mega-sites have been hacked in the last year, and second, libraries probably aren’t ready for anticipatory design.
You’ll find this list useful if:
* You’re a writer working directly with an interface
* You’re a designer that is often tasked with writing user interface copy
* You’re a content strategist working on a product and want to learn more about the words used in an interface
* You’re a copywriter and want to learn more about user experience
We — Amanda and Michael — were honored to guest-host an episode of Circulating Ideas, interviewing Cecily Walker about design thinking and project management. Steve Thomas was nice enough to let us re-broadcast our interview.
Cecily Walker is a librarian at Vancouver Public Library, where she focuses on user experience, community digital projects, digital collections, and the intersection of social justice, technology, and public librarianship. It was her frustration with the way that software was designed to meet the needs of highly technical users rather than the general public that led her to user experience, but it was her love of information, intellectual freedom, and commitment to social justice that led her back to librarianship. Cecily can be found on Twitter (@skeskali) where she frequently holds court on any number of subjects, but especially lipstick.Show notes
I saw this at the bookstore today and bought it immediately: The Bad-Ass Librarians of Timbuktu and Their Race to Save the World’s Most Precious Manuscripts, by Joshua Hammer.
I’ll try to do a review when I’ve read it, but in the meantime, anything about bad-ass librarians needs to be shared with all the other bad-ass librarians out there.
I was in library school in 1971-72, and then again in 1978-80. In 1971 I took the required classes of cataloging (two semesters), reference, children's librarianship, library management, and an elective in law librarianship. Those are the ones I remember. There was not a computer in the place, nor do I remember anyone mentioning them in relation to libraries. I was interested in classification theory, but not much was happening around that topic in the US. In England, the Classification Research Group was very active, with folks like D.J. Foskett and Brian Vickery as mainstays of thinking about faceted classification. I wrote my first published article about a faceted classification being used by a UN agency.
In 1978 the same school had only a few traditional classes. I'd been out of the country, so the change to me was abrupt. Students learned to catalog on OCLC. (We had typed cards!) I was hired as a TA to teach people how to use DIALOG for article searching, even though I'd never seen it used, myself. (I'd already had a job as a computer programmer, so it was easy to learn the rules of DIALOG searching.) The school was now teaching "information science". Here's what that consisted of at the time: research into term frequency of texts; recall and precision; relevance ranking; database development.
I didn't appreciate it at the time, but the school had some of the bigger names in these areas, including William Cooper and M. E. "Bill" Maron. (I only just today discovered why he called himself Bill - the M. E., which is what he wrote under in academia, stands for "Melvin Earl". Even for a nerdy computer scientist, that was too much nerdity.) 1978 was still the early days of computing, at least unless you were on a military project grant or worked for the US Census Bureau. The University of California, Berkeley, did not have visible Internet access. Access to OCLC or DIALOG was via dial-up to their proprietary networks. (I hope someone has or will write that early history of the OCLC network. For its time it must have been amazing.)
The idea that one could search actual text was exciting, but how best to do it was (and still is, to a large extent) unclear. There was one paper, although I so far have not found it, that was about relevance ranking, and was filled with mathematical formulas for calculating relevance. I was determined to understand it, and so I spent countless hours on that paper with a cheat sheet beside me so I could remember what uppercase italic R was as opposed to lower case script r. I made it through the paper to the very end, where the last paragraph read (as I recall): "Of course, there is no way to obtain a value for R[elevance], so this theory cannot be tested." I could have strangled the author (one of my profs) with my bare hands.
Looking at the articles, now, though, I see that they were prescient; or at least that they were working on the beginnings of things we now take for granted. One statement by Maron especially strikes me today:
A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person‘s information need. This paper shows how aboutness is related to probability of satisfaction. This is from 1977, and it essentially describes the basic theory behind Google ranking. It doesn't anticipate hyperlinking, of course, but it does anticipate that "about" is not the main measure of what will satisfy a searcher's need. Classification, in the traditional sense, is the quintessence of about. Is this the crux of the issue? As yet, I don't know. More to come.
Coyle, Karen (1975). "A Faceted Classification for Occupational Safety and Health". Special Libraries. 66 (5-6): 256–9.
Maron, M. E. (1977) "On Indexing, Retrieval, and the Meaning of About". Journal of the American Society for Information Science, January, 1977, pp. 38-43
From Mike Conlon, VIVO project director
Wikipedia runs on a set of policies that are used to help make decisions about content and to govern behavior. In a sense, this is already a very male approach, as we know from studies of boys and girls at play: boys like a sturdy set of rules, and will spend considerable time arguing whether or not rules are being followed; girls begin play without establishing a set of rules, develop agreed rules as play goes on if needed, but spend little time on discussion of rules.
If you've been on Wikipedia and have read discussions around various articles, you know that there are members of the community that like to "wiki-lawyer" - who will spend hours arguing whether something is or is not within the rules. Clearly, coming to a conclusion is not what matters; this is blunt force, nearly content-less arguing. It eats up hours of time, and yet that is how some folks choose to spend their time. There are huge screaming fights that have virtually no real meaning; it's a kind of fantasy sport.
Wiki-lawyering is frequently used to harass. It is currently going on to an amazing extent in harassment of me, although since I'm not participating, it's even emptier. The trigger was that I sent back for editing two articles about men that two wikipedians thought should not have been sent back. Given that I have reviewed nearly 4000 articles, sending back 75% of those for more work, these two are obviously not significant. What is significant, of course, is that a woman has looked at an article about a man and said: "this doesn't cut it". And that is the crux of the matter, although the only person to see that is me. It is all being discussed as violations of policy, although there are none. But sexism, as with racism, homophobia, transphobia, etc., is almost never direct (and even when it is, it is often denied). Regulating what bathrooms a person can use, or denying same sex couples marriage, is a kind of lawyering around what the real problem is. The haters don't say "I hate transexuals" they just try to make them as miserable as possible by denying them basic comforts. In the past, and even the present, no one said "I don't want to hire women because I consider them inferior" they said "I can't hire women because they just get pregnant and leave."
Because wiki-lawyering is allowed, this kind of harassment is allowed. It's now gone on for two days and the level of discourse has gotten increasingly hysterical. Other than one statement in which I said I would not engage because the issue is not policy but sexism (which no one can engage with), it has all been between the wiki-lawyers, who are working up to a lynch mob. This is gamer-gate, in action, on Wikipedia.
It's too bad. I had hopes for Wikipedia. I may have to leave. But that means one less woman editing, and we were starting to gain some ground.
The best read on this topic, mainly about how hard it is to get information that is threatening to men (aka about women) into Wikipedia: WP:THREATENING2MEN: Misogynist Infopolitics and the Hegemony of the Asshole Consensus on English Wikipedia
I have left Wikipedia, and I also had to delete my Twitter account because they started up there. I may not be very responsive on other media for a while. Thanks to everyone who has shown support, but if by any chance you come across a kinder, gentler planet available for habitation, do let me know. This one's desirability quotient is dropping fast.
As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Gregg Donovan’s session, “Lessons from Sharding Solr at Etsy”.
Gregg covers the following lessons learned at Etsy while sharding Solr: How to enable SolrJ to handle distributed search fanout and merge; How to instrument Solr for distributed tracing so that distributed searches may be better understood, analyzed, and debugged; Strategies for managing latency in distributed search, including tolerating partial results and issuing backup requests in the presence of lagging shards.
Gregg Donovan is a Senior Software Engineer at Etsy.com in Brooklyn, NY, working on the Solr and Lucene infrastructure that powers more than 120 million queries per day. Gregg spoke at Lucene/Solr Revolution 2015 in Austin, Lucene Revolution 2011 in San Francisco, Lucene Revolution 2013 in San Diego, and previously worked with Solr and Lucene at TheLadders.com.
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy from Lucidworks
Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…
In The Future of the Library: From Electric Media to Digital Media
Robert K. Logan and Marshall McLuhan, you can find this passage from Chapter 9: The Compact Library and Human Scale:
As an undergraduate at the University of Cambridge, I (McLuhan) encountered a library in the English Department that had immense advantages. I never have seen one like it since. It consisted of no more than 1,500 or 2,000 books. These books, however, were chosen from many fields of history and aesthetics, philosophy, anthropology, mathematics, and the sciences in general. The one criterion, which determined the presence of any book in this collection, was the immediate and top relevance for twentieth-century awareness. The shelf-browser could tell at a glance exactly which poets, novelists, critics, painters, and which of their individual writings were indispensable for knowing “where it’s at….”
… The library of which I spoke existed in a corner of the English Faculty Library at Cambridge, but it enabled hundreds of students to share all the relevant poets, painters, critics, musicians, and scientists of that time as a basis for an ongoing dialog. Would it not be possible to have similar libraries created by other departments in the university? Could not the History Department indicate those areas of anthropology and sociology that were indispensable to the most advanced historical studies of the hour? Could not the Department of Philosophy pool its awareness of many fields in order to create a composite image of all the relevant of many fields in order to create a composite image of all the relevant speculation and discovery of our time? Only now have I begun to realize that this unique library represented the meeting of both a written and oral tradition at an ancient university. It is this figure-ground pattern of the written and the oral that completes the meaning of the book and the library.
McLuhan isn’t the first scholar to recognize that there is something feels fundamentally different between a library collection of material selected by librarians and a working collection of material selected by practitioners. While the ideal academic library is close at hand and contains a vast amount of material relevant to one’s interests, the ideal working library is compact and at ‘human-scale.’
It is as if there are two kinds of power at hand.
Patrick Wilson’s Two Kinds of Power, published in 1968, and introduced in chapter 1, is a book that is often mentioned in library literature but whose message does not seem to have disseminated through library and cataloging thinking. If it had, our catalogs today might have a very different character. A professor of Library Science at the University of California at Berkeley, Wilson’s background was in philosophy, and his book took a distinctly philosophical approach to the question he posed, which most likely limited its effect on the practical world of librarianship. Because he approached his argument from all points of view, argued for and against, and did not derive any conclusions that could be implemented, there would need to be a rather long road from Wilson’s philosophy to actual cataloging code.
Wilson takes up the question of the goals of what he calls “bibliography,” albeit applied to the bibliographical function of the library catalog. The message in the book, as I read it, is fairly straightforward once all of Wilson’s points and counterpoints are contemplated. He begins by stating something that seems obvious but is also generally missing from cataloging theory, which is that people read for a purpose, and that they come to the library looking for the best text (Wilson limits his argument to texts) for their purpose. This user need was not included in Cutter’s description of the catalog as an “efficient instrument.” By Wilson’s definition, Cutter (and the international principles that followed) dealt only with one catalog function: “bibliographic control.” Wilson suggests that in fact there are two such functions, which he calls “powers”: the first is the evaluatively neutral description of books, which was first defined by Cutter and is the role of descriptive cataloging, called “bibliographic control”; the second is the appraisal of texts, which facilitates the exploitation of the texts by the reader. This has traditionally been limited to the realm of scholarly bibliography or of “recommender” services.
This definition pits the library catalog against the tradition of bibliography, the latter being an analysis of the resources on a topic, organized in terms of the potential exploitation of the text: general works, foundational works, or works organized by school of thought. These address what he sees as the user’s goal, which is “the ability to make the best use of a body of writings.” The second power is, in Wilson’s view, the superior capability. He describes descriptive control somewhat sarcastically as “an ability to line up a population of writings in any arbitrary order, and make the population march to one’s command” (Wilson 1968)
Karen goes on to write…
If one accepts Wilson’s statement that users wish to find the text that best suits their need, it would be hard to argue that libraries should not be trying to present the best texts to users. This, however, goes counter to the stated goal of the library catalog as that of bibliographic control, and when the topic of “best” is broached, one finds an element of neutrality fundamentalism that pervades some library thinking. This is of course irreconcilable with the fact that some of these same institutions pride themselves on their “readers’ services” that help readers find exactly the right book for them. The popularity of the readers’ advisory books of Nancy Pearl and social networks like Goodreads, where users share their evaluations of texts, show that there is a great interest on the part of library users and other readers to be pointed to “good books.” How users or reference librarians are supposed to identify the right books for them in a catalog that treats all resources neutrally is not addressed by cataloging theory.
I’m going copy and past that last sentence again for re-emphasis:
How users or reference librarians are supposed to identify the right books for them in a catalog that treats all resources neutrally is not addressed by cataloging theory.
As you can probably tell from my more recent posts and from my recent more readings, I’ve been delving deeper into the relationship between libraries and readers. To explain why this is necessary, I’ll end with another quotation from McLuhan:
The content of a library, paradoxically is not its books but its users, as a recent study of the use of campus libraries by university faculty revealed. It was found that the dominant criterion for selection of a library was the geographical proximity of the library to the professor’s office. The depth of the collection in the researcher’s field was not as important a criterion as convenience (Dougherty & Blomquist, 1971, pp. 64-65). The researcher was able to convert the nearest library into a research facility that met his needs. In other words, the content of this conveniently located facility was its user. Any library can be converted from the facility it was designed to be, into the facility the user wishes it to become. A library designed for research can be used for entertainment, and vice-versa. As we move into greater use of electronic media, the user of the library will change even more. As the user changes, so will the library’s content or the use to which the content of the library will be subjected. In other words, as the ground in which the library exists changes, so will the figure of the library. The nineteenth-century notion of the library storing basically twentieth-century material will have to cope with the needs of twenty-first century users.
This is the third part series called The Hashtag Syllabus. Part One is a brief examination of the recent phenomenon of generating and capturing crowdsourced syllabi on Twitter and Part Two is a technical description of how to use Zotero to collect and re-use bibliographies online.
U.S. libraries—120,000 strong—represent a robust national infrastructure for advancing economic and educational opportunity for all. From pre-K early learning to computer coding to advanced research, our nation’s libraries develop and deliver impactful programs and services that meet community needs and advance national policy goals.
This message is one that our Washington Office staff bring to federal policymakers and legislators every day, and we know it’s one that library directors and trustees also are hitting home in communities across the country. With Library Card Sign-up Month almost upon us, a new series of short videos (1-2 minutes) can help make the case for libraries, including one featuring school principal Gwen Abraham highlighting the important role of public libraries in supporting education. “Keep the innovation coming. Our kids benefit from it, this will affect their futures, and this is really what we need to make sure our kids are prepared with 21st century skills.”
As the nation considers our vision for the future this election year and begins to plot actionable steps to achieve that vision, we offer The E’s of Libraries® as part of the solution. Education, Employment, Entrepreneurship, Empowerment and Engagement are hallmarks of America’s libraries—but may not be as obvious to decision makers, influencers, and potential partners.
“Cleveland Public Library, like many of our colleagues, is using video more and more to share our services with more people in an increasingly visual world,” said Public Library Association (PLA) President Felton Thomas. “I know this is a catalog we need to build, and I hope these diverse videos will be used in our social media, public presentations and outreach to better reflect today’s library services and resources.”
For Employment: “The library was not a place I thought of right away, but it turned out to be the best place for my job search,” says Mike Munoz about how library programs helped him secure a job in a new city after only four months.
For Entrepreneurship: “”Before I walked into the public library, I knew nothing about 3D printing,” says brewery owner John Fuduric, who used library resources to print unique beer taps for his business. “The library is a great resource, but with the technology, the possibilities are endless.”
And Kristin Warzocha, CEO of the Cleveland Food Bank, speaks to the power of partnerships to address community needs: “Hunger is everywhere, and families across our country are struggling. Libraries are ideal partners because libraries are everywhere, too. Being able to partner with libraries…is a wonderful win-win situation for us.” In dozens of communities nationwide, libraries are partnering to address food security concerns for youth as part of summer learning programs. In Cleveland, this partnership has expanded to afterschool programming and even “checking out” groceries at the library.
“Software Carpentry was originally a competition to design new software tools, not a training course. The fact that you didn’t know that tells you how well it worked.”
When I read this in a recent post on Greg Wilson’s blog, I took it as a challenge. I actually do remember the competition, although looking at the dates it was long over by the time I found it.
I believe it did have impact; in fact, I still occasionally use one of the tools it produced, so Greg’s comment got me thinking: what happened to the other competition entries?
Working out what happened will need a bit of digging, as most of the relevant information is now only available on the Internet Archive. It certainly seems that by November 2008 the domain name had been allowed to lapse and had been replaced with a holding page by the registrar.
There were four categories in the competition, each representing a category of tool that the organisers thought could be improved:
- SC Build: a build tool to replace make
- SC Conf: a configuration management tool to replace autoconf and automake
- SC Track: a bug tracking tool
- SC Test: an easy to use testing framework
I’m hoping to be able to show that this work had a lot more impact than Greg is admitting here. I’ll keep you posted on what I find!
It's that time again. Islandora has a twice-yearly release schedule, shooting to get a new version out at the end of April and October. We are now looking for volunteers to join the team for the October release of Islandora 7.x-1.8, under the guidance of Release Manager Danny Lamb.
Given how fortunate we have been to have so many volunteers on our last few releases, we are changing things up a little bit to improve the experience, both through consolidating our documentation and by adding a few new roles to the release:
- Communications Manager - Works with the Release Manager to announce release timeline milestones to the community. Reminds volunteers of upcoming deadlines and unfinished tasks. Reports to the Release Manager.
- Testing Manager - Oversees testing of the release and reports back to the Release Manager. Advises Testers on how to complete their tasks. Monitors testing status and reminds Testers to complete their tasks on time. Helps the Release Manager to assign testing tickets to Testers during the release.
- Documentation Manager - Oversees documenting the release and reports back to the Release Manager. Advises Documenters on how to complete their tasks. Monitors testing status and reminds Documenters to complete their tasks on time.
- Auditing Manager - Oversees audit of the release and reports back to the Release Manager. Advises Auditors on how to complete their tasks. Monitors testing status and reminds Auditors to complete their tasks on time.
If you have been a Tester, Documenter, or Auditor for a previous Islandora Release, please consider taking on a little more responsibility and being a mentor to new volunteers by managing a role!
These are in addition to our existing release roles:
- Component Manager - Component Managers take responsibility for a single module or collection of modules, reviewing open pull requests and referring the results to the Release Manager. Outside of a release cycle, Component Managers act as Maintainer on their modules until the next release. Components with no Component Manager will not be included in the release.
- Tester - Installing and running the latest Islandora release candidate and testing for bugs. No programming experience required! We are looking for people with a general familiarity with Islandora to try out the latest releases and put them through their paces to look for bugs and make suggestions. Any JIRA tickets marked “Ready for Test” for a given component will also be assigned to the designated tester for a component, along with instructions on how to test.
- Documenter - Checking modules readme files and updating the Islandora Documentation Wiki to reflect new releases.
- Auditor - Each release we audit our README and LICENSE files. Auditors will be responsible for auditing a given component by verifying that these document are current and fit into their proper templates.
- Give back to Islandora. This project survives because of our volunteers. If you've been using Islandora and want to contribute back to the project, being a part of a Release Team is one of the most helpful commitments you can make.
- There's a commitment to fit your skills and time. Do you have a strong grasp of the inner workings of a module and want to make sure bugs, improvements, and features are properly managed in its newest version? Be a Component Manager. Do you work with a module a lot as an end user and think you can break it? Be a Tester! Do you want to learn more about a module and need an excuse to take a deep dive? Be a Documenter! Do you have a busy few months coming up and can't give a lot of time to the Islandora release? Be an Auditor (small time commitment - big help!). You can take on a single module or sign up for several.
- Credit. Part of my job as inaugural Communication Manager is to create Release Team pages on our documentation so that future users can know who helped to make the release a reality.
- T-Shirts. Each member of an Islandora Release Team gets a t-shirt unique to that release. They really are quite nifty:
Tenative schedule for the release:
- Code Freeze: Tuesday, September 5, 2016
- Release Candidate: Monday, September 19, 2016
- Release: Monday October 31, 2016