Digest powered by RSS Digest
A last-minute change to my plans for ALA Midwinter came on Tuesday when I was sought out to fill in for a speaker than canceled at the ALCTS Digital Preservation Interest Group meeting. Options for outsourcing storage and services for preserving digital content has been a recent interest, so I volunteered to combine two earlier DLTJ blog posts with some new information and present it to the group for feedback. The reaction was great, and here is the promised slide deck, links to further information, and some thoughts from the audience response.
Slide Deck and References
In the presentation there is a Table About Costs that uses a scenario from an earlier DLTJ blog post. The text of the scenario is:
To examine the similarities and differences in costs, let’s use the OhioLINK Satellite Image collection as a prototypical example. It consists of about 2 terabytes (2TB) of high-quality images in TIFF format, with about 7.5GB of data going into the repository each month. In the interest of exploring everything that S3 can do, there is an assumption that approximately 4GB of data will be transferred out of the archive each month; OCLC’s Digital Archive does not have a end-user dissemination component.
The point of showing this scenario is to show the widest range of costs — from a storage-only solution like Amazon S3 to a soup-to-nuts service like OCLC Digital Archive. A word about the redacted costs. Some of the numbers for OCLC’s Digital Archive response (from 2008) came from a confidential quote, so the numbers were removed from the public table. For the numbers that are publicly listed, the values come from Barbara Quint’s article.
The articles and blog posts I referenced in the course of the presentation were:
Iglesias, Edward and Wittawat Meesangnil (2010). Using Amazon S3 in Digital Preservation in a mid sized academic library: A case study of CCSU ERIS digital archive system. The Code4Lib Journal, issue 12, retrieved 5-Jan-2011 from http://journal.code4lib.org/articles/4468
Murray, Peter (2008). Long-term Preservation Storage: OCLC Digital Archive versus Amazon S3. Disruptive Library Technology Jester. Retrieved 5-Jan-2011 from http://dltj.org/article/oclc-digital-archive-vs-amazon-s3/
Murray, Peter (2009). Can We Outsource the Preservation of Digital Bits?. Disruptive Library Technology Jester. Retrieved 5-Jan-2011 from http://dltj.org/article/outsource-digital-bits/
Quint, Barbara (2008). OCLC Introduces High-Priced Digital Archive Service. Information Today. Retrieved 5-Jan-2011 from http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=49018
At the Friday ‘Big Heads’ meeting much of the conversation revolved around Incrementalism vs. Revolution, as have so many conversations, about so many things. Someone quoted David Mamet (I can’t find the quote) that what we need is sledge hammers, not chisels, and I thought it was a notion too good to pass up as a jumping off point to discuss that meeting.
There were a lot of interesting topics discussed at the meetings, but as is my habit I’m going to focus only on the topics of interest to me. As usual there were a number of vendors in the audience, and when a few of the ‘heads’ at the main table voiced the expectation that they would be depending on the vendor community for help as they experienced additional staff reductions and resource constraints in general, the vendors came up to the microphones to respond. A couple of vendors expressed their concern that the library community in general has not been able to articulate what they want from vendors, and this has made it difficult for them to develop business plans. I hear a variation of this line when I stroll the exhibit halls and talk to vendors about what their plans are for RDA implementation. Almost always I hear that they have not heard from their customers about what they want, and they’re waiting for that before making plans. As a result, when I’m presenting to librarians about RDA, I tell them that they should be talking to their vendors, asking when and how they will be implementing, etc., etc. The problem with that approach is that a) most of the time the librarians don’t know what to ask, beyond the when and how; and b) when they get an answer they often don’t know how to interpret it. Maybe I’m slow, but I’m coming to the conclusion that I should stop telling people to talk to their vendors about RDA. I’m not sure it matters.
I went up to the microphone for one of my usual rants, after hearing quite enough of this dancing around. Here’s the reality, as I see it:
1. Libraries are unlikely to agree on what they want (this has been true in the past, and will likely be true in the future)
My rant included all three of those points and more. Little over a year ago, the R2 report on the marketplace for MARC records (upon which I blogged) assumed that there is a marketplace for MARC records which will continue and that a direct return on investment is possible (or desirable) for creators of data. I said then, and still believe, that such a viewpoint is both unrealistic and in fact destructive to the task of moving forward into a world where data is not the coin of the realm but freely available (this is the basis for linked open data) and the investment and return on investment is around data services, not data sale. After my rant to Big Heads, one of the vendors came up to talk to me and offered up some useful nuggets to support my view: a) they provide records, but don’t make much money on them; b) the realm of digital metadata is vastly more complicated than that for physical metadata. It’s a huge challenge for vendors to operate in this world, but clearly the usual answers are no longer working, even as the data revolution is not yet fully upon us. The inevitable conclusion is that vendors who wait for their customers to tell them what they want may not survive the coming revolution. This is no time for chisels.
In this context it’s good to meditate on Henry Ford’s famous statement: “If I had asked people what they wanted, they would have said faster horses.”
Despite what you might have read on the blogs, libraries show no signs of imminent ebook-induced death. The latest data from Overdrive, the dominant provider of eBooks to public libraries, shows staggering growth. Digital checkouts doubled in 2010 to 15 million, looking at Overdrive alone. Based on the buzz at this weekend's American Library Association Midwinter Meeting, Overdrive should blow those numbers away in 2011- It seems that almost every librarian I've talked to here has decide to "take the plunge" into eBooks in a big way in 2011.
The ebook companies focused on academic libraries are experiencing the same growth- Ebook Library told me that for the prior year their monthly sales have been double the prior year. The biggest plunge was taken by Proquest, which announced their acquisition of ebook provider ebrary. (I’ll have a separate story on that later.)
To some extent, most libraries have been only sampling the ebook water, and despite noted usability issues and e-reader device fragmentation, patrons seem to want more and more and librararies are responding to patron demand. But not everyone is happy. One librarian told me, after a few beers, that “Overdrive sucks!” and then went on to use language unsuitable for a family-oriented blog.
As far as I can tell, there are two issues around Overdrive that are troubling libraries. One derives from the DRM system from Adobe that Overdrive uses. Adobe’s system is pretty much the only option for libraries and booksellers other than Amazon and Apple; Overdrive has no choice but to use this system in order to work with reader devices and software from Barnes&Noble, Sony and Kobo. The Internet Archive’s Brewster Kahle, in a panel on Saturday morning, slammed the Adobe system, even though it’s used by the Archives OpenLibrary. In OpenLibrary's experience, users were able to complete a lending transaction in only 43% of their attempts. Overdrive is working to improve the smoothness of these transactions, and is introducing new support methods to make the processs easier.
The second issue was discussed by library system vendor executives at Friday’s RMG President’s Panel. According the Polaris Library Systems President Bill Schickling, many of his customers are worried that their libraries will be marginalized by ebook providers like Overdrive. Although Overdrive offers extensive customization options for their ebook lending interface, libraries are still upset that patrons have to use separate interfaces for books and ebooks, one provided by Overdrive and the other provided by their ILS vendor. Libraries often think of the library system as their primary "brand extension" on the internet.
It seems a bit odd that this should be an issue. For years, libraries have lived with databases and electronic journals delivered from separate systems. But books are different. Libraries want ebooks and books to live side by side. It makes little sense to force a user who wants to read a Steig Larsson novel have to check in two places to see print and digital availability.
Overdrive is working overtime to address this second issue, it seems. Overdrive's CEO, Steve Potash, told me that his company is working on opening a set of APIs (application programming interfaces) that will allow system vendors, libraries and other developers to more deeply integrate Overdrive's ebook lending systems into other interfaces. Overdrive has needed these interfaces internally to build reading apps for Android, iPod and iPhone. Overdrive hopes to have an iPad-optimized reading app in Apple's iTunes stare by the end of first quarter 2011, and will be working with selected development partners to work out many of the details. Potash hopes Overdrive will be able to unveil the APIs this summer at the ALA meeting in New Orleans.
The Overdrive APIs and the usability improvement they lead to should come as welcome news to libraries and library patrons everywhere. Library system vendors and developers in libraries will have a lot of work to do over the coming year.
And library patrons will be reading a lot of ebooks.
Are you a new LITA member, or one who hasn’t quite found your place in the organization yet? Not quite sure what LITA is all about? LITA 201 is a session intended to introduce people to the organization and help them find a meaningful way to get involved and get connected. Don Lemke, chair of the Membership Development committee, headed this session and introduced the newbies to the organization, and to invited Committee chairs and Interest Group leaders.
LITA is a broad organization, encompassing librarians from all types of libraries, doing all kinds of work, involved with every type of information technology imaginable. The main goal of LITA is to help people make connections. It’s easy to get boxed in to your specific work, and to cut yourself off from people doing different things, but through LITA, you can stay current and aware of things going on throughout the field. LITA is about people, and according to Andrew Pace, head of IT at NCSU libraries and an active member of LITA, it’s one of the best professional networking organizations in ALA.
There are a lot of ways to get involved with LITA, and the best way to get started is to reach out and contact the chair of a committee or interest group that sounds compelling to you. Involvement is welcome and LITA members will do their best to help you find the right place in the organization. The first place to start is on lita.org, where you can find a list of Committees and Interest Groups. Interest Groups are an easy way to get involved: If you find something that sounds interesting, get in touch with the chair to find out more. You can start your own interest group if you don’t see your interest represented already.
Committees usually require a two year commitment, and while some do involve travel to conferences, virtual participation is becoming easier. Committee work revolves around outreach, education, scholarship, membership, and governance. It’s easy to get involved right away, and there is a volunteer application on the ALA website. The organization seeks balanced committee membership, with a good mix of established and new LITA members, and are always welcome to involvement from new members.
LITA sponsors the National Forum, Regional Institutes, and Pre-Conferences, which offer another way to get involved, and to learn interesting things that will make you a more effective librarian. The National Forum planning committee is a great way to get involved with LITA, and they are always looking for paper and presentation submissions.
Mentorship is a growing area in LITA: The organization is full of both digital natives and digital immigrants, and we both have things to teach each other. LITA is interested in connecting people who can provide each other mutual benefit. If you’re interested in being involved, contact Don Lemke with information about what you have to offer and what you are interested in learning in return.
If you are at ALA Midwinter this year and want to find out more, attend the Town Hall meeting, Monday morning from 8 am to 10 am. The meeting will be in the Convention Center room 01A, and breakfast will be provided (free food!). It’s a great chance to hear about what the organization is doing, and to contribute your ideas for the organization (I can personally attest that when I attended my first ALA Midwinter meeting as a young student who had no idea what she was doing, the people at the LITA town hall meeting were warm and welcoming, and really seemed to appreciate my input, so don’t be shy!).
LITA Happy Hour and LITA Eats are also important ways to get involved with LITA. These events offer less formal ways to get to know each other and make connections. We can let go and sit and talk with each other outside of a professional arena. This is where we really get to know each other, and it’s a great place to meet people and begin to find out how you can fit it in in the organization.
The most important thing to remember about LITA is that the people involved are what make it useful and fun. Your contributions will be welcome, however you can contribute. Because of it’s broadness, there is certain to be something you’re interested in. All you have to do is reach out and contact a committee chair or interest group head to find out the best way to be involved. LITA is happy to help you find your place, and will gladly welcome you into the fold.
Jason Griffey kicked off the session by introducing Dr. Vernor Vinge and talking about his many accolades as a science fiction writer and futurist.
Dr. Vinge then talked about how humans are the best tool-creating animal and the only animal that has figured out how to outsource their cognition — how to spread their cognitive abilities into the outside world. As an example, he talked about how writing and speaking are an outsourcing of our thinking and money represents an outsourcing of our perceived value for things.
As humans continue to outsource cognition more effectively by harnessing powerful machines and complex networks, we move closer to a point of technological singularity. At this point, where a superhuman intelligence can be achieved by machines or some combination of humans and machines, it will become too difficult for humans to fully grasp the present or to predict the future. As an example, he talked about how someone might be able to explain the present time and the fundamental changes that had led up to today to Mark Twain in an afternoon. But what if that same person was trying to explain all of this to a goldfish or a chimpanzee?
Next Dr. Vinge discussed the forces of change that he sees driving society toward this envisioned state of ”technological singularity”.
He began by highlighting the power of humans together with computer networks. He said that development and expansion of cell phone networks and computer networks and the billions of people on these networks has impacted the way we see the world. In the past, there was a “life boat earth” world view where population growth, lack of resources and environmental problems would make competition among the populations of the world a negative sum game. He compared that perspective on the world to the optimism that now stems from the use of networks for communication and crowdsourcing. People can pursue their life interests with passion and enthusiasm and increasingly intelligent networks continue to get better at harnessing the power of intense creative energy for the benefit of all.
Then he talked about the concept of intelligence amplification. He indicated that users who start to develop an intimate enough interface with computers could potentially become superhuman. The computer becomes sort of a “neo neo cortex”. He offered some current examples of how computers are already creating an augmented reality such as Heads-Up Displays that enhance what is being viewed to offer the benefit of additional data and context.
He then explored the potential for fine grained distributed systems or Digital Gaia. He underscored that while the computing power and potential of vast server farms is highly publicized as it relates to Google, Amazon, Microsoft, this is not the only way or the best way to capitalize on available technologies. For some tasks, he said that it is much more favorable to use embedded systems of microprocessors. As an example, there is potential that scientists might want to monitor the metabolism of the individual cells within the human body by placing a microprocessor in every cell and this type of approach would scale well with low power nodes. The outcome of such a project would be quite strange– “where reality becomes its own database.”
Jason Griffey asked Dr. Vinge about his early experiences with libraries and he talked about how he used to visit relatives and take advantage of their library cards so that he could read books from other library’s science fiction collections. He talked about the special role of librarians as enablers of the future and guardians of the past.
When asked about ereaders, he indicated that he did not have an ereader currently and that he typically relies on a laptop or desktop for most of his reading and research since he is consumed by finishing a book. He also expressed concern about the need to better understand DRM and the “pyramid of standards and legalities” associated with ebook devices.
Dr. Vinge then addressed whether it was possible that technological singularity would not happen. He said that some have made the argument that a single neuron is more powerful than most powerful super computer and if true that could delay technological singularity. Over the next 10 to 30 years, he said, ”we will be getting answers that only great philosophers and college sophomores have talked about.” He said that for a long time it had just been assumed that a machine could never play chess and then there was IBM’s Big Blue and things like this just keep gettting chipped away.
Charlton, Galen: Pronounceable acronyms, or why you should come to the LITA Open Source Systems Interest Group meeting at ALA Midwinter
I just realized something — I’ve never heard anybody refer to the LITA OSS IG as the “awww-zig”. Probably just as well.
Anyway, here’s my pitch for the meeting, which is today, 9 January 2011, at the San Diego Convention Center in room 31C. By design, the IGs in LITA are perhaps the wildest and wooliest part of ALA, being forums for like-minded librarians and library techies to discuss and work on cool toys and share ideas for the benefit of their libraries. The OSS IG is no exception; one of the best parts of our meetings is going around the table and having everybody present discuss what they’ve been up to with F/OSS. Think of it as Code4Lib where ties and jackets are allowed!
There is also a business meeting, which co-chair Daniel Lovins and I will keep as short as possible, but fortunately, there’s little in the way of administrivia to talk about. The main agenda item for the business meeting: what does the OSS IG group want to do next? To toss some ideas out:
I look forward to seeing friends of F/OSS, the curious, and even the dubious for what will be a great discussion.
Friday I attended the RDA Update, organized as the “Briefings From RDA Test Participants.” The room was full (overfull, actually), and I ended up sitting in the back on a chair pulled from the main seating area towards the back wall. Beacher Wiggins provided the background and updated the group on the plan and timetable. He suggested that there were three scenarios possible for the decision: one was that the group would agree to adopt RDA, another was that they would decide not to (either for now, or presumably ever), and a third was that they would decide to implement if and when the JSC made some specified changes in the rules. I was a bit taken aback by this last option, since it seemed very heavy handed and somewhat threatening. Of course, there will be options available for any implementing library or group of libraries (national or otherwise), but it seems a bit much to believe that among those options there might not be ways for LC/NLM/NAL to meet their specific or collective needs without holding the US RDA implementation hostage to their desires. If I were representing a non-US constituency (which in a small way I am, as the DCMI liaison to CC:DA) I would certainly take this possibility seriously, if nothing else as a gesture of US-centrism that should be repudiated by the rest of the US and international cataloging and metadata community. By all means, LC/NLM/NAL triumvirate, do what you think best, but don’t throw your considerable weight and credibility around in aid of getting what you think you want, or, just to prove you can. We look bad enough to the international community as it is, please don’t make that worse!
The presentations started out well, with Chris Cronin (U. Chicago) giving a useful summary of his group’s experience. He was followed by Penny Baker (Sterling and Francine Clark Art Institute) who had a very flashy set of slides that did not work well in a room with too much light, and too many people. While various people played with the lights, she tried to get through her slides, but was having trouble seeing the laptop when the lights were down (and her slides were visible), and lost her place a few times. Her main point (as far as I could tell)—that her group was able to show that the RDA relationships worked well in providing ways to link together the very interesting materials they chose to catalog—got lost in the shuffle. Towards the end, someone figured out how to dim the lights sufficient to see the slides without plunging the room into darkness, and the room burst into cheers. The speaker, misinterpreting the audience response, thought she was being cued to finish up, and did so, apologizing as she left, by saying: “Sorry it took so long and was so messy.” The group in the back with me agreed that this was basically the story of RDA, though we should probably not expect a similar apology from JSC.
The remainder of the speakers plodded on with little to say that interested me: they did their testing work, gave their feedback, and determined internally whether they would continue doing ‘RDA Cataloging’ until the big decision comes down from the LC/NLM/NAL triumvirate, presumably on stone tablets for which some poor schlumpf will have to create a preservation strategy.
I have been dubious since the beginning about the usefulness of this testing regime, lately going so far as to compare it with the ‘Security Theater’ we are subjected to at airports these days (I have metal knees, so am always treated to a full, and now even more intrusive ‘pat down’, something that makes me long for a naked scanner at my local airport). The analogy here is that ‘Security Theater’ is to real security as ‘RDA Testing Theater’ is to real testing, one that includes the FRBR part of RDA and not just a smattering of rules changes and some token relationships. I still think that it’s hard to justify the time and expense of the testing that has just concluded, which tests RDA only as used in a MARC environment, not RDA itself. The result of this from the point of the community has been useful insofar as it has provided an avenue for some initial training and participation, but not so useful from the point of view of really providing any understanding of RDA implementation. Far too many catalogers think (hope?) that RDA can be implemented without much change in what they do, which qualifies in my opinion as a very poor result indeed.
Sara and started reading this book to each other on 2 January:Peak learning: how to create your own lifelong education program for personal enlightenment and professional success Ronald Gross; J.P. Tarcher/Putnam 1999 WorldCat•LibraryThing•Google Books•BookFinder
I read the Preface and Chap. 1 (out loud) the first evening. As of Thursday night we are through chapter 4. We have more or less alternated reading to each other since then.
This book is on my 12 Books, 12 Months Challenge list. I began reading it last April but about 3 chapters in it got interrupted by wedding planning/prep, getting married, and moving. I still wanted to read it, and hopefully apply some of it in my life, so I added it to my 12 Books list.
The reason we are reading this book and that I mention it here are that Sara and I are trying to get a little more serious about taking our personal learning and growth into our own hands. This, of course, also includes professional learning and growth. Sara’s opportunities are a bit more limited here than where we were previously, and with me unemployed mine are severely more limited.
An apropos epigraph from the book:
“Learning is not a task or a problem—it is a way to be in the world. Man learns as he pursues goals and projects that have meaning for him.” – Sidney Jourard (44) [Info on who he is. I had to look him up. His sexist language is reportedly a product of his time; not his beliefs.]
The book is addressed to the adult learner. It tries to show that the experiences you had in school are not applicable to learning now. And unlike school, where you were simply told to learn, it attempts to help the learner learn not only how to learn but how they as an individual learn best.
Some of the myths that it seeks to dispel are: Learning is a boring unenjoyable activity; learning deals only with the subjects and skills taught in schools; We must be passive and receptive to “absorb” knowledge; You must put yourself under the tutelage of a teacher; It has to be systematic, logical, and planned; and, It needs to be thorough or it’s not worth doing (47-50).
This book was last updated in 1999 and I really wish it would be updated again. There have been big advancements made in the brain and behavioral sciences regarding learning since then. As to the resources that are now available compared to the Internet of 1998 or so one can only respond, “Oh my!”
If you can get it from a library it might be worth a look. There seems to be an awful lot of extraneous fluff between, and supposedly in support of, the actual useful bits. I already owned a used copy so we’re making use of it.My Topics of Interest
Some of the topics on my list are:
No doubt there are many others that I have forgotten. I will leave Sara to spell out her interests in her own time and fashion, as she sees fit.Tools
We are trying to find the tools and software that will work for us, whether it is something one or the other of us has already been using or whether we need to find something else instead.
For instance, she’s been a big user of Evernote. Although I created an account almost 2 years ago I never took to it. For assorted reasons we’ve been looking at DEVONthink as a replacement for Evernote and to assist in other ways. [ Mac only software ] Somewhat sadly, it doesn’t have some features that we truly need. Then again, not too sad as it saves us a fair bit of money. Still, worth looking at if your needs are not the same as ours, and there is an educator/student discount of 25%. They also have some free tools that look to be quite useful.
For now I am trying out Evernote a bit more seriously than the first time.Resources
Here is a draft list of some of the books we are considering reading to each other as part of our individual personal learning plans (take your pick, goodreads or Open Library). These are titles about literature, the “Great Books” and the canon.
These are all books that we already own, and there are several others that we also already own on the same subjects that could be added. I also have plenty of books on mathematics, physics, and so on in our collection. Some are books I have meant to read for several years now. I just need to add them to the list(s).
We both have many interests and there is a plethora of quality resources available for free today, assuming one has an Internet connection. Of course, libraries will also continue to serve our needs for the more tangible products and ebooks.
Without having begun a formal probe of resources, I am aware of iTunes U, free college courses & lectures from MIT and Harvard and others, the Khan Academy, and many, many other sources. For more fully textual resources there is Hathi Trust, feedbooks, Project Gutenberg, libraries both public and academic, and other sources.
In fact, a good resource arrived in the mail this week: the spring catalog for the local community college, Western Iowa Tech Community College. They have a lifelong learning program that has a fair few interesting looking programs, many of which are free. And it costs all of $5/year. Sara found things of interest too.
This morning (Sat.) we drove over and registered as Lifelong Learning members and we signed up for some things, most free. I signed up for 2 tours, 3 lunch programs and a lecture. Since Sara works full-time she was only able to sign up for 2 things.
I am interested in the Iranian Hostage Crisis lecture as I was a young soldier in the Army when this event happened. It was a defining event in US international relations and still haunts us to this day. I also met one of the hostages much later in my career and was able to help him in a small way that seriously pushed the boundaries of what I knew at the time.Future posts?
I hope to have some more posts about assorted issues related to our adventures in personal learning in the future. Some potential topics include:
Plans will be made (but not over made), resources compiled, topics probed, things learned.
What are some of the things you would like to learn? Any suggested resources that you would recommend?
So in response to a plaint in a BoT comment, I’ve made a glossary of often-used jargon and acronyms on Book of Trogool. It’s assuredly not done yet! Please feel free to suggest things I’ve missed in the comments, on this post or any post. Librarianship, open access, and data curation are no less prone [...]
26 institutions joined HathiTrust in 2010, doubling the size of the partnership and making a total of 52 institutions that will participate in a constitutional convention next year. In this convention, partners will review repository governance and sustainability and determine directions for the next phase of HathiTrust…
HathiTrust partners contributed 2.6 million volumes to the repository in 2010, raising the total number of volumes to more than 7.8 million. Nearly 2 million volumes are in the public domain…
The Executive Committee approved a new cost model for HathiTrust in February 2010, which will be the basis of costs for all partners beginning in 2013. The new model is based on the overlap of partner institutions’ print collections with the digital volumes in HathiTrust. Institutions that do not have large amounts of content to deposit are able to join under the new model before 2013, and more than a dozen have already done so…
Over the summer, staff at Indiana University, the University of Wisconsin, and the University of Minnesota joined in work begun at the University of Michigan to review the copyright status of works in HathiTrust published from 1923 to 1963. More than 90,000 volumes have been reviewed since the project began two years ago and approximately 55% of those reviewed have been determined to be in the public domain.
The HathiTrust continues to be a ground-breaking organization that will change how large research libraries (and likely the entire library ecosystem) build, manage, and use collections.
wild blueberriesI don't really know how all the food gets to my table. Sure, I've gathered berries, baled hay, picked peas, baked bread and smoked fish, but I've never slaughtered a pig, (successfully) milked a cow or roasted coffee beans. In my grandparents generation, I would have seemed rather ignorant and useless. Agriculture has become an industry as specialized as any other modern industry; increasingly inaccessible to the layperson or small business.
I do know a bit about how data gets to my browser. It gets harvested by data farmers and data miners, it gets spun into databases, and then gets woven into your everyday information diet. Although you've probably heard of the "web of data", you're probably not even aware of being surrounded by data cloth.
The dataculture industry is very diverse, reflecting the diversity of human curiosity and knowledge. Common to all corners of the industry is the structural alchemy that transmutes formless bits into precious nuggets of information.
In many cases, this structuring of information is layered on top of conventional publishing. My favorite example of this is that the publishers of "Entertainment Week" extract facts out of their stories and structure them with an extensive ontology. Their ontologists (yes, EW has ontologists!) have defined an attribute "wasInRehabWith" so that they can generate a starlet's biography and report to you that she attended a drug rehabilitation clinic at the same time as the co-star of her current movie. Inquiring minds want to know!
If you look at location based services such as Facebook's "places", Foursquare, Yelp, Google Maps, etc, they will often present you with information pulled from other services. Often, a description comes from Wikipedia and reviews come from Yelp or Tripadvisor and photos come from Panoramio or Flickr. These services connect users to data using a common metadata backbone of Geotags. Data sets are pulled from source sites in various ways.
Some datasets are produced in data factories. I had a chance to see one of these "factories" on my trip to India last month. Rooms full of data technicians (women do the morning shift, men the evening) sit at internet connected computers and supervise the structuring of data from the internet. Most of the work is semi-automated, software does most of the data extraction. The technicians act as supervisors who step in when the software is too stupid to know when it's mangling things and when human input is really needed.
There's been a lot of discussion lately about how spammers are using data scraped from other websites and ruining the usefulness of Google's search results. There are plenty of companies that offer data scraping services to fuel this trend. Data scraping is the use of software that mimics human web browsing to visit thousands of web pages and capture the data that's on them. This works because large websites are generated dynamically out of databases; when machines assemble web pages, machines can disassemble them.
A look at the variety of data scraping companies reveals a broad spectrum. Scraping is an essential technology for dataculture; as with any technology, it can be used to many ends. One company boasts of their "massive network of stealth scrapers capable of downloading massive amounts of data without ever getting blocked. Some companies, such as Mozenda, offer software to license. Others, such as Xtractly and Addtoit are strictly service offerings.
I spoke to Addtoit's President, Bill Brown, about his industry. Addtoit got its start doing projects for Reuters and other firms in the financial industry; their client base has since become more "balanced". Companies such as Bloomberg, Reuters and D&B get paid premiums to provide environments rich in structured data by customers wanting a leg up on competitors. Brown's view is that the industry will move away from labor intensive operations to being completely automated, and Addtoit has developed accordingly.
A small number of companies, notably Best Buy, have realized that making their data easily available can benefit them by promoting commerce and competition. They have begun to use technologies such as RDFa to make it easy for machines to read data on their web sites; scraping becomes superfluous. RDFa is a method of embedding RDF metadata in HTML web pages; RDF is the general data model standardized by the W3C for use on the semantic web, which has been discussed much on this blog.
This doesn't work for many types of data. Brown sees very slow adoption of RDFa and similar technologies but thinks website data will gradually become easier to get at. Most websites are very simple, and their owners see little need or benefit in investing in newer website technologies. If people who really want the data can hire firms like Addtoit to obtain the data, most of the potential benefits to website owners of making their data available accrue without needing technology shifts.
The library industry is slowly freeing itself from the strictures of "library data" and is broadening its data horizons. For example, many libraries have found that genealogical databases are very popular with patrons. But there is a huge world of data out there waiting to be structured and made useful. One of the most interesting dataculture companies to emerge over the last year is ShipIndex. As you'd expect from the name, ShipIndex is a vast directory of information relating to ships. Just as place information is tied together with geoposition data, ShipIndex ties together the world of information by identifying ships and their occurrence in the world's literature. The URIs in ShipIndex are very suitable for linking from other resources.
The GötheborgShipIndex is proof that a "family farm" can still deliver value in the dataculture industry. The process used to build ShipIndex. Nonetheless, in coming years you should expect that technologies developed for the financial industry will see broader application and will lead to the creation of data products that you can scarcely imagine.
The business model for ShipIndex includes free access plus a fee-for-premium-access model. One question I have is how effectively libraries will be able leverage the premium data provided with this model. Imagine for example the value you might get from a connection between ShipIndex and a geneological database bound by passenger manifests. I would be able to discover the famous people who rode the same ship that my parents took to and from the US and Sweden (my mom rode the Stockholm on the crossing before it collided with the Andrea Doria). For now though, libraries struggle to leverage the data they have; better data licensing models are way down on the list of priorities for most libraries.
Peter McCrackenShipIndex was started by Peter and Mike McCracken, who I've known since 2000. Their previous company (SerialsSolutions) and my previous company (Openly Informatics) both had exhibit tables in the "Small Press" section of the American Library Association exhibit hall, where you'll often find the next generation of innovative companies serving the library industry. They'll be back in the Small Press Section at this weekend's ALA Midwinter meeting. Peter has promised to sing a "shanty" (or was that a scupper?) for anyone who signs up for a free trial. You could probably get Mike to do a break dance if you prefer.
I'll be floating around the meeting too. If you find me and say hello, I promise not to sing anything.
Yesterday I attended the RDA briefings from test participants session at ALA Midwinter. I only caught the tail end of Beacher Wiggins’ update from the Library Congress, but as I understand it, LC will announce their decision regarding the results of their testing of RDA by Annual, if not sooner. One thing Beacher said struck me: regardless of the decision, we live in a world of mixed data and will have to get used to it. Of course, that’s been the status quo for years, if not decades; RDA is now just the latest player in the metadata standards dance. At least one major academic library test partner has already made its decision about adopting RDA; Christopher Cronin from the University of Chicago, reported that the catalogers there made a unanimous decision to continuing cataloging in RDA after the test is completed.
Besides Christopher, several other test partners relayed their experiences: Penny Baker from the Clark Art Institute Library, Richard Hasenyager from the North East Independent School District, Kathryn La Barre from the UIUC GSLIS program, and Maritta Coppieters from Backstage Library Works. Here’s my idiosyncratic summary of the tester’s experiences:
I was very sorry that none of the public library testing partners gave a briefing. However, I think Richard Hasenyager’s conclusion about when and if the NE ISD will adopt RDA applies to many public libraries: the ISD is willing to adopt RDA, but they can’t do it by themselves; the systems and materials vendors need to have full support for RDA records before it is economical for the school district to proceed. If libraries are to shift from AACR2 to RDA, this must be addressed. There is already a divide between academic and public library catalogers; having academic libraries do original cataloging in RDA while public libraries copy catalog using whatever records they can get (thereby adopting RDA by default without necessarily being fully invested or trained in RDA) would not be an ideal outcome.
“Free citizens must be able to hold big institutions and powerful individuals to account.”
I attended a speech at the Institute for Government by UK Deputy Prime Minister Nick Clegg at which he outlined the government’s stance on civil liberties. This topic is one I am particularly passionate about as a citizen of two democracies, and as a lover of history and human communication, but what was there to interest a software evangelist?
Mr Clegg’s speech is available as a transcript from his party’s site, so you can have a look at the same words I heard. If you read through a lot of the political positioning (references to “Labour”, for non-UK readers, refer to the majority party of the previous government), you get to the bit that interests me as a Talisian as well as a human.
The final point talks about citizens having the right to public information, and the right to speak out about what government (and, notably, publicly-subsidised industry) is doing. The freedom of information and freedom of speech are under the same heading. As Clegg put it:
“It is a modern right to information combined with traditional freedom of expression.”
Examples are given of current transparency measures, including the publishing of particular datasets that are already being used in innovative ways and to hold the government accountable. It’s clear from the speech that transparency is a priority, and that publishing data is seen as fundamental to this.
The theme of balancing security and freedom is repeated throughout the talk, alluding to the fact that some information in any government is clearly going to be kept secret. But the emphasis is on publishing wherever possible, and it was interesting that this felt like the most specific theme of an otherwise very high-level speech. This is an area of public policy that has been changing through the launch of data.gov.uk and the continued efforts of two successive governments (and, interestingly, all three major UK parties) to put public data online. The idea that these datasets will be used, reused, mashed up and seed innovation is at the forefront of these talks. This isn’t just data that can be seen, it’s data that can be used.
So, this government seems committed to continuing the trend for transparency through public information, and for their data to be made available online and in useful ways. The emphasis in this speech, however, adds a new dimension to the commitment, at least the way I understand it. It’s not just that data is a right of any free citizen—the Prime Minister said as much before he was PM—but that this right goes hand-in-hand with the citizen’s right to free speech.
Government publishing its data online, free to reuse and feed applications that make it easier to interact with the information has been a huge step. Alongside this is the area of libel reform, which is a topic too big to get into here but involves the scrutiny of scientific and journalistic investigation without the fear of prosecution. (Guardian journalist Simon Singh discusses libel reform here.)
Although Mr Clegg’s talk is mostly general, discussing big ideas and leaving out specifics, I think the principles discussed were hugely important, and it is good to see a further commitment to public data. As a Talisian, it’s great because we work a lot with this kind of data, and it means we get to do more interesting things with it. As a citizen, it’s important that we can see more of what’s going on within government and that it is being considered fundamental enough to mention alongside freedom of speech and libel reform encourages me.
What I’d like to see this year is the specifics, now. What specific things will make publishing public data easier and more thorough?
Join us Sunday January 9, 2011 for the Top Technology Trends panel. The session will be live-blogged by TTT committee members; the live blog will also capture any messages posted to twitter with the hashtag #alamwttt.
You can also watch the video stream here.
Our new logo reflects this exciting chapter. It combines the enduring, classic shapes of a square and circle, representing the library as a strong community anchor. The use of bold red conveys positive energy, vibrancy and enthusiasm. Yet the new image also has a touch of mystery, reflecting exploration, curiosity and discovery.
The library is an essential spot for the community. Just as our new logo means something different to everyone who sees it, the Lawrence Public Library has something different for everyone who uses it. The library is yours to explore, experience and enjoy. [pdf]
PLoS One has done well. Very well. Well enough that it is spawning imitators. I was ready to write a Huge Bloggy Screed about the importance of Nature Publishing Group going in this direction with the launch of Scientific Reports, but Cameron Neylon scooped me in grand fashion. I endorse his analysis in its entirety—including, [...]
Any vendors reading this? If so MPOW, The Lunar and Planetary Institute (LPI), is considering a hosted solution for our catalog.
The LPI is a non-profit research center. We are eligible for GSA pricing. The Library contains about 25,000 titles and 65,000 items records. There about about 25,000 authority records. We have about 300 card-holders.
If your company has a SaaS ILS, and I've not yet contacted you, please send me information.
If anyone is showing their product at the Texas Library Conference I'd like to examine it there and talk to you in-person.
Active forum topics
There are currently 0 users and 7 guests online.