You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 8 hours 38 min ago

PeerLibrary: PeerLibrary participated at OpenCon 2014, the student and early...

Sat, 2014-11-22 10:52


PeerLibrary participated at OpenCon 2014, the student and early career researcher conference on Open Access, Open Education, and Open Data.

It was held November 15-17, 2014 in Washington, DC, and recordings of all sessions are available online. We presented PeerLibrary at the beginning of the Project Presentations 1 session (slides).

Photo: Aloysius Wilfred Raj Arokiaraj

Nicole Engard: Bookmarks for November 21, 2014

Fri, 2014-11-21 20:30

Today I found the following resources and bookmarked them on <a href=

Digest powered by RSS Digest

The post Bookmarks for November 21, 2014 appeared first on What I Learned Today....

Related posts:

  1. Compare RSS Readers
  2. Share your code
  3. Planning a party or event?

District Dispatch: Free webinar: The latest on Ebola

Fri, 2014-11-21 20:26

Photo by Phil Moyer

As the Ebola outbreak continues, the public must sort through all of the information being disseminated via the news media and social media. In this rapidly evolving environment, librarians are providing valuable services to their communities as they assist their users in finding credible information sources on Ebola, as well as other infectious diseases.

On Tuesday, December 12, 2014, library leaders from the U.S. National Library of Medicine will host the free webinar “Ebola and Other Infectious Diseases: The Latest Information from the National Library of Medicine.” As a follow-up to the webinar they presented in October, librarians from the U.S. National Library of Medicine will be discussing how to provide effective services in this environment, as well as providing an update on information sources that can be of assistance to librarians.

Speakers
  • Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. Champ-Blackwell selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. Champ-Blackwell has over 10 years of experience in providing training on NLM products and resources.
  • Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. Norton has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.

Date: December 12, 2014
Time: 2:00 PM–3:00 PM Eastern
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. To view past webinars also done in collaboration with iPAC, please visit Lib2Gov.org.

The post Free webinar: The latest on Ebola appeared first on District Dispatch.

M. Ryan Hess: Library as Digital Consultancy

Fri, 2014-11-21 20:14

As faculty and students delve into digital scholarly works, they are tripping over the kinds of challenges that libraries specialize in overcoming, such as questions regarding digital project planning, improving discovery or using quality metadata. Indeed, nobody is better suited at helping scholars with their decisions regarding how to organize and deliver their digital works than librarians.

At my institution, we have not marketed our expertise in any meaningful way (yet), but we receive regular requests for help by faculty and campus organizations who are struggling with publishing digital scholarship. For example, a few years ago a team of librarians at my library helped researchers from the University of Ireland at Galway to migrate and restructure their online collection of annotations from the Vatican Archive to a more stable home on Omeka.net. Our expertise in metadata standards, OAI harvesting, digital collection platforms and digital project planning turned out to be invaluable to saving their dying collection and giving it a stable, long-term home. You can read more in my Saved by the Cloud post.

These kinds of requests have continued since. In recognition of this growing need, we are poised to launch a digital consultancy service on our campus.

Digital Project Planning

A core component of our jobs is planning digital projects. Over the past year, in fact, we’ve developed a standard project planning template that we apply to each digital project that comes our way. This has done wonders at keeping us all up to date on what stage each project is in and who is up next in terms of the workflow.

Researchers are often experts at planning out their papers, but they don’t normally have much experience with planning a digital project. For example, because metadata and preservation are things that normally don’t come up for them, they overlook planning around these aspects. And more generally, I’ve found that just having a template to work with can help them understand how the experts do digital projects and give them a sense of the issues they need to consider when planning their own projects, whether that’s building an online exhibit or organizing their selected works in ways that will reap the biggest bang for the buck.

We intend to begin formally offering project planning help to faculty very soon.

Platform Selection

It’s also our job to keep abreast of the various technologies available for distributing digital content, whether that is harvesting protocols, web content management systems, new plugins for WordPress or digital humanities exhibit platforms. Sometimes researchers know about some of these, but in my experience, their first choice is not necessarily the best for what they want to do.

It is fairly common for me to meet with campus partners that have an existing collection online, but which has been published in a platform that is ill-suited for what they are trying to accomplish. Currently, we have many departments moving old content based in SQL databases to plain HTML pages with no database behind them whatsoever. When I show them some of the other options, such as our Digital Commons-based institutional repository or Omeka.net, they often state they had no idea that such options existed and are very excited to work with us.

Metadata

I think people in general are becoming more aware of metadata, but there is still lots of technical considerations that your typical researcher may not be aware of. At our library, we have helped out with all aspects of metadata. We have helped them clean up their data to conform to authorized terms and standard vocabularies. We have explained Dublin Core. We have helped re-encode their data so that diacritics display online. We have done crosswalking and harvesting. It’s a deep area of knowledge and one that few people outside of libraries know on a suitably deep level.

One recommendation for any budding metadata consultants that I would share is that you really need to be the Carl Sagan of metadata. This is pretty technical stuff and most people don’t need all the details. Stick to discussing the final outcome and not the technical details and your help will be far more understood and appreciated. For example, I once presented to a room of researchers on all the technical fixes to a database that we made to enhance and standardize the metadata, but his went over terribly. People later came up to me and joked that whatever it was we did, they’re sure it was important and thanked us for being there. I guess that was a good outcome since they acknowledged our contribution. But it would have been better had they understood, the practical benefits for the collection and users of that content.

SEO

Search Engine Optimization is not hard, but it is likely that few people outside of the online marketing and web design world know what it is. I often find people can understand it very quickly if you simply define it as “helping Google understand your content so it can help people find you.” Simple SEO tricks like defining and then using keywords in your headers will do wonders for your collection’s visibility in the major search engines. But you can go deep with this stuff too, so I like to gauge my audience’s appetite for this stuff and then provide them with as much detail as I think they have an appetite for.

Discovery

It’s a sad statement on the state of libraries, but the real discovery game is in the major search engines…not in our siloed, boutique search interfaces. Most people begin their searches (whether academic or not) in Google and this is really bad news for our digital collections since by and large, library collections are indexed in the deep web, beyond the reach of the search robots.

I recently tried a search for the title of a digital image in one of our collections in Google.com and found it. Yeah! Now I tried the same search in Google Images. No dice.

More librarians are coming to terms with this discovery problem now and we need to share this with digital scholars as they begin considering their own online collections so that they don’t make the mistakes libraries made (and continue to make…sigh) with our own collections.

We had one department at my institution that was sitting on a print journal that they were considering putting online. Behind this was a desire to bring the publication back to life since they had been told by one researcher in Europe that she thought the journal had been discontinued years ago. Unfortunately, it was still being published, it just wasn’t being indexed in Google. We offered our repository as an excellent place to do so, especially because it would increase their visibility worldwide. Unfortunately, they opted for a very small, non-profit online publisher whose content we demonstrated was not surfacing in Google or Google Scholar. Well, you can lead a horse to water…

Still, I think this kind of understanding of the discovery universe does resonate with many. Going back to our somewhat invisible digital images, we will be pushing many to social media like Flickr with the expectation that this will boost visibility in the image search engines (and social networks) and drive more traffic to our digital collections.

Usability

This one is a tough one because people often come with pre-conceived notions of how they want their content organized or the site designed. For this reason, sometimes usability advice does not go over well. But for those instances when our experiences with user studies and information architecture can influence a digital scholarship project, it’s time well spent. In fact, I often hear people remark that they “never thought of it that way” and they’re willing to try some of the expert advice that we have to share.

Such advice includes things like:

  • Best practices for writing for the web
  • Principles of information architecture
  • Responsive design
  • Accessibility support
  • User Experience design
Marketing

It’s fitting to end on marketing. This is usually the final step in any digital project and one that often gets dropped. And yet, why do all the work of creating a digital collection only to let it go unnoticed. As digital project expert, librarians are familiar with the various channels available to promote and build followers with tools like social networking sites, blogs and the like.

With our own digital projects, we discuss marketing at the very beginning so we are sure all the hooks, timing and planning considerations are understood by everyone. In fact, marketing strategy will impact some of the features of your exhibit, your choice of keywords used to help SEO, the ultimate deadlines that you set for completion and the staffing time you know you’ll need post launch to keep the buzz buzzing.

Most importantly, though, marketing plans can greatly influence the decision for which platform to use. For example, one of the benefits of Omeka.net (rather than self-hosted Omeka) is that any collection hosted with them becomes part of a network of other digital collections, boosting the potential for serendipitous discovery. I often urge faculty to opt for our Digital Commons repository over, say, their personal website, because anything they place in DC gets aggregated into the larger DC universe and has built-in marketing tools like email subscriptions and RSS feeds.

The bottom line here is that marketing is an area where librarians can shine. Online marketing of digital collections really pulls together all of the other forms of expertise that we can offer (our understanding of metadata, web technology and social networks) to fulfill the aim of every digital project: to reach other people and teach them something.


David Rosenthal: Steve Hetzler's "Touch Rate" Metric

Fri, 2014-11-21 16:41
Steve Hetzler of IBM gave a talk at the recent Storage Valley Supper Club on a new, scale-free metric for evaluating storage performance that he calls "Touch Rate". He defines this as the proportion of the store's total content that can be accessed per unit time. This leads to some very illuminating graphs that I discuss below the fold.

Steve's basic graph is a log-log plot with performance increasing up and to the right. Response time for accessing an object (think latency) decreases to the right on the X-axis and the touch rate, the proportion of the total capacity that can be accessed by random reads in a year (think bandwidth) increases on the Y-axis. For example, a touch rate of 100/yr means that random reads could access the entire contents 100 times a year. He divides the graph into regions suited to different applications, with minimum requirements for response time and touch rate. So, for example, transaction processing requires response times below 10ms and touch rates above 100 (the average object is accessed about once every 3 days).

The touch rate depends on the size of the objects being accessed. If you take a specific storage medium, you can use its specifications to draw a curve on the graph as the size varies. Here Steve uses "capacity disk" (i.e. commodity 3.5" SATA drives) to show the typical curve, which varies from being bandwidth limited (for large objects on the left, horizontal side) being response limited (for small objects on the right, vertical side).

As an example of the use of these graphs, Steve analyzed the idea of MAID (Massive Array of Idle Drives). He used HGST MegaScale DC 4000.B SATA drives, and assumed that at any time 10% of them would be spun-up and the rest would be in standby. With random accesses to data objects, 9 out of 10 of them will encounter a 15sec spin-up delay, which sets the response time limit. Fully powering-down the drives as Facebook's cold storage does would save more power but increase the spin-up time to 20s. The system provides only (actually somewhat less than) 10% of the bandwidth per unit content, which sets the touch rate limit.

The Steve looked at the fine print of the drive specifications. He found two significant restrictions:
  • The drives have a life-time limit of 50K start/stop cycles.
  • For reasons that are totally opaque, the drives are limited to a total transfer of 180TB/yr.
Applying these gives this modified graph. The 180TB/yr limit is the horizontal line, reducing the touch rate for large objects. If the drives have a 4-year life, we would need 8M start/stop cycles to achieve a 15sec response time. But we only have 50K. To stay within this limit, the response time has to increase by a factor of 8M/50K, or 160, which is the vertical line. So in fact a traditional MAID system is effective only in the region below the horizontal line and left of the vertical line, much smaller than expected.

This analysis suggests that traditional MAID is not significantly better than tapes in a robot. Here, for example, Steve examines configurations varying from one tape drive for 1600 LTO6 tapes, or 4PB per drive, to a quite unrealistically expensive 1 drive per 10 tapes, or 60TB per drive. Tape drives have a 120K lifetime load/unload cycle limit, and the tapes can withstand at most 260 full-file passes, so tape has a similar pair of horizontal and vertical lines.

The reason that Facebook's disk-based cold storage doesn't suffer from the same limits as traditional MAID is that it isn't doing random I/O. Facebook's system schedules I/Os so that it uses the full bandwidth of the disk array, raising the touch rate limit to that of the drives, and reducing the number of start-stop cycles. Admittedly, the response time for a random data object is now a worst-case 7 times the time for which a group of drives is active, but this is not a critical parameter for Facebook's application.

Steve's metric seems to be a major contribution to the analysis of storage systems.

Jenny Rose Halperin: Townhall, not Shopping Mall! Community, making, and the future of the Internet

Fri, 2014-11-21 15:59

I presented a version of this talk at the 2014 Futurebook Conference in London, England. They also kindly featured me in the program. Thank you to The Bookseller for a wonderful conference filled with innovation and intelligent people!

A few days ago, I was in the Bodleian Library at Oxford University, often considered the most beautiful library in the world. My enthusiastic guide told the following story:

After the Reformation when all the books in Oxford were burned, Sir Thomas Bodley decided to create a place where people could go and access all the world’s information at their fingertips, for free.

“What does that sound like?” she asked. “…the Internet?”

While this is a lovely conceit, the part of the story that resonated with me for this talk is the other big change that Bodley made, which was to work with publishers, who were largely a monopoly at that point, to fill his library for free by turning the library into a copyright library. While this seemed antithetical to the ways that publishers worked, in giving a copy of their very expensive books away, they left an indelible and permanent mark on the face of human knowledge. It was not only preservation, but self-preservation.

Bodley was what people nowadays would probably call “an innovator” and maybe even in the parlance of my field, a “community manager.”

By thinking outside of the scheme of how publishing works, he joined together with a group of skeptics and created one of the greatest knowledge repositories in the world, one that still exists 700 years later. This speaks to a few issues:

Sharing economies, community, and publishing should and do go hand in hand and have since the birth of libraries. By stepping outside of traditional models, you are creating a world filled with limitless knowledge and crafting it in new and unexpected ways.

The bound manuscript is one of the most enduring technologies. This story remains relevant because books are still books and people are still reading them.

As the same time, things are definitely changing. For the most part, books and manuscripts were pretty much identifiable as books and manuscripts for the past 1000 years.

But what if I were to give Google Maps to a 16th Century Map Maker? Or what if I were to show Joseph Pulitzer Medium? Or what if I were to hand Gutenberg a Kindle? Or Project Gutenberg for that matter? What if I were to explain to Thomas Bodley how I shared the new Lena Dunham book with a friend by sending her the file instead of actually handing her the physical book? What if I were to try to explain Lena Dunham?

These innovations have all taken place within the last twenty years, and I would argue that we haven’t even scratched the surface in terms of the innovations that are to come.

We need to accept that the future of the printed word may vary from words on paper to an ereader or computer in 500 years, but I want to emphasize that in the 500 years to come, it will more likely vary from the ereader to a giant question mark.

International literacy rates have risen rapidly over the past 100 years and companies are scrambling to be the first to reach what they call “developing markets” in terms of connectivity. In the vein of Mark Surman’s talk at the Mozilla Festival this year, I will instead call these economies post-colonial economies.

Because we (as people of the book) are fundamentally idealists who believe that the printed word can change lives, we need to be engaged with rethinking the printed word in a way that recognizes power structures and does not settle for the limited choices that the corporate Internet provides (think Facebook vs WhatsApp). This is not as a panacea to fix the world’s ills.

In the Atlantic last year, Phil Nichols wrote an excellent piece that paralleled Web literacy and early 20th century literacy movements. The dualities between “connected” and “non-connected,” he writes, impose the same kinds of binaries and blind cure-all for social ills that the “literacy” movement imposed in the early 20th century. In equating “connectedness” with opportunity, we are “hiding an ideology that is rooted in social control.”

Surman, who is director of the Mozilla Foundation, claims that the Web, which had so much potential to become a free and open virtual meeting place for communities, has started to resemble a shopping mall. While I can go there and meet with my friends, it’s still controlled by cameras that are watching my every move and its sole motive is to get me to buy things.

85 percent of North America is connected to the Internet and 40 percent of the world is connected. Connectivity increased at a rate of 676% in the past 13 years. Studies show that literacy and connectivity go hand in hand.

How do you envision a fully connected world? How do you envision a fully literate world? How can we empower a new generation of connected communities to become learners rather than consumers?

I’m not one of these technology nuts who’s going to argue that books are going to somehow leave their containers and become networked floating apparatuses, and I’m not going to argue that the ereader is a significantly different vessel than the physical book.

I’m also not going to argue that we’re going to have a world of people who are only Web literate and not reading books in twenty years. To make any kind of future prediction would be a false prophesy, elitist, and perhaps dangerous.

Although I don’t know what the printed word will look like in the next 500 years,

I want to take a moment to think outside the book,

to think outside traditional publishing models, and to embrace the instantaneousness, randomness, and spontaneity of the Internet as it could be, not as it is now.

One way I want you to embrace the wonderful wide Web is to try to at least partially decouple your social media followers from your community.

Twitter and other forms of social media are certainly a delightful and fun way for communities to communicate and get involved, but your viral campaign, if you have it, is not your community.

True communities of practice are groups of people who come together to think beyond traditional models and innovate within a domain. For a touchstone, a community of practice is something like the Penguin Labs internal innovation center that Tom Weldon spoke about this morning and not like Penguin’s 600,000 followers on Twitter. How can we bring people together to allow for innovation, communication, and creation?

The Internet provides new and unlimited opportunities for community and innovation, but we have to start managing communities and embracing the people we touch as makers rather than simply followers or consumers.

The maker economy is here— participatory content creation has become the norm rather than the exception. You have the potential to reach and mobilize 2.1 billion people and let them tell you what they want, but you have to identify leaders and early adopters and you have to empower them.

How do you recognize the people who create content for you? I don’t mean authors, but instead the ambassadors who want to get involved and stay involved with your brand.

I want to ask you, in the spirit of innovation from the edges

What is your next platform for radical participation? How are you enabling your community to bring you to the next level? How can you differentiate your brand and make every single person you touch psyched to read your content, together? How can you create a community of practice?

Community is conversation. Your users are not your community.

Ask yourself the question Rachel Fershleiser asked when building a community on Tumblr: Are you reaching out to the people who want to hear from you and encouraging them or are you just letting your community be unplanned and organic?

There reaches a point where we reach the limit of unplanned organic growth. Know when you reach this limit.

Target, plan, be upbeat, and encourage people to talk to one another without your help and stretch the creativity of your work to the upper limit.

Does this model look different from when you started working in publishing? Good.

As the story of the Bodelian Library illustrated, sometimes a totally crazy idea can be the beginning of an enduring institution.

To repeat, the book is one of the most durable technologies and publishing is one of the most durable industries in history. Its durability has been put to the test more than once, and it will surely be put to the test again. Think of your current concerns as a minor stumbling block in a history filled with success, a history that has documented and shaped the world.

Don’t be afraid of the person who calls you up and says, “I have this crazy idea that may just change the way you work…” While the industry may shift, the printed word will always prevail.

Publishing has been around in some shape or form for 1000 years. Here’s hoping that it’s around for another 1000 more.

District Dispatch: ALA Washington Office copyright event “too good to be true”

Fri, 2014-11-21 15:59

(Left to right) ALA Washington Office Executive Director Emily Sheketoff, Jonathan Band, Brandon Butler and Mary Rasenberger.

On Tuesday, November 18th, the American Library Association (ALA) held a panel discussion on recent judicial interpretations of the doctrine of fair use. The discussion, entitled “Too Good to be True: Are the Courts Revolutionizing Fair Use for Education, Research and Libraries?” is the first in a series of information policy discussions to help us chart the way forward as the ongoing digital revolution fundamentally changes the way we access, process and disseminate information.

These events are part of the ALA Office for Information Technology Policy’s broader Policy Revolution! initiative—an ongoing effort to establish and maintain a national public policy agenda that will amplify the voice of the library community in the policymaking process and position libraries to best serve their patrons in the years ahead.

Tuesday’s event convened three copyright experts to discuss and debate recent developments in digital fair use. The experts—ALA legislative counsel Jonathan Band; American University practitioner-in-practice Brandon Butler; and Authors Guild executive director Mary Rasenberger—engaged in a lively discussion that highlighted some points of agreement and disagreement between librarians and authors.

The library community is a strong proponent of fair use, a flexible copyright exception that enables use of copyrighted works without prior authorization from the rights holder. Fair use can be determined by the consideration of four factors. A number of court decisions issued over the last three years have affirmed the use of copyrighted works by libraries as fair, including the mass digitization of books housed in some research libraries, such as Authors Guild v. HathiTrust.

Band and Butler disagreed with Rasenberger on several points concerning recent judicial fair use interpretations. Band and Butler described judicial rulings on fair use in disputes like the Google Books case and the HathiTrust case as on-point, and rejected arguments that the reproductions of content at issue in these cases could result in economic injury to authors. Rasenberger, on the other hand, argued that repositories like HathiTrust and Google Books can in fact lead to negative market impacts for authors, and therefore do not represent a fair use.

Rasenberger believes that licensing arrangements should be made between authors and members of the library, academic and research communities who want to reproduce the content to which they hold rights. She takes specific issue with judicial interpretations of market harm that require authors to demonstrate proof of a loss of profits, suggesting that such harm can be established by showing that future injury is likely to befall an author as a result of the reproduction of his or her work.

Despite their differences of opinion, the panelists provided those in attendance at Tuesday’s event with some meaningful food for thought, and offered a thorough overview of the ongoing judicial debates over fair use. We were pleased that the Washington Internet Daily published an article “Georgia State Case Highlights Fair Use Disagreement Among Copyright Experts,” on November 20, 2014, about our session. ALA continues to fight for public access to information as these debates play out.

Stay tuned for the next event, planned for early 2015!

The post ALA Washington Office copyright event “too good to be true” appeared first on District Dispatch.

Cherry Hill Company: Deployment and Development workflows at Cherry Hill

Fri, 2014-11-21 15:58

Last year, we reached a milestone at Cherry Hill when we moved all of our projects into a managed deployment system. We have talked about Jenkins, one of the tools that we use to manage our workflow and there has been continued interest on what our "recipe" consists of. Being that we are using open source tools, and we think of ourselves as part of the (larger than Drupal) open source community, I want to share a bit more of what we use and how it is stitched together. Our hope is that this helps to spark a larger discussion of the tools others are using, so we can all learn from each other.

Git is a distributed code revision control system. While we could use any revision control system such as CSV, Subversion (and even though this is a given with most agencies, we strongly suggest you use *some* system over nothing at all), git is fairly easy to use, has great...

Read more »

Shelley Gullikson: Weekly user tests: Finding subject guides

Fri, 2014-11-21 14:54

This week we did a guerrilla-style test to see how (or if) people find our subject guides, particularly if they are not in our main listing. We asked “Pretend that someone has told you there is a really great subject guide on the library website about [subject]. What would you do to find it?” We cycled through three different subjects not listed on our main subject guide page: Canadian History, Ottawa, and Homelessness.

Some Context

Our subject guides use a template created in-house (not LibGuides) and we use Drupal Views and Taxonomy to create our lists. The main subject guide page has an A-Z list, an autocomplete search box, a list of broad subjects (e.g. Arts and Social Sciences) and a list of narrower subjects (e.g. Sociology). The list of every subject guide is on another page. Subject specialists were not sure if users would find guides that didn’t correspond to the narrower subjects (e.g. Sociology of Sport).

Results

The 21 students we saw did all kinds of things to find subject guides. We purposely used the same vocabulary as what is on the site because it wasn’t supposed to be a test about the label “subject guide.” However, less than 30% clicked on the Subject Guides link; the majority used some sort of search.

Here you can see the places people went to on our home page most (highlighted in red), next frequently (in orange) and just once (yellow).

When people used our site search, they had little problem finding the guide (although a typo stymied one person). However, a lot of participants used our Summon search. I think there are a couple of reasons for this:

  • Students didn’t know what a subject guide was and so looked for guides the way they look for articles, books, etc.
  • Students think the Summon search box is for everything

Of the 6 students who did click on the Subject Guides link:

  • 2 used broad subjects (and neither was successful with this strategy)
  • 2 used narrow subjects (both were successful)
  • 1 used the A-Z list (with success)
  • 1 used the autocomplete search (with success)

One person thought that she couldn’t possibly find the Ottawa guide under “Subject Guides” because she thought those were only for courses. I found this very interesting because a number of our subject guides do not map directly to courses.

The poor performance of the broad subjects on the subject guide page is an issue and Web Committee will look at how we might address that. Making our site search more forgiving of typos is also going to move up the to-do list. But I think the biggest takeaway is that we really have to figure out how to get our guides indexed in Summon.


Hathi Trust Large Scale Search: Practical Relevance Ranking for 11 Million Books, Part 3: Document Length Normalization.

Thu, 2014-11-20 23:21

In Part 2 we argued that most relevance ranking algorithms used for ranking text documents are based on three fundamental features:

read more

District Dispatch: ALA welcomes Simon &amp; Schuster change to Buy It Now program

Thu, 2014-11-20 21:34

ALA President Courtney Young

Today, the American Library Association (ALA) and its Digital Content Working Group (DCWG) welcomed Simon & Schuster’s announcement that it will allow libraries to opt into the “Buy It Now” program. The publisher began offering all of its ebook titles for library lending nationwide in June 2014, with required participation in the “Buy It Now” merchandising program, which enables library users to directly purchase a title rather than check it out from the library. Simon & Schuster ebooks are available for lending for one year from the date of purchase.

In an ALA statement, ALA President Courtney Young applauded the move:

From the beginning, the ALA has advocated for the broadest and most affordable library access to e-titles, as well as licensing terms that give libraries flexibility to best meet their community needs.

We appreciate that Simon & Schuster is modifying its library ebook program to provide libraries a choice in whether or not to participate in Buy It Now. Providing options like these allow libraries to enable digital access while also respecting local norms or policies. This change also speaks to the importance of sustaining conversations among librarians, publishers, distributors and authors to continue advancing our shared goals of connecting writers and readers.

DCWG Co-Chairs Carolyn Anthony and Erika Linke also commented on the Simon & Schuster announcement:

“We are still in the early days of this digital publishing revolution, and we hope we can co-create solutions that expand access, increase readership and improve exposure for diverse and emerging voices,” said. “Many challenges remain including high prices, privacy concerns, and other terms under which ebooks are offered to libraries. We are continuing our discussions with publishers.”

For more library ebook lending news, visit the American Libraries magazine E-Content blog.

The post ALA welcomes Simon & Schuster change to Buy It Now program appeared first on District Dispatch.

In the Library, With the Lead Pipe: Introducing Library Pipeline

Thu, 2014-11-20 21:30

South Coast Pipe by Colm Walsh (CC-BY)

In Brief: We’re creating a nonprofit, Library Pipeline, that will operate independently from In the Library with the Lead Pipe, but will have similar and complementary aims: increasing and diversifying professional development; improving strategies and collaboration; fostering more innovation and start-ups, and encouraging LIS-related publishing and publications. In the Library with the Lead Pipe is a platform for ideas; Library Pipeline is a platform for projects.

At In the Library with the Lead Pipe, our goal has been to change libraries, and the world, for the better. It’s on our About page: We improve libraries, professional organizations, and their communities of practice by exploring new ideas, starting conversations, documenting our concerns, and arguing for solutions. Those ideas, conversations, concerns, and solutions are meant to extend beyond libraries and into the societies that libraries serve.

What we want to see is innovation–new ideas and new projects and collaborations. Innovative libraries create better educated citizens and communities with stronger social ties.

Unfortunately, libraries’ current funding structures and the limited professional development options available to librarians make it difficult to introduce innovation at scale. As we started talking about a couple of years ago, in our reader survey and in a subsequent editorial marking our fourth anniversary, we need to extend into other areas, besides publication, in order to achieve our goals. So we’re creating a nonprofit, Library Pipeline, that will operate independently from In the Library with the Lead Pipe, but will have similar and complementary aims.

Library Pipeline is dedicated to supporting structural changes by providing opportunities, funding, and services that improve the library as an institution and librarianship as a profession. In the Library with the Lead Pipe, the journal we started in 2008, is a platform for ideas; Library Pipeline is a platform for projects. Although our mission is provisional until our founding advisory board completes its planning process, we have identified four areas in which modest funding, paired with guidance and collaboration, should lead to significant improvements.

Professional Development

A few initiatives, notably the American Library Association’s Emerging Leaders and Spectrum Scholars programs, increase diversity and provide development opportunities for younger librarians. We intend to expand on these programs by offering scholarships, fellowships, and travel assistance that enable librarians to participate in projects that shift the trajectory of their careers and the libraries where they work.

Collaboration

Organized, diverse groups can solve problems that appear intractable if participants have insufficient time, resources, perspective, or influence. We would support collaborations that last a day, following the hack or camp model, or a year or two, like task forces or working groups.

Start-ups

We are inspired by incubators and accelerators, primarily YCombinator and SXSW’s Accelerator. The library and information market, though mostly dormant, could support several dozen for-profit and nonprofit start-ups. The catalyst will be mitigating founders’ downside risk by funding six months of development, getting them quick feedback from representative users, and helping them gain customers or donors.

Publishing

Librarianship will be stronger when its practitioners have as much interest in documenting and serving our own field as we have in supporting the other disciplines and communities we serve. For that to happen, our professional literature must become more compelling, substantive, and easier to access. We would support existing open access journals as well as restricted journals that wish to become open access, and help promising writers and editors create new publications.

These four areas overlap by design. For example, we envision an incubator for for-profit and nonprofit companies that want to serve libraries. In this example, we would provide funding for a diverse group of library students, professionals, and their partners who want to incorporate, and bring this cohort to a site where they can meet with seasoned librarians and entrepreneurs. After a period of time, perhaps six months, the start-ups would reconvene for a demo day attended by potential investors, partners, donors, and customers.

Founding Advisory Board

We were inspired by the Constellation Model for our formation process, as adapted by the Digital Public Library of America and the National Digital Preservation Alliance (see: “Using Emergence to Take Social Innovation to Scale”). Our first step was identifying a founding advisory board, whose members have agreed to serve a two-year term (July 2014-June 2016). At the end of which the Board will be dissolved and replaced with a permanent governing board. During this period, the advisory board will formalize and ratify Library Pipeline’s governance and structure, establish its culture and business model, promote its mission, and define the organizational units that will succeed the advisory board, such as a permanent board of trustees and paid staff.

The members of our founding advisory board are:

The board will coordinate activity among, and serve as liaisons to, the volunteers on what we anticipate will eventually be six subcommittees (similar to DPLA’s workstreams). This is going to be a shared effort; the job is too big for ten people. Those six subcommittees and their provisional charges are:

  • Professional Development within LIS (corresponding to our “Professional Development” area). Provide professional development funding, in the form of scholarships, fellowships, or travel assistance, for librarians or others who are working in behalf of libraries or library organizations, with an emphasis on participation in cross-disciplinary projects or conferences that extend the field of librarianship in new directions and contribute to increased diversity among practitioners and the population we serve.
  • Strategies for LIS (corresponding to “Collaboration”). Bring together librarians and others who are committed to supporting libraries or library-focused organizations. These gatherings could be in-person or online, could last a day or could take a year, and could be as basic as brainstorming solutions to a timely, significant issue or as directed as developing solutions to a specific problem.
  • Innovation within LIS (corresponding to “Start-Ups”). Fund and advise library-related for-profit or nonprofit startups that have the potential to help libraries better serve their communities and constituents. We believe this area will be our primary focus, at least initially.
  • LIS Publications (corresponding with “Publishing”). Fund and advise LIS publications, including In the Library with the Lead Pipe. We could support existing open access journals or restricted journals that wish to become open access, and help promising writers and editors create new publications.
  • Governance. This may not need to be a permanent subcommittee, though in our formative stages it would be useful to work with people who understand how to create governance structures that provide a foundation that promotes stability and growth.
  • Sustainability. This would include fundraising, but it also seems to be the logical committee for creating the assessment metrics we need to have in place to ensure that we are fulfilling our commitment to libraries and the people who depend on them.
How Can You Help?

We’re looking for ideas, volunteers, and partners. Contact Brett or Lauren if you want to get involved, or want to share a great idea with us.

District Dispatch: ALA and E-rate in the press

Thu, 2014-11-20 21:21

For nearly a year-and-a-half, the FCC has been engaged in an ongoing effort to update the E-rate program for the digital age. The American Library Association (ALA) has been actively engaged in this effort, submitting comments and writing letters to the FCC and holding meetings with FCC staff and other key E-rate stakeholders.

Our work on the E-rate modernization has drawn the attention of several media outlets over the past week, as the FCC prepares to consider an order that we expect to help libraries from the most populated cities to the most rural areas meet their needs related to broadband capacity and Wi-Fi:

The FCC Plans to Increase Your Phone Bill to Build Better Internet in Schools (ALA quoted)
E-Rate Funding Would Get Major Boost Under FCC Chair’s Plan
FCC’s Wheeler Draws Fans With E-Rate Cap Hike
Is expanding Wi-Fi to 10 million more students worth a cup of coffee?

ALA was also mentioned in articles from CQ Roll Call and PoliticoPro on Monday.

The new E-rate order is the second in the E-rate modernization proceeding. The FCC approved a first order on July 11th, which focuses on Wi-Fi and internal connections. ALA applauds the FCC for listening to our recommendations throughout the proceeding. Its work reflects an appreciation for all that libraries do to serve community needs related to Education, Employment, Entrepreneurship, Empowerment, and Engagement—the E’s of Libraries.

The post ALA and E-rate in the press appeared first on District Dispatch.

Open Library: Open Library Scheduled Hardware Maintenance

Thu, 2014-11-20 17:10

Image from Automobile maintenance by Ray F. Kuns

Open Library will be down from 4:30PM to approximately 5:00PM (PST, UTC/GMT -7 hours) on Thursday November 20, 2014 due to scheduled hardware maintenance. We’ll post updates here and on @openlibrary twitter. Thank you for your cooperation.

FOSS4Lib Recent Releases: Torus - 2.30

Thu, 2014-11-20 16:42
Package: TorusRelease Date: Thursday, November 20, 2014

Last updated November 20, 2014. Created by Peter Murray on November 20, 2014.
Log in to edit this page.

2.30 Thu 20 Nov 2014 11:34:12 CET

- MKT-168: fix parent's 'created' lost during update

- MKT-170: bootstrap 'originDate' for non-inherit records

OCLC Dev Network: Learning Linked Data: SPARQL

Thu, 2014-11-20 16:15

One thing you realize pretty quickly is that it is very hard to work with Linked Data and just confine one’s explorations to a single site or data set. The links inevitably lead you on a pilgrimage from one data set to another and another. In the case of the WorldCat Discovery API, my pilgrimage led me from WorldCat to id.loc.gov, FAST and VIAF and from VIAF on to dbpedia. Dbpedia is an amazingly fun data set to play with. Using it to provide additional richness and context to the discovery experience has been enlightening.

HangingTogether: Libraries &amp; Research: Changes in libraries

Thu, 2014-11-20 14:16

[This is the fourth in a short series on our 2014 OCLC Research Library Partnership meeting, Libraries and Research: Supporting Change/Changing Support. You can read the firstsecond, and third posts and also refer to the event webpage that contains links to slides, videos, photos, and a Storify summary.]

And now, onward to the final session of the meeting, which focused appropriately enough on changes in libraries, which include new roles and and preparing to support future service demands. They are engaging in new alliances and are restructuring themselves to prepare for change in accordance with their strategic plans.

[Paul-Jervis Heath, Lynn Silipigni Connaway, and Jim Michalko]

Lynn Silipigni Connaway (Senior Research Scientist, OCLC Research) [link to video] shared the results of several studies that identify the importance of user-centered assessment and evaluation. Lynn has been working actively in this area since 2003, looking at not only researchers but also future researchers (students!). In interviews on virtual reference, focusing on perspective users, Lynn and her team found that students use Google and Wikipedia but also rely on human resources — other students, advisers, graduate students and faculty. In looking through years of data, interviewees tend to use generic terms like “database” and refer to specific tools and sources only when they are further along in their career — this doesn’t mean they don’t use them, rather, they get used to using more sophisticated terminology as they go along. No surprise, convenience trumps everything; researchers at all levels are eager to optimize their time so many “satisfice” if the assignment or task doesn’t warrant extra time spent. From my perspective, one of the most interesting findings from Lynn’s studies relates to students’ somewhat furtive use of Wikipedia, which she calls the Learning Black Market (students look up something in Google, find sources in Wikipedia, copy and paste the citation into their paper!). Others use Facebook to get help. Some interesting demographic differences — more established researchers use Twitter, and use of Wikipedia declines as researchers get more experience. In regards to the library, engagement around new issues (like data management) causes researchers to think anew about ways the library might be useful. Although researchers of all stripes will reach out to humans for help, librarians rank low on that list. Given all of these challenges, there are opportunities for librarians and library services — be engaging and be where researchers are, both physically and virtually. We should always assess what we are doing — keep doing what’s working, cut or reinvent what is not. Lynne’s presentation provides plenty of links and references for you to check out.

Paul-Jervis Heath (Head of Innovation & Chief Designer, University of Cambridge) [link to video] spoke from the  perspective of a designer, not a librarian (he has worked on smart homes, for example). He shared findings from recent work with the Cambridge University libraries. Because of disruption, libraries face a perfect storm of change in teaching, funding, and scholarly communications. User expectations are formed by consumer technology. While we look for teachable moments, Google and tech companies do not — they try to create intuitive experiences. Despite all the changes, libraries don’t need to sit on the sidelines, they can be engaged players. Design research is important and distinguished from market research in that it doesn’t measure how people think but how they act. From observation studies, we can see that students want to study together in groups, even if they are doing their own thing. The library needs to be optimized for that. Another technique employed, asking students to use diaries to document their days. Many students prefer the convenience of studying in their room but what propels them to the library is the desire to be with others in order to focus. At Cambridge, students have a unique geographic triangle defined by where they live, the department where they go to class, and the market they prefer to shop in. Perceptions about how far something (like the library) is outside of the triangle are relative. Depending on how far your triangle points are, life can be easy or hard. Students are not necessarily up on technology so don’t make assumptions. It turns out that books (the regular, paper kind) are great for studying! But students use ebooks to augment their paper texts, or will use when all paper books are gone. Shadowing (with permission) is another technique which allows you to immerse yourself in a researcher’s life and understand their mental models. Academics wear lot of different hats, play different roles within the university and are too pressed for time to learn new systems. It’s up to the library to create efficiencies and make life easier for researchers. Paul closed by emphasizing six strategic themes: transition from physical to digital; library spaces; sustainable classic library services; supporting research and scholarly communications; making special collections more available; and creating touchpoints that will bring people back to the library seamlessly.

Jim Michalko (Vice President, OCLC Research Library Partnership) [link to video] talked about his recent work looking at library organizational structures and restructuring. (Jim will be blogging about this work soon, so I won’t give more than a few highlights.) For years, libraries have been making choices about what to do and how to do it, and libraries have been reorganizing themselves to get this (new) work done. Jim gathered feedback from 65 institutions in the OCLC Research Library Partnership and conducted interviews with a subset of those, in order to find out if structure indeed follows strategy. Do new structures represent markets or adjacent strategies (in business speak)? We see libraries developing capacities in customer relationship management and we see this reflected in user-focused activities. Almost all institutions interviewed were undertaking restructuring based on a changes external to the library, such as new constituencies and expectations. Organizations are orienting themselves to be more user centered, and to align themselves with a new direction taken by the university. We see many libraries bringing in skill sets beyond those normally found in the library package. Many institutions charged a senior position with helping to run a portion of a regional or national service. Other similarities: all had a lot of communication about restructuring. Almost all also related to a space plan.

This session was followed by a discussion session and I invite you to watch it, and also to watch this lovely summary of our meeting delivered by colleague Titia van der Werf (less than 7 minutes long and worth watching!):

If you attended the meeting or were part of the remote viewing audience for all or part of it, or if you watched any of the videos, I hope you will leave some comments with your reactions. Thanks for reading!

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (274)

Library of Congress: The Signal: All the News That’s Fit to Archive

Thu, 2014-11-20 14:03

The following is a guest post from Michael Neubert, a Supervisory Digital Projects Specialist at the Library of Congress.

The Library has had a web archiving program since the early 2000s.  As with other national libraries, the Library of Congress web archiving program started out harvesting the web sites of its national election campaigns, followed by some collections to harvest sites for period of time connected with events (for example, an Iraq War web archive and a papal transition 2005 web archive along with collecting the sites of the U.S. House and Senate and the legislative branch of government more broadly.

An American of the 1930s getting his news by reading a newspaper. These days he’d likely be looking at a computer screen. Photo courtesy of the Library of Congress Prints and Photographs division.

The question for the Library of Congress of “what else” to harvest beyond these collections is harder to answer than one might think because of the relatively small web archiving capacity of the Library of Congress (which is influenced by our permissions approach) compared to the vast immenseness of the Internet.  About six years ago we started a collection now known as the Public Policy Topics, for which we would acquire sites with content reflecting different viewpoints and research on a broad selection of public policy questions, including the sites of national political parties, selected advocacy organizations and think tanks and other organizations with a national voice in America’s policy discussions that could be of interest to future researchers.  We are adding more sites to Public Policy Topics continuously.

Eventually I decided to include some news web sites that contained significant discussion of policy issues from particular points of view – sites ranging from DailyKos.com to Townhall.com, from TruthDig.com to Redstate.com.  We started crawling these sites on a weekly basis to try to assure complete capture over time and to build a representation of how the site looked as different news events came and went in the public consciousness (and on these web sites).  We have been able to assess the small number of such sites that we have crawled and have decided that the results are acceptable.  But this was obviously not a very large-scale effort compared to the increasing number of sites presenting general news on the Internet -for many people, their current equivalent of a newspaper.

Newspapers – they are a critical source for historical research and the Library of Congress has a long history of collecting and providing access to U.S. (and other countries’) newspapers.  Having started to collect a small number of “newspaper-like” U.S. news sites for the Public Policy Topics collection, I began a conversation with three reference librarian colleagues from the Newspaper & Current Periodical Reading Room – Amber Paranick, Roslyn Pachoca and Gary Johnson ­- about expanding this effort to a new collection, a “General News on the Internet” web archive.  They explained to me:

Our newspaper collections are invaluable to researchers.  Newspapers provide a first-hand draft of history.  They provide supplemental information that cannot be found anywhere else.  They ‘fill in the gaps,’ so to speak. The way people access news has been changing and evolving ever since newspapers were first being published. We recognized the need to capture news published in another format.  It is reasonable to expect us to continue to connect these kinds of resources to our current and future patrons. Websites tend to be ephemeral and may disappear completely.  Without a designated archive, critical news content may be lost.

In short, my colleagues shared my interest, concern and enthusiasm for starting a larger collection of Internet-only general news sites as a web archiving collection.  I’ll let them explain their thinking further:

When we first got started on the project, we weren’t sure how to proceed.  Once we established clear boundaries on what to include, what types of news sites would be within scope for this collection, our selection process became easier. We asked for help in finding websites from our colleagues. 

We felt it was important to include sites that focus on general news with significant national presence where there are articles that have an author’s voice, such as with HuffingtonPost.com or BuzzFeed.com (even as some of these sites also contain articles that are meant to attract visitors, so-called “click bait).  We wanted to include a variety of sites that represent more cutting edge ways of presenting general news, such as Vox.com and TheVerge, and we felt sites that focus on parody such as TheOnion.com were also important to have represented.  Of course, these sites are not the only sources from which people obtain their news, but we tried to choose a variety that included more trendy or popular sources as well as the conventional or traditional types.  Again, the idea is to assure future users have access to a significant representation of how Americans accessed news at this time using the Internet.

The Library of Congress has an internal process for proposing new web archiving collections.  I worked with Amber, Roslyn and Gary and they submitted a “General News on the Internet” project proposal and it was approved.  Yay!  Then the work began – Amber, Roslyn and Gary describe some of the hurdles:

We understand that archiving video content is a problem. We thought websites like NowThisNews.com could be great candidates but in effect, because they contained so much video and a kind of Tumblr-like portal entry point for news, we had to reject them.  Since we do not do “one hop out” crawling, the linked-to content that is the substantive content (i.e., the news) would be entirely missed.   Also, websites like Vice.com change their content so frequently, it might be impossible to capture all of its content.

In addition, it was decided that sites chosen would not include general news sites associated primarily with other delivery vehicles, such as CNN.com or NYTimes.com.  Many of these types also have paywalls and therefore obviously would create limitations when trying to archive.

We also encountered another type of challenge with Drudgereport.com.  Since it is primarily a news-aggregator with most of the site consisting of links to news on other sites it would be tough to include the many links with the limitations in crawling (again, the “one hop” limitation – we don’t harvest links that are on a different URL).  In the end we decided to proceed in archiving The Drudge Report site since it is well known for the content that is original to that site.

The harvesting for this collection has now been underway for several months; we are examining the results.  We look forward to making an archived version of today’s news as brought to you by the Internet available to Library of Congress patrons for many tomorrows.

What news sites do you think we should collect?

Pages