You are here

Feed aggregator

PeerLibrary: Check out our brand new screencast video of PeerLibrary 0.3! We...

planet code4lib - Wed, 2014-11-12 06:11

Check out our brand new screencast video of PeerLibrary 0.3!

We are proud to announce an updated screencast which demos the increased functionality and updated user interface of the PeerLibrary website. This screencast debuted at the Mozilla Festival in October as part of our science fair presentation. The video showcases an article by Paul Dourish and Scott D. Mainwaring entitled “Ubicomp’s Colonial Impulse” as well as the easy commenting and discussion features which PeerLibrary emphasizes. One of the MozFest conference attendees actually recognized the article which drew him towards our booth and into a conversation with our team. Check out the new screencast and let us know what you think!

DuraSpace News: The DSpace 5 Testathon is Underway!

planet code4lib - Wed, 2014-11-12 00:00

From Hardy Pottinger on behalf of the DSpace Committers

The DSpace 5.0 Testathon is going on right now, and will continue through November 21, 2014.


• Details on how to participate: see [1]

• Details about new features, bug fixes in 5.0 and release schedule: see [2]

District Dispatch: ALA welcomes President Obama’s strong affirmation of net neutrality

planet code4lib - Tue, 2014-11-11 20:49

Yesterday President Barack Obama re-affirmed his commitment to network neutrality principles and to the strongest rules to protect the open internet. The American Library Association (ALA) welcomes his statement and outline of principles that echo those of public comments filed by the ALA and a coalition of library and higher education organizations this year.

“The ALA heartily agrees with the essential elements of network neutrality affirmed by President Obama: no blocking, no throttling, increased transparency, and no paid prioritization,” said ALA Incoming President Sari Feldman. “As the President noted, these elements are ‘built into the fabric of the internet since its creation.’ In fact, the initial protocols for the internet were developed by institutions of higher education, and universities were the first to deploy private high-speed data networks that formed the test-bed for what later became the public internet.

“Since then, our nation’s libraries and institutions of higher education have become leaders in creating, fostering, using, extending and maximizing the potential of the internet for research, education and the public good. An open “neutral” internet is absolutely crucial to fulfill our missions and serve our communities.

“Further, we are heartened that both the President and recent statements from FCC Chairman Tom Wheeler reflect an understanding that network neutrality must apply to both fixed and mobile broadband. We look forward to continuing to work with the FCC to secure strong, legally enforceable rules that ensure the internet remains an open platform for information exchange, intellectual discourse, creativity, innovation and learning for all,” Feldman concluded.

The post ALA welcomes President Obama’s strong affirmation of net neutrality appeared first on District Dispatch.

District Dispatch: ALA to host copyright policy discussion in Washington, D.C.

planet code4lib - Tue, 2014-11-11 16:42
#174601328 /

The way we access and use information in the digital age is fundamentally mediated by copyright policy. For several decades, this policy has been largely shaped by commercial interests. However, in the last three years, several court decisions have been more protective of public access to information and accommodating to the needs of the education, research, and library sectors. Is this a real trend and will it continue?

On November 18, 2014, the American Library Association (ALA) will host “Too Good to Be True: Are the Courts Revolutionizing Fair Use for Education, Research and Libraries?,” a symposium that will explore copyright policy in a digital and networked environment. During the discussion, a diverse panel of copyright policy experts from the library and publishing fields will attempt to make sense of key court cases such as UCLA v. AIME, Authors Guild v. HathiTrust, and the high profile U.S. Supreme Court case Kirtsaeng v. Wiley. These experts will discuss the prospects these decisions may create for public policy development over the next few years informed by the 2014 midterm elections and the upcoming 2016 general election. RSVP for the event.

This event is offered under the rubric of the Policy Revolution! Initiative of ALA’s Office for Information Technology Policy (OITP). Central to this initiative is strengthening the library community’s engagement and visibility in national public policy. Look for additional outreach activities in 2015.

Panelists include:

Mary Rasenberger is the newly appointed Executive Director for the Authors Guild. Mary has worked in the area of intellectual property, technology, and copyright law for 25 years. Prior to joining the Authors Guild, Mary was a partner at Cowan DeBaets Abrahams and Sheppard where she counseled publishing, media, entertainment, internet, and other technology companies, as well as authors and artists in all areas of copyright and related rights, including licensing, litigation, infringement analysis, policy, enforcement and digital rights. From 2002 to 2008, Mary worked for the U.S. Copyright Office and Library of Congress as senior policy advisor and program director for the National Digital Preservation Program. Mary also worked at other major New York law firms and for BMG Music.

Jonathan Band has represented a wide range of clients, including technology companies and library associations, on domestic and international copyright policy matters for more than 25 years. He has filed amicus briefs on behalf of the Library Copyright Alliance (LCA) in numerous important cases, such as Kirtsaeng v. Wiley, Authors Guild v. HathiTrust, Authors Guild v. Google, and the recently decided Georgia State e-reserves case. He also has represented the Library Copyright Alliance in connection with the Marrakesh Treaty for the print-disabled and the U.S. House Judiciary Committee’s ongoing review of copyright.

Brandon Butler is the practitioner-in-residence at the Glushko-Samuelson Intellectual Property Clinic at American University’s Washington College of Law (WCL). At the clinic, Professor Butler supervises student attorneys who represent clients in a variety of IP matters. Before joining the WCL faculty, Brandon was the director of Public Policy Initiatives at the Association of Research Libraries (ARL). While there, he worked on a host of issues ranging from fair use to network neutrality to the PATRIOT Act. He is a co-facilitator, with Professors Peter Jaszi and Patricia Aufderheide, of the “ARL Code of Best Practices in Fair Use for Academic and Research Libraries,” released in January 2012.

RSVP now if you would like to attend the no-cost event.

The post ALA to host copyright policy discussion in Washington, D.C. appeared first on District Dispatch.

Patrick Hochstenbach: Landmarks in Ghent Belgium

planet code4lib - Tue, 2014-11-11 15:49
Never thought I would have this much fun doodleing architecture.   Filed under: Doodles Tagged: architecture, belgië, belgium, booktower, Gent, ghent, msk, rijksarchief, smak, stpietersabdij, vooruit, waalse krook, wintercircus

Code4Lib: Voting for Code4Lib 2015 Prepared Talks is now open.

planet code4lib - Tue, 2014-11-11 15:39

The Code4Lib 2015 Program Committee is happy to announce that voting is now open for prepared talks.

To vote, visit, review the proposals, and assign points to those presentations you would like to see on the program this year.

You will need to log in with your account in order to vote. If you have any issues with your account, please contact Ryan Wick at

Voting will end on Tuesday, November 25, 2014 at 11:59:59 PM PT (GMT-8).

The top 10 proposals are guaranteed a slot at the conference. The Program Committee will curate the remainder of the program in an effort to ensure diversity in program content and presenters. Community votes will still weigh heavily in these decisions.

The final list of presentations will be announced in early- to mid-December.

For more information about Code4Lib 2015, visit

Patrick Hochstenbach: Creative Commons

planet code4lib - Tue, 2014-11-11 15:36
My art contribution to creative commons peeps all over the world. All images below are available with at Creative Commons Attribution 4.0 license. Filed under: Comics, Doodles Tagged: cartoon, copyright, creative commons, open access, public domain

Open Knowledge Foundation: France Prefers to Pay (twice) for Papers by Its Researchers

planet code4lib - Tue, 2014-11-11 14:39

France may not have any money left for its universities but it does have money for academic publishers.

While university presidents learn that their funding is to be reduced by EUR 400 million, the Ministry of Research has decided, under great secrecy, to pay EUR 172 million to the world leader in scientific publishing Elsevier .

In an exclusive piece published by the French news outlet Rue89 (Le Monde press group), Open Knowledge France members and open science evangelists Pierre-Carl Langlais and Rayna Stamboliyska released the agreement between the French Ministry and Elsevier. The post originally appeared here, in French.

Des fioles (Erlenmeyer), dans une classe de science (Lokesh Dhakar/Flickr/CC)

The Work of Volunteers

The scientific publishing market is an unusual sector, those who create value are never remunerated. Instead, they often pay to see their work published. Authors do not receive any direct financial gain from their articles, and the peer review is conducted voluntarily.

This enormous amount of work is indirectly funded by public money. Writing articles and participating in peer review are part of the expected activities of researchers, expected activities that lead to further research funding from the taxpayer.

Scientific publishing is centred around several privately-held publishing houses who own the journals where scientific research is published. Every journal has an editorial review board who receive potential contributions which are then sent to volunteer scientists for peer review. It is on the basis of comments and feedback from the peer review process that a decision is made whether an article is to be published or rejected and returned to the author(s).

When the article is accepted, the authors usually sign their copyright over to the publishers to sell access to the work, or can choose to make their work available to everyone, which oftentimes involves paying a given sum. In some cases journals only receive income for the service of publishing an article which is henceforth free to the consumer, but some journals have a mixed ‘hybrid’ selection so authors pay to publish some articles and their library still pays to purchase the rest of the journal. This is called ‘double dipping’ and while publishers claim they take it into account in their journal pricing, the secrecy around publisher contracts and lack of data means it is impossible to tell where money is flowing.

Huge Profit Margins

This is important because access to these journals is rarely cheap and publishers sell access primarily to academic libraries and research laboratories. In other words, financial resources for the publication of scientific papers come from credits granted to research laboratories; access to the journals these papers are published in is purchased by these same institutions. In both cases, these purchases are subsidies by the public.

The main actors in scientific publishing generate considerable income. In fact, the sector is dominated by an oligopoly with “the big four” sharing most of the global pie:

  • The Dutch Elsevier
  • The German Springer
  • The American Wiley
  • The English Informa

They draw huge profits: from 30% to 40% annual net profit in the case of Elsevier and Springer.

In other words, these four major publishers resell to universities content that the institutions themselves have produced.

In this completely closed market, competition does not exist, and pre-existing agreement is the rule: subscription prices have continued to soar for thirty years, while the cost of publishing, in the era of electronic publishing, has never been lower. For example, the annual subscription to Elsevier’s journal ‘Brain Research’ costs a whopping 15,000 EUR.

The Ministry Shoulders This Policy

The agreement between France and Elsevier amounted to ca. EUR 172 million for 476 universities and hospitals.

The first payment (approximately EUR 34 million of public money) was paid in full in September 2014. In return, 476 public institutions will have access to a body of about 2,000 academic journals.

This published research was mainly financed by public funds. Therefore in the end, we will have paid to Elsevier twice: once to publish, a second time to read.

This is not a blip. The agreement between Elsevier and the government is established policy. In March 2014, Geneviève Fioraso, Minister of Higher Education and Research, elaborated upon the main foci of her political agenda to the Academy of Sciences;two of which involve privileged interactions with Elsevier. This would be the first time that negotiating the right to read for hundreds of public research institutions and universities was managed at national level.

Pre-determined Negotiations

One could argue in favour of the Ministry’s benevolence vis-à-vis public institutions to the extent it supports this vital commitment to research. Such an argument would, however, fail to highlight multiple issues. Among these, we would pinpoint the total opacity in the choice of supplier (why Elsevier in particular?) and the lack of competitive pitch between several actors (for such an amount, open public tendering is required). The major problem which prevents competition is the monopolistic hold of publishers over knowledge – no-one else has the right to sell that particular article on cancer research that a researcher in Paris requires for their work – so there is little choice but to continue paying the individual publishers under the current system. Their hold on only expires with copyright, which is 70 years from the death of the last author and therefore entirely incompatible with the timeline of scientific discovery.

Prisoners of a game with pre-set rules, the negotiators (the Couperin consortium and the Bibliographic Agency for Higher Education, abbreviated as ABES) have not had much breathing space for negotiation. As aforementioned, a competitive pitch did not happen. Article 4 of the Agreement is explicit:

“Market for service provision without publication and without prior competition, negotiated with a particular tenderer for reasons connected with the protection of exclusive distribution rights.”

Therefore, a strange setup materialises for Elsevier to keep its former customers in its back pocket. The research organisations already having a contract with the publisher can only join the national license providing they accept a rise of the costs (that goes from 2.5 to 3.5%). Those without previous contract are not concerned.

How Many Agreements of the Sort?

To inflate the bill even more, Elsevier sells bundles of journals (its ‘flagship journals’): “No title considered as a ‘flagship journal’ (as listed in Annex 5) can be withdrawn from the collection the subscribers can access” (art. 6.2). These ‘flaghip journals’ cannot all claim outstanding impact factors. Moreover, they are not equally relevant acrossdisciplines and scientific institutions.

The final price has been reduced from the estimation initially planned in February: “only” EUR 172 million instead of EUR 188 million. Yet, this discount does not seem to be a gratuitous gift from Elsevier. Numerous institutions have withdrawn from the national license: from 642 partners in February, only 476 remain in the final deal.

Needless to say, the sitation is outrageous. Yet, it is just one agreement with one among several vendors. A recent report by the French Academy of Science [] alluded to a total of EUR 105 million annually, dedicated to acquiring access to scientific publications. This figure, however, comes out as far below the reality. Indeed, the French agreement with Elsevier grants access to publications only to some of the research institutions and universities in France; and yet in this case, the publisher already preempts EUR 33-35 million per year. The actual costs plausibly reach a total of EUR 200-300 million.

An alternative exists.

Elsewhere in Europe…

An important international movement has emerged and developed promoting and defending a free and open access to scientific publications. The overall goal is to make this content accessible and reusable to anyone.

As a matter of fact, researchers have no interest whatsoever in maintaining the current system. Copyright in scholarly publication does not requite authors and thus constitutes a fiction whose main goal is to perpetrate the publisher’s rights. Not only does this enclosure limit access to scientific publications — it also prevents the researcher from reusing their own work, as they oftenconcede their copyright when opting in to publication agreements.

The main barrier to opening up access to publications appears to stem from the government. No action is taken for research to be released from the grip of oligopolistic publishers. Assessment of publicly funded research focuses on journals referred to as “qualifying” (that is, journals mainly published by big editors). Some university departments even consider that open access publications are, by default, “not scientific”.

Several European Countries lead the way:

  • Germany has passed a law limiting the publishers’ exclusive rights to one year. Once the embargo has expired, the researcher is free to republish his work and allow open access to it. More details here.
  • Negotiations have been halted in Elsevier’s base, the Netherlands. Even though Elsevier pays most of its taxes there, the Dutch governement fully supports the demands of researchers and librarians, aiming to open up the whole corpus of Dutch scientific publications by 2020. More details here.

The most chilling potential effect of the Elsevier deal is removing, for five years, any possible collective incentive to an ambitious French open access policy. French citizens will continue to pay twice for research they cannot read. And the government will sustain a closed and archaic editorial system whose defining feature is to single-handedly limit the right to read.

District Dispatch: Libraries again fight for exemptions from “Digital Locks” copyright law

planet code4lib - Tue, 2014-11-11 14:23

For astronomers, it might be once in a few million years when a key comet comes back around.  For a soccer-crazed world, it’s every four years until the World Cup is back in play.  For library-focused copyright “geeks” the not-really-magic-at-all interval is 3 years.  That’s how often librarians, educators, disabled persons, internet security researchers, technologists, businesses and anyone else who needs access to copyrighted digital information secured with digital locks has to apply for special exemptions from Section 1201 of the Digital Millennium Copyright Act (DMCA). That’s the part of the law that prohibits the “circumvention” of  “technological protection measures” (TPMs) employed by the owners of copyrighted works to block unauthorized access to them.

Unfortunately, TPMs also block perfectly lawful uses that don’t require prior authorization, such as fair uses for education or journalism or converting text to speech for the print disabled.  Nonetheless, it’s still an actual  crime under the DMCA to circumvent TPMs even if the eventual use of the protected material is perfectly legal unless a specific exemption to circumvent (i.e., pick the digital lock) for that otherwise lawful use has been granted!  To make matters worse, even once the expensive and time consuming case for an exemption has been made successfully, the recipient of the exemption has to make the case in full to the U.S. Copyright Office all over again every three years.

ALA, in tandem with other library organizations, has actively participated in this so-called “Triennial Rulemaking” process at every opportunity since the DMCA was passed in 1998.  This time around, in conjunction with the Association of College and Research Libraries (ACRL) and Association of Research Libraries (ARL), ALA is pursuing the renewal of two critical exemptions.  One has permitted the circumvention of TPMs that otherwise would prevent educators from incorporating film excerpts into their lectures and curricula.  The second allows TPMs to be worked around so that e-reader devices may be freely used by the print disabled to convert digitally “locked” text into accessible speech.  Many others have filed for a wide range of other exemptions to facilitate all kinds of valuable commercial and non-commercial activities.  Our friends at the Electronic Frontier Foundation have helpfully created and will maintain this online repository for all filings made in the 2015 Rulemaking.

The latest Triennial Rulemaking is just getting going, but libraries will continue to lobby Congress and the Copyright Office for sensible changes to the rulemaking process that would alleviate some of the significant and senseless burdens placed upon those who seek exemption renewal, particularly in the absence of any opposition.  With luck, it won’t take until a comet comes around again for such common-sense changes to be made.



The post Libraries again fight for exemptions from “Digital Locks” copyright law appeared first on District Dispatch.

Open Knowledge Foundation: Seeking new Executive Director at Open Knowledge

planet code4lib - Tue, 2014-11-11 09:45

Today we are delighted to put out our formal announcement for a new Executive Director. In our announcement about changes in leadership in September we had already indicated we would be looking to recruit a new senior executive and we are now ready to begin the formal process.

We are very excited to have this opportunity to bring someone new on board. Please do share this with your networks and especially anyone in particular you think would be interested. We emphasize that we are conducting a world-wide search for the very best candidates, although the successful candidate would ideally be able commute to London or Berlin as needed.

Full role details are below – to apply or to download further information on the required qualifications, skills and experience for the role, please visit quoting reference 1841. The closing date for applications is 9am (GMT) on Monday, 8th December 2014.

Role Details

Open Knowledge is a multi-award winning international not-for-profit organisation. We are a network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge. We believe that by creating an open knowledge commons and developing tools and communities around this we can make a significant contribution to improving governance, research and the economy. We’re changing the world by promoting a global shift towards more open ways of working in government, arts, sciences and much more. We don’t just talk about ideas, we deliver extraordinary software, events and publications.

We are currently looking for a new Executive Director to lead the organisation through the next exciting phase of its development. Reporting into the Board of Directors, the Executive Director will be responsible for setting the vision and strategic direction for the organisation, developing new business and funding opportunities and directing and managing a highly motivated team. S/he will play a key role as an ambassador for Open Knowledge locally and internationally and will be responsible for developing relationships with key stakeholders and partners.

The ideal candidate will have strong visionary and strategic skills, exceptional personal credibility, a strong track record of operational management of organisations of a similar size to Open Knowledge, and the ability to influence at all levels both internally and externally. S/he will be an inspiring, charismatic and engaging individual, who can demonstrate a sound understanding of open data and content. In addition, s/he must demonstrate excellent communication and stakeholder management skills as well as a genuine passion for, and commitment to, the aims and values of the Open Knowledge.

To apply or to download further information on the required qualifications, skills and experience for the role, please visit quoting reference 1841. The closing date for applications is 9am (GMT) on Monday, 8th December 2014.

The role is flexible in terms of location but ideally will be within commutable distance of London or Berlin (relocation is possible) and the salary will be competitive with market rate.

PeerLibrary: Check out our brand new screencast video of PeerLibrary 0.3!

planet code4lib - Tue, 2014-11-11 03:39

We are proud to announce an updated screencast which demos the increased functionality and updated user interface of the PeerLibrary website. This screencast debuted at the Mozilla Festival in October as part of our science fair presentation. The video showcases an article by Paul Dourish and Scott D. Mainwaring entitled “Ubicomp’s Colonial Impulse” as well as the easy commenting and discussion features which PL emphasizes. One of the MozFest conference attendees actually recognized the article which drew him towards our booth and into a conversation with our team. Check out the new screencast and let us know what you think!

PeerLibrary: PeerLibrary Heads to London for MozFest 2014!

planet code4lib - Tue, 2014-11-11 03:38

Mozilla Festival brings developers, educators, and tech enthusiasts from a variety of fields together with the common goal of promoting and building the open web. Among others, some of the sessions most relevant to PeerLibrary’s goals included “Community Building” and “Science and the Web”. A delegation from the PeerLibrary team presented at the science fair on the first evening of the conference. This provided an opportunity to reconnect with some of our UK based supporters and contributors as well as introduce the platform to hundreds of MozFest attendees. We received valuable feedback from the web dev community and have a slew of new features and improvements to consider implementing in the coming months. Another phenomenal conference and we’re already looking forward to MozFest 2015!

DuraSpace News: Fedora 4 at the 2014 eResearch Australasia Conference

planet code4lib - Tue, 2014-11-11 00:00

Winchester, MA More than 400 delegates made the trip to Melbourne, Victoria, Australia in October to learn about current best practices in research support and to share innovative examples and ideas at the eResearch Australasia Conference. The annual Conference focuses on how information and communications technologies help researchers collect, manage and reuse information.

FOSS4Lib Upcoming Events: Islandora Camp BC

planet code4lib - Mon, 2014-11-10 19:56
Date: Monday, February 16, 2015 - 08:00 to Wednesday, February 18, 2015 - 17:00Supports: Islandora

Last updated November 10, 2014. Created by Peter Murray on November 10, 2014.
Log in to edit this page.

The first Islandora Camp of 2015 will be in Vancouver, BC from February 16 - 18, for our West Coast Islandorians and anyone else who would like to see beautiful British Columbia while learning about Islandora.  Many thanks to our sponsor Simon Fraser University for making this camp possible!
If you have any questions about this or future camps, please contact us.

Harvard Library Innovation Lab: Hiring! We want your design energy.

planet code4lib - Mon, 2014-11-10 19:31


The Harvard Library Innovation Lab and the Berkman Center for Internet & Society are looking for a web designer to help us build tools to explore the open Internet and define the future of libraries.

Our projects range in scope from fast-moving prototypes to long-term innovations. The best way to get a feel for what we do is by looking at some of our current efforts., a web archiving service that is powered by libraries


H2O, a platform for creating, sharing and adapting open course materials


Amber, a server side plugin to keep links working on blogs and websites


What you’ll do

  • - Work with our multi-disciplinary team to build elegant web tools
  • - Contribute to our broad vision for the Internet, libraries, and society
  • - Rely on your good design sense and user-centricity
  • - Create beautiful graphics and use modern web technologies to share them
  • - Have fun while producing meaningful work with fantastic folks

This is a term limited position running through Spring and Summer semesters (January-August 2015).

Find details and apply for the position by searching for 34346BR in the Harvard Recruitment Management System. If you have questions, email us directly at


Code4Lib: Keynote voting for the 2015 conference is now open!

planet code4lib - Mon, 2014-11-10 19:04

All nominees have been contacted and the 19 (!) nominees included in this election are all potentially available to speak. The top two available vote recipients will be invited to be our keynote speakers this year. Voting will end on Tuesday, November 18th, 2014 at 20:00:00 PM PDT.

When rating nominees, please consider whether they are likely to be an
excellent contributor in each the following areas:

1) Appropriateness. Is this speaker likely to convey information that is useful to many members of our community?
2) Uniqueness. Is this speaker likely to cover themes that may not commonly appear in the rest of the program?
3) Contribution to diversity. Will this person bring something rare, notable, or unique to our community, through unusual experience or background?

If you have any issues with your account, please contact Ryan Wick at

Eric Lease Morgan: My first R script, wordcloud.r

planet code4lib - Mon, 2014-11-10 18:50

This is my first R script, wordcloud.r:

#!/usr/bin/env Rscript # wordcloud.r - output a wordcloud from a set of files in a given directory # Eric Lease Morgan <> # November 8, 2014 - my first R script! # configure MAXWORDS = 100 RANDOMORDER = FALSE ROTPER = 0 # require library( NLP ) library( tm ) library( methods ) library( RColorBrewer ) library( wordcloud ) # get input; needs error checking! input <- commandArgs( trailingOnly = TRUE ) # create and normalize corpus corpus <- VCorpus( DirSource( input[ 1 ] ) ) corpus <- tm_map( corpus, content_transformer( tolower ) ) corpus <- tm_map( corpus, removePunctuation ) corpus <- tm_map( corpus, removeNumbers ) corpus <- tm_map( corpus, removeWords, stopwords( "english" ) ) corpus <- tm_map( corpus, stripWhitespace ) # do the work wordcloud( corpus, max.words = MAXWORDS, random.order = RANDOMORDER, rot.per = ROTPER ) # done quit()

Given the path to a directory containing a set of plain text files, the script will generate a wordcloud.

Like Python, R has a library well-suited for text mining — tm. Its approach to text mining (or natural language processing) is both similar and dissimilar to Python’s. They are similar in that they both hope to provide a means for analyzing large volumes of texts. It is similar in that they use different underlying data structures to get there. R might be more for analytic person. Think statistics. Python may be more for the “literal” person, all puns intended. I will see if I can exploit the advantages of both.

David Rosenthal: Gossip protocols: a clarification

planet code4lib - Mon, 2014-11-10 18:11
blogged on the Library of Congress' Digital Preservation blog about one of his take-aways from the Library's Designing Storage Architectures workshop; the importance of anti-entropy protocols for preservation. He talks about these as "a subtype of “gossip” protocols" and cites LOCKSS as an example, saying:
Not coincidentally, LOCKSS “consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in “opinion polls” (PDF). In other words, gossip and anti-entropy.The main use for gossip protocols is to disseminate information in a robust, randomized way, by having each peer forward information it receives from other peers to a random selection of other peers. As the function of LOCKSS boxes is to act as custodians of copyright information, this would be a very bad thing for them to do.

It is true that LOCKSS peers communicate via an anti-entropy protocol, and it is even true that the first such protocol they used, the one I implemented for the LOCKSS prototype, was a gossip protocol in the sense that peers forwarded hashes of content to each other. Alas, that protocol was very insecure. Some of the ways in which it was insecure related directly to its being a gossip protocol.

An intensive multi-year research effort in cooperation with Stanford's CS department to create a more secure anti-entropy protocol led to the current  protocol, which won "Best Paper" at the 2003 Symposium on Operating System Principles. It is not a gossip protocol in any meaningful sense (see below the fold for details). Peers never forward information they receive from other peers, all interactions are strictly pair-wise and private.

For the TRAC audit of the CLOCKSS Archive we provided an overview of the operation of the LOCKSS anti-entropy protocol; if you are interested in the details of the protocol this, rather than the long and very detailed paper in ACM Transactions on Computer Systems (PDF), is the place to start.

According to Wikipedia:
a gossip protocol is one that satisfies the following conditions:
  • The core of the protocol involves periodic, pairwise, inter-process interactions.
  • The information exchanged during these interactions is of bounded size.
  • When agents interact, the state of at least one agent changes to reflect the state of the other.
  • Reliable communication is not assumed.
  • The frequency of the interactions is low compared to typical message latencies so that the protocol costs are negligible.
  • There is some form of randomness in the peer selection. Peers might be selected from the full set of nodes or from a smaller set of neighbors.
  • Due to the replication there is an implicit redundancy of the delivered information.
The current LOCKSS anti-entropy protocol does not meet this definition. Peer communications are periodic and pairwise, but each pairwise communication forms part of a poll (anti-entropy operation) not the whole of one. When peers communicate, their state may change but the new state may not be a reflection of the state of the other. There is no implicit redundancy of the delivered information, the information delivered between two peers is specific to that pair of peers and is never shared with any other peer.

The redundancy of preserved content in a LOCKSS network is a higher-level concept than the details of individual peer communication. The current protocol is a peer-to-peer consensus protocol.

OCLC Dev Network: Learning Linked Data: Some Handy Tools

planet code4lib - Mon, 2014-11-10 17:30

I’ve been working with Linked Data off and on for a while now but really the last year has been my deepest dive into it. Much of that dive involved writing a PHP library to interact with the WorldCat Discovery API. Since I started seeing how much could be done with Linked Data in discovery, I’ve been re-adjusting my worldview and acquiring a new skills set to work with Linked Data. This meant understanding the whole concept of triples and the subject, predicate, object nomenclature. In our recent blog posts on the WorldCat Discovery API, we touched on some of the basics of Linked Data.  We also mentioned some tools for working with Linked Data in Ruby.

Library of Congress: The Signal: Digital Preservation Capabilities at Cultural Heritage Institutions: An Interview With Meghan Banach Bergin

planet code4lib - Mon, 2014-11-10 16:55

Meghan Banach Bergin, Bibliographic Access and Metadata Coordinator, University of Massachusetts Amherst Libraries.

The following is a guest post by Jefferson Bailey of Internet Archive and co-chair of the NDSA Innovation Working Group.

In this edition of the Insights Interview series we talk with Meghan Banach Bergin, Bibliographic Access and Metadata Coordinator, University of Massachusetts Amherst Libraries. Meghan is the author of a Report on Digital Preservation Practices at 148 Institutions. We discuss the results of her research and its implications of her work for digital preservation policies in general and at her institution in particular.

Jefferson: Thanks for talking with us today. Tell us about your sabbatical project.

Meghan: Thank you, I’m honored to be interviewed for The Signal blog.  The goal of my sabbatical project last year was to investigate how various institutions are preserving their digital materials.  I decided that the best way to reach a wide range of institutions was to put out a web-based survey. I was thrilled at the response. I received responses from 148 institutions around the world, roughly a third each were large academic libraries, smaller academic libraries and non-academic institutions (including national libraries, state libraries, public libraries, church and corporate archives, national parks archives, historical societies, research data centers and presidential libraries).

It was fascinating to learn what all of these different institutions were doing to preserve our cultural heritage for future generations.  I also conducted phone interviews with 12 of the survey respondents from various types of institutions, which gave me some additional insight into the issues involved in the current state of the digital preservation landscape.

Jefferson: What made you choose this topic for your sabbatical research? What specific conclusions or insight did you hope to gain in conducting the survey?

Meghan: We have been working to build a digital preservation program over the last several years at the University of Massachusetts Amherst Libraries and I thought I could help move it forward by researching what other institutions are doing in terms of active, long-term preservation of digital materials.  I was hoping to find systems or models that would work for us at UMass or for the Five Colleges Consortium.

Jefferson: How did you go about putting together the survey? Were there any specific areas that you wanted to focus on?

Meghan: I had questions about a lot of things, so I brainstormed a list of everything I wanted to know.  When I reviewed the resulting list, four main areas of inquiry emerged: solutions, services, staffing and cost.  I wanted to know what systems were being used for digital preservation and what digital preservation services were being offered, particularly at academic institutions.  Here at UMass we currently offer research data curation services and digital preservation consulting services, but we don’t have a specific budget or staff devoted to digital preservation, which was why I also wanted to know what kind of staffing other institutions had devoted to their digital preservation programs and the cost of those programs.

Jefferson: What surprised you about the responses? Or what commonalities did you find in the answers that you hadn’t considered when writing the questions?

Meghan: I was surprised at the sheer volume and variety of tools and technologies being used to preserve digital materials.  I think this shows that we are in an experimental phase and that everyone is trying to figure out what solutions will work best for different kinds of digital collections and materials, as well as what solutions will work best given the available staffing, skill sets and resources at their institutions.  It also shows that there is a lot of development happening right now, and this makes me feel optimistic about the future of the digital preservation field.

Jefferson: Did any themes or trends emerge from reading people’s responses?

Meghan: Some common themes did emerge.  Most people reported that budgets are tight and that they are trying to manage digital preservation with existing staff that also have other primary job responsibilities aside from digital preservation. Almost everyone I talked to said that they thought they needed additional staff.  Also, most of those interviewed were not completely satisfied with the systems and tools they were using. One person said, “No system is perfect right now. It’s a matter of getting a good enough system.” Others mentioned various issues such as difficulties with interoperability between systems and tools, lack of functionality such as the ability to capture technical or preservation metadata or to migrate file formats, and struggles with implementation and use of the systems. People were using multiple systems and tools in an effort to get all of the different functionality they were looking for. One archivist described their methods as “piecemeal” and said that “It would be good if we could make these different utilities more systematic. Right now every collection is its own case and we need an overall solution.”

Jefferson: Your summary report does a nice job balancing the technical and managerial issues involved with digital preservation. Could you tell us a little bit more about what those are and what your survey revealed in these areas?

Meghan: The survey, and the follow-up phone interviews, highlighted the fact that people are dealing with a wide range of technical issues, including storage cost and capacity, the complexities of web archiving and video preservation, automating processes, the need for a technical infrastructure to support long-term digital preservation, the complexity of preserving a wide variety of formats, and keeping up with standards, trends, and technology, especially when there aren’t overall agreed-upon best practices.  The managerial issues mainly centered around staffing levels, staff skill sets and funding.

Jefferson: I was curious to see that while 90% of respondents had “undertaken efforts to preserve digital materials” only 25% indicated they had a “written digital preservation policy.” What do you think accounts for this discrepancy? And, having recently contributed to writing a policy yourself, what would you say to those just starting to consider it?

Meghan: We were inspired to write our policy by Nancy McGovern’s Digital Preservation Management workshop, and we used an outline she provided at the workshop.  It was time consuming, and I think that’s why a lot of institutions have decided to skip writing a policy and just proceed straight to actually doing something to preserve their digital materials.  This approach has its merits, but we felt like writing the policy gave us the opportunity to wrap our heads around the issues, and having the policy in place provides us with a clearer path forward.

Some of the things we felt were important to define in our policy were the scope of what we wanted to preserve and the roles and responsibilities of the various stakeholders.  To those who are just starting to consider writing a digital preservation policy, I would recommend forming a small group to talk through the issues and looking at lots of examples of policies from other institutions.  Also, I would suggest looking at Library of Congress Junior Fellow Madeline Sheldon’s report Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums.

Cover page of Staffing for Effective Digital Preservation: An NDSA Report

Jefferson: Your survey also delved into both staffing and services being provided by institutions. Tell us a bit about some of your findings in those areas (and for staffing, how they compare to the NDSA Staffing Survey (PDF).

Meghan: Almost everyone said that they didn’t have enough staff.  One librarian said, “No one is dedicated to working on digital preservation. It is hard to fulfill my main job duties and still find time to devote to working on digital preservation efforts.” Another stated that, “Digital preservation gets pushed back a lot, because our first concern is patron requests, getting collections in and dealing with immediate needs.”  My survey results echoed the NDSA staffing survey findings in that almost every institution felt that digital preservation was understaffed, and that most organizations are retraining existing staff to manage digital preservation functions rather than hiring new staff.  As far as services, survey respondents reported offering various digital preservation services such as consulting, education and outreach.  However, most institutions are at the stage of just trying to raise awareness about the digital preservation services they offer.

Jefferson: Your conclusion poses a number of questions about the path forward for institutions developing digital preservation programs. How does the future look for your institution and what advice would you give to institutions in a similar place as far as program development?

Meghan: I think the future of our digital preservation program at UMass Amherst looks very positive.  We have made great advances toward digital preservation over the last decade.  We have implemented an institutional repository to manage and provide access to the scholarly output of the University, created a digital image repository to replace the old slide library and developed a Fedora-based repository system to manage and preserve our digital special collections and archives. We wrote our digital preservation policy to guide us in our path forward.

We are planning to join a LOCKSS PLN to preserve the content in our institutional repository; we just joined the Hathi Trust which should provide digital preservation for the materials we have digitized through the Internet Archive; and we are working with the Five Colleges to test and possibly implement new digital preservation tools and technologies.  It helps to have the support of the administration at your institution, which we are very fortunate to have.  My guess is that we will see an increase in collaboration in the future, so my advice would be to pay attention to the development of national-level collaborative digital preservation initiatives and to think about state or regional opportunities to work together on digital preservation efforts.

Jefferson: Finally, after conducting the survey and writing your sabbatical report, how do you feel about the current state of digital preservation?

Meghan: I think it’s really encouraging to see institutions trying different technologies and experimenting with what will work even with limited resources and uncertainty over what the best solution might be.  Despite the many challenges, we aren’t just throwing our hands up in the air and doing nothing.  We are trying different things, sharing the results of our efforts with each other, and learning as a community.  It’s an exciting time of innovation in the digital preservation field!


Subscribe to code4lib aggregator