You are here

Feed aggregator

DuraSpace News: VIVO Updates for August 6 — thanks for a great VIVO 2017 conference, new steering group members

planet code4lib - Mon, 2017-08-28 00:00

What a great conference!  The eighth annual VIVO Conference was held at Weill Cornell Medicine in New York City, August 2-4.  We had outstanding keynote presentations by Christina Pattuelli and Jodi Schneider, invited talks by Katy Frey, Jim Hendler, Dave Eichmann, Ying Ding, and Rebecca Bryant, as well as a featured presentation by Julia Trimmer and Damaris Murry.  But there was quite a bit more – five workshops, 27 contributed presentations, and sixteen posters rounded out a very full program.  Thanks to all the presenters!!

Jonathan Brinley: Augustine on the relationship between scripture and science

planet code4lib - Sun, 2017-08-27 13:55

From The Literal Meaning of Genesis by Augustine, translated by John Hammond Taylor.

Book One, Chapter 18

In matters that are obscure and far beyond our vision, even in such as we may find treaded in Holy Scripture, different interpretations are sometimes possible without prejudice to the faith we have received. In such a case, we should not rush in headlong and so firmly take our stand on one side that, if further progress in the search of truth justly undermines this position, we too fall with it. That would be to battle not for the teaching of Holy Scripture but for our own, wishing its teaching to conform to ours, whereas we ought to wish ours to conform to that of Sacred Scripture.

Book One, Chapter 19

Usually, even a non-Christian knows something about the earth, the heavens, and the other elements of this world, about the motion and orbit of the starts and even their size and relative positions, about the predictable eclipses of the sun and moon, the cycles of the years and the seasons, about the kinds of animals, shrubs, stones, and so forth, and this knowledge he holds to as being certain from reason and experience. Now, it is a disgraceful and dangerous thing for an infidel to hear a Christian, presumably giving the meaning of Holy Scripture, talking nonsense on these topics; and we should take all means to prevent such an embarrassing situation, in which people show up vast ignorance in a Christian and laugh it to scorn. The shame is not so much than an ignorant individual is derided, but that people outside the household of faith think our sacred writers held such opinions, and, to the great loss of those for whose salvation we toil, the writers of our Scripture are criticized and rejected as unlearned men. If they find a Christian mistaken in a field which they themselves know well and hear him maintaining his foolish opinions about our books, how are they going to believe those books in matters concerning the resurrection of the dead, the hope of eternal life, and the kingdom of heaven, when they think their pages are full of falsehoods on facts which they themselves have learnt from experience and the light of reason?

John Miedema: Analog and Digital are Not Opposites

planet code4lib - Sat, 2017-08-26 17:33

The word, analog, is commonly used to describe a system that predates digital computers. I blink each time I hear it because I know it is not right. Formal definitions in computer science are better. They contrast analog and digital by how they store data, continuously or discretely. The terms are treated as opposites. I still blink. If the formal definition is correct, the dyad should be digital and continuous. I will explain how analog and digital are not exactly opposites.

In analog telephone lines, the fluctuations of the voice correspond to the electric vibrations in the wires. The essential waveform is preserved. In a digital voice system, the voice transmission is encoded into bytes for transmission, and later decoded back into sound. All digital computers do the same. Electricity flows through switches, representing data in binary digits, one or zero, called bits. So far so good, but as above, the dyad should be digital and continuous.

An analog clock measures time by the continuous motion of one or more hands. In a digital clock, an electric charge is passed through a crystal, causing a sound whose frequency is converted into counts of seconds and minutes and so. The digital aspect is seen in the display of number of hours, 1-12 or 24. The base 12 or 24 system is as digital as binary. But then, then hands of an analog clock also point to the same digits. The clock is both analog and digital.

Today, analog is used as adjective, but originally it was a noun, a comparison of one thing to another. For example, a pump is an analog of the heart. An analog is a literary device with properties. One property is proportion, a correspondence in size or quality between one thing and another. An analog can be exactly proportional to the original. If I mark the length of my finger on a ruler, the marked ruler is a one-to-one analog for the length of my finger. I can use the ruler as a record of my finger length at a particular age and compare the size again in the future.

More often, the proportionality is at a scale. A map is an analog for a real world geography, reduced in scale for analysis and portability. This bring out a second property of analogy, incompleteness. Analogy is a comparison such as a metaphor or simile, which are substitutes. My love is not literally a red red rose. A map is not literally the land. Even if I map the world with a high resolution satellite camera I cannot record every detail. This fact is what makes the map useful. By omitting detail the map becomes something I can see at a glance, measure with a compass, and carry in my pocket. But what happens when I zoom in with a magnifying glass? It depends on how the map was printed. I may see dots. A color map has a range of dots. A black and white map has two types, black or not — binary digits, bits, digital. Even a hand-drawn and painted map is not continuous like the real world. The resolution of analog is always digital.

The third property of the analog is its purpose, to explain a new or complex idea with a familiar one. We use analogy to help explain and understand. It is the same purpose for a digital representation. The bits must ultimately correspond to something meaningful in the world. Bits mean nothing without a mapping to numbers and characters and real world phenomena. A ruler represents my finger. A clock represents time. Ones and zeros decode back into voice, and images, and music.

Decoding depends partly on the continuous physical form in which the data was stored. Take language. In some ideal language we could map all the forms of a word to a single lemma. We could also strip out all punctuation and spacing. In real language each form carries unique differences in meaning that must be preserved in digital storage. Punctuation and spacing also meaning. All digital systems that store text also store these real world features. Digital systems require analog data.

Analog and digital are not precise opposites. Analog has an older literary meaning that still applies, comparison. My discussion of the properties of proportion and completeness showed that the analog always resolves to the digital. Also, the purpose of analog is just as applicable to the digital. None of this analysis will make a whit of difference to the operation of analog or digital systems. The analysis cannot be used to claim that analog technology is superior to digital. The main benefit is that I might stop blinking the next time someone misuses the terms analog and digital.

Terry Reese: MarcEdit 7 alpha: Introducing Clustering tools

planet code4lib - Fri, 2017-08-25 18:09

Folks sometimes ask me how I decide what kinds of new tools and functions to add to MarcEdit.  When I was an active cataloger/metadata librarian, the answer was easy – I added tools and functions that helped me do my work.  As my work has transitioned to more and more non-MARC/integrations work; I still add things to the program that I need (like the linked data tooling), but I’ve become more reliant on the MarcEdit and metadata communities to provide feedback regarding new features or changes to the program.

This is kind of how the Clustering work came about.  It started with this tweet:  There are already tools that catalogers can use to do large scale data clustering (OpenRefine); and my hope is that more and more individuals make use of them.  But in reading the responses and asking some questions, I started thinking about what this might look like in a tool like MarcEdit – and could I provide a set of lite-weight functionality that would help users solve some problems, while at the same time exposing them to other tooling (like OpenRefine)…and I hope this is what I’ve done.

This work is very much still in active development, but I’ve started the process of creating a new way of batch editing records in MarcEdit.  The clustering tools will be provided as both a stand alone resource and a resource integrated into the MarcEditor, and will be somewhat special in that it will require that the application extract the data out of MARC and store it in a different data model.  This will allow me to provide a different way of visualizing one’s data, and potentially make it easier to surface issues with specific data elements.

The challenge with doing clustering is that this is a very computationally expensive process.  From the indexing of the data out of MARC, to the creation of the clusters using different matching algorithms, the process can take time to generate.  But beyond performance, the question that I’m most interested in right now is how to make this function easier for users to navigate and understand.  How to create an interface that makes it simple to navigate clustered groups and make edits within or across clustered groups.  I’m still trying to think about what this looks like.  Presently, I’ve created a simple interface to test the processes and start asking those questions.

If you are interested in see how this function is being created and some of the assumptions being made as part of the development work – please see:

I’m interested in feedback – particularly around the questions of UI and editing options, so if you see the video and have thoughts, let me know.


District Dispatch: Congress examining Title 44 and the Federal Depository Library Program

planet code4lib - Fri, 2017-08-25 16:00

Congress’ Committee on House Administration this year began examining Title 44 of the U.S. Code, which is the authority for the Federal Depository Library Program (FDLP) and the Government Publishing Office (GPO). This process is an important opportunity for librarians to advocate for improvements to the FDLP and public access to government information.

So far, the committee has held two hearings on GPO, in May and July. There may be one or more additional hearings in the fall. The committee may also prepare legislation to amend Title 44.

In her testimony at the July hearing, GPO Director Davita Vance-Cooks commented that “there are certain provisions of Title 44 that no longer make good business sense.” She also noted that the laws underpinning the FDLP, which have gone largely unchanged since the 1960s, have “been eclipsed in some areas by technology.”

To address those issues, GPO has asked the Depository Library Council, an advisory committee to GPO on FDLP issues, to provide the office with recommendations for potential changes. The council, in turn, has invited suggestions from the community.

In response, ALA submitted comments to the FDLP on Aug. 23 stating, in part:

For decades, FDLP libraries and GPO have worked together to implement Title 44 and help the public find, use, and understand government information. The FDLP’s purpose, to ensure that the American public has access to its government’s information, remains vital. Since the major concepts of Title 44 were last revised, however, the information environment has evolved considerably: government publishing and information-seeking increasingly take place online, and libraries continue to update their services to meet patron needs. Revising Title 44 to account for these changes would keep the FDLP relevant for the next generations of information users.

ALA looks forward to working with the Depository Library Council and GPO to advance ideas that strengthen the federal government’s partnership with libraries and expand the American public’s long-term access to government information.

For Title 44 to best serve libraries and the public, it will be critical for ALA members to engage with this process and provide their ideas. If you have suggestions, please share them with me so we can consider them as this process moves forward.

The post Congress examining Title 44 and the Federal Depository Library Program appeared first on District Dispatch.

District Dispatch: My experience as a Google Public Policy Fellow

planet code4lib - Fri, 2017-08-25 13:30

This year marks the 10th anniversary of the American Library Association’s Washington Office participation in the Google Public Policy Fellow program. We were lucky to host Alisa Holahan, JD, a graduate student at the School of Information at the University of Texas at Austin, over the summer. Read on to learn more about her experience working with the Office for Information Technology Policy.

Left to right: Emily Wagner, information manager for the Washington Office, Alisa Holahan, ALA’s 2017 Google Public Policy Fellow, and Alan Inouye, director of the Office for Information Technology Policy.

I was honored to have the opportunity to serve as a Google Public Policy Fellow this summer at the ALA’s Office for Information Technology Policy (OITP) in Washington, D.C. The Google Public Policy Fellowship provides undergraduate, graduate and law students with the opportunity to work in the summer with organizations that are actively engaged with technology policy issues. OITP supports ALA’s public policy work by advocating for information technology policies that promote open, full, and fair access to electronic information.

The placement with OITP was an ideal match because it perfectly combined my interests in librarianship and information policy. In my work at the ALA, I focused on both copyright and issues related to the FCC’s Schools and Libraries Program (“E-rate”). My work on copyright focused on the recent debate over whether the Copyright Office should remain in the Library of Congress. I wrote a report for OITP arguing that based on both legislative history and the Office’s current needs, the Copyright Office should stay in the Library of Congress. I also researched a number of different aspects of the E-rate program, which provides essential funding for broadband in schools and libraries. I provided information about how various public policy organizations view the program, researched the administration of the program, and identified possible E-rate supporters.

A key component of my experience as a Google Public Policy Fellow was expanding my knowledge of technology policy issues. Google hosted bi-weekly panels for the fellows, which focused on important tech policy issues, such as free speech, the future of work and privacy. It was a privilege to listen to and learn from so many experts in the tech policy field. It was also exciting to speak with the other fellows about their work and policy interests.

Additionally, during my time as a fellow, OITP Director Alan Inouye encouraged me to take advantage of as many relevant events in D.C. as possible. I attended a number of panel presentations on topics related to tech policy, including big data, free speech and anti-SLAPP legislation. A highlight was having the opportunity to attend a congressional oversight hearing on the Library of Congress’s information technology management.

The Google Public Policy Fellowship with the ALA gave me a much deeper understanding of the many ways in which libraries and technology policy intersect. It was a wonderful experience and I am so grateful to have had the chance to spend my summer with OITP.

The post My experience as a Google Public Policy Fellow appeared first on District Dispatch.

Open Knowledge Foundation: Podcast: Pavel Richter on the value of open data

planet code4lib - Fri, 2017-08-25 09:36

This month Pavel Richter, CEO of Open Knowledge International, was interviewed by Stephen Ladek of Aidpreneur for the 161st episode of his Terms of Reference podcast. Aidpreneur is an online community focused on social enterprise, humanitarian aid and international development that runs this podcast to cover important topics in the social impact sector.

Under the title ‘Supporting The Open Data Movement’, Stephen Ladek and Pavel Richter discuss a range of topics surrounding open data, such as what open data means, how open data can improve people’s lives (including the role it can play in aid and development work) and the current state of openness in the world. As Pavel phrases it: “There are limitless ways where open data is part of your life already, or at least should be”.

Pavel Richter joined Open Knowledge International as CEO in April 2015, following five years of experience as Executive Director of Wikimedia Deutschland. He explains how Open Knowledge International has set its’ focus on bridging the gap between the people who could make the best use of open data (civil society organisations and activists in areas such as human rights, health or the fight against corruption) and the people who have the technical knowledge on how to work with data. OKI can make an impact by bridging this gap, empowering these organisations to use open data to improve people’s lives.

The podcast goes into several examples that demonstrate the value of open data in our everyday life, from how OpenStreetMap was used by volunteers following the Nepal earthquake to map where roads were destroyed or still accessible, to governments opening up financial data on tax returns or on how foreign aid money is spent, to projects such as OpenTrials opening up clinical trial data, so that people are able to get information on what kind of drugs are being tested for effectiveness against viruses such as Ebola or Zika.

In addition, Stephen Ladek and Pavel Richter discuss questions surrounding potential misuse of open data, the role of the cultural context in open data, and the current state of open data around the world, as measured in recent initiatives such as the Open Data Barometer and the Global Open Data Index.

Listen to the full podcast below, or visit the Aidpreneur website for more information:


LibUX: Cloudflare boots the Daily Stormer then wonders at its own might

planet code4lib - Fri, 2017-08-25 08:17

RSS | Google Play | iTunes

W3 Radio is now public. After short while in beta available to our patreon subscribers, you can now find it in your podcatcher of choice. W3 Radio is a fun-sized news cast for designers and developers recapping the week in web in just 10 minutes. For the first time in this episode I’m joined by Dave Gillhepsy and Dan Sims, who make the news real fun. Give it a spin, and be sure to like / star / heart / favorite / and tell your friends.

Follow @w3_radio on Twitter.

Michael Schofield is @schoeyfield.
Dave Gillhepsy is @yodasw16
Dan Sims is @danielgsims

  1. Google Search Console has started sending out notices to sites that have not yet migrated to HTTPS. 
  2. CakePHP 3.5.0 was released, and it’s ten users are really excited about it.
  3. After Firefox 55, Selenium IDE will no longer work
  4. Facebook is pushing a licensing model called “BSD + patents” in all their projects, including the wildly popular React.
    1. Related
      1. Squarespace says it’s removing ‘a group of sites’ as internet cracks down on hate speech
      2. Spotify removes ‘hate bands’ from its streaming library
      3. After Charlottesville, Mark Zuckerberg pledges to remove violent threats from Facebook
  5. Cloudflare reversed its long-held policy to remain content-neutral and booted The Daily Stormer

District Dispatch: New ideas in Congress for supporting libraries

planet code4lib - Wed, 2017-08-23 14:00

Support for libraries figured in several bills that were recently introduced in Congress immediately prior to the August recess. Introduction of these bills with the inclusion of library provisions demonstrates that libraries are increasingly on the top-of-mind in Congress.

The outlook for these bills, however, is uncertain. None of the bill authors sit on the committees of jurisdiction considering the legislation which often dims the chances of passage. In addition, two of the bills would create new funding streams, an uphill battle in the current congressional atmosphere. Nevertheless, we are pleased to see libraries thoughtfully included in relevant legislation.

The three bills introduced this month are:

  • Senator Jack Reed (D-RI) introduced S. 1674, the School Building Improvement Act of 2017, which would authorize as much as $52 million in bonds for school renovation, repair and construction projects at elementary and secondary schools. The bill explicitly would allow the funds to be used for school library projects. S. 1674 will be considered by the Senate Finance Committee.
  • Senator Cory Booker (D-NJ) introduced S. 1689, the Marijuana Justice Act of 2017, which encourages states to focus drug enforcement strategies away from marijuana. S. 1689 would also re-direct certain funds used by states to enforce laws against marijuana use into a new Community Reinvestment Fund. This fund, authorized up to $500 million per year, would allow the Department of Housing and Urban Development to provide community grants for programs, including public libraries, job training, reentry services and other services. S. 1689 has been referred to the Senate Judiciary Committee.
  • Senator Jack Reed (D-RI) introduced S. 1694 and Rep. Ruben Kihuen (D-NV) introduced the House companion H.R. 3636. The bills, entitled Educator Preparation Reform Act, would amend the Higher Education Act of 1965 by expanding the definition of an “educator” eligible to receive job training and development to include school librarians, counselors, and paraprofessionals. In addition, under H.R. 3636, librarians would be eligible to receive specialized training to implement reading and writing instruction. S. 1694 will be considered by the Senate Health, Education, Labor and Pensions Committee. H.R. 3636 will be considered by the House Education and Workforce Committee.

Our efforts to encourage members of Congress to “think libraries” appear to be bearing fruit. If these bills gain traction, we will work with these Congressional offices to ensure the library message continues to be heard.

The post New ideas in Congress for supporting libraries appeared first on District Dispatch.

FOSS4Lib Recent Releases: Hyrax - 1.0.4

planet code4lib - Wed, 2017-08-23 13:00

Last updated August 23, 2017. Created by Peter Murray on August 23, 2017.
Log in to edit this page.

Package: HyraxRelease Date: Tuesday, August 22, 2017

David Rosenthal: Economic Model of Long-Term Storage

planet code4lib - Tue, 2017-08-22 17:00
Cost vs. Kryder rateAs I wrote last month in Patting Myself On The Back, I started working on economic models of long-term storage six years ago. I got a small amount of funding from the Library of Congress; when that ran out I transferred the work to students at UC Santa Cruz's Storage Systems Research Center. This work was published here in 2012 and in later papers (see here).

What I wanted was a rough-and-ready Web page that would allow interested people to play "what if" games. What the students wanted was something academically respectable enough to get them credit. So the models accumulated lots of interesting details.

But the details weren't actually useful. The extra realism they provided was swamped by the uncertainty from the "known unknowns" of the future Kryder and interest rates. So I never got the rough-and-ready Web page. Below the fold, I bring the story up-to-date and point to a little Web site that may be useful.

Earlier this year the Internet Archive asked me to update the numbers we had been working with all those years ago. And, being retired with time on my hands (not!), I decided instead to start again. I built an extremely simple version of my original economic model, eliminating all the details that weren't relevant to the Internet Archive and everything else that was too complex to implement at short notice, and put it behind an equally simple Web site running on a Raspberry Pi (so please don't beat up on it).
What This Model DoesFor a single Terabyte of data, the model computes the endowment, the money which deposited with the Terabyte and invested at interest would suffice to pay for the storage of the data "for ever" (actually 100 years in this model).
AssumptionsThese are the less than totally realistic assumptions underlying the model:
  • Drive cost is constant, although each year the same cost buys drives with more capacity as given by the Kryder rate.
  • The interest rate and the Kryder rate do not vary for the duration.
  • The storage infrastructure consists of multiple racks, containing multiple slots for drives. I.e. the Terabyte occupies a very small fraction of the infrastructure.
  • The number of drive slots per rack is constant.
  • Ingesting the Terabyte into the infrastructure incurs no cost.
  • The failure rate of drives is constant and known in advance, so that exactly the right number of spare drives is included in each purchase to ensure that failed drives can be replaced by an identical drive.
  • Drives are replaced after their specified life although they are still working.
Some of these assumptions may get removed in the future (see below).
ParametersThis model's adjustable parameters are as follows.
Media Cost Factors
  • DriveCost: the initial cost per drive, assumed constant in real dollars.
  • DriveTeraByte: the initial number of TB of useful data per drive (i.e. excluding overhead).
  • KryderRate: the annual percentage by which DriveTeraByte increases.
  • DriveLife: working drives are replaced after this many years.
  • DriveFailRate: percentage of drives that fail each year.
Infrastructure Cost factors
  • SlotCost: the initial non-media cost of a rack (servers, networking, etc) divided by the number of drive slots.
  • SlotRate: the annual percentage by which SlotCost decreases in real terms.
  • SlotLife: racks are replaced after this many years
Running Cost Factors
  • SlotCostPerYear: the initial running cost per year (labor, power, etc) divided by the number of drive slots.
  • LaborPowerRate: the annual percentage by which SlotCostPerYear increases in real terms.
  • ReplicationFactor: the number of copies. This need not be an integer, to account for erasure coding.
Financial Factors
  • DiscountRate: the annual real interest obtained by investing the remaining endowment.
DefaultsThe defaults are my invention for a rack full of 8TB drives. They should not be construed as representing the reality of your storage infrastructure. If you want to use the output of this model, for example for budgeting purposes, you need to determine your own values for the various parameters.

Default valuesParameterValueUnitsDriveCost250.00Initial $DriveTeraByte7.2Usable TB per driveKryderRate10% per yearDriveLife4yearsDriveFailRate2% per yearSlotCost150.00Initial $SlotRate0% per yearSlotLife8yearsSlotCostPerYear100.00Initial $ per yearLaborPowerRate4% per yearDiscountRate2% per yearReplicationFactor2# of copies
Unlike the KryderRate and the SlotRate, the LaborPowerRate reflects that the real cost of staff increases over time. Of course, the capacity of the slots is typically increasing faster than the LaborPowerRate, so the per-Terabyte cost from the LaborPowerRate still decreases over time. Nevertheless, the endowment calculated is quite sensitive to the value of the LaborPowerRate.
CalculationThe model works through the 100-year duration year by year. Each year it figures out the payments needed to keep the Terabyte stored, including running costs and equipment purchases. It then uses the DiscountRate to figure out how much would have to have been invested at the start to supply that amount at that time. In other words, it computes the Net Present Value of each year's expenditure and sums them to compute the endowment needed to pay for storage over the full duration.
UsageSample model outputThe Web site provides two ways to use the model:
The sample graph shows why adding lots of detail to the model isn't really useful, because the effects of the unknowable future DiscountRate and KryderRate parameters are so large.
CodeThe code is here under an Apache 2.0 license.
What This Model Doesn't (Yet) DoIf I can find the time, some of these deficiencies in the model may be removed:
  • Unlike earlier published research, this model ignores the cost of ingesting the data in the first place, and accessing it later. Experience suggests the following rule of thumb: ingest is half the total lifetime cost, storage is one-third the total lifetime cost, and access is one-sixth. Thus a reasonable estimate of the total preservation cost of a Terabyte is three times the result of this model.
  • The model assumes that the parameters are constant through time. Historically, interest rates, the Kryder rate, labor costs, etc. have varied, and thus should be modeled using Monte Carlo techniques and a probability distribution for each such parameter. It is possible for real interest rates to go negative, disk cost per Terabyte to spike upwards, as it did after the Thai floods, and so on. These low-probability events can have a large effect on the endowment needed, but are excluded from this model. Fixing this needs more CPU power than a Raspberry Pi.
  • There are a number of different possible policies for handling the inevitable drive failures, and different ways to model each of them. This model assumes that it is possible to predict at the time a batch of drives is purchased what proportion of them will fail, and inflates the purchase cost by that factor. This models the policy of buying extra drives so that failures can be replaced by the same drive model.
  • The model assumes that drives are replaced after DriveLife years even though they are working. Continuing to use the drives beyond this can have significant effects on the endowment (see this paper).

District Dispatch: The classics live forever

planet code4lib - Tue, 2017-08-22 13:42

This summer, Rep. Darrell Issa (R-CA) and Jerry Nadler (D-NY) introduced H.R. 3301, the “Compensating Legacy Artists for their Songs, Service, and Important Contributions to Society Act,” also known as the CLASSICS Act (impressive acronym!).

Music librarians have been talking about pre-1972 sound recordings for some time. A quirk in U.S. copyright law, sound recordings were not awarded copyright protection until 1971. Instead, they were protected by common law, which varied from state to state. This made it difficult to know if and how pre-1972 recordings could be used by libraries. Did federal copyright exceptions apply, for instance?

In 2010, the U.S. Copyright Office conducted “a study on the desirability of and means for bringing pre-1972 sound recordings into the federal copyright regime.” ALA and the Association of Research Libraries (ARL) argued against federalization because statutory damages for infringement would skyrocket under federal law with no guarantee that robust library exceptions for preservation, lending and public performance would be included. The Copyright Office report (which is one of my favorites) ultimately decided that pre-1972 should be federalized. They also recommended a term of copyright 95 years from publication or 120 years if the work had not been published prior to the effective date of the legislation, if enacted.

The CLASSICS Act, however, differs on copyright term. The Library Copyright Alliance noted in a letter that sound recordings published before 1972 would be protected until February 15, 2067. Why not 95 years from the date of publication as recommended by the Copyright Office? Under this legislation, sound recordings could be protected for 137 years, even more if the sound recording was particularly old. Ironically, it is the oldest sound recordings that are most at risk—and the ones libraries want to preserve—so one wonders what is the public policy justification for making sound recordings protected by copyright longer than any other protected work. (There is none.) My guess is that the CLASSICS Act tried to make all stakeholders happy, including the heirs of famous composers who were particularly vocal arguing for the extension of copyright term protection in 2002.

There is bound to be further discussion among the players if the legislation moves forward. But at this point, the heirs of dead composers who wrote the classics can anticipate that their cash cow will live a longer life than previously expected.

The classics truly are timeless.

The post The classics live forever appeared first on District Dispatch.

Open Knowledge Foundation: The Open Education Working Group: What do we do and what is coming up next

planet code4lib - Tue, 2017-08-22 10:27

The Open Education Working Group ( is a very active community of educators, researchers, PhD students, policy makers and advocates that promote, support and collaborate with projects related with the advancement of Open Education in different fields at international level. This group aims at supporting the development of Open Educational projects at international level but also, at promoting good practices in Open Education. In this blog we give an update on our recent activities.

The coordinators of the group are Paul Bacsich (@pbacsich) (Open Policies), a professor with a large experience in educational policy and open education, Annalisa Manca (@AnnalisaManca) (Open Science), an expert in critical pedagogy currently completing her PhD in Medical Education and Javiera Atenas @jatenas (Open Data) a lecturer with a PhD in Education with interest in Open Data and Media Literacies.

Our ethos is to be a platform that promotes Openness in education at all levels, including OER, Open Science, Open Education and Open Access focusing on Open Educational Practices to democratise and enhance education at all levels. Our mission is to support organisations and individuals to implement, support and develop Open Education projects, research and policies and also to support communities of open practice towards ensuring that everyone can have democratic access to education.

In the last years we have done lots of things, published books, worked with Open Education international organisations, and participated in a large number of projects, some of which can be summarised as follows:

Publication of the Open Educator Handbook, which has been written to provide a useful point of reference for readers with a range of different roles and interests who are interested in learning more about the concept of Open Education and to help them deal with a variety of practical situations.

Publication of the book Open Data as Open Educational Resources: Case studies of Emerging Practice. This book contains a series of case studies related with use of open data as pedagogical materials. The authors of this chapters are academics and practitioners who have been using open data in different educational scenarios and the cases present different dynamics and approaches for the use of open data in the classroom.

Involvement in the POERUP policy project and the OpenMed project, aimed at opening up teaching and learning resources in the southern Mediterranean countries – in partnership with UniMed Rome.

Organisation of a pre-Open Data Day event at UCL,  which was round table to discuss challenges and opportunities of the use of open data as teaching and learning resources with a group of expert  and practitioners  and with the Latin American Open Data Initiative. We also organised a course for academics on Open Data as Open Educational Resources with the support of the Open Education Unit of the Universidad de la República Uruguay in partnership with A Scuola di OpenCoesione. The outcome of the course can be read in the blog Putting research into practice: Training academics to use Open Data as OER: An experience from Uruguay.

In regards with campaigning we have worked with Communia in support for their campaign for better education, aimed at collecting petitions from educators throughout Europe to let the European Parliamentarians know we need a better copyright for education. You can read more about it in this blog.

Our blog at reflects the current state of the arts in Open Education around the world. We have blog posts from Croatia, Sweden, Brazil, Scotland, Finland, Germany, Italy and Spain on different topics, from Open Educational Resources Toolkits, Open Education Policy, Open Data and Open Education  Research. In our forum we have spaces for different communities of practice to interact, exchange and discuss. You can join the discussion through:

At the moment we are supporting the 101openstories, a collaborative project led by a group of Open Practitioners aimed at collecting stories and ideas of openness from educators, researchers and learners in general. Also, we are supporting the development of local Open Education Working Groups such as the Italian network of Open Educators, who met recently in Bologna to discuss an agenda to promote and enhance open education accross all the educational sectors in Italy (read more).

In this Year of Open we will be participating in a series of events and congresses, including the Latin American Open Data Conference in Costa Rica in August, Con Datos and the OER congress in Slovenia in September. Also, we have joined the Global Partnership for Sustainable Development Data towards collaborating with different initiatives towards improving Open Data literacies.

We are always open to collaborations and willing to support innovative projects on Open Education. If you would like to get in touch with us, you will find us on twitter as @okfnedu or via email at

DuraSpace News: Fedora & Samvera Camp UK: Last chance to register!

planet code4lib - Tue, 2017-08-22 00:00

DuraSpace and Data Curation Experts are set to offer the Fedora and Samvera Camp at Oxford University, Sept 4 - 8, 2017. The camp will be hosted by Oxford University Oxford, UK and is supported by Jisc. 

Lucidworks: Fourth Annual Solr Developer Survey

planet code4lib - Mon, 2017-08-21 21:23

It’s that time of the year again – time for our fourth annual survey of the Solr marketplace and ecosystem. Every day, we hear from organizations looking to hire Solr talent. Recruiters want to know how to find and hire the right developers and engineers, and how to compensate them accordingly.

Lucidworks is conducting our annual global survey of Solr professionals to better understand how engineers and developers at all levels of experience can take advantage of the growth of the Solr ecosystem – and how they are using Solr to build amazing search applications.

This survey will take about 2 minutes to complete. Responses are anonymized and confidential. Once our survey and research is completed, we’ll share the results with you and the Solr community.

As a thank you for your participation, you’ll be entered in a drawing to win one of our “You Autocomplete Me” t-shirts plus copies of the popular books Taming Text and Solr in Action. Be sure to include your t-shirt size in the questionnaire.

Take the survey today

Past survey results:  20162015, 2014

The post Fourth Annual Solr Developer Survey appeared first on Lucidworks.

Islandora: Islandora CLAW Install Sprint Starts Today

planet code4lib - Mon, 2017-08-21 13:15

[Edited to add some later additions to the team]


A small but mighty dream team of Islandora CLAW contributors answered our call for stakeholders and are now coming together over the next two weeks to work on the first official Islandora CLAW Sprint of 2017.

The team:

  • Danny Lamb (Islandora Foundation)
  • Bryan Brown (Florida State University)
  • Jared Whiklo (University of Manitoba)
  • Adam Soroka (Smithsonian Institution)
  • Natkeeran Kanthan (University of Toronto Scarborough)
  • Marcus Barnes (University of Toronto Scarborough)
  • Jonathan Green (LYRASIS)
  • Diego Pino (Metropolitan New York Library Council)
  • Rosie Le Faive (University of Prince Edward Island)
  • Brian Woolstrum (Carnegie Mellon)
  • Yamil Suarez (Berklee College of Music)
  • Gavin Morris (Born-Digital)

The goal: a modular installation process that meets the following criteria:

  • Capable of supporting multiple operating systems
  • Supports Vagrant, bare metal, and eventually Docker containers
  • Can be used for both all-in-one installs as well as more complex setups involving multiple servers
  • Can be used to maintain/update existing installations as the CLAW codebase evolves
  • Is well documented

Ansible has been identified as the dev/ops tool that fits best with these goals, and over the course of several sprints the existing claw_vagrant codebase will be adapted into an Ansible based solution. You can follow their work with this project in Github or by staying tuned for more updates here.

Open Knowledge Foundation: Fostering open, inclusive, and respectful participation

planet code4lib - Mon, 2017-08-21 08:28

At Open Knowledge International we have been involved with various projects with other civil society organisations aiming for the release of public interest data, so that anyone can use it for any purpose. More importantly, we focus on putting this data to use, to help it fulfil its potential of working towards fairer and more just societies.

Over the last year, we started the first phase of the project Open Data for Tax Justice, because we and our partners believe the time is right to demand for more data to be made openly available to scrutinise the activities of businesses. In an increasingly globalised world, multinational corporations have tools and techniques to their disposal to minimise their overall tax bill, and many believe that this gives them an unfair advantage over ordinary citizens. Furthermore, the extent to which these practices take place is unknown, because taxes that multinational corporations pay in all jurisdictions in which they operate are not reported publicly. By changing that we can have a proper debate about whether the rules are fair, or whether changes will need to be made to share the tax bill in a different way.

For us at Open Knowledge International, this is an entry into a new domain. We are not tax experts, but instead we rely on the expertise of our partners. We are open to engaging all experts to help shape and define together how data should be made available, and how it can be put to use to work towards tax systems that can rely on more trust from their citizens.

Unsurprisingly, in such a complex and continuously developing field, debates can get very heated. People are obviously very passionate about this, and being passionate open data advocates ourselves, we sympathise. However, we think it is crucial that the passion to strive for a better world should never escalate to personal insults, ad-hominem attacks, or violate basic norms in any other way. Unfortunately, this happened recently with a collaborator on a project. While they made clear they were not affiliated with Open Knowledge International, nevertheless their actions reflected very badly on the overall project and we deeply condemn their actions.

Moving forward, we want to make more explicitly clear what behaviour is and is not acceptable within the context of the projects we are part of. To that end, we are publishing project participation guidelines that make clear how we define acceptable and unacceptable behaviour, and what you can do if you feel any of these guidelines are being violated. We invite your feedback on these guidelines, as it is important that these norms are shared among our community. So please let us know on our Open Knowledge forum what you think and where you think these guidelines could be improved.

Furthermore, we would like to make clear what the communities we are part of, like the one around tax justice, can expect from Open Knowledge International beyond enforcing the basic behavioural norms that we set out in the guidelines linked above. Being in the business of open data, we love facts and aim to record many facts in the databases we build. However, facts can be used to reach different and sometimes even conflicting conclusions. Some partners engage heavily on social media channels like Twitter to debate conflicting interpretations, and other partners choose different channels for their work. Open Knowledge International is not, and will never be, in a position to be the arbiter on all interpretations that partners make about the data that we publish. Our expertise is in building open databases, helping put the data to use, and convening communities around the work that we do. On the subject matter of, for example, tax justice, we are more similar to those of us who are interested and care about the topic, but would rely on the debate being led by experts in the field. Where we spot abuse of the data published in databases we run, or obvious misrepresentation of the data, we will speak out. But we will not monitor or take a stance on all issues that are being debated by our partners and the wider communities around our projects.

Finally, we strongly believe that the open knowledge movement is best served by open and diverse participation. We aim for the project participation guidelines to spell out our expectations and hope these will help us move towards developing more inclusive and diverse communities, where everyone who wants to participate respectfully feels welcomed to do so. Do you think these guidelines are a step in the right direction? What else do you feel we should be doing at Open Knowledge International? We look forward to hearing from you in our forum.

LibUX: Is autocomplete on your library home page?

planet code4lib - Mon, 2017-08-21 04:31

If your library has a vendor-based discovery system, chances are you, or your systems administrator, can go straight to the settings page and enable or disable a built-in search autocomplete. If you haven’t considered that setting before or don’t know whether it exists, check it now. Is it enabled? Do you know why or why not? To begin answering some of these questions, check that your discovery system vendor provides clear documentation on how they handle user data. The ALA has strict suggested guidelines on user data for a reason; as librarians we care about privacy and its direct tie to intellectual freedom. But here is the question I want to ask: have you considered the use of autocomplete on your library homepage?

Literature and some testing I’ve done this semester convinces me that autocomplete fundamentally improves the user experience, especially on the library home page. But sometimes it isn’t as easy to implement on your home page and I often see it used only within the discovery system interface. This is a huge missed opportunity when our library home page is likely the most heavily-trafficked real estate of our website. Here’s why:

#1 Autocomplete prevents user failure

I am astounded at the level of failure I see in usability testing that results directly from users misspelling a search term. Often times, the testing script (and therefore, the spelling) is right in front of them. Regardless of any doomsday predictions that the internet is ruining our spelling and grammar habits, we need to recognize autocomplete dependency is real. Our users rely on it to remember or correct everything from search strings, URLs, to the spelling of words. In my experience, younger users use it in lieu of bookmarks. Nielsen heuristic #5 is “error prevention”.

Autocomplete is, in my experience, the most impactful feature library websites have at their disposal to prevent user error. Research by Ward et al. (2012) shows that using autocomplete improves search results and even allows dyslexic users to compensate for their slow writing speeds, spelling difficulties, and reduced reading speeds (Berget and Sandnes, 2016).

#2 Autocomplete can combat library anxiety

Library anxiety is a well-documented phenomena (most recently, Kwon, et al., 2007). Users are susceptible to anxiety stemming from mechanical barriers (using computers or other equipment) and affective barriers: feelings of inadequacy or ineptness in “attempting library tasks, which are exacerbated by assuming that other people are more proficient”.

Our websites are complex and filled with jargon that users do not understand (see John Kupersmith’s extensive research on the subject). This compounds anxiety; especially for newer or less-experienced researchers and library users. Besides the error prevention provided by autocomplete, there is a compelling argument to be made for how the feature boosts user confidence.

Documentation for Primo autocomplete says it is, “derived from a combination of your top local Primo and Primo Central queries and from other data taken from your Primo institution and the Primo Central index.” Given the repeated nature of classes, the local data captured can reflect a blueprint of the institutional research experience and help along successive groups of students.

Counter argument: could autocomplete negatively impact user behavior?

But that raises another question: are we, as librarians, comforted to know that users are providing the data that creates autocomplete suggestions for users that come after? A popular search string can still be a bad one. One study even found that,

“Far from reflecting neutrally users’ preferences, Instant Search can direct users’ attention to other searches: by attracting user curiosity, it could orient them towards searches they would have not otherwise performed” (Karapapa et al., 2015).

Unlike most corners of the internet, library websites do not run on popularity and purchase power; we are an educational tool and therefore seek to exercise a certain level of control over the search experience to teach good research behavior. Does autocomplete obstruct a library website from being the most effective educational tool? This is conversation us tech librarians need to have with our instruction and research librarian team.

#3 We can manipulate autocomplete to improve the user experience

The article “Autocomplete as a Research Tool: A Study on Providing Search Suggestions” goes so far as to suggest influencing autocomplete with librarian-suggested terms and keywords to improve query formulation and search term identification (Ward et al., 2012). North Carolina State University seemingly separates its influenced autocomplete using the label “Best Bets”. This design allows library influence into the autocomplete process with transparency (see screenshot below).

Screencapture from North Carolina State’s discovery system autocomplete (not available from home page…)

Yet even while Ward et al. recommend autocomplete, they conclude that in-person “instruction on search-term formulation should include a review of autocomplete suggestions as well as practical methods for integrating these suggestions into the research process.” (14) Try as librarians might to influence and improve the user experience with design and content, in-person interaction (in the classroom or one-on-one) is still an irreplaceable tool. This is especially true with use of the home page; the most highly trafficked portion of a library website.

Enabling autocomplete is worth your time

Perhaps all it takes is pressing a radio button to enable autocomplete on your library’s whole discovery system. But if your users’ primary point of contact with the discovery system is your library home page, then you have a different issue. The discovery system’s autocomplete might not be plug-and-play.

It is worth your time, or a programmer’s time, to create an autocomplete that draws from the same data as your discovery system’s autocomplete library to make sure they operate identically. Or, it is time to reach out to your vendor and ask if any other institutions have used their autocomplete on their CMS.

I hope the information here has convinced you that autocomplete, specifically on the library home page, has the potential to significantly improve the user experience of your library’s discovery system.

Works Cited

Berget, Gerd and Frode Eika Sandnes. “Do Autocomplete Functions Reduce the Impact of Dyslexia on Information-Searching Behavior? The Case of Google.” Journal of the Association for Information Science & Technology, vol. 67, no. 10, Oct. 2016, pp. 2320–2328. EBSCOhost.


Breeding, Marshall. “Privacy and Security of Automation and Discovery Products.” Smart Libraries Newsletter 35.01 (2015): 2–7. Web. Jun 8, 2017.


“Library Privacy Checklist for Library Management Systems/Integrated Library Systems.” Advocacy, Legislation & Issues. -02–06T11:00:52–06:00 2017. Web. Jun 8, 2017 <>.


Karapapa, Stavroula, and Maurizio Borghi. “Search Engine Liability for Autocomplete Suggestions: Personality,


Privacy and the Power of the Algorithm.” International Journal of Law and Information Technology 23.3 (2015): 261–89. CrossRef.


Kupersmith, John. (2012). Library Terms That Users Understand. UC Berkeley: UC Berkeley Library. Web. Jun 8, 2017 <>.


Ward, David, Jim Hahn, and Kirsten Feist. “Autocomplete as a Research Tool: A Study on Providing Search Suggestions.” Information Technology and Libraries (Online) 31.4 (2012): 6–19. ABI/INFORM Professional Advanced.


Terry Reese: MarcEdit 6.3 Updates (all versions)

planet code4lib - Sat, 2017-08-19 16:19

I spent sometime this week working on a few updates for MarcEdit 6.3.  Full change log below (for all versions).


* Bug Fix: MarcEditor: When processing data with right to left characters, the embedded markers were getting flagged by the validator.
* Bug Fix: MarcEditor: When processing data with right to left characters, I’ve heard that there have been some occasions when the markers are making it into the binary files (they shouldn’t).  I can’t recreate it, but I’ve strengthen the filters to make sure that these markers are removed when the mnemonic file format is saved.
* Bug Fix: Linked data tool:  When creating VIAF entries in the $0, the subfield code can be dropped.  This was missed because viaf should no longer be added to the $0, so I assumed this was no longer a valid use case.  However local practice in some places is overriding best practice.  This has been fixed.

A note on the MarcEditor changes.  The processing of right to left characters is something I was aware of in regards to the validator – but in all my testing and unit tests, the data was always filtered prior to compiling the data.  These markers that are inserted are for display, as noted here:  However, on the pymarc list, there was apparently an instance where these markers slipped through.  The conversation can be found here:!topic/pymarc/5zxuOh0fVuc.  I posted a long response on the list, but I think i t’s being held in moderation (I’m a new member to the list), but generally, here’s what I found.  I can’t recreate it, but I have updated the code to ensure that this shouldn’t happen.  Once a mnemonic file is saved (and that happens prior to compiling), these markers are removed from the file.  I guess if you find this isn’t the case, let me know.  I can add the filter down into the MARCEngine level, but I’d rather not, as there are cases where these values may be present (legally)…this is why the filtering happens in the Editor, where it can assess their use and if the markers are present already, determine if they are used correctly.

Downloads can be picked up through the automated update tool, or via



Subscribe to code4lib aggregator