You are here

Feed aggregator

Islandora: Cross-Community Collaboration at Hydra Dev Sprint

planet code4lib - Wed, 2016-09-28 14:58

From September 19 - 23, a group of 16 developers from various institutions, with users of Hydra, Islandora, and Fedora, got together in State College, PA, to work on Fedora 4 features for the community. They worked in three teams:

  1. Workflow
  2. Fedora Import/Export
  3. Admin Dashboard

Dan Coughlin has provided a summary of their accomplishments:


The workflow team extracted the database-backed workflow implementation from Notre Dame’s Sipity application into CurationConcerns, using Princeton’s Plum workflow as an initial target for modeling multiple configurable workflows. Once completed, this work will enable workflows that will support mediated deposit approval workflows, digitization & metadata augmentation/review workflows, and takedown/revocation workflows. There are tickets for the remaining work in the CurationConcerns github repository, and there is a call being set up next week between some of the folks from the workflow team and the team that is working on the community sprint focused on mediated deposit. You can find more about the work in the workflow branch of CurationConcerns. Members of this team were Justin Coyne (Stanford), Jeremy Friesen (Notre Dame), Kyle Lawhorn (Cincinnati), and Michael Tribone (Penn State).

Fedora Import/Export

The import/export team started working on a BagIt implementation design including Bag Profile support for APTrust and MetaArchive. Their work included reviewing and updating documentation, and squashing bugs related to importing Fedora resources from the filesystem to prepare for an initial round of stakeholder feedback. The team finalized the tickets assigned to Phase 1 for import/export — more on the requirements and phases for this work. In December, some members of this team will begin Phase 2 of the sprint. There will be stakeholder calls in October and November to finalize the BagIt implementation design. In addition to work at the Fedora layer, support was added to CurationConcerns for running the Fedora import/export utility so that the tool can be called from the user interface. Members of this team were Esmé Cowles (Princeton), Karen Estlund (Penn State), Nick Ruest (York), Jon Stroop (Princeton), Andrew Woods (DuraSpace), and Adam Wead (Penn State).

Admin Dashboard

The administrative dashboard team added a configurable, extensible admin dashboard to CurationConcerns. The dashboard design allows flexible control over what appears in the dashboard menu, and in what order, in addition to what views are rendered and what data sources are used. The current implementation of the dashboard includes a pie chart widget displaying information about visibility of deposited works and also about embargoes and leases, allowing multiple levels of drill-down for more granular information. This early work has been merged into the master branch of CurationConcerns. To test how configurable and usable the new admin dashboard configuration is, the team started working on extending the CurationConcerns dashboard in Sufia and that currently sits in a branch. Remaining work has been ticketed using the 'admin dashboard’ label in both CurationConcerns and Sufia. Members of this team were Carolyn Cole (Penn State), Mike Giarlo (Stanford), Trey Pendragon (Princeton), Lynette Rayle (Cornell), and Thomas Scherz (Cincinnati).

with further details (and a group photo!) here

HangingTogether: Metadata reconciliation

planet code4lib - Wed, 2016-09-28 12:00

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Myung-Ja (MJ) Han of the University of Illinois at Urbana-Champaign. She wanted to find out why libraries do metadata reconciliation work, what kind of linked data sources libraries use and what libraries do with reconciled identifiers. Many libraries are performing metadata reconciliation work, such as searching matching terms from linked data sources and replacing strings in metadata records with URIs or storing URIs, as a necessary first step toward the linked data environment or as part of metadata enhancement work. Several conferences, notably Code4Lib, have offered preconferences or workshops to introduce tools for metadata reconciliation work. Europeana’s report on metadata enhancement work outlines the benefits and challenges of the semantic reconciliation and enrichment process based on its own experience.

Three main reasons why libraries do metadata reconciliation:

  • Efficient batch enhancement
  • Preparation for linked data by using identifiers (Uniform Resource Identifiers, or URIs) rather than text strings
  • Enhancing user services and discovery

Metadata reconciliation is done on a variety of data, including traditional MARC library data, digital collections, institutional repositories and archival materials.  Improving the quality of the data improves users’ experience in the short term, and will help with the transition to linked data later.

Most metadata reconciliation is done on personal names, subjects and geographic names. Sources used for such reconciliation include the Virtual International Authority File (VIAF), the Library of Congress’ linked data service (, the International Standard Name Identifiers (ISNI), the Getty’s Union List of Artists Names (ULAN), Art and Architecture Thesaurus (AAT), and Thesaurus of Geographic Names (TGN), Faceted Application of Subject Terminology (FAST), GeoNames, DBPedia and various national authority files. Selection of the source depends on the trustworthiness of the organization responsible, subject matter and richness of the information.

Much metadata reconciliation is devoted to normalizing variants. The University of Auckland, for example, has encountered Maori terms with dozens of spelling variants that had to be normalized to a preferred form. Large aggregators like Libraries Australia also must both normalize variant forms and remove duplicates; each incoming file submitted from individual institutions requires some level of data cleanup.  Much of this work requires manual checking and is time-consuming. Each institution is doing similar types of reconciliation—how can this work be shared? When someone makes a correction, how can we disseminate the correction?

A number of institutions have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace our current reliance on text strings.  Institutions have concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later on.  University of Washington has created an experimental RDA input form that generates identifiers for various descriptors like place of publication, edition, language and carrier.  The University of Michigan has developed a LCNAF Named Entity Reconciliation program using Google’s Open Refine that searches the VIAF file with the VIAF API for matches, looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading. This results in a dataset with the authorized LC Name Authority File heading paired with the original heading with a link to the URI of the LCNAF linked data service. It could be modified to bring in the VIAF identifier instead; it gets fair results even though it uses string matching.  A number of NACO contributors have started to include URIs of linked data sources in the 024 fields of authority records when they are confident of exact matches.

Some portion of terms cannot be matched to an existing entity identifier. How should libraries provide identifiers for entities not represented in any of the above sources? OCLC Research coins a “placeholder URI” that encodes the work identifier, type of entity and name (text string). For example:

If and when this placeholder URI can be replaced with a persistent URI, the placeholder URI can be “deprecated” using owl:sameAs.

University of Wisconsin-Madison has developed a prototype that indicates how linked data sources could be incorporated into user services.  It retrieves the URIs for person entities from its local Alma system, searches VIAF with the URI retrieved, and then extracts factual information from the links within a VIAF cluster such as biographies or abstracts from Getty’s ULAN or DBPedia and alma mater from Wikidata. See for example the bottom part of its catalog record for Gertrude Stein on Picasso.

Such demonstrations of the value of ingesting or fetching related information from different sources to improve discovery services help make the investment in metadata reconciliation and using identifiers worthwhile.


Note on the graphic: Example is provided by colleague Janifer Gatenby. All three identifiers are for “a” Russell Thomas, but only one of the identifiers listed is for the Russell Thomas pictured. A metadata specialist would need to determine the correct one.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

Mail | Web | Twitter | More Posts (69)

Open Knowledge Foundation: Africa Open Data Collaboration Fund: Building civil society capacity for Africa’s emerging data revolution.

planet code4lib - Wed, 2016-09-28 09:00

Open Knowledge International is a member of Open Data for Development (OD4D), a global network of leaders in the open data community, working together to develop open data solutions around the world. In collaboration with the OD4D program, Open Knowledge International coordinates the Africa Open Data Collaboration Fund which is designed to provide seed support to empower Africa’s emerging open data civic organisations and entrepreneurs.

Almost a year ago in Dar es Salaam, the first Africa Open Data Conference brought together information activists, government officials and technologists to push forward an ambitious agenda for open data on the continent. The two-day conference highlighted several opportunities and challenges for open data in Africa with a general consensus being that Africa only stands to gain from both actively engaging with and shaping the data revolution.

The conference ended with members of the Open Data for Development (OD4D) network setting up the Africa Open Data Collaboration (AODC) Fund to provide seed funding to innovative civil society organisations interested in using or producing open data. Some of the areas selected to be supported by the fund were data-driven journalism, community-focused data collection using open data tools, government budgets and spending and ICT skills building.

As members of the growing Africa Open Data community met recently in Nairobi on August 17th and 18th for the East Africa Open Data Festival, it is only fitting to provide an update of the work that recipient organisations have done so far with the AODC Fund. The following projects are currently being managed by Open Knowledge International (OKI).

HeHe Labs, a mobile technology organisation focused on developing socially-relevant applications, has been working on the HeHe Labs Code Clubs project which aims to enhance research and collaboration among students by training embedded fellows. Despite no direct use of Open Data during this fellowship training, HeHe Labs is interested in incorporating these skills into the on-going learning of its current and future fellows. Fellows’ technical skills also present an avenue to build the scarce open data expertise across the continent. HeHe Labs has also created the InventEd platform to foster collaboration and research among youth and are currently working with both public and private organisations including universities to scale up it adoption.

Some of the fellows trained by HeHe Labs to lead high school Code Clubs in Rwanda

Construction Sector Transparency Initiative (CoST) Tanzania is working to improve CSO assessment of new construction in Tanzania through developing an easy-to-use assessment tool and then make this data open and available for everyone.  CoST is currently working with a consultant from Uganda to replicate the assessment tool used in Uganda with the goal of having a Tanzanian version in use by local CSOs in the coming months.  Training sessions on how to use the  tool will follow and  provide an opportunity to learn best practices that can be used in advocacy work.

Outbox, a civic technology hub based in Uganda, is designing low-cost urban environmental stations for Kampala. In partnership with National Information Technology Agency of Uganda (NITA-U), their locally developed sensors will measure environmental conditions in Kampala and make this data open. This project intersects with work around smart and open cities, environment and Internet of Things (IoT). Priority is being given to the partnership with NITA-U to ensure project adoption and sustainability.

Women Environmental Programme (WEP) wants to promote transparency and accountability in local governments through open data. With a deep understanding of the local community contexts and partnerships with both the National Bureau of Statistics and OKI, WEP designed a survey to collect data on the availability of various public services at the community level. So far, WEP has completed a 7-day data collection exercise of 160 local communities in 3 Abuja area council on top of training 20 data collectors to use the Kobo Toolbox, a mobile data collection tool. The data will be made open and analysed to provide insights to communities and governments.

Data collectors with WEP and OKI trainers after two-day data collection training

Afro Leadership is a civil society organisation based in Cameroon that is working to fight corruption, improve local accountability and ensure effective service delivery by collecting and publishing  approved budgets and accounts for all local authorities on the OpenSpending Cameroon platform. Additionally, they will strengthen the capacity of journalists and civil society actors to understand budget data by providing a number of offline trainings and developing online resources and courses, all in collaboration with School of Data. The Afro Leadership team is in the process of collecting the 2015 budget reports from all of the regions in Cameroon.

Charlie Ngounou (4th from left in front) of AfroLeadership with members CSOs in Cameroon

The Association of Freelance Journalists in Kenya (AFJ) is working to train 6 of its members in skills relevant to produce data stories in Kenya and will be working with OKI to adapt School of Data materials and the World Bank Sudan Data Journalism Program resources into a curriculum. The goal of the training is to build a team that will be capable of producing data stories as a norm. The outcome of the data journalism training will be to provide a minimum of 12 stories (2 per AFJ participant) that will be published on AFJ’s TalkAfrica platform.

How you can get involved

As this is the first iteration of the AODC Fund, we value the lessons these organisations accrue through their projects. The goal is not only to obtain tangible project outcomes but gain a better understanding of what it takes to build open data capacity in Africa. Insights into how much it costs to do data-driven journalism, to produce data on environmental conditions in cities, or to train the next generation of data practitioners in Africa will help shape how we approach open data capacity development in the future. The AODC Fund’s vision is to expose civil society organisations to the open data space, strengthen their capacity and set them on a path to be champions of open data in their local contexts.

In the meantime, if you’re reading this and  have any lessons to share or want to connect with any of the organisations above, please reach out. I am very active on Twitter.

An earlier version of this piece originally appeared on the OD4D website.

LibUX: Public library users are mobile

planet code4lib - Wed, 2016-09-28 02:39

Half (49%) of those who have visited a public library website in the past year used handheld mobile devices (such as smartphones or tablets). — Pew Research Center

LibUX: Usage Trends in Libraries 2016 – Pew Research Center

planet code4lib - Wed, 2016-09-28 02:31

Surprise, surprise. Searching the catalog remains the bread and butter of library web design.

LibUX: 2016 Statewide Keynote: Harper Reed

planet code4lib - Wed, 2016-09-28 01:41

So Courtney McDonald tweeted: “dude, schedule an hour to watch the #statewide16 keynote by Harper Reed.” I totally agree.

DuraSpace News: Third Annual Meeting of the German DSpace User Group

planet code4lib - Wed, 2016-09-28 00:00

The German DSpace User Group Meeting 2016 took place at ZBW – Leibniz Information Centre for Economics in Hamburg on September 27th, 2016. This was the third annual meeting for the group. With 45 participants, it was the largest meeting so far. High attendance was due in part to the momentum that DSpace has in Germany right now.  A few new institutions participated that were already working on their migration or are considering migrating to DSpace.

DuraSpace News: 4Science, A "Young" Registered Service Provider with a lot of Experience!

planet code4lib - Wed, 2016-09-28 00:00

By Susanna Mornati, 4Science  At 4Science,, we are all very proud to appear for the first time in the DuraSpace Digest.

LibUX: Library Usage and Trends

planet code4lib - Tue, 2016-09-27 19:41

In this episode I’m joined by not one but two (!) guests. Carli Spina and Emily King come on the show to talk about the 2016 Libraries Usage and Trends Report published earlier this month by the Pew Research Center.

Stream and subscribe

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.

LITA: LITA Forum early bird rates extended

planet code4lib - Tue, 2016-09-27 18:22
We’ve extended the LITA members early bird registration another two weeks, so there’s still time to register for the 2016 LITA Forum at the early bird rate and save $50

Fort Worth, TX
November 17-20, 2015

LITA Forum early bird rates now will end October 14, 2016
Register Now!

Join us in Fort Worth, Texas, at the Omni Fort Worth Hotel located in Downtown Fort Worth, for the 2016 LITA Forum, a three-day education and networking event featuring 2 preconferences, 3 keynote sessions, more than 55 concurrent sessions and 25 poster presentations. It’s the 19th annual gathering of the highly regarded LITA Forum for technology-minded information professionals. Meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Registration is limited in order to preserve the important networking advantages of a smaller conference. Attendees take advantage of the informal Friday evening reception, networking dinners and other social opportunities to get to know colleagues and speakers.

Why attend the LITA Forum

Tune in to #LITAchat Friday, September 30, 2016, at noon Central time to learn about the 2016 LITA Forum with guest tweeters from the Forum Planning Committee. From #litaforum, they will discuss the upcoming LITA Forum, November 17-20, in Fort Worth, Texas: why you should attend, what to expect, how to get the most out of the experience, and much more! To participate, launch your favorite Twitter client and check out the #LITAchat hashtag. On the web client, just search for #LITAchat and then click “LIVE” to follow along. Ask questions using the hashtag #LITAchat, add your own comments, and even answer questions posed by other participants.

Register now to receive the LITA members early bird discount:

  • LITA member early bird rate: $340
  • LITA member regular rate: $390

Keynote Speakers:

  • Cecily Walker, Vancouver Public Library
  • Waldo Jaquith, U.S. Open Data
  • Tara Robertson, @tararobertson

The Preconference Workshops:

  • Librarians can code! A “hands-on” computer programming workshop just for librarians
  • Letting the Collections Tell Their Story: Using Tableau for Collection Evaluation

Comments from past attendees:

“Best conference I’ve been to in terms of practical, usable ideas that I can implement at my library.”
“I get so inspired by the presentations and conversations with colleagues who are dealing with the same sorts of issues that I am.”
“After LITA I return to my institution excited to implement solutions I find here.”
“This is always the most informative conference! It inspires me to develop new programs and plan initiatives.”

Forum Sponsors:

OCLC, Yewno

Get all the details, register and book a hotel room at the 2016 Forum website.

See you in Fort Worth.

Terry Reese: Note on Automatic Updates

planet code4lib - Tue, 2016-09-27 17:45

Please note, MarcEdit’s Automated update tool will notify you of the update, but you will need to manually download the file from:  My web host, Bluehost, has made a change to their server configuration that makes no sense, but ultimately, dumps traffic sent from non-web browsers (connections without a user-agent).  Right now, users will get this message when they attempt to download using the automatic update:

I can accommodate the requirements that they have setup now, but it will mean that users will need to do manual downloads for the current update posted 9/27/2016 and the subsequent update — which I’ll try to get out tonight or tomorrow.

I apologize for the inconvenience, but after spending 8 hours yesterday and today wrangling with them and trying to explain what this breaks (because I have some personal tools that this change affects), I’m just not getting anywhere.  Maybe something will magically change, maybe not — but for now I’ll be rewriting the update process to try and protect from these kinds of unannounced changes in the future.

So again, you’ll want to download MarcEdit from since the automatic update download connection is currently being dumped by my web host.

LITA: Volunteers needed to help with privacy initiative

planet code4lib - Tue, 2016-09-27 17:14
Are you interested in improving privacy in libraries all across the country? If so, we need your help! The recently-released ALA Library Privacy Guidelines are a great collection of the standards and practices that libraries should be putting into place to protect users’ digital information. A small group of us is now working on creating checklists and resource guides for each set of guidelines in order to help real live library staff implement these guidelines with ease. And we’re looking for volunteers to help! We need folks to help out with developing checklists for the following. We’re particularly hoping to find people with experience in school libraries and networked services, but we’ll take all willing volunteers!
  1. Library Privacy Guidelines for Public Access Computers and Networks
  2. Library Privacy Guidelines for Library Websites (social media), OPACs, and Discovery Services
  3. Library Privacy Guidelines for Library Management Systems
  4. Library Privacy Guidelines for Data Exchange Between Networked Devices and Services
  5.  Library Privacy Guidelines for E-book Lending and Digital Content Vendors
  6. Library Privacy Guidelines for Students in K-12 Schools

If you’re able and interested in putting in a few hours to help us out with this project, pop me an email at with what you can help out with. And thank you!

Terry Reese: MarcEdit Update (Windows/Linux)

planet code4lib - Tue, 2016-09-27 04:10

I’ve posted a new set of updates.  The initial set is for Windows and Linux.  I’ll be posting Mac updates later this week.  Here’s the list of changes:

  • Behavior Change — Windows/Linux: Intellisense turned off by default (this is the box that shows up when you start to type a diacritic) for new installs. As more folks use UTF8, this option makes less sense. Will likely make plans to remove it within the next year.
  • Enhancement: Select Extracted Records: UI Updates to the import process.
  • Enhancement: Select Extracted Records: Updates to the batch file query.
  • Behavior Change: Z39.50 Client: Override added to the Z39.50 client to enable the Z39.50 client to override search limits. Beware, overriding this option is potentially problematic.
  • Update: Linked Data Rules File: Rules file updated to add databases for the Japanese Diet library, 880 field processing, and the German National Library.
  • Enhancement: Task Manager: Added a new macro/delimiter. {current_file} will print the current filename if set.
  • Bug Fix: RDA Helper – Abbreviation expansion is failing to process specific fields when config file is changed.
  • Bug Fix: MSXML Engine – In an effort to allow the xsl:strip-whitespace element, I broke this process. The work around has been to use the engine. However, I’ll correct this. Information on how you emulate the xsl:strip-whitespace element will be here:
  • Bug Fix: Task Manager Editing – when adding the RDA Helper to a new task, it asks for file paths. This was due to some enhanced validation around files. This didn’t impact any existing tasks.
  • Bug Fix: UI changes – I’m setting default sizes for a number of forms for usability
  • Bug Fix/Enhancement: Open Refine Import – OpenRefine’s release candidate changes the tab delimited output slightly. I’ve added some code to accommodate the changes.
  • Enhancement: MarcEdit Linked Data Platform – adding enhancements to make it easier to add collections and update the rules file
  • Enhancement: MarcEdit Linked Data Platform – updating the rules file to include a number of new endpoints
  • Enhancement: MarcEdit Linked Data Platform – adding new functionality to the rules file to support the recoding of the rules file for UNIMARC.
  • Enhancement: Edit Shortcut – Adding a new edit short cut to find fields missing words
  • Enhancement: XML Platform – making it clearer that you can use either XQuery or XSLT for transformations into MARCXML
  • Enhancement: OAI Harvester – code underneath to update user agent and accommodate content-type requirements on some servers.
  • Enhancement: OCLC API Integration – added code to integrate with the validation. Not sure this makes its way into the interface yet, but code will be there.
  • Enhancement: Saxon.NET version bump
  • Enhancement: SPARQL Explorer – Updating the sparql engine to give me more access to low level data manipulation
  • Enhancement: Autosave option when working in the MarcEditor. Saves every 5 minutes. Will protect against crashes. data

Downloads are available from the downloads page (


DuraSpace News: Welcome Heather Greer Klein: Hosted Services Customer Specialist

planet code4lib - Tue, 2016-09-27 00:00

Austin, TX  DuraSpace is pleased to announce that Heather Greer Klein accepted the position as hosted services customer specialist effective October 26, 2016. In this role, Heather will work closely with the DuraSpace hosted services team to manage the lead to sale process of DuraSpace hosted services (DuraCloud, DSpaceDirect, ArchivesDirect) including customer service, product pricing, new account set up, onboarding, and training.

SearchHub: Solr Distributed Indexing at WalmartLabs

planet code4lib - Mon, 2016-09-26 21:57

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Shenghua Wan’s talk, “Solr Distributed Indexing at WalmartLabs”.

As a retail giant, Walmart provides millions of items’ information via its e-commerce websites, and the number grows quickly. This calls for big data technologies to index the documents. Map-Reduce framework is a scalable and high-available base on top of which the distributed indexing can be built. While original Solr has a map-reduce index tool, there exist some barriers which makes it unable to deal with Walmart’s use case easily and efficiently. In this case study, Shenghua demonstrates a way to build your own distributed indexing tool and optimize the performance by making the indexing stage a map-only job before they are merged.

Shenghua Wan is a Senior Software Engineer on the Polaris Search Team at WalmartLabs. His focus is applying big data technologies to deal with large-scale product information to be searched online.

Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Solr Distributed Indexing at WalmartLabs appeared first on

Library of Congress: The Signal: Collections as Data Tomorrow

planet code4lib - Mon, 2016-09-26 18:10

Tomorrow, September 27, 2016, NDI is hosting our Collections as Data symposium, which will be free and open to the public. We’re really excited about the speakers we have lined up for the day, and hope you can join is in person or through the live-streamed video.

In preparation for the event, our colleagues in Library Services put together a wonderful collection of resources to help orient researchers using Library of Congress collections as data:

Using Collections as Data?

Research Guidance Is Available!

Digitized Collections and Items from Library of Congress

Torch of Learning. Photo by Carol Highsmith, 2007.

LITA: Transmission #10 – Season 1 Finale

planet code4lib - Mon, 2016-09-26 17:51

In the final episode of our first season, I’m telling you about the intention and future of the program, and a little more about myself. I’m also putting out a call for bloggers, contributors, innovators and visionaries! Collaborate with me- send an email to lindsay dot cronk at gmail dot com!

Stay tuned for changes coming in two weeks on October 10th!

Karen Coyle: 2 Mysteries Solved!

planet code4lib - Mon, 2016-09-26 16:57
One of the disadvantages of a long tradition is that the reasons behind certain practices can be lost over time. This is definitely the case with many practices in libraries, and in particular in practices affecting the library catalog. In U.S. libraries we tend to date our cataloging practices back to Panizzi, in the 1830's, but I suspect that he was already building on practices that preceded him.

A particular problem with this loss of history is that without the information about why a certain practice was chosen it becomes difficult to know if or when you can change the practice. This is compounded in libraries by the existence of entries in our catalogs that were created long before us and by colleagues whom we can no longer consult.

I was recently reading through volume one of the American Library Journal from the year 1876-1877. The American Library Association had been founded in 1876 and had its first meeting in Philadelphia in September, 1876. U.S. librarianship finally had a focal point for professional development. From the initial conference there were a number of ALA committees working on problems of interest to the library community. A Committee on Cooperative Cataloguing, led by Melvil Dewey, (who had not yet been able to remove the "u" from "cataloguing") was proposing that cataloging of books be done once, centrally, and shared, at a modest cost, with other libraries that purchased the same book. This was realized in 1902 when the Library of Congress began selling printed card sets. We still have cooperative cataloging, 140 years later, and it has had a profound effect on the ability of American libraries to reduce the cost of catalog creation.

Other practices were set in motion in 1876-1877, and two of these can be found in that inaugural volume. They are also practices whose rationales have not been obvious to me, so I was very glad to solve these mysteries.
Title caseSome time ago I asked on Autocat, out of curiosity, why libraries use sentence case for titles. No one who replied had more than a speculative answer. In 1877, however, Charles Ammi Cutter reports on The Use of Capitals in library cataloging and defines a set of rules that can be followed. His main impetus is "readability;" that "a profusion of capitals confuses rather than assists the eye...." (He also mentions that this is not a problem with the Bodleian library catalog, as that is written in Latin.)

Cutter would have preferred that capitals be confined to proper names, eschewing their use for titles of honor (Rev., Mrs., Earl) and initialisms (A.D). However, he said that these uses were so common that he didn't expect to see them changed, and so he conceded them.

All in all, I think you will find his rules quite compelling. I haven't looked at how they compare to any such rules in RDA. So much still to do!
CentimetersI have often pointed out, although it would be obvious to anyone who has the time to question the practice, that books are measured in centimeters in Anglo-American catalogs, although there are few cultures as insistent on measuring in inches and feet than those. It is particularly un-helpful that books in libraries are cataloged with a height measurement in centimeters while the shelves that they are destined for are measured in inches. It is true that the measurement forms part of the description of the book, but at least one use of that is to determine on which shelves those books can be placed. (Note that in some storage facilities, book shelves are more variable in height than in general library collections and the size determination allows for more compact storage.) If I were to shout out to you "37 centimeters" you would probably be hard-pressed to reply quickly with the same measurement in inches. So why do we use centimeters?

The newly-formed American Library Association had a Committee on Sizes. This committee had been given the task of developing a set of standard size designations for books. The "size question" had to do with the then current practice to list sizes as folio, quarto, etc. Apparently the rise of modern paper making and printing meant that those were no longer the actual sizes of books. In the article by Charles Evans (pp. 56-61) he argued that actual measurements of the books, in inches, should replace the previous list of standard sizes. However, later, the use of inches was questioned. At the ALA meeting, W.F. Poole (of Poole's indexes) made the following statement (p. 109):
"The expression of measure in inches, and vulgar fractions of an inch, has many disadvantages, while the metric decimal system is simple, and doubtless will soon come into general use."The committee agreed with this approach, and concluded:
"The committee have also reconsidered the expediency of adopting the centimeter as a unit, in accordance with the vote at Philadelphia, querying whether it were really best to substitute this for the familiar inch. They find on investigation that even the opponents of the metric system acknowledge that it is soon to come into general use in this country; that it is already adopted by nearly every other country of importance except England; that it is in itself a unit better adapted to our wants than the inch, which is too large for the measurement of books." (p. 180)
The members of the committee were James L. Whitney, Charles A. Cutter, and Melvil Dewey, the latter having formed the American Metric Bureau in July of 1876, both a kind of lobbying organization and a sales point for metric measures. My guess is that the "investigation" was a chat amongst themselves, and that Dewey was unmovable when it came to using metric measures, although he appears not to have been alone in that. I do love the fact that the inch is "too large," and that its fractions (1/16, etc.) are "vulgar."

Dewey and cohort obviously weren't around when compact discs came on the scene, because those are measured in inches ("1 sound disc : digital ; 4 3/4 in"). However, maps get the metric treatment: "1 map : col. ; 67 x 53 cm folded to 23 x 10 cm". Somewhere there is a record of these decisions, and I hope to come across them.

It would have been ideal if the U.S. had gone metric when Dewey encouraged that move. I suspect that our residual umbilical chord linking us to England is what scuppered that. Yet it is a wonder that we still use those too large, vulgar measurements. Dewey would be very disappointed to learn this.

So there it is, two of the great mysteries solved in the record of the very first year of the American library profession. Here are the readings; I created separate PDFs for the two most relevant sections:

American Library Journal, volume 1, 1876-1877 (from the Internet Archive)
Cutter, Charles A. The use of capitals. American Library Journal, v.1, n. 4-5, 1877. pp. 162-166
The Committee on Sizes of Books, American Library Journal, v.1, n. 4-5, 1877, pages 178-181

Also note that beginning on page 92 there is a near verbatim account of every meeting at the first American Library Association conference in Philadelphia, September, 1876. So verbatim that it includes the mention of who went out for a smoke and missed a key vote. And the advertisements! Give it a look.

Library of Congress: The Signal: 2016-2017 Class of National Digital Stewardship Residents Selected

planet code4lib - Mon, 2016-09-26 15:15

Five new National Digital Stewardship Residents will be joining the Library in late September 2016. Selected from a competitive pool and representing five different library schools, the residents bring a range of skills and experience in working with digital and archival collections. The NDSR program offers recent graduates an opportunity to gain professional experience under the guidance of a mentor. They will acquire hands-on knowledge and skills in the collection, selection, management, long-term preservation and accessibility of digital assets.

Throughout the year, residents and their mentors will attend digital stewardship workshops at the Library of Congress and at one of their five host institutions in the greater Washington, D.C. region.

  • Meredith Broadway of Dallas, Texas, has a Master of Science in Data Curation and Certificate in Special Collections from the University of Illinois at Urbana-Champaign, and a bachelor’s degree from Rhodes College. Meredith will be a resident at the World Bank Group focusing on an assessment framework and appraisal guidelines for identification of data for permanent preservation; a set of analytic process document guidelines to enable documentation of processes used in the collection and analysis of data; and guidelines for linking datasets to related documents and analytical reports.
  • Joseph Carrano of Middlebury, Connecticut, has dual Master’s degrees from the University of Maryland in History and Library Science, and a bachelor’s degree from the University of Connecticut. Joe will be part of a team at the Georgetown University Library developing open-source project guidelines, documentation and workflows for different preservation platforms. He will be involved in all stages of the process of inventory, selection, curation, preparation and ingest of files of all formats.
  • Elizabeth England of Washington, DC, has a Masters degree in Library and Information Science from the University of Pittsburgh, and a Bachelor’s degree from Drew University. Elizabeth will be a resident in the University Archives at the Johns Hopkins University Sheridan Libraries, applying core archival functions such as appraisal, accessioning, processing, preservation, description, and provision of access to a 50 terabyte collection of born-digital photographs, using scripting languages and tools that are vital to manipulating large data sets.
  • Amy Gay of Binghamton, New York, has a Masters degree in Library and Information Science from Syracuse University, and a bachelor’s degree from the State University of New York, Oneonta. Amy will be a resident at the Food & Drug Administration, Office of Science & Engineering Laboratories, Center for Devices & Radiological Health, working on the “CDRH Science Data Catalogue Pilot”; a joint project to develop a searchable digital catalog for data sets, software code, computational models, images and more as part of federally mandated public access efforts. She will lead catalog content acquisition and curation, as well as refining the metadata schema and taxonomy.
  • Megan Potterbusch of Nashville, Tennessee, has a master’s degree from the School of Library and Information Science at Simmons College, and a bachelor’s degree from Earlham College. Megan will serve as a resident at the Association of Research Libraries working in partnership with the George Washington University Libraries and the Center for Open Science to prototype the process of linking the output from a university research unit to a library digital repository through the Open Science Framework — an open source tool that integrates and supports research workflow.


Subscribe to code4lib aggregator