Sunday night there wasn’t a cloud in the sky in Boston, and so we were fortunate to get a clear view of the rare supermoon eclipse. I took a telescope out to the backyard with my kids and we worked to line up the equipment, and then we chatted about astronomy, optics, and physics, umbras and penumbras. A moment of science? Yes, but ultimately much more.
The eclipse lasted for several hours, and the science part was quickly dispatched. That left plenty of time for greater thoughts to play out, as we were awed by the spectacle. My mind drifted to Mary Shelley, Lord Byron, and their poetry and prose from the creepily dark summer of 1816; the apocalyptic paintings of John Martin; the impact of eclipses on ancient Jerusalem; and entreaties against the fearful night in the Book of Common Prayer (so evocatively described by Alan Jacobs in his recent “biography” of that Anglican prayerbook). In short, I experienced the lunar eclipse simultaneously through the lenses of the telescope and the humanities.
Undoubtedly others had different literary, artistic, philosophical, religious, or historical thoughts come to mind. (As well as less highbrow allusions: for some reason I also thought of Space: 1999.) But it’s impossible to imagine our experience of a lunar eclipse without the framing of our shared culture. We are humans, not machines, and we do not experience daily life—or awe-inspiring events—mechanically. We are constantly applying our understanding of the past, of writing and interpretations, of the spirit and art, to what we see and do.
The National Endowment for the Humanities has been supporting and broadly communicating that profound understanding for 50 years. Their anniversary website shows the incredible breadth and depth of their programs, projects, and topics. The NEH has not stood still, either; the establishment of the Office of Digital Humanities a decade ago, for instance, catalyzed an incipient field and led to productive commerce between the humanities and many other fields, including the sciences.
And the NEH has been a leading supporter of the Digital Public Library of America, which we hope will serve as a storehouse of shared—and open—culture for the next 50 years and beyond. We salute the National Endowment for the Humanities on their fiftieth, and thank them once again for underwriting the full range of human experience.
As many of you may or may not know, I collect water, and it seems as if my water collection predicts the future, sort of.
Since 1979 or so, I’ve been collecting water.  The purpose of the collection is/was enable me to see and experience different parts of the world whenever I desired. As the collection grew and my computer skills developed, I frequently used the water collection as a kind of Guinea pig for digital library projects. For example, my water collection was once manifested as a HyperCard Stack complete with the sound of running water in the background. For a while my water collection was maintained in a FileMaker database that generated sets of HTML. Quite a number of years ago I migrated everything to MySQL and embedded images of the water bottles in fields of the database. This particular implementation also exploited XML and XSLT to dynamically make the content available on the Web. (There was even some RDF output.) After that I included geographic coordinates into the database. This made it easy for me to create maps illustrating whence the water came. To date, there are about two hundred and fifty waters in my collection, but active collecting has subsided in the past few years.
But alas, this past year I migrated my co-located host to a virtual machine. In the process I moved all of my Web-based applications — dating back more than two decades — to a newer version of the LAMP stack, and in the process I lost only a single application — my water collection. I still have all the data, but the library used to integrate XSLT into my web server (AxKit) simply would not work with Apache 2.0, and I have not had the time to re-implement a suitable replacement.
Concurrently, I have been negotiating a two-semester long leave-of-absence from my employer. The “leave” has been granted and commenced a few of weeks ago. The purpose of the leave is two-fold: 1) to develop my skills as a librarian, and 2) to broaden my experience as a person. The first part of my leave is to take a month-long vacation, and that vacation begins today. For the first week I will paint in Tuscany. For the second week I will drink coffee in Venice. During the third week I will give a keynote talk at ADLUG in Rome.  Finally, during the fourth week I will learn how to make croissants in Provence. After the vacation is over I will continue to teach “XML 101” to library school graduate students at San Jose State University.  I will also continue to work for the University of Notre Dame on a set of three text mining projects (EEBO, JSTOR, and HathiTrust). [4, 5, 6]
As I was getting ready for my “leave” I was rooting through my water collection, and I found four different waters, specifically from: 1) Florence, 2) Venice, 3) Rome, and 4) Nice. As I looked at the dates of when the water was collected, I realized I will be in those exact same four places, on those exact same four days, exactly thirty-three years after I originally collected them. My water collection predicted my future. My water collection is a sort of model of me and my professional career. My water collection has sent me a number of signs.
This “leave-of-absence” (which in not really a leave nor a sabbatical, but instead a temporary change to adjunct faculty status) is a whole lot like going to college for the first time. “Where in the world am I going? What in the world am I going to do? Who in the world will I meet?” It is both exciting and scary at once and at the same time. It is an opportunity I would be foolish to pass up, but it is not as easy as you might imagine. That said, I guess I am presently an artist- and librarian-at-large. I think I need new, albeit temporary, business cards to proclaim my new title(s).
Wish me luck, and “On my mark. Get set. Go!”
- blog postings describing my water collection – http://infomotions.com/blog/2009/09/water-1-of-3/
- ADLUG – http://www.adlug.net
- “XML 101” at SJSU – http://ischoolapps.sjsu.edu/facultypages/view.php?fac=morgane
- EEBO browser – https://github.com/ndlib/text-analysis-eebo
- JSTOR browser – https://github.com/ndlib/text-analysis-jstor
- HathiTrust browser – https://github.com/ndlib/text-analysis-htrc
This is a guest post from Richard Pietro the writer and director of Open.
If you’re reading this, you’re likely familiar with the terms Open Government, Open Data, and Open Source. You probably understand how civic engagement is being radically transformed through these movements.
Therein lays the challenge: How can we reach everyone else? The ones who haven’t heard these terms and have little interest in civic engagement.
Here’s what I think: Civic engagement is a bad brand. If we’re to capture the attention of more people, we need to change its brand for the better.
When most people think of civic engagement, they probably imagine people in a community meeting somewhere yelling at each other. Or, maybe they picture a snooze-fest municipal planning and development consultation. Who has time to fit that in with everything else going on in their lives? I think most people would prefer to invest their spare time on something they’re passionate about; not sitting in a stuffy meeting! (If stuffy meetings ARE your passion, that’s cool too!)
Civic engagement is seen as dry and boring, or meant solely for the hyper-informed, hyper-engaged, policy-wonk. Between these two scenarios, you feel your voice will never be heard – so why bother? Civic engagement has bad PR. It isn’t viewed as fun for most people. Plus, I think there’s also an air of elitism, especially when it’s spoken as a right, duty, privilege, or punishment (judges issue community service as a punishment).
That’s why I’ve adopted a different perspective: Civic Engagement as Art. This was motivated via Seth Godin’s book “Linchpin” where he suggests that art shouldn’t only be thought of as fine art. Rather, he argues that art is a product of passion; art is creating something, and that’s what civic engagement is all about – creating something in your community that comes from passion.
I’m hoping that Open will introduce Open Government, Open Data, and Open Source to new people in simply because it is being done in a new way. My intention is to begin changing the civic engagement brand by having fun with it.
For example, I call myself an Open Government Fanboy, so Open uses as many pop-culture and “fanboy-type” references as we could squeeze in. As a matter of fact, I call the film a “spoofy adaptation” of The Matrix. What we did was take the scene where Morpheus is explaining to Neo the difference between the “Real World” and the “Matrix” and adapts it to the “Open World” versus the “Closed World.” We also included nods to Office Space, The Simpsons, Monty Python, and Star Trek.
As a bonus, I’m hoping that these familiar themes and references will make it easier for “newbies” to understand Open Government, Open Data, and Open Source space.
So, without further Apu (Simpsons fans will get it), I give you Open – The World’s first short film on Open Government, Open Data, and Open Source.
THE TEAM BEHIND OPEN
Writer and Director: Richard Pietro
Screenplay: Richard Pietro & Rick Weiss
Executive Producers: Keith Loo and Bruce Chau
Cinematographers: Gord Poon & Mike Donis
Technical Lead: Brian Wong
Composer and Sound Engineer: GARU
Actors: Mish Tam & Julian Friday
I attended the 79th Annual Meeting of the Society of American Archivists (SAA) last month in Cleveland, Ohio and was invited to participate on the Research Libraries Roundtable panel on Data Management and Curation in 21st Century Archives. Dan Noonan, e-Records/Digital Resources Archivist, moderated the discussion. Wendy Hagenmaier, Digital Collections Archivist, Georgia Tech Library and Sammie Morris, Director, Archives and Special Collections & University Archivist, Purdue University Libraries joined me on the panel. Between the three of us there was a nice variety of perspectives given our different experiences and interests.
I discussed my presentation in an earlier blog post – Part 1: Managing and Curating Data with Reuse in Mind. In this post I highlight key points from Wendy and Sammie’s presentations. What made an impression on me was whether and how they and their colleagues came to value each other’s complementary skill, experience, and expertise needed to manage and curate data.
Do you value complementarities?
Wendy discussed her collaboration with Lizzy Rolando, Research Data Librarian, Georgia Tech Library. She likened their experience to Susan and Sharon from Hayley Mills’ 1961 film The Parent Trap. Wendy described herself and Lizzy as “twins separated by silly professional silos”. Working together they found several areas of convergence and divergence around workflows, copyright, data integrity, security and reusability, and funding curation. Wrestling with their differences has changed Wendy’s thoughts about archival theory and practice. She has been inspired to place more emphasis on being a proactive partner during data creation; considering what a network-based, non-exclusive ownership model of archives might look like; identifying best practices for capturing dynamic cloud-based files and systems; ensuring born-digital collections are actually reusable; and creating pathways for products of reuse to be preserved and related back to the original record. She also is wondering how to leverage federal data sharing mandates to advocate for the resources required to build repositories and systems needed to provide access to born-digital archives.
Sammie discussed strategies to convince stakeholders that archivists should actively participate in data management and curation activities given their expertise in collecting, preserving, and providing access to unique collections. Like data, archival materials are under-described, often lack context, and are frequently complex, unpublished raw primary sources that present a plethora of management issues from privacy and intellectual property rights affecting access to preservation and security needs of one-of-a-kind materials. Archivists’ experiences with creating collecting policies, selecting and appraising unique collections for long-term value, negotiating privacy and copyright issues, and creating secure and trusted repositories can prove invaluable for data curation planning and decision making. A key strategy she used was articulating how archival theory and practice could be used to help institutions meet the ISO 16363 requirements for establishing trustworthy digital repositories.
I was not surprised about the amount of convincing Sammie had to do with campus stakeholders, because my research suggests the same thing when it comes to librarians. However, as an outsider looking in, I must admit I was surprised that librarians were included in the group of campus stakeholders that needed convincing. Although archivists and librarians have different areas of expertise, I thought they would have proactively joined forces to seize on the value of their complementarities. The work archivists and librarians could accomplish together, given their areas of expertise would seem to strengthen the argument that they have major roles in planning and implementing e-Research support on campus. Wendy’s presentation reinforced this thought, but her collaboration with Lizzy was expected as part of her job responsibilities.
It made me wonder how many archivist-librarian pairings exist on campuses engaged in e-Research support. If you are actively working together in an archivist-librarian pairing please comment or respond to this blog post. Tell us what sparked your collaboration. How has it changed your thinking about your professional practice? What have been your strategies for a successful collaboration? What value has it added? Are you finding that you’re stronger together?About Ixchel Faniel
Ixchel M. Faniel is a Research Scientist at OCLC. She is currently working on projects examining data reuse within academic communities to identify how contextual information about the data that supports reuse can best be created and preserved. She also examines librarians' early experiences designing and delivering research data services with the objective of informing practical, effective approaches for the larger academic community.Mail | Web | More Posts (3)
Hidden treasures are even better when they are discovered. The VIVO community wiki pages are one of those treasures. This section of the DuraSpace wiki offers the VIVO community a wealth of information, best practices and valuable resources that can assist institutions in implementing, managing and sharing VIVO data and resources. Here are highlights of what you will find in the VIVO Community pages.
Winchester, MA If you would like to keep up with DuraSpace news, events, opportunities and initiatives in Spanish please subscribe to DuraSpace Informe. This new bi-monthly newsletter will be published for the first time in the beginning of October and will feature current strategic information of interest to Spanish speaking users of DSpace, Fedora and VIVO open source projects.
From Danielle Pokusa, Thomson Reuters
DuraSpace News: FIND OUT How to Take Advantage of DSpaceDirect: Your Fast and Affordable Repository Solution
Winchester, MA Do you need a fast, efficient and affordable hosted repository solution? DSpaceDirect is a service offered by DuraSpace based on the popular DSpace open source repository software. DSpaceDirect provides access, management, and preservation of any content or file type in a hosted repository environment making digital resources discoverable by your users and easily managed by you.
Libraries: The quintessential community organization in the digital age
Given the critical role of local communities to our nation’s economic strength, ALA’s Alan Inouye, director of the Office for Information Technology Policy (OITP), is urging the presidential candidates to make communities, and specifically libraries in their role as community anchors, a central part of the candidates’ campaign conversations.
Libraries are now digitally-enabled community spaces with an array of technology training and resources, Inouye explains, noting that “Libraries, as well as other community anchor institutions, are best positioned to effect positive change towards an economy for the future that works for everyone.”
His op ed article is published in the Digital Beat Blog of the Benton Foundation.
Take a moment to check it out!
Today I found the following resources and bookmarked them on Delicious.
- Zulip A group chat application optimized for software development teams
Digest powered by RSS Digest
Attend this important new LITA webinar:
Tuesday October 6, 2015
1:30 pm – 3:00 pm Central Time
Register Online, page arranged by session date (login required)
In the wake of Edward Snowden’s revelations about NSA and FBI dragnet surveillance, Alison Macrina started the Library Freedom Project as a way to teach other librarians about surveillance, privacy rights, and technology tools that protect privacy. In this 90 minute webinar, she’ll talk about the landscape of surveillance, the work of the LFP, and some strategies you can use to protect yourself and your patrons online. Administrators, instructors, librarians and library staff of all shapes and sizes will learn about the important work of the Library Freedom Project and how they can help their patrons.
Alison’s work for the Library Freedom Project and classes for patrons including tips on teaching patron privacy classes can be found at: https://libraryfreedomproject.org/resources/onlineprivacybasics/
Is a librarian, privacy rights activist, and the founder and director of the Library Freedom Project, an initiative which aims to make real the promise of intellectual freedom in libraries by teaching librarians and their local communities about surveillance threats, privacy rights and law, and privacy-protecting technology tools to help safeguard digital freedoms. Alison is passionate about connecting surveillance issues to larger global struggles for justice, demystifying privacy and security technologies for ordinary users, and resisting an internet controlled by a handful of intelligence agencies and giant multinational corporations. When she’s not doing any of that, she’s reading.
Can’t make the date but still ant to join in? Registered participants will have access to the recorded webinar.
- LITA Member: $45
- Non-Member: $105
- Group: $196
Register Online, page arranged by session date (login required)
Mail or fax form to ALA Registration
call 1-800-545-2433 and press 5
Questions or Comments?
For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty, email@example.com
We are excited to announce that the first face-to-face Mashcat event in North America will be held on January 13th, 2016, at Simmons College in Boston, Massachusetts. We invite you to save the date, and we hope to have registration and a schedule for this low-cost (less than $10), 1-day event announced in November.
At present, we are accepting proposals for talks, events, panels, workshops or other for the Mashcat event. We are open to a variety of formats, with the reminder that this will be a one-day, single-track event aiming to support the cross-pollination goals of Mashcat (see more below). We are open to proposals for sessions led virtually. Please submit your proposals using this form. All proposals must be received by November 1st, 2015, midnight, and we will respond to all proposals by November 8th, 2015.
Not sure what Mashcat is? “Mashcat” was originally an event in the UK in 2012 which was aimed at bringing together people working on the IT systems side of libraries with those working in cataloguing and metadata. Three years later, Mashcat is a loose group of metadata specialists, cataloguers, developers and anyone else with an interest in how metadata in and around libraries can be created, manipulated, used and re-used by computers and software. The aim is to work together and bridge the communications gap that has sometimes gotten in the way of building the best tools we possibly can to manage library data.
Thanks for considering, and we hope to see you in January.
Library of Congress: The Signal: Stewarding Academic and Research Content: An Interview with Bradley Daigle and Chip German about APTrust
The following is a guest post by Lauren Work, digital collections librarian, Virginia Commonwealth University.
In this edition of the Insights Interview series for the NDSA Innovation Working Group, I was excited to talk with Bradley Daigle, director of digital curation services and digital strategist for special collections at the University of Virginia, and R. F. (Chip) German Jr., program director of the APTrust, about the Academic Preservation Trust.
Lauren: Tell us about the Academic Preservation Trust and how the organization addresses the needs of member institutions.
Bradley and Chip: The APTrust is a consortium of 17 members who believe that their combined expertise and experience can provide more efficient and effective means to answering the challenges of digital stewardship. The consortium’s objective is to establish new collaborative strategies to help in addressing the complex and daunting issue of preserving the digital scholarly content produced or managed by universities. The group draws upon the deep knowledge of its members to target specific solutions that are content, technological, and administratively focused. Each member has representatives that work locally with their organization and then bring that knowledge back to the larger collective. This dialogic approach provides the methodology by which challenges are identified, analyzed, and then addressed in the best manner possible for the consortium.
The consortium is governed by its members, and it is operated and managed by a small staff based at the University of Virginia Library. The core APTrust team organizes and deploys the resources of the group in an open, collaborative manner. We work to guide and seek guidance from the consortium itself.
Lauren: You mentioned that members work within their organization and share what they learned with the consortium. Could you talk a bit more specifically about what are members expected to contribute to APTrust? What are some of the resources from which members can benefit?
Bradley and Chip: The APTrust seeks to provide broad, scalable solutions that identify the true costs of preservation. In this manner, we hope to provide the economic and business models for digital preservation that any level of organization can adopt and deploy locally. Working together, we hope to create solutions that anyone can use.
To that end, members play a key role in seeking out both the problems and solutions to specific preservation challenges. For example, we have a current sub group of members who are focused specifically on the requirements for becoming a Trusted Digital Repository. This qualification is highly desired by some members but not necessarily everyone at the same level. Therefore, the ability to form special interest groups who can plumb the depths of a given issue and then bring a condensed version back to the collective is one of the many ways we use engagement and need to move the entire effort forward. We also have groups that are focused on our communications efforts as well as storage security. Some of these groups will disband once the initial work is concluded–others (like the TDR) represent the ongoing need for focused attention.
Lauren: You recently confirmed your mission statement, and the word “innovative” is used. How do you define or hope to define APTrust as an innovator in the field of digital preservation?
Bradley and Chip: The APTrust sees innovation as an ongoing goal. Preservation issues are not easily solved and once solutions are determined the problem set can mutate. Innovation means that we are striving for the best solution we can identify at the time and continue to identify and adapt and solve. Innovation is ongoing and the product of a great deal of collaborative effort on the part of everyone in the APTrust. We never see solutions as final but rather structures that need constant repair.
Lauren: Digital preservation is a daunting topic for many organizations, and the effort sometimes faces a “Why try?” stance. What advice do you have for those attempting to form digital preservation guidelines for their own organizations?
Bradley and Chip: The stewardship of our digital heritage is indeed an overwhelming and daunting task. It is always a matter of perspective–the best being an acknowledgement that we will never be able to accomplish it in its entirety. As with the physical realm, we can only hope to do our best at any given time. Digital preservation requires perspective and humility.
As with most efforts at this scale, often the most effective approach is to define the problem and then create a plan that speaks to what is possible for your organization. Define the scope, choose what is important and start in small but achievable chunks. As with collections, one must define the scope and not try to collect everything. Specialize if it makes sense for your organization–content type, format type, level of preservation. We have found it most useful to create levels of preservation–mapping to what is achievable by your organization and use that as a guide. Start somewhere and you will find you can make a difference, no matter how small.
Lauren: Technology changes quickly, and keeping up with evolving hardware, software and formats is an issue. As APTrust accepts all types of formats from its institutions, what advice do you have for librarians and archivists who need to make the preservation case for funding the technologies and infrastructure to support digital preservation in their organizations?
Bradley and Chip: This goes back to defining your organization’s levels of preservation. For example, the lowest level of preservation may simply be a piece of metadata that states something existed at one time but is no longer extant. The highest level may be the management of those digital files in an emulated environment. The crux of sustainability lies in overlapping two mutable matrices: a map of what preservation levels are meant to do overlaid on a technical implementation matrix that defines how that level can be accomplished. This way you can adapt to new trends in technology. The former matrix, that of collecting or preservation levels, should change very little over time. The technical implementation, however, should adapt to evolving trends.
Lauren: Digital preservation benefits are not immediate, and it can be difficult to demonstrate value, even for the immediate future. How did APTrust articulate the value of digital preservation and make the case for allocating current resources to reap long-term benefits?
Bradley and Chip: The APTrust consortium benefits from a shared belief that digital preservation is not a luxury service. We represent organizations whose mission it is to steward our cultural record–in whatever form it takes. The old adage of there being only two kinds of people: those who have lost data – and those who will lose data – applies here. Most organizations have taken on digital preservation in one manner or another. APTrust offers the ability to provide scalable services at cost–with the added benefit of collective problem solving. Certainly there are preservation solutions out there for any level of organization. However, in taking this singular approach, you are also taking the full brunt of solving each preservation solution on your own as it arises.
We believe that a consortial approach leverages the strengths of all its partners which leads to quicker, more efficient (read: cheaper) solutions. Preservation isn’t solved in a day–it is solved in many ways every day. The more people you have scanning the landscape for challenges and solutions the more effective and scalable your solution.
Lauren: There are many advantages to the consortium model for digital preservation. What advantages do you think individual institutions or smaller consortia might have in their approach to digital preservation?
Bradley and Chip: As we mentioned, the advantages to the “many mind” approach can have dramatic benefits. The ability of a group to identify an arising challenge, task a small group to investigate that challenge, and then bring that knowledge back to the collective has been proven repeatedly. Given the scale, complexity, and scope of digital preservation, doing this at any level is critical to moving us all forward in solving these issues.
Lauren: What do you see as the greatest challenge for digital preservation?
Bradley and Chip: The main challenge for preservation has always been the same: it is infrastructure and infrastructure is not sexy. If you are doing your job and doing it well, no one notices. People only notice when you fail. This fact is inculcated in our society. Witness all the home renovation shows. People don’t care about knob and tube wiring – until they have to replace it. No one wants to pay for that work, they would rather have that brushed nickel six burner stove that everyone will notice and love. That is the challenge of preservation – making the case for the cost of this endeavor is difficult because it is so resource intensive. However, the cost of failing is much higher. It is already likely that we will have a gap in our digital cultural heritage as we play catch up to operationalizing enterprise digital preservation. Let’s just hope it is not too late.
It's that time again. Let's wag the Long Tail of Islandora and have a look at some of the really great work being done out in the Islandora community and review some modules that might solve a problem you're having:Streaming Media Solution Pack
Created by UPEI's Rosie LeFaive and born from the Bowing Down Home project, this module allows you to create and manage Islandora objects representing externally hosted streaming resources, which can be catalogued and displayed via an Islandora instance. You can also store copy of the file as an Islandora object.EAD Solution Pack Islandora Context
From Simon Fraser University's Mark Jordan (Chairman of the Islandora Foundation!) Provides a set of Context "conditions" and "reactions" for Islandora objects. Think of this module as an "if-this-then-that" configurator for Islandora repositories.Islandora Custom Solr
A neat little tool from Jared Whiklo at the University of Manitoba, this module will replace Sparql queries with Solr queries where possible for speed improvements.Islandora RML
A great module from Frits Van Latum of Delft University, Islandora RML gets triples from xml datastreams e.g. MODS and stores these triples in RELS-EXT. Triples consist of the URI of the fedoraObject as subject, a generated predicate and a generated object.
I manage digital services and resources at a small academic library with minimal financial and human resources available. For almost a year, I served as solo librarian for fixing and optimizing the library website, library services platform, electronic resources, workflows, documentation, and other elements of technology management vital to back-end operations and front-end services. Coping with practical limitations and a vast array of responsibilities, I resorted to triage. In triage management, the primary consideration is return on investment (ROI) – how stakeholder benefits measure against time and resources expended to realize those benefits.
Condition Black: The technology must be replaced or phased out because it is dysfunctional and impossible to fix. Into this category fell our website, built with the clunky and unusable Microsoft SharePoint; our laptops running Windows XP and too old to upgrade to a more current operating system; and our technology lending service, for which we had no funds to upgrade the dated technologies on offer. Down the road we might write this last item into the budget or solicit donations from the community, but at the time, the patient was DOA.
Condition Blue: The technology is current, optimal for user needs, and can be left essentially to run itself while library technology managers focus on more urgent priorities. Into this category fell the recently upgraded hardware at one of our campus libraries, as well as LibCal, a study room booking system with faultless performance.
Condition Green: The situation requires monitoring but not immediate intervention, not until higher-order priorities have been addressed. This was the situation with OCLC WorldShare Management Services (WMS). This LSP offers only limited functionality—scandalously to my mind, subscribers still have to pull many reports via FTP. But the platform is cheap and handles the core functions of circulation, cataloging, and interlibrary loan perfectly. For us, WMS was low-priority.
Condition Yellow: The situation needs to be salvaged and the system sustained, but it is still not quite the top priority. In this category fell the OCLC knowledge base and WorldCat discovery layer, which in Hodges University’s instance experiences incessant link resolution issues and requires constant monitoring and frequent repair tickets to OCLC. A screwy discovery layer impacts users’ ability to access resources as well as creating a frustrating user experience. BUT I decided not to prioritize knowledge base optimization because the methodology was already in place for triaging the crisis. For years my colleagues had been steering students directly toward subject databases in lieu of WorldCat.
Condition Red: The system is in dire need of improvement – this is Priority 1. Into this category fell the library’s content management system, LibGuides. My first priority on taking over web services was to upgrade LibGuides to Version 2, which offers responsive design and superior features, and then to integrate the entire library website within this new-and-improved CMS. I would also argue that internal customer service falls into this category – staff must documentation, training, and other support to do their work well before they can exceed expectations for external customer service. These are the critical priorities.
A few additional points.
1. Library technologists must revisit traige placements periodically and reassess as needed. Movement is the goal – from conditions Red to Blue.
2. Library technologists must eschew using triage as a stopgap measure. Triage is vital to long-range planning in terms of budget allocation, project management, and other responsibilities. Triage is planning.
3. Where each priority is placed in a triage system is contingent on local needs and circumstances. There is no one-size-fits-all generalization.
How do you use triage at your library? Is it a useful approach?
A few months ago I was following a conversation on Twitter that for got me thinking about how much bit-for-bit duplication there was in our preservation repository and how much space that duplication amounted to.
I let this curiosity sit for a few months and finally pulled the data from the repository in order to get some answers.Getting the data
Each of the digital objects in our repository have a METS record that conforms to the UNTL-AIP-METS Profile registered with the Library of Congress. One of the features of this METS profile (like many others) is that these files make use of is the fileStruct section and for each file in a digital object, there exist the following pieces of informationField Example Value FileName ark:/67531/metadc419149 CHECKSUM bc95eea528fa4f87b77e04271ba5e2d8 CHECKSUMTYPE MD5 USE 0 MIMETYPE image/tiff CREATED 2014-11-17T22:58:37Z SIZE 60096742 FILENAME file://data/01_tif/2012.201.B0389.0516.TIF OWNERID urn:uuid:295e97ff-0679-4561-a60d-62def4e2e88a ADMID amd_00013 amd_00015 amd_00014 ID file_00005
By extracting this information for each file in each of the digital objects I would be able to get at the initial question I had about duplication at the file level and how much space it accounted for in the repository.Extracted Data
At the time of writing of this post the Coda Repository that acts as the preservation repository for the UNT Libraries Digital Collections contains 1.3 million digital objects that occupy 285TB of primary data. These 1.3 million digital objects consist of 151 million files that have fixity values in the repository.
The dataset that I extracted has 1,123,228 digital objects because it was extracted a few months ago. Another piece of information that is helpful to know is that the numbers that we report for “file managed by Coda (151 million mentioned above) include both the primary files ingested into the repository as well as metadata files added to the Archival Information Packages as they are ingested into the repository. The analysis in this post deals only with the primary data files deposited with the initial SIP and do not include the extra metadata files. This dataset contains information about 60,164,181 files in the repository.Analyzing the Data
Once I acquired the METS records from the Coda repository I wrote a very simple script to extract information from the File section of the METS records and format that data into a Tab separated dataset that I could use for subsequent analysis work. Because of the duplication of some of the data to each row to make processing easier, this resulted in a Tab separated file that is just over 9 GB in size (1.9 GB compressed) that contains the 60,164,181 rows, one for each file.
Here is a representation as a table for a few rows of data.METS File CHECKSUM CHECKSUMTYPE USE MIMETYPE CREATION SIZE FILENAME metadc419149.aip.mets.xml bc95eea528fa4f87b77e04271ba5e2d8 md5 0 image/tiff 2014-11-17T22:58:37Z 60096742 file://data/01_tif/2012.201.B0389.0516.TIF metadc419149.aip.mets.xml 980a81b95ed4f2cda97a82b1e4228b92 md5 0 text/plain 2014-11-17T22:58:37Z 557 file://data/02_json/2012.201.B0389.0516.json metadc419544.aip.mets.xml 0fba542ac5c02e1dc2cba9c7cc436221 md5 0 image/tiff 2014-11-17T23:20:57Z 51603206 file://data/01_tif/2012.201.B0391.0539.TIF metadc419544.aip.mets.xml 0420bff971b151442fa61b4eea9135dd md5 0 text/plain 2014-11-17T23:20:57Z 372 file://data/02_json/2012.201.B0391.0539.json metadc419034.aip.mets.xml df33c7e9d78177340e0661fb05848cc4 md5 0 image/tiff 2014-11-17T23:42:16Z 57983974 file://data/01_tif/2012.201.B0394.0493.TIF metadc419034.aip.mets.xml 334827a9c32ea591f8633406188c9283 md5 0 text/plain 2014-11-17T23:42:16Z 579 file://data/02_json/2012.201.B0394.0493.json metadc419479.aip.mets.xml 4c93737d6d8a44188b5cd656d36f1e3d md5 0 image/tiff 2014-11-17T23:01:15Z 51695974 file://data/01_tif/2012.201.B0389.0678.TIF metadc419479.aip.mets.xml bcba5d94f98bf48181e2159b30a0df4f md5 0 text/plain 2014-11-17T23:01:15Z 486 file://data/02_json/2012.201.B0389.0678.json metadc419495.aip.mets.xml e2f4d1d7d4cd851fea817879515b7437 md5 0 image/tiff 2014-11-17T22:30:10Z 55780430 file://data/01_tif/2012.201.B0387.0179.TIF metadc419495.aip.mets.xml 73f72045269c30ce3f5f73f2b60bf6d5 md5 0 text/plain 2014-11-17T22:30:10Z 499 file://data/02_json/2012.201.B0387.0179.json
My first step at this was to extract the column that stored the MD5 fixity value, sort that column and then find the number of the instances of each fixity value in the dataset. The command ends up looking like this:cut –f 2 mets_dataset.tsv | sort | uniq –c | sort –nr | head
This worked pretty will and resulted with the MD5 values that occurred the most. This represents the duplication at the file level in the repository.Count Fixity Value 72,906 68b329da9893e34099c7d8ad5cb9c940 29,602 d41d8cd98f00b204e9800998ecf8427e 3,363 3c80c3bf89652f466c5339b98856fa9f 2,447 45d36f6fae3461167ddef76ecf304035 2,441 388e2017ac36ad7fd20bc23249de5560 2,237 e1c06d85ae7b8b032bef47e42e4c08f9 2,183 6d5f66a48b5ccac59f35ab3939d539a3 1,905 bb7559712e45fa9872695168ee010043 1,859 81051bcc2cf1bedf378224b0a93e2877 1,706 eeb3211246927547a4f8b50a76b31864
There are a few things to note here, first because of the way that we version items in the repository, there is going to be some duplication because of our versioning strategy. If you are interested in understanding the versioning process we use for our system and the overhead that occurs because of this strategy you can take a look at the whitepaper we wrote a in 2014 about the subject.
Phillips, Mark Edward & Ko, Lauren. Understanding Repository Growth at the University of North Texas: A Case Study. UNT Digital Library. http://digital.library.unt.edu/ark:/67531/metadc306052/. Accessed September 26, 2015.
To get a better idea of the kinds of files that are duplicated in the repository, the following table shows fields for the top five more repeated files.Count MD5 Bytes Mimetype Common File Extension 72,906 68b329da9893e34099c7d8ad5cb9c940 1 text/plain txt 29,602 d41d8cd98f00b204e9800998ecf8427e 0 application/x-empty txt 3,363 3c80c3bf89652f466c5339b98856fa9f 20 text/plain txt 2,447 45d36f6fae3461167ddef76ecf304035 195 application/xml xml 2,441 388e2017ac36ad7fd20bc23249de5560 21 text/plain txt 2,237 e1c06d85ae7b8b032bef47e42e4c08f9 2 text/plain txt 2,183 6d5f66a48b5ccac59f35ab3939d539a3 3 text/plain txt 1,905 bb7559712e45fa9872695168ee010043 61,192 image/jpeg jpg 1,859 81051bcc2cf1bedf378224b0a93e2877 2 text/plain txt 1,706 eeb3211246927547a4f8b50a76b31864 200 application/xml xml
You can see that most of the files that are duplicated are very small in size, 0, 1, 2, and three bytes. The largest were jpegs that were represented 1,905 times in the dataset and each were 61,192 byes. The makeup of files for these top examples are txt, xml and jpg.
Overall we see that for the 60,164,181 rows in the dataset, there are 59,177,155 unique md5 hashes. This means that 98% of the files in the repository are in fact unique. Of the 987,026 rows in the dataset that are duplicates of other fixity values, there are 666,259 unique md5 hashes.
So now we know that there is some duplication in the repository at the file level. Next I wanted to know what kind of effect does this have on the storage allocated. I took care of this by taking the 666,259 values that contained duplicates and went back to pull the number of bytes for those files. I calculated the storage overhead for each of these fixity values as bytes x instances – 1 to remove the size of the initial storage, thus showing only the duplication overhead.
Here is the table for the ten most duplicated files to show that calculation.Count MD5 Bytes per File Duplicate File Overhead (Bytes) 72,906 68b329da9893e34099c7d8ad5cb9c940 1 72,905 29,602 d41d8cd98f00b204e9800998ecf8427e 0 0 3,363 3c80c3bf89652f466c5339b98856fa9f 20 67,240 2,447 45d36f6fae3461167ddef76ecf304035 195 476,970 2,441 388e2017ac36ad7fd20bc23249de5560 21 51,240 2,237 e1c06d85ae7b8b032bef47e42e4c08f9 2 4,472 2,183 6d5f66a48b5ccac59f35ab3939d539a3 3 6,546 1,905 bb7559712e45fa9872695168ee010043 61,192 116,509,568 1,859 81051bcc2cf1bedf378224b0a93e2877 2 3,716 1,706 eeb3211246927547a4f8b50a76b31864 200 341,000
After taking the overhead for each row of duplicates, I ended up with 2,746,536,537,700 bytes or 2.75 TB of overhead because of file duplication in the Coda repository.Conclusion
I don’t think there is much surprise that there is going to be duplication of files in a repository. The most common file we have that is duplicated is a txt file with just one byte.
What I will do with this information I don’t really know. I think that the overall duplication across digital objects is a feature and not a bug. I like the idea of more redundancy when reasonable. It should be noted that this redundancy is often over files that from what I can tell carry very little information (i.e. tiff images of blank pages, or txt files with 0, 1, or 2 bytes of data)
I do know that this kind of data can be helpful when talking with vendors that provide integrated “de-duplication services” into their storage arrays, though that de-duplication is often at a smaller unit that the entire file. It might be interesting to take a stab at seeing what the effect of different de-duplication methodologies and algorithms on a large collection of digital content might be, so if anyone has some interest and algorithms I’d be game on giving it a try.
That’s all for this post, but I have a feeling I might be dusting off this dataset in the future to take a look at some other information such as filesizes and mimetype information that we have in our repository.
Organizational culture is a very real and a very powerful force in every organization. I have worked in a variety of different organizations and each had had its own rituals, norms, values, and assumptions that influenced the way people worked together, shared information, and got things done. Culture is this weird, powerful, unspoken thing that both impacts and is impacted by the people within it. While organizational culture can change over time, it is usually because of major staff turnovers as culture is notoriously difficult to change.
Organizational culture can be positive and healthy or seriously maladaptive, but I think most cultures have a little from column A and a little from column B. Healthy cultures incorporate and adapt to new people and ideas. Maladaptive cultures are notoriously difficult for newcomers to feel welcome in and tend to force them to conform or leave. It’s in organizations with maladptive cultures where I think the issue of cultural fit can be most problematic.
I know what it feels like to work at a place where you don’t fit. You feel like a second class citizen in just about every interaction. You go from participating in meetings to avoiding speaking at all costs. You feel like your perspective is not taken seriously and the projects you’re involved in are marginalized. There were a few of us at that job to whom it was made painfully clear that we were the odd men out. These were not slackers who did a crappy job, but folks who were passionate about and devoted to their work. Not fitting was torture for my psyche and made me question whether there was something inherently wrong with me.
Based on my experience, you might think I’d be suggesting that people carefully screen their applicants for “fit.” That couldn’t be further from the truth. Screening for cultural fit tends to lead to monocultures that don’t embrace diversity of any kind — racial, gender, perspective, experience, etc. Monocultures are toxic and have difficulty adapting to change. Hiring people in your own image leads to an organization that can’t see clearly beyond its navel. As expressed in the article “Is Cultural Fit a Qualification for Hiring or a Disguise for Bias?” in Knowledge @ Wharton —
Diversity in the workplace has long been valued as a way to introduce new ideas, but researchers have found other reasons for cultivating heterogeneity. Information was processed more carefully in heterogeneous groups than homogenous groups, according to “Is the Pain Worth the Gain? The Advantages and Liabilities of Agreeing With Socially Distinct Newcomers,” by Katherine W. Phillips, Katie A. Liljenquist and Margaret A. Neale, published in Personality and Social Psychology Bulletin. Social awkwardness creates tension, and this is beneficial, the study found. “The mere presence of socially distinct newcomers and the social concerns their presence stimulates among old-timers motivates behavior that can convert affective pains into cognitive gains” — or, in other words, better group problem solving.
So perhaps bringing people in who aren’t such a perfect fit, and maybe even challenge the current structure a bit, is very good for the organization. Any time I have worked with someone who has a very different perspective and lived experience than I have, I have learned so much. I remember when we hired an instructional designer at the PSU Library who came from outside of libraries, I found that it was much more difficult to get on the same page, but the ideas he brought to our work more than compensated for any difficulties I had as a manager. He allowed us to see beyond our myopic librarian view. I think hiring people with different cultural, racial, gender, socioeconomic, etc. backgrounds provide similar benefits to the organization.
Whether it is conscious or unconscious, hiring people who are “like you” is bias, and it tends to result in organizations that are less diverse; not only in terms of perspectives, but in terms of race/gender/religion/etc. When you’re on a hiring committee, how often do you find yourself judging candidates based on qualities you value in a colleague rather than the stated qualifications? It probably happens more than we’d all like to admit.
It’s easy to fall into the trap of considering fit without even thinking about it. I remember when I was on my first hiring committee, once we’d weeded out those candidates who didn’t meet the minimum qualifications, I felt myself basing my evaluation of the rest on whether or not they had the traits I value in a colleague. The person we hired ended up becoming a good friend and while he did a fantastic job in his role, part of me wishes I had put my personal biases aside when making that decision. I may still have championed him, but I would have done it for the right reasons.
One thing I feel strongly that we should hire for is shared values. It is critical that the person one hires doesn’t hold values antithetical to the work of the organization. I don’t care anymore if a candidate seems like they could be a friend, but I do care if they evidence and support the goals and values of my library and community college. Just having the required qualifications isn’t enough; being a community college librarian isn’t for everyone.
Unfortunately, in reading this New York Times article, “Guess Who Doesn’t Fit In at Work”, and from my own experiences, people are judged by much more than shared values, which unintentionally biases people doing hiring against folks who have different lived experiences and interests. This is discrimination, plain and simple. When I was looking for an image to use for this post, I found this blog post about how people doing hiring should look at candidates’ social media profiles to scan for cultural fit. That we should look at what restaurants candidates visit and what things they favorite on Twitter frankly scares the crap out of me. Because in doing that, you’re saying that people with different views or outside-of-work activities are not welcome in your organization.
What we need is to embrace diversity in its many forms and value contributions from everyone, but that is easier said than done. I like the suggestions theNew York Times article has regarding hiring for fit without bias:
First, communicate a clear and consistent idea of what the organization’s culture is (and is not) to potential employees. Second, make sure the definition of cultural fit is closely aligned with business goals. Ideally, fit should be based on data-driven analysis of what types of values, traits and behaviors actually predict on-the-job success. Third, create formal procedures like checklists for measuring fit, so that assessment is not left up to the eyes (and extracurriculars) of the beholder.
Finally, consider putting concrete limits on how much fit can sway hiring. Many organizations tell interviewers what to look for but provide little guidance about how to weigh these different qualities. Left to their own devices, interviewers often define merit in their own image.
Clearly, the more structured the process and the less leeway there is for making decisions based on aspects of a candidates personality, interests and background, the less likely the bias.
And what of those cultures that may hire for diversity but then treat people with different ideas and experiences like pariahs? Unfortunately, I get the sense that changing culture is nearly impossible without a decent amount of staff turnover. I witnessed a culture shift in my first library job, but it was because my boss had hired over half the staff over a period of about six years and was able to cultivate the right mix of values and diverse characters. I’ve also seen new administrators come into organizations with really strong, entrenched cultures and fail spectacularly at creating any kind of culture change. Fixing the problem of bias in hiring is only half the problem. We also need to embrace diversity in our organizations so that people of color or people with divergent ideas feel valued by the organization.
I feel very lucky that I work at a library that values diversity and diverse perspectives. We have a group of librarians who have different passions, different viewpoints, and very different personalities. Yet I don’t see anyone marginalizing anyone else. I don’t see anyone whose opinions are taken less seriously than anyone else’s. I don’t see people playing favorites or being cliquish. What I see is an diverse group of people who value each other’s opinions and also value consensus-building. We don’t always come to complete agreement, but we accept and respect the way things go. We have a functional adhocracy, where we feel empowered to act and where we alternate taking and sharing leadership roles organically. I feel like everyone is valued for what they bring to the group and everyone brings something very different. Even after one year, it still feels like heaven to me and it’s certainly not because everyone is like me.
We have a long way to go in building diverse libraries, but becoming keenly aware of how our unconscious preferences in hiring and our organizational cultures can help or harm diversity is a good step in the right direction.
Lately I’ve been looking back through the past of the Digital Library Production Service (DLPS) -- in fact, all the way back to the time before DLPS, when we were the Humanities Text Initiative -- to see what, if anything, we’ve learned that will help us as we move forward into a world of Hydra, ArchivesSpace, and collaborative development of repository and digital resource creation tools.
DuraSpace News: Telling DSpace Stories at the International Livestock Research Institute (ILRI) with Alan Orth
“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of the International Livestock Research Institute (ILRI) or the DSpace Project.