You are here

Feed aggregator

FOSS4Lib Recent Releases: Senayan Library Management System (SLiMS) - 8 ( Akasia )

planet code4lib - Fri, 2016-01-08 10:16
Package: Senayan Library Management System (SLiMS)Release Date: Tuesday, December 1, 2015

Last updated January 8, 2016. Created by gurujim on January 8, 2016.
Log in to edit this page.

Additional features include:
* New OPAC template, new Admin template
* System environment display for troubleshooting
* Partial RDA implementation ( transitional towards full implementation )
* Inbuilt staff-patron chat system
* Generation of citations from bibliographic entries, in a variety of common formats, using a template model which can be expanded to include other styles

Karen G. Schneider: Speaking about writing: I nominate me

planet code4lib - Fri, 2016-01-08 04:27

I have been immersed in a wonderful ordinariness: completing my first full year as dean, moving my doctoral work toward the proposal-almost-ready stage, and observing the calendar in my personal life. In November I pulled Piney III, our Christmas tree, out of his box in the garage, and he is staying up until next weekend. We missed him last year, so he gets to spend a little more time with us this season.

Meanwhile, I spent a few spare moments this week trying to wrap my head around a LibraryLand kerfuffle. An article was published in American Libraries that according to the authors was edited after the fact to include comments favorable to a vendor. I heard back-alley comments that this wasn’t the full story and that the authors hadn’t followed the scope, which had directed them to include this perspective, and therefore it was really their fault for not following direction and complaining, etc. And on the social networks, everyone got their knickers in a twist and then, as happens, moved on. But as someone with a long publishing history, this has lingered with me (and not only because someone had to mansplain to me, have you read the article? Yes, I had read the article…).

Here’s my offer. I have been fairly low-key in our profession for a couple of years, while I deal with a huge new job, a doctoral program, family medical crises, household moves, and so on. My term on ALA Council ended last summer, and while I do plan to get involved in ALA governance again, it’s not immediate.

But once upon a time, I made a great pitch to American Libraries. I said, you should have a column about the Internet, and I should write it. I had to walk around the block four times before I screwed up enough courage to go into 50 East Huron and make that pitch (and I felt as if I had an avocado in my throat the whole time), but thus the Internet Librarian column was born, and lo it continues on to this day, two decades later.

My pitch these days is that American Libraries steal a page from the New York Times and appoint a Public Editor or if you prefer, Omsbudman (Omsbudwimmin?), and that person should be me. Why me? Because I have a strong appreciation for all aspects of publishing. Because I’ve been an author and a vendor. Because I may be an iconoclast, but most people see me as fair. Because a situation like this needs adjudication before it becomes fodder for Twitter or Facebook. Because at times articles might even need discussion when no one is discussing them. Because I came up with the idea, and admit it, it’s a really good one.

A long time ago, when I was active in Democratic Party politics in Manhattan, a politician in NY made himself locally famous for saying of another pol, “He is not for sale… but he can be rented.” One thing about me, despite two books, over 100 articles, being a Pushcart nominee, being anthologized, etc.: I am not for sale or for rent. That has at times limited my ascendancy in certain circles, but it makes me perfect for this role.

If you’re on the board of American Libraries, or you know someone who is, give this some thought. We all have a place in the universe. I feel this would be perfect for me, and a boon for the profession.

Bookmark to:

Evergreen ILS: Join us in Boston!

planet code4lib - Thu, 2016-01-07 22:06

Are you going to ALA Midwinter in Boston this weekend? If so, we invite Evergreen users, enthusiasts, or those who are just interested in learning more about this great open-source library system to join us for a meetup this Saturday at Stan Getz Library, Berklee College of Music. The meetup is scheduled for 4:30 to 6 p.m. Saturday, January 9.

Here are some of the activities you can look forward to at the meetup

  • We’ll look at some of the new features that have been added to Evergreen in the most-recent release.
  • 2.10 Release Manager Galen Charlton will talk about plans for the March Evergreen release.
  • We’ll share some community highlights from the past year.
  • We’ll talk about any other Evergreen or open-source issues and questions that are on people’s minds.

There are two ways to get to the Stan Getz Library:

  • Best option if coming from the convention center: Take the ALA Shuttle to the Sheraton Boston stop (Route 5). The bus will drop you off on Dalton St. Walk towards Belvidere Street, where you will take a right. Take a right when you reach Massachusetts Ave. The library is located on the right at 142 Massachusetts Avenue.
  • Public transportation: Take the MBTA green line (B, C, or D) to the Hynes Convention Center stop. As you leave the subway station, take a left on Massachusetts Avenue. The library is located on the left at 142 Massachusetts Avenue.

We also have a Google map showing walking directions from both locations.

Since the school is still on break, the library is closed, but Yamil Suarez will be available to escort everyone to the meeting room. If nobody is at the door when you arrive, call Yamil at 617-748-2617.

Feel free to send any questions along to Kathy Lussier at klussier@masslnc.org.

We look forward to seeing you at Midwinter!

 

LibX: Signed LibX Add-On Pushed for Firefox

planet code4lib - Thu, 2016-01-07 21:08

We just pushed a signed LibX add-on for Firefox.

If you want to pull in the update immediately, open the Firefox browser, select Add-Ons, then Check for Updates. It will ask you to restart the browser.

Please let us know if you see any problems.

Thank you for your patience,
Annette & Godmar

LibX: LibX, Firefox, and Signatures

planet code4lib - Thu, 2016-01-07 20:59

LibX is currently working in Google Chrome.

LibX is currently disabled in Firefox version 43.

We have edited LibX code so that it has passed Mozilla’s automatic verification. We can now upload code, have it checked, get it signed, then download it. We are still working on a bug fix and the creation of an update.rdf to push to you, our users.

We will post updates on this site.

Annette & Godmar

LibraryThing (Thingology): ALAMW 2016 in Boston (and Free Passes)!

planet code4lib - Thu, 2016-01-07 20:20

Abby and KJ will be at ALA Midwinter in Boston this weekend, showing off LibraryThing for Libraries. Since the conference is so close to LibraryThing headquarters, chances are good that a few other LT staff members may appear, as well!

Visit Us. Stop by booth #1717 to meet Abby & KJ (and potential mystery guests!), get a demo, and learn about all the new and fun things we’re up to with LibraryThing for Libraries, TinyCat, and LibraryThing.

Get in Free. Are you in the Boston area and want to go to ALAMW? We have free exhibit only passes. Click here to sign up and get one! Note: It will get you just into the exhibit hall, not the conference sessions themselves.

Open Knowledge Foundation: Open Data goes local in Nepal: Findings of Nepal Open Data Index 2015

planet code4lib - Thu, 2016-01-07 19:02

Nepal Open Data Index 2015 – White Paper

The Local Open Data Index Nepal 2015 is a crowdsourced survey that examines the availability of Open Data at city level. The survey was conducted for the second time in Nepal by Open Knowledge Nepal. See our previous post that announced the local index here.

Background

For the decentralization of power from central authority to district, village and municipality levels, Nepal government use Local Self Governance Regulation, 2056 (1999). where Village Development Committee (VDC) and District Development Committees (DDC) both act as planners and program implementing bodies of the government. Where municipalities are also doing the same kinds of tasks but at smaller scale, it has created difficulties in understanding layers of governing units. This overlapping of powers and roles has also been found in the government data space; average citizens still don’t know which local governance units are responsible for the data they need. This highlights the importance of a survey around open data and publishing.

Global surveys such as the Global Open Data Index and Open Data Barometer taught us that availability of open data and participatory governance in Nepal is not reaching full potential in terms of everything from citizen readiness, to data release and data infrastructure in Nepal. Using World Wide Web Foundation terminology, in Nepal we are operating in a “capacity constrained” environment.

Furthermore, in Nepal citizen participation and using open data often makes more sense and is more powerful at local level as it is local governments that handle all national and international project for citizens and generates data from it. However, open data is still a new concept in Nepal and the central government has only just started releasing data, with data even less available at the local level.

Why do we need a Local Open Data Index in Nepal?

The Local Open Data Index is intended to help to put the discrepancies of local level on the map (literally!). Peter Drucker said, “What gets measured gets managed.” Mapping the gaps will aid strategic planning and help create a framework for action and citizen engagement at all levels.

For local governments to adopt openness, they need to understand the what, why and how of opening up their data. Government need to learn why making data open is not only a means to make them accountable (or worse – alarmed), but also a tool to help them become more efficient and effective in their work. Governments need to understand that opening data is only the beginning of participatory governance, and for them to participate they need well defined and easy-to-adopt mechanisms.

The Local Open Data Index for Nepal will help in assessing the baseline of availability and nature of open data in Nepali cities. This will help to identify gaps, and plan strategic actions to make maximum impact.

Summary

A survey was done in 10 major cities of Nepal by open data enthusiasts and volunteers inside and outside of Open Knowledge Nepal. The cities chosen were Kathmandu, Bhaktapur, Butwal, Chitwan, Dolakha, Dhading, Hetauda, Kavre, Lalitpur, and Pokhara. The datasets that we survey were Annual Budget, Procurement Contracts, Crime Statistics, Business Permits, Traffic Accident, and Air Quality.

Unsurprisingly, the largest municipality and the capital of Nepal – Kathmandu – ranked highest, followed by Pokhara and Chitwan.

Different datasets were available in all 10 cities in digital format on the government websites. All available datasets are free to access. However, none of the datasets were machine readable, nor were any datasets licensed with any of the standard open data licences.

Datasets regarding annual budgets and procurement contracts are easily available digitally, although not open in standard sense of the term. Datasets for air quality are virtually nonexistent. It is not clear whether data is available in categories such as Traffic Accidents or Business Permits.

The central government of Nepal has been slowly adopting open data as a policy, and has shown commitment through projects such as the Aid Management Platform, Election Data, and interactive visualization available in National Planning Commission website. The enthusiasm is growing, but, has not yet spread to local governing authorities.

Key Findings
  1. None of the data sets are completely open. All of them lack machine readability and standard licensing.
  2. Annual budget data is publicly available in almost all cities surveyed. Air quality data is not available in any city. Other datasets fall somewhere in between.
  3. The enthusiasm and progress shown by central government in terms of open data projects has yet to catch on at the local level.

Read more about it in the official white paper.

Library of Congress: The Signal: APIs: How Machines Share and Expose Digital Collections

planet code4lib - Thu, 2016-01-07 19:01

By DLR German Aerospace Center (Zwei Roboterfreunde / Two robot friends) [CC BY 2.0], via Wikimedia Commons.

Kim Milai, a retired school teacher, was searching on ancestry.com for information about her great grandfather, Amohamed Milai, when her browser turned up something she had not expected: a page from the Library of Congress’s Chronicling America site displaying a scan of the Harrisburg Telegraph newspaper from March 13, 1919. On that page was a story with the headline, “Prof. Amohamed Milai to Speak at Second Baptist.” The article was indeed about her great grandfather, who was an enigmatic figure within her family, but…”Professor!?,” Milai said. “He was not a professor. He exaggerated.” Whether it was the truth or an exaggeration, it was, after all, a rare bit of documentation about him, so Milai printed it out and got to add another colorful piece to the mosaic of her family history. But she might never have found that piece if it wasn’t for ancestry.com’s access to Chronicling America’s collections via an API.

Application Programming Interfaces (APIs) are not new. API-based interactions are part of the backdrop of modern life. For example, your browser, an application program, interfaces  with web servers. Another example is when an ATM screen enables you to interact with a financial system. When you search online for a flight, the experience involves multiple API relationships: your travel site or app communicates with individual airlines sites which, in turn, query their systems and pass their schedules and prices back to your travel site or app. When you book the flight, your credit card system gets involved. But all you see during the process are a few screens, while in the background, at each point of machine-to-machine interaction, servers rapidly communicate with each other, across their boundaries, via APIs. But what exactly are they?

Chris Adams, an information technology specialist at the Library of Congress, explained to me that APIs can be considered a protocol – or a set of rules governing the format of messages exchanged between applications. This allows either side of the exchange to change without affecting other parties as long as they continue to follow the same rules.

World Digital Library, Library of Congress.

Adams created the APIs for the World Digital Library, an international project between approximately 190 libraries, archives and museums. The World Digital Library’s APIs describe what to expect from the API and explain how to build tools to access the WDL’s collections. Adams said, “The APIs declare that we publish all of our data in a certain format, at a certain location and ‘here’s how you can interact with it.’ ” Adams also said that an institution’s digital-collections systems can and should evolve over time but their APIs should remain stable in order to provide  reliable access to the underlying data.  This allows outside users the stability needed to build tools which use those APIs and frequently saves time even within the same organization as, for example, the front-end or user-visible portion of a website can be improved rapidly without the need to touch the complex back-end application running on the servers.

HathiTrust Digital Library. Hathitrust.org.

So, for us consumers, the experience of booking a flight or buying a book online just seems like the way things ought to be. And libraries, museums, government agencies and other institutions are coming around to “the way things ought to be” and beginning to implement APIs to share their digital collections in ways that consumers have come to expect.

Another example of implementation, similar to the WDL’s, is how HathiTrust uses APIs among shared collections. For example, a search of HathiTrust for the term “Civil War” queries the collections of all of their 110 or so consortium partners (the Library of Congress is among them) and the search results include a few million items, which you can filter by Media, Language, Country and a variety of other filters. Ultimately it may not matter to you which institutions you got your items from; what matters is that you got an abundance of good results for your search. To many online researchers, it’s the stuff that matters, not so much which institution hosts the collection.

That doesn’t mean that the online collaboration of cultural institutions might diminish the eminence of any individual institution. Each object in the search results — of HathiTrust, WDL and similar resources — is clearly tagged with metadata and information about where the original material object resides, and so the importance of each institution’s collections becomes more widely publicized. APIs help cultural institutions increase their value — and their web traffic — by exposing more of their collections and sharing more of their content with the world.

The increasing use of APIs does not mean that institutions who want them are required to write code for them. David Brunton, a supervisory IT specialist at the Library of Congress, said that most people are using time-tested APIs instead of writing their own, and, as a result, standardized APIs are emerging. Brunton said, “Other people have already written the code, so it’s less work to reuse it. And most people don’t have infinite programming resources to throw at something.”

Example 1. Adding the Library of Congress search engine to Firefox.

Brunton cites OpenSearch as an example of a widely used, standardized API. OpenSearch helps search engines and clients communicate, by means of a common set of formats, to perform search requests and publish results for syndication and aggregation. He gave an example of how to view it in action by adding a Library of Congress search engine to the Firefox browser.

“In Firefox, go to www.loc.gov and look in the little search box at the top of the browser,” Brunton said. “A green plus sign (+) pops up next to ‘Search.’ If you click on the little green Plus sign, one of the things you see in the menu is ‘Add the Library of Congress search.’ [Example 1.] When you click on that, the Library’s search engine gets added into your browser and you can search the Library’s site from a non-Library page.”

As institutions open up more and more of their online digital collections, Chris Adams sees great potential in using another API, the International Image Interoperability Framework , as a research tool. IIIF enables users to, among other things, compare and annotate side-by-side digital objects from participating institutions without the need for each institution to run the same applications or specifically enable each tool used to view the items.  Adams points to an example of how it works by means of the Mirador image viewer. Here is a demonstration:

  1. Go to http://iiif.github.io/mirador/ and, at the top right of the page, click “Demo.” The subsequent page, once it loads, should display two graphics side by side – “Self-Portrait Dedicated to Paul Gauguin” in the left window and “Buddhist Triad: Amitabha Buddha Seated” in the right window. [Example 2.]

    Example 2. Mirador image viewer demo.

  2. Click on the thumbnails at the bottom of each window to change the graphic in the main windows.
  3. In the left window, select the grid symbol in the upper left corner and, in the drop down menu, select “New Object.” [Example 3.]

    Example 3. Select New Object.

  4. The subsequent page should display thumbnails of sample objects from different collections at Harvard, Yale, Stanford, BnF, the National Library of Wales and e-codices. [Example 4.]

    Example 4. Thumbnails from collections.

  5. Double-click a new object and it will appear in left image viewer window.
  6. Repeat the process for the right viewer window.

To see how it could work with the WDL collections:

  1. Go to http://iiif.github.io/mirador/ and click “Demo” at the top right of the page. The subsequent page will display the page with the two graphics.
  2. Open a separate browser window or tab.
  3. Open “The Sanmai-bashi Bridges in Ueno.”
  4. Scroll to the bottom of the page and copy the link displayed under “IIIF Manifest,” The link URL is http://www.wdl.org/en/item/11849/manifest
  5. Go back to the Mirador graphics page, to the left window, select the grid symbol and in the drop down menus select “New Object.”
  6. In the subsequent page, in the field that says “Add new object from URL…” paste the IIIF Manifest URL. [Example 5.]

    Example 5. Add new object from URL…”

  7. Click “enter/return” on your computer keyboard. “The Sanmai-bashi Bridges in Ueno” should appear at the top of the list of collections. Double-click one of the three thumbnails to add it to the left graphics viewer window.
  8. For the right window in the graphics viewer page use another sample from WDL, “The Old People Mill,” and copy its IIIF Manifest URL from the bottom of the page (http://www.wdl.org/en/item/11628/manifest).
  9. Return to the graphics viewer page, to the right window, select the grid symbol and in the drop down menus select “New Object.”
  10. In the subsequent page, in the field that says “Add new object from URL…,” paste the IIIF Manifest URL and click the “enter/return” key. “The Old People Mill” should appear at the top of the list of collections. Double-click to add it to the right graphics viewer window.

This process can be repeated using any tool which supports IIIF, such as the Universal Viewer, and new tools can be built by anyone without needing to learn a separate convention for each of the many digital libraries in the world which support IIIF.

Adams said that implementing an API encourages good software design and data management practices. “The process of developing an API can encourage you to better design your own site,” Adams said. “It forces you to think about how you would split responsibilities.” As programmers rush to meet deadlines, they often face the temptation of solving a problem in the simplest way possible at the expense of future flexibility; an API provides a natural point to reconsider those decisions. This encourages code which is easier to develop and test, and makes it cheaper to expand server capacity as the collections grow and user traffic increases.

Meanwhile, the APIs themselves should remain unchanged, clarifying expectations on both sides, essentially declaring, “I will do this. You must do that. And then it will work.”

APIs enable a website like the HathiTrust, Digital Public Library of America or Europeana to display a vast collection of digital objects without having to host them all. APIs enable a website like Chronicling America or the World Digital Library to open up its collections to automated access by anyone. In short, APIs enable digital collections to become part of a collective, networked system where they can be enjoyed — and used — by a vast international audience of patrons.

“Offering an API allows other people to reuse your content in ways that you didn’t anticipate or couldn’t afford to do yourself,” said Adams. “That’s what I would like for the library world, those things that let other people re-use your data in ways you didn’t even think about.”

Islandora: Islandora CLAW Community Sprint 003: January 18 - 29

planet code4lib - Thu, 2016-01-07 18:44

The Islandora community is kicking off the new year with our third volunteer sprint on the Islandora CLAW project. Continuing with our plan for monthly sprints, this third go around will continue some of the tickets from the second sprint, put a new focus on developing Collection service in PHP, and put more work into PCDM. To quote CLAW Committer Jared Whiklo, we shall PCDMize the paradigm.

This sprint will be developer focussed, but the team is always happy to help new contributors get up to speed if you want to take part in the project. If you have any questions about participating in the sprint, please do not hesitate to contact CLAW Project Director, Nick Ruest. A sign up sheet for the sprint is available here, and the sprint will be coordinated via a few Skype meetings and a lot of hanging around on IRC in the #islandora channel on freenode.

Villanova Library Technology Blog: Foto Friday: Reflection

planet code4lib - Thu, 2016-01-07 16:58

“Character is like a tree and reputation like a shadow.
The shadow is what we think of it; the tree is the real thing.”
— Abraham Lincoln

Photo and quote contributed by Susan Ottignon, research support librarian: languages and literatures team.


Like11 People Like This Post

Villanova Library Technology Blog: A New Face in Access Services

planet code4lib - Thu, 2016-01-07 14:33

Cordesia (Dee-Dee) Pope recently joined Falvey’s staff as a temporary Access Services specialist reporting to Luisa Cywinski, Access Services team leader. Pope described her duties as “providing superb assistance to Falvey Memorial Library’s patrons.”

Pope, a native of Philadelphia, attended the PJA School where she earned an associate’s degree in paralegal studies and business administration. She has approximately 10 years of experience as a paralegal.

When asked about hobbies and interests she says, “I enjoy spending time with my two children, reading books of every genre, watching movies and learning new things.”


Like11 People Like This Post

LITA: A Linked Data Journey: Interview with Julie Hardesty

planet code4lib - Thu, 2016-01-07 14:00

Image Courtesy of Marcin Wichary under a CC BY 2.0 license.

Introduction

This is part four of my Linked Data Series. You can find the previous posts in my author feed. I hope everyone had a great holiday season. Are you ready for some more Linked Data goodness? Last semester I had the pleasure of interviewing Julie Hardesty, metadata extraordinaire (and analyst) at Indiana University, about Hydra, the Hydra Metadata Interest Group, and Linked Data. Below is a bio and a transcript of the interview.

Bio:

Julie Hardesty is the Metadata Analyst at Indiana University Libraries. She manages metadata creation and use for digital library services and projects. She is reachable at jlhardes@iu.edu.

The Interview

Can you tell us a little about the Hydra platform?

Sure and thanks for inviting me to answer questions for the LITA Blog about Hydra and Linked Data! Hydra is a technology stack that involves several pieces of software – a Blacklight search interface with a Ruby on Rails framework and Apache Solr index working on top of the Fedora Commons digital repository system. Hydra is also referred to when talking about the open source community that works to develop this software into different packages (called “Hydra Heads”) that can be used for management, search, and discovery of different types of digital objects. Examples of Hydra Heads that have come out of the Hydra Project so far include Avalon Media System for time-based media and Sufia for institutional repository-style collections.

What is the Hydra Metadata Interest Group and your current role in the group?

The Hydra Metadata Interest Group is a group within the Hydra Project that is aiming to provide metadata recommendations and best practices for Hydra Heads and Hydra implementations so that every place implementing Hydra can do things the same way using the same ontologies and working with similar base properties for defining and describing digital objects. I am the new facilitator for the group and try to keep the different working groups focused on deliverables and responding to the needs of the Hydra developer community. Previous to me, Karen Estlund from Penn State University served as facilitator. She was instrumental in organizing this group and the working groups that produced the recommendations we have so far for technical metadata and rights metadata. In the near-ish future, I am hoping we’ll see a recommendation for baseline descriptive metadata and a recommendation for referring to segments within a digitized file, regardless of format.

What is the group’s charge and/or purpose? What does the group hope to achieve?

The Hydra Metadata Interest Group is interested in working together on base metadata recommendations, as a possible next step of the successful community data modeling, Portland Common Data Model. The larger goals of the Metadata Interest Group are to identify models that may help Hydra newcomers and further interoperability among Hydra projects. The scope of this group will concentrate primarily on using Fedora 4. The group is ambitiously interested in best practices and helping with technical, structural, descriptive, and rights metadata, as well as Linked Data Platform (LDP) implementation issues.

The hope is to make recommendations for technical, rights, descriptive, and structural metadata such that the Hydra software developed by the community uses these best practices as a guide for different Hydra Heads and their implementations.

Can you speak about how Hydra currently leverages linked data technologies?

This is where keeping pace with the work happening in the open source community is critical and sometimes difficult to do if you are not an active developer. What I understand is that Fedora 4 implements the W3C’s Linked Data Platform specification and uses the Portland Common Data Model (PCDM) for structuring digital objects and relationships between them (examples include items in a collection, pages in a book, tracks on a CD). This means there are RDF statements that are completely made of URIs (subject, predicate, and object) that describe how digital objects relate to each other (things like objects that contain other objects; objects that are members of other objects; objects ordered in a particular way within other objects). This is Linked Data, although at this point I think I see it as more internal Linked Data. The latest development work from the Hydra community is using those relationships through the external triple store to send commands to Fedora for managing digital objects through a Hydra interface. There is an FAQ on Hydra and the Portland Common Data Model that is being kept current with these efforts. One outcome would be digital objects that can be shared at least between Hydra applications.

For descriptive metadata, my understanding is that Hydra is not quite leveraging Linked Data… yet. If URIs are used in RDF statements that are stored in Fedora, Hydra software is currently still working through the issue of translating that URI to show the appropriate label in the end user interface, unless that label is also stored within the triple store. That is actually a focus of one of the metadata working groups, the Applied Linked Data Working Group.

What are some future, anticipated capabilities regarding Hydra and linked data?

That capability I was just referring to is one thing I think everyone hopes happens soon. Once URIs can be stored for all parts of a statement, such as “this photograph has creator Charles W. Cushman,” and Charles W. Cushman only needs to be represented in the Fedora triple store as a URI but can show in the Hydra end-user interface as “Charles W. Cushman” – that might spawn some unicorns and rainbows.

Another major effort in the works is implementing PCDM in Hydra. Implementation work is happening right now on the Sufia Hydra Head with a base implementation called Curation Concerns being incorporated into the main Hydra software stack as its own Ruby gem. This involves Fedora 4’s understanding of PCDM classes and properties on objects (and implementing Linked Data Platform and ordering ontologies in addition to the new PCDM ontology). Hydra then has to offer interfaces so that digital objects can be organized and managed in relation to each other using this new data model. It’s pretty incredible to see an open source community working through all of these complicated issues and creating new possibilities for digital object management.

What challenges has the Hydra Metadata Interest Group faced concerning linked data?

We have an interest in making use of Linked Data principles as much as possible since that makes our digital collections that much more available and useful to the Internet world. Our recommendations are based around various RDF ontologies due to Fedora 4’s capabilities to handle RDF. The work happening in the Hydra Descriptive Metadata Working Group to define a baseline descriptive metadata set and the ontologies used there will be the most likely to want Linked Data URIs used as much as possible for those statements. It’s not an easy task to agree on a baseline set of descriptive metadata for various digital object types but there is precedence in both the Europeana Data Model and the DPLA Application Profile. I would expect we’ll follow along similar lines but it is a process to both reach consensus and have something that developers can use.

Do you have any advice for those interested in linked data?

I am more involved in the world of RDF than in the world of Linked Data at this point. Using RDF like we do in Hydra does not mean we are creating Linked Data. I think Linked Data comes as a next step after working in RDF. I am coming from a metadata world heavily involved in XML and XML schemas so to me this isn’t about getting started with Linked Data, it’s about understanding how to transition from XML to Linked Data (by way of RDF). I watch for reports on creating Linked Data and, more importantly, transitioning to Linked Data from current metadata standards and formats. Conferences such as Code4Lib (coming up in March 2016 in Philadelphia), Open Repositories (in Dublin, Ireland in June 2016) and the Digital Library Federation Forum (in Milwaukee in November 2016) are having a lot of discussion about this sort of work.

Is there anything we can do locally to prepare for linked data?

Recommended steps I have gleaned so far include cleaning the metadata you have now – syncing up names of people, places, and subjects so they are spelled and named the same across records; adding authority URIs whenever possible, this makes transformation to RDF with URIs easier later; and considering the data model you will move to when describing things using RDF. If you are using XML schemas right now, there isn’t necessarily a 1:1 relationship between XML schemas and RDF ontologies so it might require introducing multiple RDF ontologies and creating a local namespace for descriptions that involve information that is unique to your institution (you become the authority). Lastly, keep in mind the difference between Linked Data and Linked Open Data and be sure if you are getting into publishing Linked Data sets that you are making them available for reuse and aggregation – it’s the entire point of the Web of Data that was imagined by Tim Berners-Lee when he first discussed Linked Data and RDF (http://www.w3.org/DesignIssues/LinkedData.html).

Conclusion

A big thank you to Julie for sharing her experiences and knowledge. She provided a plethora of resources during the interview, so go forth and explore! As always, please feel free to leave a comment or contact Julie/me privately. Until next time!

Ed Summers: Craft and Computation

planet code4lib - Thu, 2016-01-07 05:00

(???) provides an interesting view into how the funiture artist Wendell Castle uses 3D scanning and digital fabrication tools in his work. Usefully (for me) the description is situated in the larger field of human-computer interaction, and computer supported work, which I’m trying to learn more about. It’s worth checking out if you are interested in a close look at how a small furniture studio (that has built an international reputation for craftsmanship) uses 3d scanning and robotics to do its work.

One fascinating piece of the story is the work of the studio director, Marvin Pallischeck (Marv), who adapted a CNC machine designed for [pick-and-place] work in the US Postal Service, to serve as a milling machine. This robot is fed 3D scans of prototypes created by Castle along with material (wood) and then goes to work. The end result isn’t a completed piece, but one that a woodcarver can then work with further to get it into shape. The 3d scanning is done by an offiste firm that does work in scanning wood. They deliver a CAD file that needs to be converted to a CAM file. The CAM file then needs to be adjusted to control the types of cutters and feed speeds that are used, to fit the particular wood being worked on.

The work is also iterative where the robot successively works on the parts of the whole piece, getting closer and closer with Marv’s help. The process resists full autmation:

“At the end of the day, it’s the physical properties of the material that drives our process”, says Marv as he describes the way the wood grain of a Castle piece can be read to determine the orientation of the tree’s growth within the forest. “I always say, this tree is now dead, but its wood is not - and it’s important to know that going into this.” Bryon understands this in a similar way, “There’s a lot of tension in wood. When you start cutting it up, that tension is released, free to do as it will. And form changes. Things crack, they bend, and warp”

There is also an impact on the clients perception of the work: its authenticity and authorship. On the theoretical side, Cheadle and Jackson are drawing attention to how the people, their creative processes, the computers and the materials they are working with, are all part of a network. As with Object Oriented Ontology (Bogost is cited), the lines between the human and the non-human objects begin to get fuzzy, and complicated. More generally the interviews and ethnographic work point the work of Wanda Orlikowski.

These arguments build in turn on a broader body of work around materiality and social life growing in the organizational and social sciences. Orlikowski finds that materiality is integral to organizational life and that developing new ways of dealing with material is critical if one is to understand the multiple, emergent, shifting and interdependent technologies at the heart of contemporary practice (Orlikowski, 2007). Orlikowski sees humans and technology as bound through acts of ‘recursive intertwining’ or ‘constitutive entanglement’ that eschew pre-ordered hierarchies or dualisms. Rather, human actors and technological practices are enmeshed and co-constituted, emerging together from entangled networks that are always shifting and coemerging in time.

I think this is an angle I’m particularly interested in exploring with respect to Web archiving work: the ways in which traditional archival materials (paper, film, audio, photographs, etc) and processes are challenged by the material of the Web. With respect to this work by Cheatle and Jackson: the ways in which our automated tools (crawlers, viewers, inventory/appraisal tools) have been designed (or not) to fit the needs of archivists. How are archivists, the medium of the Web, and the archival tools/processes are entangled, and how an understanding of this entanglement can inform the design of new archival tools.

Orlikowski, W. J. (2007). Sociomaterial practices: Exploring technology at work. Organization Studies, 28(9), 1435–1448. http://doi.org/10.1177/0170840607081138

Terry Reese: MarcEdit updates

planet code4lib - Thu, 2016-01-07 04:30

I noted earlier today that I’d be making a couple MarcEdit updates.  You can see the change logs here:

Please note – if you use the Linked data tools, it is highly recommended that you update.  This update was done in part to make the interactions with LC more efficent on all sides.

You can get the download from the automated update mechanism in MarcEdit or from the downloads page: http://marcedit.reeset.net/downloads

Questions, let me know.

–tr

LITA: Jobs in Information Technology: January 6, 2016

planet code4lib - Thu, 2016-01-07 00:38

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Cadence Group, Metadata and Systems Librarian, Greenbelt, MD

California State University, Fullerton, Systems Librarian, Fullerton, CA

The Samuel Roberts Noble Foundation, Inc., Manager, Library & Information Management Services, Ardmore, OK

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Villanova Library Technology Blog: The Curious ‘Cat: Favorite Falvey Memory from 2015?

planet code4lib - Wed, 2016-01-06 21:18

This week, the Curious ‘Cat asks library staff, “As 2015 comes to an end, do you have a favorite Falvey memory from this past year?

Melanie Wood—“The service dogs that were in here for the stress busters: that was definitely a highlight. … The ancient Bassett hound [“Oliver”] was my favorite.”

 

Stephen Spatz—“My favorite moment might have been when that marketer hid all those Kind bars in the stacks.”

Raamaan McBride—“Was Millicent hired this year? That would be my favorite 2015 memory.”

 

Nora Ramos—[A pipe burst in the Falvey West stacks the morning of Aug. 12, which caused water to rain down on books shelved on the first and ground floors.] “This wasn’t a very good memory. I worked very hard; it was awful. It was the worst thing that happened this year. … I think everybody—custodial—did a beautiful job, no matter what. … They were able to solve the problem in time. I hope it never happens again.”

Barb Haas—[A pipe burst in the Falvey West stacks the morning of Aug. 12, which caused water to rain down on books shelved on the first and ground floors.] “I wouldn’t say it was a favorite event, but it certainly was a memorable event. … Also, we moved [thousands of library items] from Garey Hall to our new remote-storage facility. That was huge; that was a huge project.”

 

 

 

 

 

 

 

 

 

 

 

 

 

Dave Uspal—“I work in the tech department at Falvey and, as such, mostly come up from the basement for meals and meetings (after which it’s back down to the basement). This year, however, I got out of the basement more since I was privileged to work with several classes as part of our Aurelius Digital Scholarship initiative. Normally, working in the tech department you don’t see the bigger picture of what you do, but getting a chance to work directly with students on a series of Digital Humanities projects was a great opportunity for me to get out and see the greater campus. The students this year were great—intelligent and creative—and I thought the projects turned out fantastic (to be unveiled Spring 2016!). I would love to do this again next year.”


Like11 People Like This Post

LibUX: The “Phablet” Will be the Dominant Form Factor

planet code4lib - Wed, 2016-01-06 19:11

This is the seventh year Flurry has reported on mobile app usage, showing continuing trends of mobile growth, the importance and conscious popularity of personalization, how media consumption is shifting from television and PCs to smartphones, and the rapid growth of mobile commerce.

Of interest to designers and web teams might be the changing size of the handheld screen, which way back in early iPhone days constituted an easy media query at something-something 320px, but now is of course a little bit more nebulous and a whole lot bigger.

At least in terms of engagement metrics, the Phablet is doing really well

The picture got much clearer when we looked at year-over-year growth in time spent and cut that by form factor. Time spent on phablets grew 334% year-over-year (2.9 times more than the average), compared to 117% for all form factors. With time spent on mobile surpassing that on television, and phablets posting astonishing growth in media consumption, it appears that the cable industry will find in the phablet and its apps its long-awaited digital nemesis.

This post is part of a nascent library UX data collection we hope you can use as a reference to make smart decisions. If you’re interested in more of the same, follow @libuxdata on Twitter, or continue the conversation on our Facebook group. You might also think about signing-up for the Web for Libraries weekly.

Email Address

The post The “Phablet” Will be the Dominant Form Factor appeared first on LibUX.

Patrick Hochstenbach: Sktchy Portraits

planet code4lib - Wed, 2016-01-06 18:44
Filed under: Comics, Sketchbook Tagged: copic, illustration, portrait, sktchyapp

Eric Lease Morgan: XML 101

planet code4lib - Wed, 2016-01-06 18:05

This past Fall I taught “XML 101” online and to library school graduate students. This posting echoes the scripts of my video introductions, and I suppose this posting could also be used as very gentle introduction to XML for librarians.

Introduction

I work at the University of Notre Dame, and my title is Digital Initiatives Librarian. I have been a librarian since 1987. I have been writing software since 1976, and I will be your instructor. Using materials and assignments created by the previous instructors, my goal is to facilitate your learning of XML.

XML is a way of transforming data into information. It is a method for marking up numbers and text, giving them context, and therefore a bit of meaning. XML includes syntactical characteristics as well as semantic characteristics. The syntactical characteristics are really rather simple. There are only five or six rules for creating well-formed XML, such as: 1) there must be one and only one root element, 2) element names are case-sensitive, 3) elements must be close properly, 4) elements must be nested properly, 4) attributes must be quoted, and 5) there are a few special characters (&, <, and >) which must be escaped if they are to be used in their literal contexts. The semantics of XML is much more complicated and they denote the intended meaning of the XML elements and attributes. The semantics of XML are embodied in things called DTDs and schemas.

Again, XML is used to transform data into information. It is used to give data context, but XML is also used to transmit this information in an computer-independent way from one place to another. XML is also a data structure in the same way MARC, JSON, SQL, and tab-delimited files are data structures. Once information is encapsulated as XML, it can unambiguously transmitted from one computer to another where it can be put to use.

This course will elaborate upon these ideas. You will learn about the syntax and semantics of XML in general. You will then learn how to manipulate XML using XML-related technologies called XPath and XSLT. Finally, you will learn library-specific XML “languages” to learn how XML can be used in Library Land.

Well-formedness

In this, the second week of “XML 101 for librarians”, you will learn about well-formed XML and valid XML. Well-formed XML is XML that conforms to the five or six syntactical rules. (XML must have one and only one root element. Element names are case sensitive. Elements must be closed. Elements must be nested correctly. Attributes must be quoted. And there are a few special characters that must be escaped (namely &, <, and >). Valid XML is XML that is not only well-formed but also conforms to a named DTD or schema. Think of valid XML as semantically correct.

Jennifer Weintraub and Lisa McAulay, the previous instructors of this class, provide more than a few demonstrations of how to create well-formed as well as valid XML. Oxygen, the selected XML editor for this course is both powerful and full-featured, but using it efficiently requires practice. That’s what the assignments are all about. The readings supplement the demonstrations.

DTD’s and namespaces

DTD’s, schemas, and namespaces put the “X” in XML. They make XML extensible. They allow you to define your own elements and attributes to create your own “language”.

DTD’s — document type declarations — and schemas are the semantics of XML. They define what elements exists, what order they appear in, what attributes they can contain, and just as importantly what the elements are intended to contain. DTD’s are older than schemas and not as robust. Schemas are XML documents themselves and go beyond DTD’s in that they provide the ability to define the types of data elements and attributes contain.

Namespaces allow you, the author, to incorporate multiple DTD and schema definitions into a single XML document. Namespaces provide a way for multiple elements of the same name to exist concurrently in a document. For example, two different DTD’s may contain an element called “title”, but one DTD refers to a title as in the title of a book, and the other refers to “title” as if it were an honorific.

Schemas

Schemas are an alternative and more intelligent alternative to DTDs. While DTDs define the structure of XML documents, schemas do it with more exactness. While DTDs only allow you to define elements, the number of elements, the order of elements, attributes, and entities, schemas allow you to do these things and much more. For example, they allow you to define the types of content that go into elements or attributes. Strings (characters). Numbers. Lists of characters or numbers. Boolean (true/false) values. Dates. Times. Etc. Schemas are XML documents in an of themselves, and therefore they can be validated just like any other XML document with a pre-defined structure.

The reading and writing of XML schemas is very librarian-ish because it is about turning data into information. It is about structuring data so it makes sense, and it does this in an unambiguous and computer-independent fashion. It is too bad our MARC (bibliographic) standards are not as rigorous.

RelaxNG, Schematron, and digital libraries

The first is yet another technology for modeling your XML, and it is called RelaxNG. This third modeling technology is intended to be more human readable than schemas and more robust that DTDs. Frankly, I have not seen RelaxNG implements very many times, but it behooves you to know it exists and how it compares to other modeling tools.

The second is Schematron. This tool too is used to validate XML, but instead of returning “ugly” computer-looking error messages, its errors are intended to be more human-readable and describe why things are the way they are instead of just saying “Wrong!”

Lastly, there is an introduction to digital libraries and trends in their current development. More and more, digital libraries are really and truly implementing the principles of traditional librarianship complete with collection, organization, preservation, and dissemination. At the same time, they are pushing the boundaries of the technology and stretching our definitions. Remember, it is not so much about the technology (the how of librarianship) that is important, but rather the why of libraries and librarianship. The how changes quickly. The why changes slowly, albiet sometimes too slowly.

XPath

This week is all about XPath, and it is used to select content from your XML files. It is akin to navigating a computer’s filesystem from the command line in order to learn what is located in different directories.

XPath is made up of expressions which return values of true, false, strings (characters), numbers, or nodes (subsets of XML files). XPath is used in conjunction with other XML technologies, most notably XSTL and XQuery. XSLT is used to transform XML files into other plain text files. XQuery is akin to the structured query language of relational databases.

You will not be able to do very much with XML other than read or write it, unless you understand XPath. An understanding XPath is essencial if you want to do truly interesting things with XML.

XSLT

This week you will be introduced to XSLT, a programming language used to transform XML into other plain text files.

XML is all about information, and it is not about use nor display. In order for XML to be actually useful — to be applied towards some sort of end — specific pieces of data need to be extracted from XML or the whole of the XML file needs to be converted into something else. The most common conversion (or “transformation”) is from some sort of XML into HTML for display in a Web browser. For example, bibliographic XML (MARCXML or MODS) may be transformed into a sort of “catalog card” for display, or a TEI file may be transformed into a set of Web pages, or an EAD file may be transformed into a guide intended for printing. Alternatively, you may want to tranform the bibliographic data into a tab-delimited text file for a spreadsheet or an SQL file for a relational database. Along with other sets of information, an XML file may contain geographic coordinates, and you may want to extract just those coordinates to create a KML file — a sort of map file.

XSLT is a programming language but not like most programming languages you may know. Most programming languages are “procedural” (like Perl, PHP, or Python), meaning they execute their commands in a step-wise manner. “First do this, then do that, then do the other thing.” This can be contrasted with “declarative” programming languages where events occur or are encountered in a data file, and then some sort of execution happens. There are relatively few declarative programming languages, but LISP is/was one of them. Because of the declarative nature of XSLT, the apply-templates command is so important. The apply-templates command sort of tells the XSLT processor to go off and find more events.

Now that you are beginning to learn XSLT and combining it with XPath, you are beginning to do useful things with the XML you have been creating. This is where the real power is. This is where it gets really interesting.

TEI — Text Encoding Initiative

TEI is a granddaddy, when it comes to XML “languages”. It started out as a different from of mark-up, a mark-up called SGML, and SGML was originally a mark-up language designed at IBM for the purposes of creating, maintaining, and distributing internal documentation. Now-a-days, TEI is all but a hallmark of XML.

TEI is a mark-up language for any type of literature: poetry or prose. Like HTML, it is made up of head and body sections. The head is the place for administrative, bibliographic, and provenance metadata. The body is where the poetry or prose is placed, and there are elements for just about anything you can imagine: paragraphs, lines, headings, lists, figures, marginalia, comments, page breaks, etc. And if there is something you want to mark-up, but an element does not explicitly exist for it, then you can almost make up your own element/attribute combination to suit your needs.

TEI is quite easily the most well-documented XML vocabulary I’ve ever seen. The community is strong, sustainable, albiet small (if not tiny). The majority of the community is academic and very scholarly. Next to a few types of bibliographic XML (MARCXML, MODS, OAIDC, etc.), TEI is probably the most commonly used XML vocabulary in Library Land, with EAD being a close second. In libraries, TEI is mostly used for the purpose of marking-up transcriptions of various kinds: letters, runs of out-of-print newsletters, or parts of a library special collection. I know of no academic journals marked-up in TEI, no library manuals, nor any catalogs designed for printing and distribution.

TEI, more than any other type of XML designed for literature, is designed to support the computed critical analysis of text. But marking something up in TEI in a way that supports such analysis is extraordinarily expensive in terms of both time and expertise. Consequently, based on my experience, there are relatively very few such projects, but they do exist.

XSL-FO

As alluded to throughout this particular module, XSL-FO is not easy, but despite this fact, I sincerely believe it is under-utilized tool.

FO stands for “Formatting Objects”, and it in an of itself is an XML vocabulary used to define page layout. It has elements defining the size of a printed page, margins, running headers & footers, fonts, font sizes, font styles, indenting, pagination, tables of contents, back-of-the-book indexes, etc. Almost all of these elements and their attributes use a syntax similar to the syntax of HTML’s cascading stylesheets.

Once an XML file is converted into an FO document, you are expected to feed the FO document to a FO processor, and the FO processor will convert the document into something intended for printing — usually a PDF document.

FO is important because not everything is designed nor intended to be digital. Digital everything is mis-nomer. The graphic design of a printed medium is different from the graphic design of computer screens or smart phones. In my opinion, important XML files ought to be transformed into different formats for different mediums. Sometimes those mediums are screen oriented. Sometimes it is better to print something, and printed somethings last a whole lot longer. Sometimes it is important to do both.

FO is another good example of what XML is all about. XML is about data and information, not necessarily presentation. XSL transforms data/information into other things — things usually intended for reading by people.

EAD — Encoded Archival Description

Encoded Archival Description (or EAD) is the type of XML file used to enumerate, evaluate, and make accessible the contents of archival collections. Archival collections are often the raw and primary materials of new humanities scholarship. They are usually “the papers” of individuals or communities. They may consist of all sorts of things from letters, photographs, manuscripts, meeting notes, financial reports, audio cassette tapes, and now-a-days computers, hard drives, or CDs/DVDs. One thing, which is very important to understand, is that these things are “collections” and not intended to be used as individual items. MARC records are usually used as a data structure for bibliographically describing individual items — books. EAD files describe an entire set of items, and these descriptions are more colloquially called “finding aids”. They are intended to be read as intellectual works, and the finding aids transform collections into coherent wholes.

Like TEI files, EAD files are comprised of two sections: 1) a header and 2) a body. The header contains a whole lot or very little metadata of various types: bibliographic, administrative, provenance, etc. Some of this metadata is in the form of lists, and some of it is in the form of narratives. More than TEI files, EAD files are intended to be displayed on a computer screen or printed on paper. This is why you will find many XSL files transforming EAD into either HTML or FO (and then to PDF).

RDF

RDF is an acronym for Resource Description Framework. It is a data model intended to describe just about anything. The data model is based on an idea called triples, and as the name implies, the triples have three parts: 1) subjects, 2) predicates, and 3) objects.

Subjects are always URIs (think URLs), and they are the things described. Objects can be URIs or literals (words, phrases, or numbers), and objects are the descriptions. Predicates are also always URIs, and they denote the relationship between the subjects and the objects.

The idea behind RDF was this. Describe anything and everthing in RDF. Resuse as many of the URIs used by other people as possible. Put the RDF on the Web. Allow Internet robots/spiders to harvest and cache the RDF. Allow other computer programs to ingest the RDF, analyse it for the similar uses of subjects, predicates, and objects, and in turn automatically uncover new knowledge and new relationships between things.

RDF is/was originally expressed as XML, but the wider community had two problems with RDF. First, there were no “killer” applications using RDF as input, and second, RDF expressed as XML was seen as too verbose and too confusing. Thus, the idea of RDF languished. More recently, RDF is being expressed in other forms such as JSON and Turtle and N3, but there are still no killer applications.

You will hear the term “linked data” in association with RDF, and linked data is the process of making RDF available on the Web.

RDF is important for libraries and “memory” or “cultural heritage” institutions, because the goal of RDF is very similar to the goals of libraries, archives, and museums.

MARC

The MARC standard has been the bibliographic bread & butter of Library Land since the late 1960’s. When it was first implemented it was an innovative and effect data structure used primarily for the production of catalog cards. With the increasing availability of computers, somebody got the “cool” idea of creating an online catalog. While logical, the idea did not mature with a balance of library and computing principles. To make a long story short, library principles prevailed and the result has been and continues to be painful for both the profession as well as the profession’s clientele.

MARCXML was intended to provide a pathway out of this morass, but since it was designed from the beginning to be “round tripable” with the original MARC standard, all of the short-comings of the original standard have come along for the ride. The Library Of Congress was aware of these short-comings, and consequently MODS was designed. Unlike MARC and MARCXML, MODS has no character limit and its field names are human-readable, not based on numeric codes. Given that MODS is flavor of XML, all of this is a giant step forward.

Unfortunately, the library profession’s primary access tools — the online catalog and “discovery system” — still heavily rely on traditional MARC for input. Consequently, without a wholesale shift in library practice, the intellectual capital the profession so dearly wants to share is figuratively locked in the 1960’s.

Not a panacea

XML really is an excellent technology, and it is most certainly apropos for the work of cultural heritage institutions such as libraries, archives, and museums. This is true for many reasons:

  1. it is computing platform independent
  2. it requires a minimum of computer technology to read and write
  3. to some degree, it is self-documenting, and
  4. especially considering our profession, it is all about data, information, and knowlege

On the other hand, it does have a number of disadvantages, for example:

  1. it is verbose — not necessarily succinct
  2. while easy to read and write, it can be difficult to process
  3. like all things computer program-esque, it imposes a set of syntactical rules, which people can sometimes find frustrating
  4. its adoption as standard has not been as ubiquitous as desired

To date you have learned how to read, write, and process XML and a number of its specific “flavors”, but you have by no means learned everything. Instead you have received a more than adequate introduction. Other XML topics of importance include:

  • evolutions in XSLT and XPath
  • XML-based databases
  • XQuery, a standardized method for querying sets of XML similar to the standard query language of relational databases
  • additional XML vocabularies, most notably RSS
  • a very functional way of making modern Web browsers display XML files
  • XML processing instructions as well as reserved attributes like lang

In short, XML is not a panacea, but it is an excellent technology for library work.

Summary

You have all but concluded a course on XML in libraries, and now is a good time for a summary.

First of all, XML is one of culture’s more recent attempts at formalizing knowledge. At its root (all puns intended) is data, such as the number like 1776. Through mark-up we might say this number is a year, thus turning the data into information. By putting the information into context, we might say that 1776 is when the Declaration of Independence was written and a new type of government was formed. Such generalizations fall into the realm of knowledge. To some degree, XML facilitates the transformation of data into knowledge. (Again, all puns intended.)

Second, understand that XML is also a data structure defined by the characteristics of well-formedness. By that I mean XML has one and only one root element. Elements must be opened and closed in a hierarchal manner. Attributes of elements must be quoted, and a few special characters must always be escaped. The X in XML stands for “extensible”, and through the use of DTDs and schemas, specific XML “flavors” can be specified.

With this under your belts you then experimented with at least a couple of XML flavors: TEI and EAD. The former is used to mark-up literature. The later is used to describe archival collections. You then learned about the XML transformation process through the application of XSL and XPath, two rather difficult technologies to master. Lastly, you made strong efforts to apply the principles of XML to the principles of librarianship by marking up sets of documents or creating your own knowledge entity. It is hoped you have made a leap from mere technology to system. It is not about Oxygen nor graphic design. It is about the chemistry of disseminating data as unambiguously as possible for the purposes of increasing the sphere of knowledge. With these things understood, you are better equipped to practice librarianship in the current technological environment.

Finally, remember, there is no such thing as a Dublin Core record.

Epilogue — Use and understanding

This course in XML was really only an introduction. You were expected to read, write, and transform XML. This process turns data into information. All of this is fine, but what about knowledge?

One of the original reasons texts were marked up was to facilitate analysis. Researchers wanted to extract meaning from texts. One way to do that is to do computational analysis against text. To facilitate computational analysis people thought is was necessary for essential characteristics of a text to be delimited. (It is/was thought computers could not really do natural language processing.) How many paragraphs exists? What are the names in a text? What about places? What sorts of quantitative data can be statistically examined? What main themes does the text include? All of these things can be marked-up in a text and then counted (analyzed).

Now that you have marked up sets of letters with persname elements, you can use XPath to not only find persname elements but count them as well. Which document contains the most persnames? What are the persnames in each document. Tabulate their frequency. Do this over a set of documents to look for trends across the corpus. This is only a beginning, but entirely possible given the work you have already done.

Libraries do not facilitate enough quantitative analysis against our content. Marking things up in XML is a good start, but lets go to the next step. Let’s figure out how the profession can move its readership from discovery to analysis — towards use & understand.

Eric Lease Morgan: Mr. Serials continues

planet code4lib - Wed, 2016-01-06 16:42

The (ancient) Mr. Serials Process continues to support four mailing list archives, specifically, the archives of ACQNET, Colldv-l, Code4Lib, and NGC4Lib, and this posting simply makes the activity explicit.

Mr. Serials is/was a process I developed quite a number of years ago as a method for collecting, organizing, archiving electronic journals (serials). The process worked well for a number of years, until electronic journals were no longer distributed via email. Now-a-days, Mr. Serials only collects the content of a few mailing lists. That’s okay. Things change. No big deal.

On the other hand, from a librarian’s and archivist’s point-of-view, it is important to collect mailing list content in its original form — email. Email uses the SMTP protocol. The communication sent back and forth, between email server and client, is well-structured albiet becoming verbose. Probably “the” standard for saving email on a file system is called mbox. Given a mbox file, it is possible to use any number of well-known applications to read/write mbox data. Heck, all you need is a text editor. Increasingly, email archives are not available from mailing list applications, and if they are, then they are available only to mailing list administrators and/or in a proprietary format. For example, if you host a mailing list on Google, can you download an archive of the mailing list in a form that is easily and universally readable? I think not.

Mr. Serials circumvents this problem. He subscribes to mailing lists, saves the incoming email to mbox files, and processes the mbox files to create searchable/browsable interfaces. The interfaces are not hugely aesthetically appealing, but they are more than functional, and the source files are readily available. Just ask.

Most recently both the ACQNET and Colldv-l mailing lists moved away from their hosting institutions to servers hosted by the American Library Association. This has not been the first time these lists have moved. It probably won’t be the last, but since Mr. Serials continues subscribe to these lists, comprehensive archives persevere. Score a point for librarianship and the work of archives. Long live Mr. Serials.

Pages

Subscribe to code4lib aggregator