You are here

Feed aggregator

Islandora: Announcing the First Annual Islandora Conference

planet code4lib - Mon, 2014-12-08 21:07

It's like an Islandora Camp, only more.

The Islandora community has been working since 2006, via email, listservs, irc, and nearly a dozen Islandora Camps. We have grown, matured, tackled major changes in the structure of the project, and now we are ready to come together for a full conference to talk about where we've been and where we're going.

August 3 - 7, 2015, we invite Islandorians from the world over to join us in the birthplace of Islandora (Charlottetown, PEI) for a week of great food, (hopefully) beautiful weather, and all the Islandora you can handle. This full week event will contain sessions from the Islandora Foundation, Interest groups, community presentations, two full days of hands-on Islandora training, and will end with an islandora Hackfest where we invite you to make your mark in the Islandora code and work together with your fellow Islandorians to complete a project selected by the community.

Full details will be available as they develop on our Conference Website, including registration, calls for proposals, scheduling, and recommendations for accommodations and travel. We look forward to seeing you in PEI next year!


Photo: Nicolas Raymond

District Dispatch: Reminder: Free webinar on Ebola for librarians

planet code4lib - Mon, 2014-12-08 21:04

Reminder: On Friday, December 12, 2014, library leaders from the U.S. National Library of Medicine will host the free webinar “Ebola and Other Infectious Diseases: The Latest Information from the National Library of Medicine.” As a follow-up to the webinar they presented in October, librarians from the U.S. National Library of Medicine will be discussing how to provide effective services in this environment, as well as providing an update on information sources that can be of assistance to librarians.

#521695521 / gettyimages.com Speakers
  • Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. Champ-Blackwell selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. Champ-Blackwell has over 10 years of experience in providing training on NLM products and resources.
  • Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. Norton has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.

Date: December 12, 2014
Time: 2:00 PM–3:00 PM Eastern
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. To view past webinars also done in collaboration with iPAC, please visit Lib2Gov.org.

The post Reminder: Free webinar on Ebola for librarians appeared first on District Dispatch.

Nicole Engard: Bookmarks for December 8, 2014

planet code4lib - Mon, 2014-12-08 20:30

Today I found the following resources and bookmarked them on <a href=

  • Code.org Launched in 2013, Code.org® is a non-profit dedicated to expanding participation in computer science by making it available in more schools, and increasing participation by women and underrepresented students of color. Our vision is that every student in every school should have the opportunity to learn computer science.
  • Hour of Code Join the largest learning event in history, Dec 8-14, 2014. The Hour of Code is a global movement reaching tens of millions of students in 180+ countries. Anyone, anywhere can organize an Hour of Code event. One-hour tutorials are available in over 30 languages. No experience needed.
  • HipChat Private group chat, video chat, instant messaging for teams
  • Slack Slack is a platform for team communication: everything in one place, instantly searchable, available wherever you go.

Digest powered by RSS Digest

The post Bookmarks for December 8, 2014 appeared first on What I Learned Today....

Related posts:

  1. Conversants :-) A Participatory Conversation
  2. Koha in Library School
  3. Registered

SearchHub: Noob* Notes: Fusion First Look

planet code4lib - Mon, 2014-12-08 19:27
This is a record of my coming up to speed on Fusion, starting from zero.  I’ve just joined the Lucidworks team to write documentation and develop demos.   I’d like to dedicate this my first post to developers who, like me, know enough about search and Lucene and/or Solr to be dangerous employable, but who haven’t used Fusion — yet. Getting Started I like to click first, read the docs later, so the first thing I do is find the Fusion download page. I download Fusion (version 1.1.1). It’s a gzipped tarball with a README.txt file that points to the online documentation. Looks like I have to read the docs sooner rather than later. The installation instructions are straightforward. My Mac is running Java7 JDK (build 1.7.0_71-b14) but I don’t have an existing Solr installation, so I need to start Fusion with the embedded Solr instance. I run the bin/fusion start command, point the Chrome web browser at http://localhost:8764, and login. The Fusion UI shows 5 icons: Admin, Quick Start, Relevancy Workbench, Search, Banana Dashboards. I click through each in turn. The Banana Dashboard is especially impressive. This looks very different from the Solr Admin UI, that’s for sure. The instructions in the Getting Started page start with the Admin app. Following the steps in the First 5 minutes with Fusion, I create a Collection named getStarted, and a web Datasource named lucidworks. The concept Collection is familiar from Solr; it’s a logical index. Datasources are used to pull data into an index. Indexing the Lucidworks web pages starting from the URL http://lucidworks.com retrieves 1180 documents. On a slow cable internet connection, this took 5 minutes. At this point I’ve spent about 3 minute staring at and clicking through the Admin UI, and 5 minutes reading the Lucidworks docs. It’s always prudent to multiply a time estimate by 2 (or 3), so if I can carry out a few searches in under 2 minutes, my first 5 minutes with Solr will have taken 10 minutes my time, plus 5 minutes indexing. I run a series of searches: “Lucidworks” returns 1175 documents, “Lucidworks AND Fusion” returns 1174 documents, “Java AND Python” returns 15 documents, “unicorn” returns 0 documents. That took no time at all. I’ve got a Collection and the search results look sensible. By following the instructions and ignoring everything I don’t understand, my first 5 minutes with Fusion have been a total success. A Real Problem So far I’ve kicked the tires and taken a drive around the parking lot. Time to hit the road and index some fresh content. My go-to test case is the content available from the National Library of Medicine. The NLM maintains databases of drugs, chemicals, diseases, genes, proteins, enzymes, as well as MEDLINE/PubMed, a collection of more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. NLM leases MEDLINE/PubMed to U.S. and non-US individuals or organizations, distributed as a set of XML files with top-level element is <MedlineCitationSet>. Each citation set contains one or more lt;MedlineCitation> elements. Every year, NLM releases a new version of MEDLINE, a revised DTD, and a sample data set. Can I index the MEDLINE/PubMed 2015 sample data as easily as I indexed Lucidworks.com? The answer yes I can index the data, but it takes a little more work because a specialized document set requires a specialized index. I demonstrate this by failure. Working through the Fusion Admin UI, I create a new collection called Medsamp2015. As before, I create a web datasource called medsamp2015xml and point it at the MEDLINE/PubMed 2015 sample data file. Fusion proceses this URL into a single document. Since there’s only one document in the index, I use the wildcard search “*” to examine it. The content field contains the text of all elements in the XML file. Definitely not the indexing strategy I had in mind.  The MEDLINE 2015 sample data file has one top-level element <MedlineCitationSet> and 165 <MedlineCitation> elements. What I want to is to index each <MedlineCitation> element as its own document. A Real Solution A Fusion datasource is coupled with Index Pipeline.  Pipelines are powerful, but the documentation is incomplete — that’s why I’ve been hired. At this point, with a little lot of help from the Fusion developers, I was able to create an indexing pipeline for the Medline data. Soon the documentation will catch up to Fusion’s awesome capabilities. In the interim, here’s a report of my what I did: what worked and what didn’t. Pipelines are comprised of a sequence of stages. The conn_solr pipeline is a general-purpose document parsing pipeline composed of the following stages: an Apache Tika Parser index stage; a Field Mapper Index stage; a Multi-value Resolver stage; and Solr Indexer stage. The Tika Parser interface provides a single mechanism for extracting both metadata and data from many different sorts of documents, including HTML, XML, and XHTML. The field mapper index stage maps common document elements to defined fields in the default Solr schema. The Multi-value Resolver stage resolves conflicts that would otherwise arise when a document contains multiple values for a Solr field which is not multi-valued. Finally, the Solr indexer stage sends documents to Solr for indexing. Beause there’s a close connection between a datasource and the processing applied to that data, when possible, the Fusion Admin UI provides a default index pipeline ID. For a web datasource, the default index pipeline is the conn_solr pipeline which provides field mappings for common elements found on HTML pages. In the Getting Started example above, there was a one-to-one correspondance between web pages and document in the Solr index. For a Medline XML file, additional processing is required to transform each citation into a fielded document for Solr.  The indexing pipeline required looks like this:
  • Apache Tika Parser
  • XML Transform
  • Field Mapper
  • Solr Indexer
This pipeline looks superficially similar to the conn_solr index pipeline but both the Tika Parser and Field Mapper stages are configured quite differently and an XML Transform stage is used to map specific elements of the Medline XML to custom fields in the Solr document. A Multi-value Resolver stage isn’t necessary because I’ve set up the mapping so that multi-valued elements are mapped to multi-valued fields. The configuration of the Solr Indexer remains the same. The new Fusion Admin UI Pipelines control panel can be used to define both index and query pipelines. It’s also possible to define pipelines through the Fusion REST API.  As a noob, I’m sticking to the Admin UI.  After clicking through to the Pipelines control panel, Index Pipelines tab, I create a new Index Pipeline named medline_xml, then add each stage in turn. When a new stage is added, the Pipeline panel displays the configuration choices needed for that stage. Apache Tika Parser Configuration To process the MEDLINE XML, I need to configure Tika so that it doesn’t try to extract the text contents but instead passes the XML to the next stage of the indexing pipeline. I’ve captured the config that I need in the following screenshot and circled the setting that I had to change from the current default in red: The control addOriginalContent is set to false and both controls “Return parsed content as XML or HTML” and “Return original XML and HTML instead of Tika XML” are set to true. The latter two controls seem redundant, but they’re not and you’ll need both set to true to work. Trust me. XML Transform Configuration The XML Transform stage does the mapping from nested XML elements into a fielded doc for Solr. After adding an XML Transform stage to my pipeline and naming, I get down to specifying that mapping. The following screenshot shows the key configurations: Because we want to index each MedlineCitation element as its own document, the Root XPath element is set to the full XPath “/MedlineCitationSet/MedlineCitation”.   XPathMappings pick out the elements that map to fields in that document.  For my document fields, I use the Solr dynamic field naming conventions.  Each MedlineCitation is assigned a unique integer identifier called a PMID (PubMed ID).   In this example, flattening the MEDLINE XML into a Solr doc is straightforward.  The XPathMappings used are:
  • “/MedlineCitationSet/MedlineCitation/Article/ArticleTitle” maps to “article-title_txt”, Multi Value false
  • “/MedlineCitationSet/MedlineCitation/Article/Abstract/AbstractText” maps to “article-abstract_txt”, Multi Value true
  • “/MedlineCitationSet/MedlineCitation/MeshHeadingList/MeshHeading/DescriptorName” maps to “mesh-heading_txt”, Multi Value true.
There’s a lot more information to be extracted from the XML, but this is enough for now. Field Mapper Configuration It’s complicated, but because I’m using Tika and an XML Transform, I need a Field Mapper stage to remove some of the document fields created by Tika before sending to document to Solr for indexing.  On the advice of my local wizard, I create mappings for “_raw_content_”,  “parsing”, “parsing-time”, “Content-Type”, and “Content-Length” fields and set the mode to “delete”. Solr Indexer Configuration I set the “Buffer documents” for Solr option to true.  This isn’t strictly necessary. It just seems like a good thing to do. Checking Configuration with the Pipeline Result Preview Tool This is a lot to configure, and I couldn’t have done it without the “Pipeline Result Preview” tool, located on the right hand side of the Pipeline UI panel. The Preview tool takes as input a list of documents coded up as a JSON objects and runs them through the indexing pipeline. A document object has two members: id and fields. Here, our input has exactly one field, name “body” whose value is a JSON-encoded string of the raw XML (or a subset thereof). The JSON input can’t split strings across lines, which means that the JSON-encoded XML is pretty much unreadable.  After several tries, I get a well-formed example MedlineCitationSet example consisting of three MedlineCitation elements, properly escaped for JSON all jammed together on one line. The “view results” tab shows the result of running this input through the medline_xml indexing pipeline: I’ve scrolled down to display the input to the Solr Indexer, which consists of three documents, named doc1#0 through doc1#2. Indexing From the Fusion Admin UI, I return to the Collections panel for the collection named medsamp2015. As before, I create a web datasource called medsamp2015xml_v2 and point it at the MEDLINE 2015 sample data file, taking care to specify the medline_xml pipeline in the “Pipeline ID” input box. One input was processed and the index now contains 165 documents. I have managed to index fresh content! Search and Results As a first test, I do a keyword search on the word “gene”. This search returns no documents. I do a keyword search on “*”. This search returns 165 documents, the very first of which contains the word “gene” in both the article title and the article abstract. Again, the problem lies with the pipeline I’m using. The default query pipeline doesn’t search the fields “article-title_txt”, “article-abstract_txt”, or “mesh-heading_txt”. Tuning the search query parameters is done with Query Pipeline control panel. After changing the set of search fields and return fields in the “medsamp2015-default” to include these fields, I run a few more test queries. Now a search on “gene” returns 11 results and returns only the relevant fields. In conclusion, I’ve managed use Fusion Admin UI to search and index my data. I didn’t get the Enterprise up to warp speed. Maybe next week. In the meantime, I’ve learned a lot and I hope that you have too. *dare to learn new things, dare to be a noob

The post Noob* Notes: Fusion First Look appeared first on Lucidworks.

Roy Tennant: One Format to Rule Them All

planet code4lib - Mon, 2014-12-08 18:46

One Format to rule them all, One Format to find them; One Format to bring them all and in the darkness bind them. – with apologies to J.R.R. Tolkien

It is now over 12 years since I wrote “MARC Must Die” in Library Journal. At the time that I wrote it, I think that I imagined a much redesigned metadata format expressed in XML. But it didn’t take long for me to realize the error of my ways. Not that we didn’t need to do something, but I was wrong to think that it required replacing. What it really required, I soon realized, was for us to not rely upon it solely. And that is a point that I feel has become lost in our discussions about our bibliographic future.

Here is how I put it in a follow-up piece in Library Hi Tech titled “A Bibliographic Infrastructure for the Twenty-First Century”:

What I am suggesting [in this article] is different in scope and structure than is implied by my “MARC Must Die” column in Library Journal, although I alluded to it in the follow-up “MARC Exit Strategies” column. What must die is not MARC and AACR2 specifically, despite their clear problems, but our exclusive reliance upon those components as the only requirements for library metadata. If for no other reason than easy migration, we must create an infrastructure that can deal with MARC (although the MARC elements may be encoded in XML rather than MARC codes) with equal facility as it deals with many other metadata standards. We must, in other words, assimilate MARC into a broader, richer, more diverse set of tools, standards, and protocols. The purpose of this article is to advance the discussion of such a possibility.

I went on to explain a number of characteristics that I felt our bibliographic infrastructure should support as well as a fairly specific proposal on implementation. However, despite being awarded for being the best article to appear in Library Hi Tech for that year, that salvo basically landed on deaf ears.

And now we are here.

“Here”, being, of course, that the Library of Congress is developing a new format.

I parse a lot of data. I even fancy myself to be a Data Geek. After all of the data processing I’ve done I’ve come to realize that there are really only three things I care about in terms of metadata: parseabilitygranularity, and consistency. Pretty much everything else can be dealt with. You call your author field “creator”? Fine and dandy. You record dates as MM/DD/YYYY? I can deal with that. So long as your metadata is:

  • Parseable. Separate fields must be delimited in some way. It doesn’t need to be XML, it can be a JSON array or a pipe symbol (“|”) or even, in many cases, a tab (I process a lot of tab-delimited text files that are saved out of Excel, for example). But there must be some way of determining via software what has been kept separate.
  • Granular. If I need first names separate from family names I want them in separate fields. Trying to break apart elements you need to be separate can be difficult, especially if the data is inconsistent. Oh, and by the way, punctuation (even ISBD punctuation) doesn’t count.
  • Consistent. When processing data, inconsistency can cause a lot of problems. Even if a mistake is made, it’s best to make it consistently so the person processing it can treat all records the same. What is difficult is having to accommodate a wide variety of edge cases.

That’s really all I care about, since it is very unlikely that every library will create their own format. No, we are herd animals, so we will gather around a very small number of formats, and perhaps only one. After all, that is all we have ever known.

 

Photo by David Fulmer, Creative Commons License Attribution 2.0 Generic.

Library of Congress: The Signal: Preserving Carnegie Hall’s Born-Digital Assets: An NDSR Project Update

planet code4lib - Mon, 2014-12-08 14:02

The following is a guest post by Shira Peltzman, National Digital Stewardship Resident at Carnegie Hall in New York City.

The author inside the Isaac Stern Auditorium. Photo by Gino Francesconi.

As the National Digital Stewardship Resident placed at Carnegie Hall, I have been tasked with creating and implementing policies, procedures and best practices for the preservation of our born-digital assets. Carnegie Hall produces a staggering quantity of born-digital content every year: live concert webcasts; videos of professional development and educator workshops; artist interviews; promotional videos for festivals and performances taking place at the Hall; workshops and masterclasses; and all print media produced for the Hall, including infographics, programs, and annual reports, to cite just a few examples. A sampling of this material can be found on Carnegie Hall’s blog, which averages 400 posts per year.

The first phase of my project–which I’m on track to wrap up by mid-December–has been largely focused on developing a detailed understanding of how the organization’s born-digital assets are created, stored and used. To do this, I have spent the past couple months conducting interviews with staff across a wide range of departments at Carnegie Hall that use or produce digital content.

The interview process is fundamentally important to my project because it lays the groundwork for all of my NDSR project deliverables. These include: establishing selection and acquisition criteria for the preservation of born-digital assets; developing and streamlining production workflows; and writing a digital preservation and sustainability policy document. Beyond helping me evaluate the current workflows and digital usage within the organization, I’ve found that conducting these interviews has also helped me settle into my new work environment.  Having the opportunity to talk to so many different people within the organization has allowed me to meet many more of my coworkers than I might otherwise cross paths with in the course of a normal work week, and it’s also helped me to better understand how each department fits into the ‘bigger picture’ at Carnegie Hall.

Each interview takes place in two halves: during the first half of the interview I ask questions that are designed to help me understand precisely how digital assets are created and used within each department. This usually involves asking the people I’m interviewing to walk me through the production process of the digital content they are responsible for creating. I do this so that I can take note of things like the hardware and software they use; whether or not a final version is likely to have many associated versions or drafts; and how likely it is that the audio, video or print media they create will eventually be re-purposed or re-used in the future. This information helps paint a detailed picture of how every department operates. It also allows me to recognize what assets matter most to each department, which in turn will help me establish selection and acquisition criteria further down the road.

During the interview I make it a point to not only understand the current workflows involved in creating and using digital assets, but also to gather information about how these workflows might be improved. This is important because as part of the Digital Archives Project, the Carnegie Hall Archives is in the process of configuring and implementing a new Digital Asset Management System, and the feedback I receive during the interview process will help us streamline the process of ingesting material into the DAMS.

The second half of the interview typically takes place after I’ve had a chance to write up a summary of the initial discussion, and is much more ‘hands-on.’ An important aspect of my project is to create a detailed inventory of Carnegie Hall’s born-digital assets, and so the purpose of this half of the interview is to gather all extant hard drives, thumb drives, optical media, etc. that contain digital assets and review their contents. This involves reviewing the contents of digital storage media from both internal departments and, occasionally, from external contractors of Carnegie Hall as well. My goal is to uncover unknown and overlooked digital assets that should ultimately end up in the DAMS.

A screenshot of Carnegie Hall’s website featuring content created for the UBUNTU: Music and Arts of South Africa festival, which took place from Oct. 8 to Nov. 5 and celebrated twenty years of freedom in South Africa. The videos showcased on this website are exemplary of the born-digital content that Carnegie Hall creates.

Right now the largest task I’m faced with completing before the end of the year is going to be the inventory of born-digital assets. This will be a complex process because not only will I have to account for the assets stored on extant media that I track down throughout the office, but I will also need to create a comprehensive inventory of the assets stored within every departmental file directory. So far, the Digital Record Object Identification (DROID), which is the UK National Archive’s file format profiling tool, has been helpful for this task.

During my downtime between interviews I work on any number of smaller projects that are also part of my NDSR project deliverables. These include creating a document that provides Carnegie Hall with recommendations on how to improve file naming, writing a disaster preparedness plan and revising the Archives’ mission statement so that it includes a remit specifically for digital preservation.

So far one of my biggest takeaways from the project has been the importance of engaging both media creators and Carnegie Hall staff at large in the preservation process. Having their input has been essential because not only do they have a much more intimate knowledge of the different ways that material within the organization is created, used and stored, but they also collectively possess a substantial institutional knowledge that has helped guide my project throughout. Another benefit of this collaboration has been that it has bolstered buy-in for the DAMS, and has helped create a greater level of awareness about preservation among staff.

The first several months of my project feel like they’ve flown by. There are days when I reflect on what I’ve accomplished in just under three months’ time and feel proud of my progress, and then there are other days when I’m humbled by how much there is still left to do. But overall, the project has been one of the greatest learning experiences I could have hoped for–and there’s still six months left to go.

ACRL TechConnect: This is How I Work (Lauren Magnuson)

planet code4lib - Mon, 2014-12-08 14:00

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.

Lauren Magnuson, @lpmagnuson

Location: Los Angeles, CA

Current Gig:

Systems & Emerging Technologies Librarian, California State University Northridge (full-time)

Development Coordinator, Private Academic Library Network of Indiana (PALNI) Consortium (part-time, ~10/hrs week)

Current Mobile Device: iPhone 4.  I recently had a chance to upgrade from an old slightly broken iPhone 4, so I got….another iPhone4.  I pretty much only use my phone for email and texting (and rarely, phone calls), so even an old iPhone is kind of overkill for me.

Current Computer:

  • Work:  work-supplied HP Z200 Desktop, Windows 7, dual monitors
  • Home: (for my part-time gig): Macbook Air 11”

Current Tablet: iPad 2, work-issued, never used

One word that best describes how you work: relentlessly

What apps/software/tools can’t you live without?

  • Klok – This is time-tracking software that allows you to ‘clock-in’ when working on a project.  I use it primarily to track time spent working my part-time gig.  My part-time gig is hourly, so I need to track all the time I spend working that job.  Because I love the work I do for that job, I also need to make sure I work enough hours at my full-time job.  Klok allows me to track hours for both and generate Excel timesheets for billing.  I use the free version, but the pro version looks pretty cool as well.
  • Trello – I use this for the same reasons everyone else does – it’s wonderfully simple but does exactly what I need to do.  People often drop by my office to describe a problem to me, and unless I make a Trello card for it, the details of what needs to be done can get lost.  I also publish my CSUN Trello board publically and link it from my email signature.
  • Google Calendar - I stopped using Outlook for my primary job and throw everything into Google Calendar now.  I also dig Google Calendar’s new feature that integrates with Gmail so that hotel reservations and flights are automatically added to your Google Calendar.
  • MAMP/XAMPP – I used to only do development work on my Macbook Air with MAMP and Terminal, which meant I carted it around everywhere – resulting in a lot of wear and tear.  I’ve stopped doing that and invested some time in in setting up a development environment with XAMPP and code libraries on my Windows desktop.  Obviously I then push everything to remote git repositories so that I can pull code from either machine to work on it whether I’m at home or at work.
  • Git (especially Git Shell, which comes with Git for Windows) – I was initially intimidated about learning git – it definitely takes some trial and error to get used to the commands and how fetching/pulling/forking/merging all work together.  But I’m really glad I took the time to get comfortable with it.  I use both GitHub (for code that actually works and is shared publically) and BitBucket (for hacky stuff that doesn’t work yet and needs to be in a private repo).
  • Oxygen XML Editor – I don’t always work with XML/XSLT, but when I have to, Oxygen makes it (almost) enjoyable.
  • YouMail – This is a mobile app that, in the free version, sends you an email every time you have a voicemail or missed call on your phone.  At work, my phone is usually buried in the nether-regions of of my bag, and I usually keep it on silent, so I probably won’t be answering my mobile at work.  YouMail allows me to not worry where my phone is or if I’m missing any calls.  (There is a Pro version that transcribes your voicemail that I do not pay for, but seems like it might be cool if you need that kind of thing).
  • Infinite Storm – It rarely rains in southern California.  Sometimes you just need some weather to get through the day.  This mobile app makes rain and thunder sounds.

Physical:

  • Post It notes (though I’m trying to break this habit)
  • Basic Logitech headset for webinars / Google hangouts.  I definitely welcome suggestions for a headset that is more comfortable – the one I have weirdly crushes my ears.
  • A white board I use to track information literacy sessions that I teach

What’s your workspace like?

I’m on the fourth floor of the Oviatt Library at CSUN, which is a pretty awesome building.  Fun fact:  the library building was the shooting location for Star Fleet Academy scenes in JJ Abrams’ 2009 Star Trek movie, (but I guess it got destroyed by Romulans because they have a different Academy in Into Darkness):

My office has one of the very few windows available in the building, which I’m ambivalent about.  I truly prefer working in a cave-like environment with only the warm glow of my computer screen illuminating the space, but I also do enjoy the sunshine.

I have nothing on my walls and keep few personal effects in my office – I try to keep things as minimal as possible.  One thing I do have though is my TARDIS fridge, which I keep well-stocked with caffeinated beverages (yes, it does make the whoosh-whoosh sound, and I think it is actually bigger on the inside).

I am a fan of productivity desktop wallpapers – I’m using these right now, which help peripherally see how much time has elapsed when I’m really in the zone.

When I work from home, I mostly work from my living room couch.

What’s your best time saving trick  When I find I don’t know how to do (like when I recently had to wrangle my head around Fedora Commons content models, or learning Ruby on Rails for Hydra), I assign myself some ‘homework’ to read about it later rather than trying to learn the new thing during working hours.  This helps me avoid getting lost in a black hole of Stack Overflow for several hours a day.

What’s your favorite to do list manager Trello

Besides your phone and computer, what gadget can’t you live without?

Mr. Coffee programmable coffee maker

What everyday thing are you better at than everyone else? Troubleshooting

What are you currently reading?  I listen to audiobooks I download from LAPL (Thanks, LAPL!), and I particularly like British mystery series.  To be honest, I kind of tune them out when I listen to them at work, but they keep the part of my brain that likes to be distracted occupied.

In print, I’m currently reading:

What do you listen to while at work?  Mostly EDM now, which is pretty motivating and helps me zone in on whatever I’m working on.  My favorite Spotify station is mostly Deadmau5.

Are you more of an introvert or an extrovert? Introvert

What’s your sleep routine like?  I love sleep.  It is my hobby.  Usually I sleep from around 11 PM to 7 AM; but my ideal would be sleeping between like 9 PM and 9 AM.  Obviously that would be impractical.

Fill in the blank: I’d love to see _________ answer these same questions.  David Walker @ the CSU Chancellor’s Office

What’s the best advice you’ve ever received? 

Applies equally to using the Force and programming.

Eric Hellman: Stop Making Web Surveillance Bugs by Mistake!

planet code4lib - Mon, 2014-12-08 03:59
Since I've been writing about library websites that leak privacy, I figured it would be a good idea to do an audit of Unglue.it to make sure it wasn't leaking privacy in ways I wasn't aware of. I knew that some pages leak some privacy via referer headers to Google, to Twitter, and to Facebook, but we force HTTPS and make sure that user accounts can be pseudonyms. We try not to use any services that push ids for advertising networks. (Facebook "Like" button, I'm looking at you!)
I've worried about using static assets loaded from third party sites. For example, we load jQuery from https://ajax.googleapis.com (it's likely to be cached, and should load faster) and Font Awesome from https://netdna.bootstrapcdn.com (ditto). I've verified that these services don't set any cookies and allow caching, which makes it unlikely that they could be used for surveillance of unglue.it users.

It turned out that my worst privacy leakage was to Creative Commons! I'd been using the button images for the various licenses served from https://i.creativecommons.org/ I was surprised to see that id cookies were being sent in the request for these images.
In theory, the folks at Creative Commons could track the usage for any CC-licensed resource that loaded button images from Creative Commons! And it could have been worse. If I had used the HTTP version of the images, anyone in the network between me and Creative Commons would be able to track what I was reading!

Now, to be clear, Creative Commons is NOT tracking anyone. The reason my browser is sending id cookies along with button image requests is that the Creative Commons website uses Google Analytics, and Google Analytics sets a domain-wide id cookie. Google Analytics doesn't see any of this traffic- it doesn't have access to server logs. But without anyone intending it, the combination of Creative Commons, Google Analytics, and websites like mine that want to promote use of Creative Commons have conspired to build a network of web surveillance bugs BY MISTAKE.

When I inquired about this to Creative Commons, I found out they were way ahead of the issue. They've put in redirects to HTTPS version of their button images. This doesn't plug any privacy leakage, but it discourages people from using the privacy spewing HTTP versions. In addition, they'd already started to process of moving static assets like button images to a special-purpose domain. The use of this domain,  licensebuttons.net, will ensure that id cookies aren't sent and nobody could use them for surveillance.

If you care about user privacy and you have a website, here's what you should do:
  1. Avoid loading images and other assets from 3rd party sites. consider self-hosting these.
  2. When you use 3rd party hosted assets, use HTTPS references only!
  3. Avoid loading static assets from domains that use Google Analytics and set id domain cookies.
For Creative Common license buttons, use the buttons from licensebuttons.net. If you use the Creative Commons license chooser, replace "i.creativecommons.org" in the code it makes for you with "licensebuttons.net". This will help the web respect user privacy. The buttons will also load faster, because the "i.creativecommons.org" requests will get redirected there anyway.

Library of Congress: The Signal: Personal Digital Archiving 2015 in NYC — “Call for Papers” Deadline Approaching

planet code4lib - Fri, 2014-12-05 14:17

New-york-city by irot2 on deviantart.com.

The Personal Digital Archiving Conference 2015 will take place in New York City for the first time. The conference will be hosted by our NDIIPP and NDSA partners at New York University’s Moving Image Archiving and Preservation program April 24-26, 2015. Presentation submissions for Personal Digital Archiving are due Monday, December 8th, 2014 by 11:59 pm EST.

This year’s conference will differ slightly from the Personal Digital Archiving Conferences of previous years (see listings below). There will be two full days of presentations focused on a set of themes; a third day will be set aside for workshops covering useful digital tools.

The conference program committee seeks proposals for:
– ten- to twenty-minute presentations
– five-minute lightning talks
– posters (including demos)
– workshops, particularly those emphasizing software tools (to take place on the third day).

The program committee will try to cluster shorter presentations into panels and encourage discussion among the panelists. For the day of workshops, they are seeking hands-­on learning focused on useful digital tools. They anticipate four half-­day workshops, with two in the morning and two in the afternoon.

Personal Digital Archiving 2015 invites proposals on the full range of topics relevant to personal digital archiving, with particular interest in papers and presentations around community groups, activist groups and their use of digital media, as well as personal/familial collections and home­-brewed digital solutions.

Presentations might address challenges, such as:
– Ubiquitous recording devices – such as cell phones — for videos and photos
– Action cameras (such as GoPro)
– Cloud storage
– Social media (Vine, Instagram, Twitter, Facebook, blogs etc.)
– Email
– Open­-source, low-cost digital tools
– Tracking and sharing personal health data
– Community outreach and economic models from an organizational perspective
– Security and issues of access, encryption, reliability and safety
– Archival and library issues associated with collection, appraisal, ingest and description
– Migration of content from obsolete or outdated storage media.

Submissions should include:
– The title of the presentation;
– For 10 to ­20­ minute presentations, a 300 ­word abstract;
– For lightning talks and posters, a 150 to ­300 word abstract;
– For workshop proposals, a 150 to ­300 word curriculum overview, including approximate number of hours needed, what tools will be taught, and computing infrastructure requirements;
– For panel proposals, a 150­ to 300 word overview of the topic and suggestions for additional presenters;
– A brief biographical sketch or CV (no more than 2 pages).

Submit your conference proposals to http://bit.ly/1t8x3Td.

For more information on previous PDA conferences, please visit:

Registration, program, housing and other information will be posted in early 2015. For further information, email personaldigitalarchiving [at] gmail.com.

Personal Digital Archiving 2015 is co­-sponsored by NYU’s Moving Image Archiving & Preservation program, the NYU Libraries and the Coalition for Networked Information.

LITA: Building a Small Web Portfolio

planet code4lib - Fri, 2014-12-05 14:00

Since my undergraduate commencement in May, I have been itching to create my own personal portfolio website. I wanted to curate my own space devoted to my curriculum vitae, projects, and thoughts moving through my education and career. This was for my own organization, but also for colleagues to view my work in an environment I envisioned.

I began looking at sites belonging to mentors, students, and other professionals, noticing that each site fit the person and their accomplishments. Then, I began to wonder, which design fits me? Which platform fits me? If I’m terrible at any sort of creative design, how will I design my own website?

I found clarity when a fellow library student shared some advice: it is never right the first time. Get it functional, get that first iteration out of the way, and improve from there.

Choosing a Platform

With web design becoming increasingly accessible, many services have popped up offering to help users create a website. By no means is this list exhaustive, but here are a few that I discovered and considered, ranging from least coding required to most:

  • Wix Wix is a free platform that offers customizable website templates built on HTML5. Once you choose a template, you can click the boxes to add text and click/drag text boxes and images around to change the layout. This platform was useful as a first step in seeing my content laid out on a webpage without having to write code.
  • WordPress and Squarespace These two platforms triple as portfolios, blogs, and content management systems. Both allow customizable templates, hosting space, and a unique domain name. Since they are content management systems, you must learn to use their interface and coding may be required if you want to customize beyond the readily customizable features.
  • Jekyll (using Git) and Bootstrap Jekyll and Bootstrap are frameworks for developing your own HTML- and CSS-based websites. Instead of placing your content into a text box, you actually dive into the HTML files to write your context. These platforms come with templates to get you started, but do require outside coding and system knowledge, such as Git and Ruby for Jekyll. For two great examples visit the pages of Ryan Randall, an ILS graduate assistant and all around culture scholar at IU, and Cecily Walker, a self-titled rabble-rousing librarian residing in Vancouver, BC.
  • Adobe Dreamweaver and Adobe Muse using HTML5 These final two require the most HTML coding. You can find some starter templates, but it is up to you to write and design the majority of the content and website. These platforms offer the highest range of customization, but also the highest learning curve. For a dynamic example of a website built with Adobe Dreamweaver, check out Samantha Sickels‘ media and design portfolio page.

Since I have a programming background, I decided that using Wix felt like neglecting my tech skills. Since I have limited experience with HTML5 and CSS, I wasn’t ready to take on an entire portfolio website from scratch. Therefore, I went with WordPress because I could choose a designed template, but customize as needed.

Using WordPress

I used my Information Architecture skills and decided the exact content I wanted to feature. Since then, I have spent countless hours arguing with WordPress. Consistently asking my computer screen, “What do you want from me?”

Image courtesy of pixgood.com.

WordPress turned out to be less intuitive than I imagined (I shrugged off tutorials thinking it couldn’t be that difficult). It took a few tries to understand the interface with pages and menus. I also didn’t realize that different themes come with different customizable features and that they don’t include different page layouts you can choose from a simple drop-down menu. I typically found a perfect combination of clean, minimal, and functional, but with one unsatisfactory aspect. So close.

Image courtesy of imgarcade.com.

Finally, I chose a theme that worked! With some minor set backs with text fonts, I discovered how to use plug-ins. My social media buttons and accordion-style Projects page were the result of Google, willingness to explore, and my conceptual coding knowledge. Brianna Marshall helped me figure out how to set a menu item as a link. And I breathed a sigh of relief.

Tips

If you are creating a personal portfolio or even a quick WordPress site for a library project or service, I have three tips for you:

  1. Choose a random theme. Insert your content. Then, decide on a more permanent theme from there. Sometimes, seeing your own name in Comic Sans will put that theme on the “absolutely not” list.
  2. Read this article shared with me by the same student who gave me the advice above. It is geared toward IU students, with widely applicable ideas.
  3. Persist. You have a vision of your future site’s look and function, keep learning, Googling and exploring until you find out how to bring it to life. I can’t wait to play with more of the coding-heavy platforms in the future!

Find my portfolio website here, and please comment if you have any questions about web hosting and domain names, important aspects of website creation I didn’t touch on.

Then, respond! I would love to hear your thoughts about using WordPress or other platforms mentioned for different functions! What were your struggles and triumphs? Do you prefer a platform I didn’t mention?

OCLC Dev Network: Code and Camaraderie

planet code4lib - Fri, 2014-12-05 02:00

Show-and-tell time for the 11 developers who attended Developer House this week.  Look for a video of the presentations.  These are worth watching!  The teams accomplished a lot of great work.

OCLC Dev Network: Developer House Gets Inside View of Hot Projects at OCLC

planet code4lib - Fri, 2014-12-05 02:00

We spent most of today working on our projects, so thought we'd share more about the presentations we had yesterday, including OCLC's Linked Data strategy and a series of "lightning" talks by OCLC staff responsible for some of our current internal projects.

Dan Scott: Putting the "Web" back into Semantic Web in Libraries 2014

planet code4lib - Thu, 2014-12-04 21:15

I was honoured to lead a workshop and speak at this year's edition of Semantic Web in Bibliotheken (SWIB) in Bonn, Germany. It was an amazing experience; there were so many rich projects being described with obvious dividends for the users of libraries, once again the European library community fills me with hope for the future success of the semantic web.

The subject of my talk "Cataloguing for the open web with RDFa and schema.org" (slides and video recording - gulp) pivoted while I was preparing materials for the workshop. I was searching library catalogues around Bonn looking for a catalogue with persistent URIs that I could use for an example. To my surprise, catalogue after catalogue used session-based URLs; it took me quite some time before I was able to find ULB, who had hosted a VuFind front end for their catalogue. Even then, the robots.txt restricted crawling by any user agent. This reminded me rather depressingly of my findings from current "discovery layers", which entirely restrict crawling and therefore put libraries into a black hole on the web.

Thses findings in the wild are so antithetical to the basic principles of enabling discovery of web resources that, in a conference about the semantic web, I opted to spend over half of my talk making the argument that libraries need to pay attention to the old-fashioned web of documents first and foremost. The basic building blocks that I advocated were, in priority order:

  • Persistent URIs, on which everything else is built
  • Sitemaps, to facilitate discovery of your resources
  • A robots.txt file to filter portions of your website that should not be crawled (for example, search results pages)
  • RDFa, microdata, or JSON-LD only after you've sorted out the first three

Only after setting that foundation did I feel comfortable launching into my rationale for RDFa and schema.org as a tool for enabling discovery on the web: a mapping of the access points that cataloguers create to the world of HTML and aggregators. The key point for SWIB was that RDFa and schema.org can enable full RDF expressions in HTML; that is, we can, should, and must go beyond surfacing structured data to surfacing linked data through @resource attributes and schema:sameAs properties.

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. Tim Berners-Lee, Scientific American, 2001

I also argued that using RDFa to enrich the document web was, in fact, truer to Berners-Lee's 2001 definition of the semantic web, and that we should focus on enriching the document web so that both humans and machines can benefit before investing in building an entirely separate and disconnected semantic web.

I was worried that my talk would not be well received; that it would be considered obvious, or scolding, or just plain off-topic. But to my relief I received a great deal of positive feedback. And on the next day, both Eric Miller and Richard Wallis gave talks on a similar, but more refined, theme: that libraries need to do a much, much better job of enabling their resources to be found on the web--not by people who already use our catalogues, but by people who are not library users today.

There were also some requests for clarification, which I'll try to address generally here (for the benefit of anyone who wasn't able to talk with me, or who might watch the livestream in the future).

"When you said anything could be described in schema.org, did you mean we should throw out MARC and BIBFRAME and EAD?"

tldr: I intended and, not instead of!

The first question I was asked was whether there was anything that I had not been able to describe in schema.org, to which I answered "No"--especially since the work that the W3C SchemaBibEx group had done to ensure that some of the core bibliographic requirements were added to the vocabulary. It was not as coherent or full a response as I would have liked to have made; I blame the livestream camera

But combined with a part of the presentation where I countered a myth about schema.org being a very coarse vocabulary by pointing out that it actually contained 600 classes and over 800 properties, a number of the attendees interpreted one of the takeaways of my talk as suggesting that libraries should adopt schema.org as the descriptive vocabulary, and that MARC, BIBFRAME, EAD, RAD, RDA, and other approaches for describing library resources were no longer necessary.

This is not at all what I'm advocating! To expand on my response, you can describe anything in schema.org, but you might lose significant amounts of richness in your description. For example, short stories and poems would best be described in schema.org as a CreativeWork. You would have to look at the associated description or keyword properties to be able to figure out the form of the work.

What I was advocating was that you should map your rich bibliographic description into corresponding schema.org classes and properties in RDFa at the time you generate the HTML representation of that resource and its associated entities. So your poem might be represented as a CreativeWork, with a name, author, description, keywords, and about values and relationships. Ideally, the author will include at least one link (either via sameAs, url, or @resource) to an entity on the web; and you could do the same with about if you are using a controlled vocabulary.

If you take that approach, then you can serve up schema.org descriptions of works in HTML that most web-oriented clients will understand (such as search engines) and provide basic access points such as name / author / keywords, while retaining and maintaining the full richness of the underlying bibliographic description--and potentially providing access to that, too, as part of the embedded RDFa, via content negotiation, or <link rel="">, for clients that can interpret richer formats.

"What makes you think Google will want to surface library holdings in search results?"

There is a perception that Google and other search engines just want to sell ads, or their own products (such as Google Books). While Google certainly does want to sell ads and products, they also want to be the most useful tool for satisfying users' information needs--possibly so they can learn more about those users and put more effective ads in front of them--but nonetheless, the motivation is there.

Imagine marking up your resources with the Product / Offer portion of schema.org you are able to provide search engines with availability information in the same way that Best Buy, AbeBooks, and other online retailers do (as Evergreen, Koha, and VuFind already do). That makes it much easier for the search engines to use everything they may know about their users, such as their current location, their institutional affiliations, their typical commuting patterns, their reading and research preferences... to provide a link to a library's electronic or print copy of a given resource in a knowledge graph box as one of the possible ways of satisfying that person's information needs.

We don't see it happening with libraries running Evergreen, Koha, and VuFind yet, realistically because the open source library systems don't have enough penetration to make it worth a search engine's effort to add that to their set of possible sources. However, if we as an industry make a concerted effort to implement this as a standard part of crawlable catalogue or discovery record detail pages, then it wouldn't surprise me in the least to see such suggestions start to appear. The best proof that we have that Google, at least, is interested in supporting discovery of library resources is the continued investment in Google Scholar.

And as I argued during my talk, even if the search engines never add direct links to library resources from search results or knowledge graph sidebars, having a reasonably simple standard like the GoodRelations product / offer pattern for resource availability enables new web-based approaches for building appplications. One example could be a fulfillment system that uses sitemaps to intelligently crawl all of its participating libraries, normalizes the item request to a work URI, and checks availability by parsing the offers at the corresponding URIs.

District Dispatch: ALA seeks nominations for 2015 James Madison awards

planet code4lib - Thu, 2014-12-04 19:08

The American Library Association’s (ALA) Washington Office is calling for nominations for two awards to honor individuals or groups who have championed, protected and promoted public access to government information and the public’s right to know.

The James Madison Award, named in honor of President James Madison, was established in 1986 to celebrate an individual or group who has brought awareness to these issues at the national level. Madison is widely regarded as the Father of the Constitution and as the foremost advocate for openness in government.

The Eileen Cooke Award honors an extraordinary leader who has built local grassroots awareness of the importance of access to information. Cooke, former director of the ALA Washington Office, was a tireless advocate for the public’s right to know and a mentor to many librarians and trustees.

Both awards are presented during Freedom of Information (FOI) Day, an annual event on or near March 16, Madison’s birthday.

Nominations should be submitted to the ALA Washington Office no later than January 16, 2015. Submissions should include a statement (maximum one page) about the nominee’s contribution to public access to government information, why it merits the award and one seconding letter. Please include a brief biography and contact information for the nominee.

Send e-mail nominations to Jessica McGilvray, Assistant Director for the ALA Office of Government Relations, at jmcgilvray@alawash.org. Submissions can also be mailed to:

James Madison Award / Eileen Cooke Award
American Library Association
Washington Office
1615 New Hampshire Avenue, NW
Washington, D.C. 20009-2520

The post ALA seeks nominations for 2015 James Madison awards appeared first on District Dispatch.

LITA: Don’t Miss the OpenStreetMaps Webinar

planet code4lib - Thu, 2014-12-04 17:32

Before Hackforge’s Mita Williams Masters session on new spaces at the ALA 2015 Midwinter Meeting, you can attend her next LITA webinar, part of the “Re-drawing the Map”–a webinar series:

OpenStreetMaps: Trust the map that anyone can change

Tuesday December 9, 2014
1:00 pm – 2:00 pm Central Time
Instructor: Mita Williams
Register for this webinar

Ever had a map send you the wrong way and wished you could change it? Learn how to add your local knowledge to the “Wikipedia of Maps.”

It’s been said that “the map is not the territory”. But when when the most of the world’s websites and mobile apps rely on maps from private corporations who selectively show you places based on who you are (and who pays for the privilege), perhaps we should cede that territory for higher ground. It’s counter-intuitive to trust a map that anyone can edit, but OpenStreetMap is already the geospatial foundation of some of the world’s most popular sites including Pinterest, Evernote, and github. This session will introduce you to OpenStreetMap and show you how you can both contribute to and make use of the “Wikipedia of Maps”.

Full details

Can’t make the date but still want to join in? Registered participants will have access to the recorded webinar.

Cost:

  • LITA Member: $39
  • Non-Member: $99
  • Group: $190

Registration Information:

Register Online page arranged by session date (login required)

OR

Mail or fax form to ALA Registration
OR call 1-800-545-2433 and press 5
OR email registration@ala.org

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty, mbeatty@ala.org.

Library of Congress: The Signal: New FADGI Report: Creating and Archiving Born Digital Video

planet code4lib - Thu, 2014-12-04 16:20

As part of a larger effort to explore file formats, the Born Digital Video subgroup of the Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce the release of a new four-part report, “Creating and Archiving Born Digital Video.”

This report has already undergone review by FADGI members and invited colleagues including the IASA Technical Committee. With this release, we seek comments and feedback from all interested parties.

The report is the result of over 18 months of collaborative effort from a range of federal agencies including the Smithsonian Institution Archives as well as the Smithsonian Institution Office of the CIO, National Archives and Records Administration, National Oceanic and Atmospheric Administration, Voice of America, and several Library of Congress units including the American Folklife Center, the Web Archiving team and the Packard Campus for Audio-Visual Conservation.

The four documents that comprise the “Creating and Archiving Born Digital Video” report provide practical technical information for both file creators and file archivists to help them make informed decisions and understand the long-term consequences of those decisions when creating or archiving born digital video. The information is intended to serve memory institutions, especially in the U.S. federal sector. But of course we also hope that this report will serve the broader cultural heritage community who may produce and/or receive and ingest materials that range from high-end professional productions to more modest (but culturally important) grass-roots footage.

Clap On. Clap Off. photo by Chuck Olsen courtesy of Flickr.

The scope of the report is outlined in the introduction (Part I) (PDF) including background information and rationale on some of the choices made during the project. The eight case histories (Part II) (PDF) document aspects of the current state of practice in six U. S. federal agencies working with born digital video. These case histories not only describe deliverables and file specifications but also tell the story of each project, and provide background information about the institution and the collection, as well as lessons learned.

As the case histories developed, a set of high level recommended practices (Part III) (PDF) emerged from the collective project experiences. Drafting a uniform guideline or a cookbook felt premature at this point so these recommended practices are intended to support informed decision-making and guide file creators and archivists as they seek out workflows, file characteristics and other practices that will yield files with the greatest preservation potential.

Finally, the annotated resource guide (Part IV) (PDF) provides links to useful documentation, including reports, guidelines, software applications and other technical tools. Many of these resources are referenced in the “Case Histories” and “Recommended Practices” documents.

The report covers both the perspective of the archive that is receiving born digital video and seeks to preserve it for the long term (a group we call “file archivists”) and the perspective of the organization that oversees production (termed “file creators”). In many cases the “file creator” organization is itself an archive. Thus one of the goals of this report is to encourage dialog between stakeholders involved in creating born digital video files and those responsible for protecting the legacy of those files. Dialog between producers and archivists is essential to sustainability and interoperability of born digital video; this report aims to broach that topic in earnest by looking at thoughtful approaches and helpful practices.

IASA Technical Committee meeting. Photo courtesy of Carl Fleischhauer.

The goal of the three “Creating Born Digital Video” case histories, which we summarized as “start with nothing; end up with ingest-ready video,” is to encourage a thoughtful approach from the very beginning of the video production project, before even shooting the video, which takes the “long tail” perspective of preservation, use and reuse into account. These case histories illustrate the advantages of starting with high-quality data capture from the very start because choices made during the file creation process will have impacts on the long-term archiving and distribution processes.

The five “Archiving Born Digital Video” case histories tell the story of bringing the born digital video files into managed data repositories for long-term retention and access. Our shorthand for this group is “identify what you have and understand what you need to do to it.”

These case histories explore the issues which emerge when the born digital video objects arrive at the archive. They cover topics including the challenges of dealing with diverse formats, understanding and documenting relationships among the video files and related objects, and metadata. A major topic for this case history set is the technical characteristics of file formats: how to identify and document the formats coming into the archive, when changes to the file attributes are needed, and what are the impact of changes to the format and encoding.

It bears mentioning that as this report was being compiled, the Library of Congress received the “Preserving Write-Once DVDs: Producing Disk Images, Extracting Content, and Addressing Flaws and Errors” (PDF) report from George Blood Audio/Video. The report was one product of a contract with GBAV in which the company converted a set of write-once DVDs for the Library. The report describes the issues encountered and provides some detail about GBAV’s methods for carrying out the work, thus providing a complement to the DVD section of the “Creating and Archiving Born Digital Video,” drafted by the Smithsonian Institution Archives.

Burned DVD by Roman Soto courtesy of Flickr.

The case histories (PDF) report includes summary tables of the file characteristics of the case history projects, one for “Creating Born Digital Video” projects and a separate one for the “Archiving Born Digital Video” projects. These two tables are interesting because they hint at the trends for the “right now solutions.” This is how some institutions are working today – using what they have to do what they can. It will be very interesting to see how this changes over time as practices advance and mature.

The recommended practices (PDF) are organized into three categories:

  • Advice for File Creators, also known as “advice for shooters,” focuses on providing video content producers, including videographers and, by extension, the project managers within cultural heritage institutions who are responsible for the creation of new born digital video, with a set of practices that emphasize the benefits of aiming for high quality and planning for archival repository ingest from the point of file creation.
  • Advice for File Archivists seeks to provide guidance about video-specific issues which come into play when ingesting the files into a managed storage repository.
  • Advice for File Creators and File Archivists are grouped together because they transcend specific lifecycle points. This guidance focuses on selecting sustainable encodings and wrappers whether at initial file creation or during normalization.

As mentioned in a previous blog post, the use, or more accurately the lack of use, of uncompressed video encodings is one marked example of how the case history projects deviate from the Recommended Practices. Quite simply, we didn’t follow our own advice. All five case history project which specified encodings used compression. And of the five case history projects that implement compression, only one (The Library of Congress’s Packard Campus) implements mathematically lossless compression. The remaining four use various forms of lossy compression, including visually lossless, and all for good reasons. The specific goals of the case history projects necessitated different decisions in order to meet business needs – in this case, the need for smaller files and/or systems-specific compressed formats outweighed the benefits of uncompressed video.

Let’s start the dialog now! We welcome comments and feedback through the FADGI page or direct email to this writer from the interested public on the “Creating and Archiving Born Digital Video” report through the end of January 2015, after which we will review them and publish a “final” version early in the new year. Of course, comments received after our closing are equally welcome although they may have to wait until a planned revision to be addressed. We look forward to hearing from you.

David Rosenthal: A Note of Thanks

planet code4lib - Thu, 2014-12-04 16:00
I have a top-of-the-line MacBook Air, which is truly a work of art, but I discovered fairly quickly that subjecting a machine that cost almost $2000 to the vicissitudes of today's travel is worrying. So for years now the machine I've travelled with is a netbook, an Asus Seashell 1005PE. It is small, light, has almost all-day battery life and runs Ubuntu just fine. It cost me about $250, and with both full-disk encryption and an encrypted home directory, I just don't care if it gets lost, broken or seized.

But at last the signs of the hard life of a travelling laptop are showing. I looked around for a replacement and settled on the Acer C720 Chromebook. This cost me $387 including tax and same-day delivery from Amazon. Actually, same-day isn't accurate. It took less than 9 hours from order to arrival! If I'd waited until Black Friday to order it would have been more than $40 cheaper.

For that price, the specification is amazing:
  • 1.7GHz 4-core Intel Core i3
  • 4GB RAM
  • 32GB SSD
  • 11.6" 1366x768 screen
Thanks to these basic instructions from Jack Wallen and the fine work of HugeGreenBug in assembling a version of Ubuntu for the C720, 24 hours after ordering I had a light, thin, powerful laptop with a great display running a full 64-bit installation of Ubuntu 14.0.4. I'm really grateful to everyone who contributed to getting Linux running on Chromebooks in general and on the C720 in particular. Open source is wonderful.

Of course, there are some negatives. The bigger screen is great, but it makes the machine about an inch bigger in width and depth. Like the Seashell and unlike full-size laptops, it will be usable in economy seats on the plane even if the passenger in front reclines their seat. But it'll be harder than it was with the Seashell to claim that the computer and the drink can co-exist on the economy seat-back table.

Below the fold, some details for anyone who wants to follow in my footsteps.

Jack Wallen's instructions for creating a recovery disk didn't work. After I had updated the Chrome OS, I discovered there's an app to create a recovery disk that requires the updated OS, which worked perfectly.

My attempt to install ChrUbuntu from Jack's instructions failed with a long string of errors as the install script tried to patch some files. I then installed Bodhi Linux from his instructions which worked fine except for the part about enabling legacy boot via crosh. I had to follow his ChrUbuntu instructions to log in as cronos before enabling legacy boot. You can see whether what you did worked by executing crossystem - with no arguments it dumps all the settable parameters.

I'm used to Ubuntu, so some Googling I should have done before installing Bodhi Linux led me to HugeGreenBug's instructions for installing it, which worked like a charm.

The 32GB of SSD is not a lot of space. I added a 64GB SD card, but there is a
problem. When inserted in the SD slot the card is only half inside the machine, so it is vulnerable and has to be removed when its being carried.

The 24 hours didn't include transferring all my customizations from the Seashell, but I don't expect any trouble doing that when I get to it shortly before my next trip.


DPLA: VT Community Rep introduces DPLA to Veterans at Disability Awareness event

planet code4lib - Thu, 2014-12-04 14:30

As a Community Rep for Vermont, I introduced Digital Public Library of America (DPLA) to employees of the Veterans Affairs Medical Center (VAMC) in White River Junction, Vermont, during its Disability Employment Awareness event on October 16, 2014. Drawing upon my law degree (Vermont Law School) and Masters in Library and Information Science (Simmons College), I have been volunteering at VAMC this year in order to contribute toward programs that benefit current and retired members of the U.S. Armed Forces and their families. Bringing DPLA to veterans and civilians with disabilities was my first effort as the Community Rep to bridge digital divides among under-represented populations. The event’s motto “Expect. Employ. Empower.” was about creating a society of inclusion, thus it seemed to be a perfect fit for DPLA. Ten other participants were local entities that provide adaptive technologies to people with disabilities. The community programs cover healthy eating; vision, hearing, and mobility assistance; as well as outdoor and sports activities.

The DPLA Info Table was equipped with a laptop for a hands-on presentation, which attracted around thirty or so VA veterans and civilian employees, including Deborah Amdur, VAMC Director. Most attendees had never heard of DPLA before, while others were quite informed: “We are nurses, and we learn a lot at our workshops.” Still, another group, albeit the smallest, was not only aware of DPLA as a portal but also as a platform; those were IT employees.

Community Rep Natalia May demonstrates DPLA at the Veterans Affairs Medical Center in White River Junction, Vermont, during its October 2014 Disability Employment Awareness event.

Despite the various degrees of DPLA awareness, the attendees’ responses may be grouped in the following common threads:

  • Attendees recognize DPLA as a “library” (asking if the Info Table got managed by the VAMC in-house library)
  • Attendees with visual and/or audio impairments need visual/audio features to enhance DPLA content for them (asking if I could make the font large, or if a video had subtitles, etc.)
  • Attendees with limited mobility enjoy DPLA all together (commenting that discovering DPLA is akin to “a travel-free, and thus, trouble-free, visit to a neighborhood library or a museum”)
  • Attendees take DPLA swag to share (not only with their family and friends, but also with volunteer organizations they belong to: libraries, peer-to-peer veterans help groups, etc.)
  • Attendees are particular fond of the DPLA’s Timeline feature (that directly answers their needs in authentic sources on history: “This may be another way to enhance our exhibit at a historical society I belong to,” genealogy: “I bet I can find out something for my family’s genealogical tree, as I am an ancestry-buff among my siblings,” and warfare: “to match time of deployment with the country’s events”).

Introducing DPLA to disabled veterans made me realize two things. First, there is a real desire among VA populations to learn more about DPLA and other online resources. The Info Table format allowed only a short introductory presentation, while attendees were curious to spend more time searching beyond the major DPLA tabs, such as Bookshelf (“to search public records”) and Partners. The veterans are likely to benefit from a more formal sit-down instruction with individual computer access. Second, veterans are extremely fond of volunteers, as most of them are volunteers themselves; without exception, they were full of gratitude for the fact that I had brought DPLA into their lives.

FOSS4Lib Recent Releases: Fedora Repository - 3.8.0

planet code4lib - Thu, 2014-12-04 14:21
Package: Fedora RepositoryRelease Date: Thursday, December 4, 2014

Last updated December 4, 2014. Created by Peter Murray on December 4, 2014.
Log in to edit this page.

The Fedora 3.8 release features an improved REST API interaction with correct headers returned for better caching along with performance improvements and bug fixes.

FOSS4Lib Recent Releases: Fedora Repository - 4.0

planet code4lib - Thu, 2014-12-04 14:19
Package: Fedora RepositoryRelease Date: Thursday, December 4, 2014

Last updated December 4, 2014. Created by Peter Murray on December 4, 2014.
Log in to edit this page.

The international Fedora repository community and DuraSpace are very pleased to announce the production release of Fedora 4. This significant release signals the effectiveness of an international and complex community source project in delivering a modern repository platform with features that meet or exceed current use cases in the management of institutional digital assets. Fedora 4 features include vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more.

Pages

Subscribe to code4lib aggregator