You are here

Feed aggregator

Library of Congress: The Signal: The TriCollege Libraries Consortium and Digital Content

planet code4lib - Wed, 2017-01-04 14:48

Swarthmore College. Photo by Stefanie Ramsay.

This is a guest post from Stefanie Ramsay, a Digital Collections Librarian at Swarthmore College, which is part of the TriCollege Libraries consortium.

Consortium arrangements among libraries and archives are an increasingly popular strategy for managing the large amount of digital content they produce and for providing increased access to these important materials. Luckily for us at Swarthmore College, we have long been part of the TriColleges Consortium, or TriCo, with nearby Bryn Mawr and Haverford Colleges, all located in suburban Philadelphia.

Conversations around forming a consortium began in 1935. However, a formal agreement was reached in 1988 when the three colleges invested in a joint integrated library system or ILS. The TriCo is not a formal legal entity, and the libraries are fully independent, but we share resources and personnel and we collaborate through a number of working groups and community initiatives. This structure presents opportunities for innovative projects, along with its fair share of challenges (first and foremost, just trying to scheduling a meeting).

Our consortium efforts extend beyond the scope of digital content, such as selecting the ILS, purchasing joint journal subscriptions, developing joint collection development strategies and creating new metadata schemas for implementation. Each campus also has its own library director, giving us three leaders with broad ranges and depths of experiences to mine for the benefit of the TriCo. These topics all deserve their own post but as the digital collections librarian at Swarthmore, I’d like to address our consortium’s approach to handling digital content.

I oversee projects related to our digital collections. Our projects currently include planning an oral histories program on student activism, developing workflows for acquiring born-digital student publications and making our student newspaper archives accessible online. Though my work is focused at Swarthmore, I participate in two TriCo working groups, work closely with local and TriCo IT staff and frequently partner with librarians and archivists on each campus to think through issues around digital collections and digital preservation.

Swarthmore College. Photo by Stefanie Ramsay.

I serve as part of a TriCo group focused on Digital Asset Management and Preservation, known affectionately as DAMP! (and in its second phase, DAMPer!), which reviewed the current TriCo platforms for digital content and recommended strategic directions for streamlining our approach to managing digital assets.

The group includes metadata librarians, archivists and IT staff from across the three campuses. DAMPer! recently produced a report (PDF) that offered a holistic picture of our current ecosystem, reviewed the benefits and shortcomings of our existing platforms and recommended exploring Hydra/Fedora as we look to migrate away from our six current repositories. Using this report as a guide, the group will be moving forward with the pilot testing and making plans for an eventual migration and clean-up of current systems in the next year.

DAMPer! is one example of how we work together across the TriCo, yet it’s important even within these consortial groups to consider the differing needs of our individual libraries. A few of the obstacles we face include managing the various formats housed in each college’s collections as well as working within the infrastructure of each library.

For example, Bryn Mawr’s collections are largely centered around art and artifacts and the history of women’s education; organizationally, their Library and IT departments operate as one unit. Haverford’s digital collections focus on Quakerism, college archives and scholarly communications. At Swarthmore, our Special Libraries, the Friends Historical Library and the Peace Collection do not report up from the libraries but directly to the Provost, and our collections are diverse in scope and format. Beyond collecting and providing access to these assets, there’s also the challenge of how best to store shared materials within the TriCo, considering the institutional infrastructures in place, the increase in cloud-based alternatives and preservation standards.

It is an ongoing challenge to recognize these varying approaches and resources while working towards a unified solution. As priorities, personnel and professional developments change, the group’s responsibilities have to continuously adapt. Our focus on maintaining documentation, implementing policies and guidelines and providing increased access to digital assets moves us forward to determining strategic directions.

There are the administrative challenges inherent in the TriCo arrangement to consider as well. Financial planning, strategic directions and organizational changes are some of these factors, not to mention managing the numerous working groups currently in place. It requires strong organizational and communicative abilities to keep track of who is involved in each group, what they do, what they should be doing and so on. We use collaborative tools such as Wikis, Trello and Google Apps to share and collaborate on materials relating to our work as a consortium. These tools allow for increased transparency and connections across the campuses.

Though there are struggles, the advantages are substantial. Staff, faculty and students all benefit from the expertise of library staff across all three campuses — accessing the many physical and digital materials from each of our collections — and the thoughtful use of financial resources that allows us to provide more services. A bonus perk for students is being able to take classes at any of these colleges, as well as the University of Pennsylvania.

As DAMPer! tests Hydra/Fedora and reviews storage options, and as the other TriCo projects progress, there will continue to be constructive conversations around how best to operate within the consortium while still meeting the needs of our respective communities. We are not bound as a consortium by law but by choice, and we reaffirm that choice each time we collaborate on these valuable projects.

It’s important to me and to our campuses that we not only make the most of our financial, technical and organizational resources but that we work towards creating welcoming and open spaces, physically and digitally, with shared values that contribute to our communities. This is the most significant impact of collaborating with our friends nearby.

That is, if we can ever schedule a meeting.

LibUX: Help the Digital Library Federation AIG UX Working Group identify user needs and expectations

planet code4lib - Wed, 2017-01-04 13:28

Hello Everyone!

This message is an invitation to participate in the Digital Library Federation Assessment Interest Group’s (DLF AIG) User Experience Working Group in 2017. There is more information about the User Experience (UX) Working Group below, but first, a few words about participating in DLF AIG.

The only requirement for participation in a DLF AIG working group is a willingness to dig in and devote a small part of your time contributing to the tools, methods, and body of knowledge on digital library assessment. Attendance at the DLF forum is not a requirement for participation in any of the working groups. Additionally, if your institution is not a DLF member, you can still participate.

UX Working Group – What is it?

UX Working Group is a subset of the User Studies AIG and previous named Usability Studies Guidelines and Best Practices Working Group.

Our group takes a multi-faceted approach to determining the best way to develop guidelines and best practices in user and usability studies related to digital libraries.

There are many aspects to user and usability studies, and we welcome others who would like to pursue them. Suggested areas of investigation within the scope of this group includes: identifying users and user behavior, learnability, accessibility, and usability.  As this is a tremendous task, we encourage you to join in the effort, as clarity in this area will save us all a great deal of work, resources and confusion.

UX Working Group Going Forward in 2017

During the 2016 DLF Forum many ideas, projects, and goals were discussed about where to take the working group into 2017. Many folks vocalized wanting resources, tools and instruments to identify user needs and expectations, conduct usability tests, and user outreach. At our December 2016 meeting the working group met and decided to focus 2017 efforts on identifying users and their needs and expectations.

It was decided that our 2017 work would focus on gathering digital library user personas across cultural heritage institutions and building a best practice for personas creation.

To take part or help, please add yourself to our Google Group and/or attend our next meeting on Friday, January 27th at 2 PM Eastern (meeting details are located in our Agenda and Meeting Minutes Google Doc). We meet once a month on the 2nd Friday unless otherwise noted in the meeting minutes and our DLF AIG wiki page.

How can you participate in the UX Working Group?

We want your help for deciding how to approach, conduct, and implement our project. Please join our Google Group and/or attend our next meeting on Friday, January 27th at 2 PM Eastern if you are interested. We welcome everyone with or without UX experience.

For questions about the DLF AIG UX Working Group, contact Kate Thornhill at thornhik@ohsu.edu. For questions about the parent DLF Assessment Interest Group, contact the DLF AIG co-leaders, Sara Rubinow sararubinow@nypl.org and Santi Thompson sathompson3@uh.edu.

Thank you for your interest in the DLF AIG!

Open Knowledge Foundation: Updates from our Network: Austria chapter takes strategy forward and explores new collaborations.

planet code4lib - Wed, 2017-01-04 10:40

This blog post is part of our on-going Community series featuring updates from chapters across the Open Knowledge Network and was written by the team of Open Knowledge Austria.

September was largely a transition period between the old and new board to facilitate the takeover of responsibilities. We also involved our team and the community in the strategic processes we started in August, through multiple feedback rounds and a team call.

The output is a strategic paper, outlining the goals of Open Knowledge Austria for our board period and acting as a common ground. We also had a round of meetings with current and possible partners, to introduce us and discuss future collaborations.

Image credit: Spektral Coding Workshop by murdelta

Internally, we continued working on guidelines, e.g. community management, how to set up a new project, or organize an event. The aim of these guidelines is to give members a head start in their activities. Additionally, we started to better coordinate Social Media activities between our working groups, and now regularly run campaigns emphasizing current events and activities. As board members, we also continue to learn and attend webinars on topics related to NGO-activities.

Image credit: Open Knowledge Foundation Deutschland

Now a quick review of events and projects:

  • In October and November activities got a lot more outward looking, in the ramp-up to “Jugend Hackt” in Linz and with a strong start of “Offene Wahlen” (Open Elections).
  • Jugend hackt turned out to be a great experience for all involved, with 25 young participants hacking on a wide range of projects (Flickr Album).
  • After the kick-off event at Netzpolitischer Abend in Metalab Vienna, “Offene Wahlen” – our first webcast – attracted contributors from the community who contributed to the website, participated in formulating demands, and helped running a “Beginner’s Coding Workshop” in Graz (Flickr Album).
  • The Open Science Working Group had the best-visited meet-up so far, which was held in cooperation with OpenKnowledgeMaps.
  • Unfortunately, although the Austrian City Open Data Census garnered initial interest, we did not get it launched as planned. We hope to continue working on it with the community in the next year.

Oh, and not to forget: We also printed a lot of stickers with Open Science, Open Data, and Open Knowledge!

District Dispatch: ALA Past President Sari Feldman joins SHLB board

planet code4lib - Tue, 2017-01-03 23:08

Starting January 1, ALA Immediate Past President and Executive Director, Cuyahoga County Public Library, will join the board of the Schools, Health and Libraries Broadband (SHLB) Coalition. The coalition promotes government policies and programs that enable schools, libraries, health care providers, other anchor institutions and their communities to obtain open, affordable, high-speed broadband connections to the Internet.

ALA Past President Sari Feldman’s advocacy for broadband pairs perfectly with SHLB’s goal of bringing it to anchor institutions around the country. Photo credit: tibco.com

As someone who has had the pleasure of working with Sari as an ALA staffer, as well as serving on the SHLB board, I know this connection is like chocolate and peanut butter. Most recently, Sari was a featured voice in the vision paper that set the stage for a series of policy papers, compiled together and released this year as the SHLB Broadband Action Plan.

“Libraries are transforming into community hubs for digital content creation and collaboration,” she said. “Having a high-speed and resilient Internet connection at each and every library is essential to ensuring a full range of services related to education, employment, entrepreneurship, empowerment and community engagement for all.” As the person who initiated the ALA Libraries Transform public awareness campaign, Sari knows well that high-capacity broadband powers much of our modern library services. She also was recently elected to serve as board chair-elect for DigitalC, a Cleveland-based civic tech collaboration, starting in March 2017.

SHLB’s work is vital to fulfilling a national vision of connected libraries and communities. ALA has been involved from the beginning as a founding member for the coalition when it was created in 2009 to advocate for community anchor institutions as part of the Broadband Technology Opportunities Program. SHLB formally incorporated and named its first board of directors in 2012.

This last year has been a busy one, and 2017 promises more action. A free webinar on Partnerships and Rural Broadband Needs will take place January 10, and proposals for the 2017 conference are being accepted through January 16.

Congratulations to both Sari and to SHLB for making a great connection that will serve to advance broadband for libraries and other community anchor institutions!

The post ALA Past President Sari Feldman joins SHLB board appeared first on District Dispatch.

Meredith Farkas: Holding to our values during difficult times

planet code4lib - Tue, 2017-01-03 16:52

I write a lot in my American Libraries column about library values, particularly those around access and privacy. My latest column (Jan/Feb 17), which should be out soon online just also came out this morning, is a love letter to critical librarianship in which I share my conviction (shared by many) that libraries are not neutral. I believe strongly that real neutrality means supporting the oppression of many people in this country. Libraries should not replicate the fundamental inequality of our society; instead, we should be all about providing access (to libraries and information) and protecting the privacy of all members of our communities. In an unequal society, that is not a neutral act. I do not see how that lack of neutrality is not completely consistent with our professional values.

We are going to have our values tested time and again with the incoming administration and the polarized climate we’re living and working in. Our professional values aren’t only there for when they are convenient or are consonant with our personal beliefs. As Andy Woodworth wrote in his terrific Medium post, upholding the freedom to read sometimes means providing access to things we find repugnant (like Milo Yiannopoulos’ upcoming book). Sometimes holding to our values may mean pissing off the very people who hold our purse strings, as Sarah Houghton pointed out in her passionate rebuke to ALA’s misguided press releases. I get it. I quit my first library job because I disagreed with a new policy that decreased information access for the most vulnerable members of our community (something I just wrote about for the March/April issue of American Libraries). Others have suffered far worse for standing up for these values.

I was thrilled to discover Andy and Sarah’s Operation451, which is designed to affirm and remind us of values so core to our profession, specifically the promotion of freedom of expression and access. 451 stands both for Fahrenheit 451 (the book-burning classic) and the 4th and 5th articles of the Library Bill of Rights along with the first amendment of the U.S. Constitution.

From the website:

By participating in 451, you pledge to:

  • Work towards increasing information access, especially for vulnerable populations;
  • Establish your library as a place for everyone in the community, no matter who they are;
  • Ensure and expand the right of free speech, particularly for minorities’ voices and opinions.

Sounds like what we do (or should be doing) as librarians every day. If you agree, check out their website and join their Facebook event. Personally, I wish they’d included privacy in there, because it will likely see unprecedented challenges under the Trump administration, but I can see how it didn’t fit in with the 451 theme. I would also add to that list that we should support and collaborate with organizations that share our support for individual civil liberties, freedom of the press, and equal access to information. We’re stronger together.

Supporting “library values” is a family affair for me. Many folks who read this blog may not know that my husband runs the largest online professional community for optometrists and has done so for more than 14 years. He is an expert at keeping an online community vibrant and sticky, and dealing with problematic members. He has had to balance the needs of the community with the interests of advertisers and has always been a staunch defender of free speech, even when it’s bitten him in the ass and run counter to his financial interests. While he’s not a librarian, he embodies all of our values around intellectual freedom and privacy in his work and I love him for it.

While the community mainly exists to discuss clinical work, practice management, and the like, there is also a non-professional forum where clinicians can discuss political topics, current events, etc. The other day, a community member asked to have her account deleted because she was disgusted by some of the more right-wing commenters and didn’t want to be associated with a site that allowed people to air those kinds of opinions. She also said that she would discourage other optometrists from using the site. My husband allowed me to share his response, which I’ve posted below:

I respect your opinion. But as one of the site’s administrators (and dyed-in-the-wool liberal — I helped Bernie Sanders get elected as my Senator in Vermont years ago), I think censoring people is the absolute wrong approach in this environment, and only worsens the problem.

I don’t know if you’ve heard of a concept called the “filter bubble”. This is where people online tend to retreat to spaces that make them feel comfortable, thus blocking out any alternate point of view (no matter how repellent or painful to them personally.) This is why people who listen to Fox news or Brietbart, or only listen to their Facebook ‘friends’ — live in a totally alternate reality from everyone else, where fiction somehow becomes fact.

Studies have found that one of the only spaces where any reasonable political discourse occurs online is actually in *sports forums* — amazing but true! And one of the reasons is because people come to those sites over a common issue that is apolitical (ie, their favorite team), and can speak to each other from a common interest, even if politically they are on opposites sides of the universe.

The same goes for our site — when we see racism, or information that is factually just wrong — we give it a full-throated rebuke. And explain why people are dead wrong. If things get out of control, we stop it. We’ve even banned people who repeatedly violated our code of conduct.

Removing these avenues of discussion through censorship only makes people harder to reach, and will make them retreat further into their own (possibly delusional) cliques.

Or I guess in medical terms, “sunlight is the best disinfectant”. If everyone took the approach of “I’m not going to this place because I am offended”, the segregation online only gets worse.

It is a tricky problem for sure, one with which we are forced to grapple every day.

Values aren’t always convenient to have, but without them, how do we define ourselves? How do we respect ourselves?

David Rosenthal: Travels with a Chromebook

planet code4lib - Tue, 2017-01-03 16:00
Two years ago I wrote A Note Of Thanks as I switched my disposable travel laptop from an Asus Seashell to an Acer C720 Chromebook running Linux. Two years later I'm still traveling with a C720. Below the fold, an update on my experiences.

As I wrote two years ago, I started by replacing ChromeOS with HugeGreenBug's port of Ubuntu 14.04. This gave me almost exactly the UI I was then running on my desktop, so I was happy with one exception:
I'm still not used to the much bigger touch pad, and keep accidentally moving the cursor by touching the touch pad with the balls of my thumbs.This turned out to be a permanent irritation (but see below for what I should have done to fix it). Although the combination of the C720 and Ubuntu 14.04 gave sterling service for a long time, I started noticing an occasional crash. One of them was a very spectacular snow crash, which caused me to suspect a hardware problem. It was a day and a half before a trip, so I resorted again to Amazon's same-day delivery for a second C720. By the time it arrived all seemed well again with the first one, so I left with it and had no problem on the trip.

Now I had two C720s, so I could experiment with alternate Linux ports. I installed GalliumOS on the second one, and worked with it for a while. It was totally crash-free, being based on Xubuntu 16.04 was significantly more up-to-date, and it was certainly lean-and-mean. Derrick Diener's GalliumOS: The Linux Distro Specially Designed for Chromebook is a good review of GalliumOS.

But I just couldn't come to terms with the XFCE desktop and the way the touchpad had been configured to mimic the ChromeOS touchpad. I'd traded one irritation for two. Then I found Mark Solters' Ubuntu 16.04 on the Google Pixel 2, which reported:
I hit upon trying Linux Mint 18 “Sarah”, currently beta. This version of Mint is 16.04 based, and the installer boots on the Pixel 2!So I downloaded Linux Mint 18.1 "Serena" - Cinnamon (64-bit), wrote the .iso to a USB drive, popped it into the first C720 and powered up. The Mint live system came right up! I played with it for a while. Everything seemed to work, and the Cinnamon UI was a lot more to my taste than XFCE. Worth a try. After backing up my files from 14.04, I double-clicked on "Install Linux Mint".

After you answer the questions about keyboard and time zone and so on, the install starts with a worrisome long pause with the spinning cursor, but then things proceed normally with a progress bar. After it finished and I rebooted, I had a default Linux Mint system with 19GB of free space.

My next step was to connect to my WiFi network and
sudo apt-get update
sudo apt-get upgrade
which updated 12 packages. I have no travel for the next month and a working if slightly irritating environment on C720 #2, so I can take the opportunity to clean things up and customize a nice fresh environment on C720 #1. For the programming environment I needed:
  • git - the LOCKSS Program is converging on gitlab.com and gitlab CE.
  • emacs - which I still use for programming although I also use vi.
  • The JDK, ant, libxml2-utils and junit - needed to build the LOCKSS software.
  • Docker
For more general needs I added:
  • TeX Live- still my favorite environment for writing papers.
  • KeePassX - everyone should use a password safe.
  • gnucash
  • zoom.us - a program committee I'm on uses this for distributed meetings, it works really well.
So far, everything else I need is in the default install. At this point I have 17GB free. Not a lot, but as in the 14.04 days I will travel with an external SSD that holds my development environment.

Then I closed the lid, had a cup of tea, and opened the lid again. Things seemed to be working but, as I quickly discovered there was a problem. It turns out that the Mint kernel shows symptoms that appeared on Chromebook Linux wikis back in 2014:
  • Suspend/resume works the first time, but the next time the screen locks but the processor keeps running indefinitely. Shutdown behaves similarly.
  • And after a resume /var/log/kern.log is continually spammed with messages from ehci-pci saying resume error -19.
Not good. I tried various fixes found by Google-ing but none worked. Then I had an inspiration. I had two machines:
  • C720 #2 with a working kernel and a userland I didn't like.
  • C720 #1 with a broken kernel and a userland I liked.
All I needed to do was to get C720 #1 to boot the GalliumOS kernel not the Mint one. I re-installed vanilla Mint 18.1 on C720 #1 then here is what I did to it.

Stage 1 was to get the GalliumOS fixes relevant to the suspend/resume issue into the Mint userland:
  • Check out the CalliumOS config files for Haswell by: git clone https://github.com/GalliumOS/galliumos-haswell
  • Copy each of the files into its appropriate place.
Stage 2 was to get the GalliumOS kernel installed on C720 #1:
  • Create a file /etc/apt/sources.list.d/additional-repositories.list containing the line: deb http://apt.galliumos.org/ xenon main
  • Copy the file /etc/apt/trusted.gpg.d/galliumos.gpg from C720 #2 to C720 #1.
  • Run synaptic, reload, search for linux-image-galliumos, mark it for installation, apply.
  • Set the grub bootloader to display its menu by default so that if something goes wrong you can boot the Mint kernel using the "advanced options" menu item. Edit /etc/default/grub to read: #GRUB_HIDDEN_TIMEOUT=0
    GRUB_HIDDEN_TIMEOUT_QUIET=false
    GRUB_TIMEOUT=15
  • Reboot, and the system comes up: $ uname -r
    4.7.2-galliumos
    $
Now suspend/resume works just the way it does on GalliumOS!

Three years ago redditor SuccessInCircuit posted List of fixes for Xubuntu 13.10 on the Acer C720. Some of these fixes still seem to be relevant:
  • GalliumOS follows SuccessInCircuit's advice to optimize for the C720s SSD. So I edited the root file system entry in /etc/fstab to be: /dev/mapper/mint--vg-root / ext4 discard,relatime,errors=remount-ro 0 1
  • I find it necessary to disable the touchpad while typing. SuccessInCircuit says to do this with syndaemon, but there is a better way for Ubuntu 16.04, installing a touchpad control applet in the tray: $ sudo add-apt-repository ppa:atareao/atareao
    $ sudo apt-get update
    $ sudo apt-get install touchpad-indicator
  • Brave souls have upgraded the M.2 SSD, reinstalled ChromeOS, then installed Linux, but I'm not yet up for that despite the lack of space.
I'm now working on building the nice fresh environment on C720 #1 before my next trip. If I encounter anything else interesting I'll comment on this post.

Ariadne Magazine: Folksonomies in archives: controlled collaboration for specific documents

planet code4lib - Tue, 2017-01-03 13:24

Ana Margarida Dias da Silva looks at how archives in France have made use of modern web technologies to bring user input and controlled social collaboration into metadata creation for their large numbers of records.

Abstract: Folksonomies arise in the context of the Information Society, spurred by the advent and widespread use of collaborative tools of Web 2.0, where the passiv Read more about Folksonomies in archives: controlled collaboration for specific documents

Article type: Issue number: Authors: Organisations: Date published: Tue, 01/03/201777http://www.ariadne.ac.uk/issue77/margaridadiasdasilva

John Mark Ockerbloom: Public Domain Day 2017: Keeping memory alive

planet code4lib - Sun, 2017-01-01 17:12

It’s a bit hard to believe, but there are now adults in the US who cannot remember a substantial Public Domain Day.  In their own lifetimes, copyrights for published works have never expired here.  But I remember when they expired regularly every New Years Day, up until 1998.  And I have hope that in two more years, if our government does not acquiesce to the entertainment industry and extend copyrights yet again, we’ll start seeing published works again regularly enter the public domain here.

There are other things that I don’t remember, or haven’t seen much of myself.  I wasn’t around for the demagogues who whipped up mobs to usher in fascism and persecution in 1930s Europe, nor for the world war that followed that, nor the beginnings of European integration that helped keep the peace afterwards.  I didn’t personally experience the struggles for civil rights and equality of the 1960s and early 1970s, not being born yet, or being too young to take part in them.  I’ve never been to Russia, Turkey, Hungary, or the Philippines, where recent hopes of democratic and liberal reforms have given way to more authoritarian regimes.  I’m not an expert in the science that shows the world dangerously warming in recent years, how humans have caused and will be affected by it, and what we need to do to stop it.

Yes, all of these things are mentioned in history books and various media, but they’re often discussed in superficial ways that don’t reflect the detailed memories and lessons of the people who experienced them firsthand, or who are experts in understanding the issue.  I think that’s part of why I see our societies now making many of the same sorts of mistakes that people who know of these things firsthand could help us avoid.   I think it’s also part of the shock of many of people I know watching the news of the past year.  They didn’t have the personal memories of how prosperous societies can regress just as easily as they can progress, or hadn’t previously noticed how marginalized people experience those societies very differently than our dominant narratives suggest.

Keeping memory alive, and making it known far and wide, is essential if we are going to solve the problems we have today, and avoid the kinds of mistakes and disasters we’ve had in the past.   The more easily we can duplicate and spread that memory, the more likely we are to keep it alive.  That’s why the Internet Archive is making a backup copy in Canada, just in case anything happens to its primary US copy.  That’s why people at Penn and elsewhere are trying to duplicate all the data the government has on climate and other issues before the administration changes, lest it become unavailable or harder to access in the future.  That’s why projects like Wikipedia go out of their way to allow their content to be copied and readily downloadable in bulk, so it can be read and shared in places where the Internet isn’t as reliable or as uncensored as it is in other places.

And that’s one reason why the public domain is so important, and why it’s so important that copyrighted works enter the public domain regularly, automatically, and in a timely fashion.  Copyrights are important to support the people who create works of art and knowledge, and to help ensure that they can introduce them to the world in the form and manner they intend.  But it’s also important that after “limited times” (to quote the US Constitution) the works enter the public domain, so they can be copied, disseminated, reinterpreted and reworked, and remembered, without restriction.  The easier it is to copy and disseminate, the easier it is to remember.

In 2017, we have works by authors who died in 1946 finally entering the public domain in Europe and many other countries.  That now allows us to freely copy the works of those who perished in World War II and shortly thereafter, but not the works of all those who survived and helped rebuild society afterwards.  (Though it at least finally frees works like 1895’s The Time Machine, written by a young H.G. Wells who would live until 1946.)  We’re also seeing works by authors who died in 1966 entering the public domain in Canada and some other countries, freeing the works of people like the anti-fascist poet Andre Breton, as well as C. S. Forester’s Captain Horatio Hornblower.  (You can read about more authors now in the public domain in “life plus 70 years” and “life plus 50 years” countries in the Public Domain Review’s annual roundup.)

In the US, once again, no published works are entering the public domain today on their own accord.  (Works published in 1960 would have, under the copyright law in effect at the time they were published.   Duke has an overview of some of those, as they have for previous years.)  But we may soon be seeing works regularly enter the public domain once again, and there’s still a lot of the present-day public domain that’s not as well-known as it should be.  So there’s a lot we can do now to support the public domain in 2017.  Here’s what I’m planning to do:

  • Keep government accountable for the public domain and sound copyright policy:  The Congress and administration we just elected in the US are the same people who will be in office when 1923’s copyrights are scheduled to finally expire, on January 1, 2019.  I intend to make sure those expire on schedule, by watching for any attempts to extend copyrights further and telling my elected officials to oppose them.  (This has had good effect in other countries recently,  Pushback against the Trans-Pacific Partnership, for instance, has so far kept Canada from extending its copyrights another 20 years, as that treaty would have required them to do.)  I also intend to press the government to keep fair use strong, and to keep the Copyright Office administered by the Library of Congress, which will help it be responsive to both creators and consumers, and not just to the entertainment industry.
  • Bring to light the “hidden” public domain of the 20th century:  Many public domain works are now freely readable online, but most of them are from before 1923, when it’s easy to determine public domain status in the US.  HathiTrust has also made available many library books published between 1923 and 1963, whose copyrights have expired because they were never renewed as required.  But there are also a lot of unrenewed public domain newspapers, magazines, scholarly journals, photographs, and art and music from this time period that’s in the public domain as well, as well as pamphlets, posters, and underground and amateur publications from as late as 1989 in the public domain.   Unpublished works are also still entering the public domain, including in the US, from authors who died in 1946 or earlier.  All of this material is a crucial part of the memory of the 20th century, especially for more marginalized groups of people, and it’s at more risk of being lost and forgotten than most library books are.  In 2017, I’ll be working on projects that should make it considerably easier for people to clear copyrights for periodical literature in particular,  and thus keep the memories in them alive.
  • Highlight public domain and other open content especially important to remember:  On my Online Books Page, I’ll keep an eye out for books and serials in the public domain that cover topics that seem especially important for people to remember in 2017.  I’ll add them to my catalog with appropriate descriptions and make them more easily findable from places like Google and Wikipedia.  I’ll also be looking for more recent open-licensed content on these topics.  You can help me out with this, by suggesting titles for me to add. Over the past few years, I’ve added thousands of works to my new listings that were suggested by readers, and I’m eager to hear what you think we should be remembering.
  • Add my own work to the public domain:  Like many creators, I’d rather have my work remembered than keep it locked up for 70 years after my death.  So I open-license much of what I put online.  (This post for instance, is licensed CC-BY, so you can copy it as you like as long as you clearly credit that I wrote it and originally published it here on Everybody’s Libraries, and note the CC-BY license.)  And after a suitable period of time, I go further and put my work into the public domain outright.   Today, for instance, I dedicate everything I wrote and published prior to 2003, and whose copyrights I control, to the public domain, via a CC0 dedication.  I’ve made similar declarations in past years as well, inspired by the initial 14-year term that was prescribed by the US’s original copyright law.

It’s especially important in times of uncertainty and danger that we keep our collective memory alive, to help us move forward wisely and joyfully.  The public domain preserves and promotes that memory, and I hope you’ll join me in supporting it in the coming year.


Jonathan Rochkind: Concurrency in Rails 5.0

planet code4lib - Sun, 2017-01-01 17:10

My previous posts on concurrency in ActiveRecord have been some of the most popular on this blog (which I’d like to think means concurrency is getting more popular in Rails-land), so I’m going to share what I know about some new concurrency architecture in Rails5 — which is no longer limited to ActiveRecord.

(update: Hours before I started writing this unawares, matthewd submitted a rails PR for a Rails Guide, with some really good stuff; have only skimmed it now, but you might wanna go there either before, after, or in lieu of this).

I don’t fully understand the new stuff, but since it’s relatively undocumented at present, and has some definite gotchas, as well as definite potentially powerful improvements — sharing what I got seems helpful. This will be one of my usual “lots of words” posts, get ready!

The new architecture primarily involves ActiveSupport::Reloader (a global one of which is in Rails.application.reloader) and ActiveSupport::Executor (a global one of which is in Rails.application.executor). Also ActiveSupport::Dependencies::Interlock (a global one of which is at ActiveSupport::Dependencies.interlock.

Why you need to know this

If you create any threads in a Rails app yourself — beyond the per-request threads a multi-threaded app server like Puma will do for you. Rails takes care of multi-threaded request dispatch for you (with the right app server), but if you’re doing any kind of what I’ll call “manual concurrency” yourself —Thread.new, any invocations of anything in concurrent-ruby (recommended), or probably any celluloid (not sure), etc. — you got to pay attention to be using the new architecture to be doing what Rails wants — and to avoid deadlocks if dev-mode-style class-reloading is happening.

If you’re getting apparent deadlocks in a Rails5 app that does multi-threaded concurrency, it’s probably about this.

If you are willing to turn off dev-mode class-reloading and auto-loading altogether, you can probably ignore this.

What I mean by “dev-mode class-reloading”

Rails5 by default generates your environments/development.rb with  with config.cache_classes==false, config.eager_load==false. Classes are auto-loaded only on demand (eager_load == false), and are also sometimes unloaded to be reloaded on next access (cache_classes == false). (The details of when/how/which/if they are unloaded is outside the scope of this blog post, but has also changed in Rails5).

You can turn off all auto-loading with config.cache_classes==true, config.eager_load==true — the Rails5 default production.  All classes are loaded/require’d en masse on boot, and are never unloaded.  This is what I mean by ‘turn off dev-mode class-reloading and auto-loading altogether’.

The default Rails5 generated environments/test.rb has config.cache_classes==true, config.eager_load==false.  Only load classes on demand with auto-loading (eager_load == false), but never unload them.

I am not sure if there’s any rational purpose for having config.cache_classes = false, config.eager_load = true, probably not.

I think there was a poorly documented  config.autoload in previous Rails versions, with confusing interactions with the above two config; I don’t think it exists (or at least does anything) in Rails 5.

Good News

Previously to Rails 5, Rails dev-mode class-reloading and auto-loading were entirely un-thread-safe. If you were using any kind of manual concurrency, then you pretty much had to turn off dev-mode class-reloading and auto-loading. Which was too bad, cause they’re convenient and make dev more efficient. If you didn’t, it might sometimes work, but in development (or possibly test) you’d often see those pesky exceptions involving something like “constant is missing”, “class has been redefined”, or “is not missing constant” — I’m afraid I can’t find the exact errors, but perhaps some of these seem familiar.

Rails 5, for the first time, has an architecture which theoretically lets you do manual concurrency in the presence of class reloading/autoloading, thread-safely. Hooray! This is something I had previously thought was pretty much infeasible, but it’s been (theoretically) pulled off, hooray. This for instance theoretically makes it possible for Sidekiq to do dev-mode-style class-reloading — although I’m not sure if latest Sidekiq release actually still has this feature, or they had to back it out.

The architecture is based on some clever concurrency patterns, so it theoretically doesn’t impact performance or concurrency measuribly in production — or even, for the most part, significantly in development.

While the new architecture most immediately effects class-reloading, the new API is, for the most part, not written in terms of reloading, but is higher level API in terms of signaling things you are doing about concurrency: “I’m doing some concurrency here” in various ways.  This is great, and should be a good for future of Just Works concurrency in Rails in other ways than class reloading too.  If you are using the new architecture correctly, it theoretically makes ActiveRecord Just Work too, with less risk of leaked connections without having to pay lots of attention to it. Great!

I think matthewd is behind much of the new architecture, so thanks matthewd for trying to help move Rails toward a more concurrency-friendly future.

Less Good News

While the failure mode for concurrency used improperly with class-reloading in Rails 4 (which was pretty much any concurrency with class-reloading, in Rails 4) was occasional hard-to-reprodue mysterious exceptions — the failure mode for concurrency used improperly with class-reloading in Rails5 can be a reproduces-every-time deadlock. Where your app just hangs, and it’s pretty tricky to debug why, especially if you aren’t even considering “class-reloading and new Rails 5 concurrency architecture”, which, why would you?

And all the new stuff is, at this point, completely undocumented.  (update some docs in rails/rails #27494, hadn’t seen that before I wrote this).  So it’s hard to know how to use it right. (I would quite like to encourage an engineering culture where significant changes without docs is considered just as problematic to merge/release as significant changes without tests… but we’re not there yet). (The docs Autoloading and Reloading Constants Guide, to which this is very relevant, have not been updated for this ActiveSupport::Reloader stuff, and I think are probably no longer entirely accurate. That would be a good place for some overview docs…).

The new code is a bit tricky and abstract, a bit hard to follow. Some anonymous modules at some points made it hard for me to use my usual already grimace-inducing methods of code archeology reverse-engineering, where i normally count on inspecting class names of objects to figure out what they are and where they’re implemented.

The new architecture may still be buggy.  Which would not be surprising for the kind of code it is: pretty sophisticated, concurrency-related, every rails request will touch it somehow, trying to make auto-loading/class-reloading thread-safe when even ordinary ruby require is not (I think this is still true?).  See for instance all the mentions of the “Rails Reloader” in the Sidekiq changelog, going back and forth trying to make it work right — not sure if they ended up giving up for now.

The problem with maybe buggy combined with lack of any docs whatsoever — when you run into a problem, it’s very difficult to tell if it’s because of a bug in the Rails code, or because you are not using the new architecture the way it’s intended (a bug in your code). Because knowing the way it’s intended to work and be used is a bit of a guessing game, or code archeology project.

We really need docs explaining exactly what it’s meant to do how, on an overall architectural level and a method-by-method level. And I know matthewd knows docs are needed. But there are few people qualified to write those docs (maybe only matthewd), cause in order to write docs you’ve got to know the stuff that’s hard to figure out without any docs. And meanwhile, if you’re using Rails5 and concurrency, you’ve got to deal with this stuff now.

So: The New Architecture

I’m sorry this is so scattered and unconfident, I don’t entirely understand it, but sharing what I got to try to save you time getting to where I am, and help us all collaboratively build some understanding (and eventually docs?!) here. Beware, there may be mistakes.

The basic idea is that if you are running any code in a manually created thread, that might use Rails stuff (or do any autoloading of constants), you need to wrap your “unit of work” in either Rails.application.reloader.wrap { work } or Rails.application.executor.wrap { work }.  This signals “I am doing Rails-y things, including maybe auto-loading”, and lets the framework enforce thread-safety for those Rails-y things when you are manually creating some concurrency — mainly making auto-loading thread-safe again.

When do you pick reloader vs executor? Not entirely sure, but if you are completely outside the Rails request-response cycle (not in a Rails action method, but instead something like a background job), manually creating your own threaded concurrency, you should probably use Rails.application.reloader.  That will allow code in the block to properly pick up new source under dev-mode class-reloading. It’s what Sidekiq did to add proper dev-mode reloading for sidekiq (not sure what current master Sidekiq is doing, if anything).

On the other hand, if you are in a Rails action method (which is already probably wrapped in a Rails.application.reloader.wrap, I believe you can’t use a (now nested) Rails.application.reloader.wrap without deadlocking things up. So there you use Rails.application.executor.wrap.

What about in a rake task, or rails runner executed script?  Not sure. Rails.application.executor.wrap is probably the safer one — it just won’t get dev-mode class-reloading happening reliably within it (won’t necessarily immediately, or even ever, pick up changes), which is probably fine.

But to be clear, even if you don’t care about picking up dev-mode class-reloading immediately — unless you turn off dev-mode class-reloading and auto-loading for your entire app — you still need to wrap with a reloader/executor to avoid deadlock — if anything inside the block possibly might trigger an auto-load, and how could you be sure it won’t?

Let’s move to some example code, which demonstrates not just the executor.wrap, but some necessary use of ActiveSupport::Dependencies.interlock.permit_concurrent_loads too.

An actual use case I have — I have to make a handful of network requests in a Rails action method, I can’t really push it off to a bg job, or at any rate I need the results before I return a response. But since I’m making several of them, I really want to do them in parallel. Here’s how I might do it in Rails4:

In Rails4, that would work… mostly. With dev-mode class-reloading/autoloading on, you’d get occasional weird exceptions. Or of course you can turn dev-mode class-reloading off.

In Rails5, you can still turn dev-mode class-reloading/autoloading and it will still work. But if you have autoload/class-reload on, instead of an occasional weird exception, you’ll get a nearly(?) universal deadlock. Here’s what you need to do instead:

And it should actually work reliably, without intermittent mysterious “class unloaded” type errors like in Rails4.

ActiveRecord?

Previously, one big challenge with using ActiveRecord under concurrency was avoiding leaked connections.

think that if your concurrent work is wrapped in Rails.application.reloader.wrap do or Rails.application.executor.wrap do, this is no longer a problem — they’ll take care of returning any pending checked out AR db connections to the pool at end of block.

So you theoretically don’t need to be so careful about wrapping every single concurrent use of AR in a ActiveRecord::Base.connection_pool.with_connection  to avoid leaked connections.

But I think you still can, and it won’t hurt — and it should sometimes lead to shorter finer grained checkouts of db connections from the pool, which matters if you potentially have more threads than you have pool size in your AR connection. I am still wrapping in ActiveRecord::Base.connection_pool.with_connection , out of superstition if nothing else.

Under Test with Capybara?

One of the things that makes Capybara feature tests so challenging is that they inherently involve concurrency — there’s a Rails app running in a different thread than your tests themselves.

I think this new architecture could theoretically pave the way to making this all a lot more intentional and reliable, but I’m not entirely sure, not sure if it helps at all already just by existing, or would instead require Capybara to make use of the relevant API hooks (which nobody’s prob gonna write until there are more people who understand what’s going on).

Note though that Rails 4 generated a comment in config/environments/test.rb that says “If you are using a tool that preloads Rails for running tests [which I think means Capybara feature testing], you may have to set [config.eager_load] to true.”  I’m not really sure how true this was in even past versions Rails (whether it was neccessary or sufficient). This comment is no longer generated in Rails 5, and eager_load is still generated to be true … so maybe something improved?

Frankly, that’s a lot of inferences, and I have been still leaving eager_load = true under test in my Capybara-feature-test-using apps, because the last thing I need is more fighting with a Capybara suite that is the closest to reliable I’ve gotten it.

Debugging?

The biggest headache is that a bug in the use of the reloader/executor architecture manifests as a deadlock — and I’m not talking the kind that gives you a ruby ‘deadlock’ exception, but the kind where your app just hangs forever doing nothing. This is painful to debug.

These deadlocks in my experience are sometimes not entirely reproducible, you might get one in one run and not another, but they tend to manifest fairly frequently when a problem exists, and are sometimes entirely reproducible.

First step is experimentally turning off dev-mode class-reloading and auto-loading altogether  (config.eager_load = true,config.cache_classes = true), and see if your deadlock goes away. If it does, it’s probably got something to do with not properly using the new Reloader architecture. In desperation, you could just give up on dev-mode class-reloading, but that’d be sad.

Rails 5.0.1 introduces a DebugLocks feature intended to help you debug these deadlocks:

Added new ActionDispatch::DebugLocks middleware that can be used to diagnose deadlocks in the autoload interlock. To use it, insert it near the top of the middleware stack, using config/application.rb:

config.middleware.insert_before Rack::Sendfile, ActionDispatch::DebugLocks

After adding, visiting /rails/locks will show a summary of all threads currently known to the interlock.

PR, or at least initial PR, at rails/rails #25344.

I haven’t tried this yet, I’m not sure how useful it will be, I’m frankly not too enthused by this as an approach.

References
  • Rails.application.executor and Rails.application.reloader are initialized here, I think.
  • Not sure the design intent of: Executor being an empty subclass of ExecutionWrapper; Rails.application.executor being an anonymous sub-class of Exeuctor (which doesn’t seem to add any behavior either? Rails.application.reloader does the same thing fwiw); or if further configuration of the Executor is done in other parts of the code.
  • Sidekiq PR #2457 Enable code reloading in development mode with Rails 5 using the Rails.application.reloader, I believe code may have been written by matthewd. This is aood intro example of a model of using the architecture as intended (since matthewd wrote/signed off on it), but beware churn in Sidekiq code around this stuff dealing with issues and problems after this commit as well — not sure if Sidekiq later backed out of this whole feature?  But the Sidekiq source is probably a good one to track.
  • A dialog in Rails Github Issue #686 between me and matthewd, where he kindly leads me through some of the figuring out how to do things right with the new arch. See also several other issues linked from there, and links into Rails source code from matthewd.
Conclusion

If I got anything wrong, or you have any more information you think useful, please feel free to comment here — and/or write a blog post of your own. Collaboratively, maybe we can identify if not fix any outstanding bugs, write docs, maybe even improve the API a bit.

While the new architecture holds the promise to make concurrent programming in Rails a lot more reliable — making dev-mode class-reloading at least theoretically possible to do thread-safely, when it wasn’t at all possible before — in the short term, I’m afraid it’s making concurrent programming in Rails a bit harder for me.  But I bet docs will go a long way there.


Filed under: General

Meredith Farkas: 2016 wasn’t all bad

planet code4lib - Sat, 2016-12-31 17:45

As I alluded to in my last post, this year was a difficult one for me personally that ended up turning out for the better. I know that many of us have felt dispirited and beaten down since the election and feel like 2016 was a flaming dumpster fire of a year, so I’ve decided to look back at the things I’m grateful for in 2016.

Here’s my list, in no particular order:

  • Starting my third year as a librarian at Portland Community College. I have never felt more at home in a job and more challenged in fun and exciting ways. I love the work I get to do, my colleagues in the library, and the disciplinary faculty who are so devoted to their students. I’m right where I want to be.
  • Taking up archery. I’m not a very sporty person, so I was surprised by how much I enjoy shooting arrows at a target. It’s an elegant and challenging sport that I have fallen in love with.
  • Meeting Bruce Springsteen! Twice!!
  • Living in Oregon. Portland is such an amazing city with great food and culture, and Oregon is so full of natural wonders that there is always a new hiking trail, mountain, meadow, or waterfall to discover. It’s not perfect (what place is?), but I feel incredibly privileged to live here.
  • Attending the ACRL Oregon/Washington joint conference, also known as librarian camp. I always come away from that conference feeling so much love for our profession and so much hope in our future. The Pacific NW is full of amazing and collaborative librarians and I feel proud to be a part of this community.
  • Attending my dear high school friend Melodie’s wedding in Florida. There are few people in this world I will ever love the way I do Melodie and I was brought to tears being a part of this happy occasion in her life.
  • Finally finding an exercise routine I could stick with. I’ve been doing it faithfully for a year now and I’ve lost weight, gained muscle, and feel terrific!
  • The fact that my son just keeps getting more and more awesome. I don’t know how I deserved such a brilliant, kind, curious, compassionate, and cuddly kid. I feel especially grateful that I get to spend the entire summer break with him going on adventures.
  • Teaching LIS courses for San Jose State is a lot of work, but is always rewarding. I love teaching and I am constantly amazed by the caliber of students coming out of library school (or at least San Jose State).
  • Splurging for MLB at Bat so we could watch NY Mets games anytime and anywhere we wanted, even on my phone while stuck in traffic on the way home from Seattle.
  • Attending the Nutcracker with my son. We’ve done the abbreviated “Nutcracker Tea” for the past three years, but this December, we got front row center seats to the Oregon Ballet Theater’s stunning rendition of George Balanchine’s The Nutcracker with a live orchestra. It’s a wonderful tradition to have with my buddy.
  • Visiting Sunriver and Bend (in Central Oregon) in August. I love bike riding and I spend as much time as I can in Sunriver riding my bike through forests and along the Deschutes River. We rented a house and did a lot of good eating, hiking, swimming, and miniature golfing on this trip too.
  • My husband’s unconditional love. I don’t deserve everything he’s done for me and the faith he has had in me, but I am so grateful for it. He’s the best person I know.
  • Attending more Hillsboro Hops games this past summer. Our local single A minor league team’s games are SO fun and cheap! I just bought a 4-game season ticket package for the three of us in 2017 for less than a single major league game ticket would cost.
  • Going to Universal Studios Hollywood with my husband and son. What a fantastic day that was!!
  • I am terrible at keeping happy secrets, so I was incredibly proud of myself for actually surprising my husband with a romantic anniversary trip to the wine country in Walla Walla, Washington. He didn’t even know where we were going as we drove there!!! It was a perfect mini-vacation with bike riding, wine drinking, and hot tubbing.
  • All of the concerts I attended! This year I tried to get out to more concerts — both rock/pop, jazz, and classical. Highlights included Pink Martini, Bruce Springsteen, Belly, the Oregon Symphony’s program with Holst’s The Planets and Natasha Peremsky, the Danish String Quartet, Chico Freeman, and Tony Starlight vs. Jazz.
  • Seeing so many people mobilizing to do good after Trump was elected. It feels like we are going into a very dark time now, but I am heartened by seeing so many people coming together to form a resistance and help people who are or might be negatively impacted by Trump. Most people of privilege have been complacent for a long time, and I think the election shook many of us out of our comfort zones. If Trump’s election mobilizes people in the long-term to fight for things like racial and gender equality, transgender rights, and the environment, then that’s a good thing. I have to remain hopeful; I have a kid I love very much who is going to inherit this world from us and I want to help make it better (or at least not worse).
  • I feel so fortunate to be a part of our amazing profession. When I see librarians passionately standing up for the values of our profession — especially those around access and privacy — I feel very proud to be a librarian. I remain optimistic that we will remain faithful to these values during what I’m sure will be a challenging next four years.

And I’m grateful to you, my friends who read this blog, and especially those of you who have engaged with me here or via social media. There have been moments in my life where I’ve felt very alone and isolated, and this blog has sometimes served as a lifeline for over the past 12 years. Thank you for reading it and for connecting with me. I wish you all good things in the coming year.

What was the best thing that happened to you in 2016? What are you most grateful for?

Conal Tuohy: A tool for Web API harvesting

planet code4lib - Sat, 2016-12-31 05:31

A medieval man harvesting metadata from a medieval Web API

As 2016 stumbles to an end, I’ve put in a few days’ work on my new project Oceania, which is to be a Linked Data service for cultural heritage in this part of the world. Part of this project involves harvesting data from cultural institutions which make their collections available via so-called “Web APIs”. There are some very standard ways to publish data, such as OAI-PMH, OpenSearch, SRU, RSS, etc, but many cultural heritage institutions instead offer custom-built APIs that work in their own peculiar way, which means that you need to put in a certain amount of effort in learning each API and dealing with its specific requirements. So I’ve turned to the problem of how to deal with these APIs in the most generic way possible, and written a program that can handle a lot of what is common in most Web APIs, and can be easily configured to understand the specifics of particular APIs.

This program, which I’ve called API Harvester, can be configured by giving it a few simple instructions: where to download the data from, how to split up the downloaded data into individual records, where to save the record files, how to name those files, and where to get the next batch of data from (i.e. how to resume the harvest). The API Harvester does have one hard requirement: it is only able to harvest data in XML format, but most of the APIs I’ve seen offered by cultural heritage institutions do provide XML, so I think it’s not a big limitation.

The API Harvester software is open source, and free to use; I hope that other people find it useful, and I’m happy to accept feedback or improvements, or examples of how to use it with specific APIs. I’ve created a wiki page to record example commands for harvesting from a variety of APIs, including OAI-PMH, the Trove API, and an RSS feed from this blog. This wiki page is currently open for editing, so if you use the API Harvester, I encourage you to record the command you use, so other people can benefit from your work. If you have trouble with it, or need a hand, feel free to raise an issue on the GitHub repository, leave a comment here, or contact me on Twitter.

Finally, a brief word on how to use the software: to tell the harvester how to pull a response apart into individual records, and where to download the next page of records from (and the next, and the next…), you give it instructions in the form of “XPath expressions”. XPath is a micro-language for querying XML documents; it allows you to refer to elements and attributes and pieces of text within an XML document, to perform basic arithmetic and manipulate strings of text. XPath is simple yet enormously powerful; if you are planning on doing anything with XML it’s an essential thing to learn, even if only to a very basic level. I’m not going to give a tutorial on XPath here (there are plenty on the web), but I’ll give an example of querying the Trove API, and briefly explain the XPath expressions used in that examples:

Here’s the command I would use to harvest metadata about maps, relevant to the word “oceania”, from the Trove API, and save the results in a new folder called “oceania-maps” in my Downloads folder:

java -jar apiharvester.jar
directory="/home/ctuohy/Downloads/oceania-maps"
retries=5
url="http://api.trove.nla.gov.au/result?q=oceania&zone=map&reclevel=full"
url-suffix="&key=XXXXXXX"
records-xpath="/response/zone/records/*"
id-xpath="@url"
resumption-xpath="/response/zone/records/@next"

For legibility, I’ve split the command onto multiple lines, but this is a single command and should be entered on a single line.

Going through the parts of the command in order:

  • The command java launches a Java Virtual Machine to run the harvester application (which is written in the Java language).
  • The next item, -jar, tells Java to run a program that’s been packaged as a “Java Archive” (jar) file.
  • The next item, apiharvester.jar, is the harvester program itself, packaged as a jar file.

The remainder of the command consists of parameters that are passed to the API harvester and control its behaviour.

  • The first parameter, directory="/home/ctuohy/Downloads/oceania-maps", tells the harvester where to save the XML files; it will create this folder if it doesn’t already exist.
  • With the second parameter, retries=5, I’m telling the harvester to retry a download up to 5 times if it fails; Trove’s server can sometimes be a bit flaky at busy times; retrying a few times can save the day.
  • The third parameter, url="http://api.trove.nla.gov.au/result?q=oceania&zone=map&reclevel=full", tells the harvester where to download the first batch of data from. To generate a URL like this, I recommend using Tim Sherratt’s excellent online tool, the Trove API Console.
  • The next parameter url-suffix="&key=XXXXXXX" specifies a suffix that the harvester will append to the end of all the URLs which it requests. Here, I’ve used url-suffix to specify Trove’s “API Key”; a password which each registered Trove API user is given. To get one of these, see the Trove Help Centre. NB XXXXXXX is not my actual API Key.

The remaining parameters are all XPath expressions. To understand them, it will be helpful to look at the XML content which the Trove API returns in response to that query, and which these XPath expressions apply to.

  • The first XPath parameter, records-xpath="/response/zone/records/*", identifies the elements in the XML which constitute the individual records. The XPath /response/zone/records/* describes a path down the hierarchical structure of the XML: the initial / refers to the start of the document, the response refers to an element with that name at the “root” of the document, then /zone refers to any element called zone within that response element, then /records refers to any records within any of those response elements, and the final /* refers to any elements (with any name) within any of of those response elements. In practice, this XPath expression identifies all the work elements in the API’s response, and means that each of these work elements (and its contents) ends up saved in its own file.
  • The next parameter, id-xpath="@url" tells the harvester where to find a unique identifier for the record, to generate a unique file name. This XPath is evaluated relative to the elements identified by the records-xpath; i.e. it gets evaluated once for each record, starting from the record’s work element. The expression @url means “the value of the attribute named url”; the result is that the harvested records are saved in files whose names are derived from these URLs. If you look at the XML, you’ll see I could equally have used the expression @id instead of @url.
  • The final parameter, resumption-xpath="/response/zone/records/@next", tells the harvester where to find a URL (or URLs) from which it can resume the harvest, after saving the records from the first response. You’ll see in the Trove API response that the records element has an attribute called next which contains a URL for this purpose. When the harvester evaluates this XPath expression, it gathers up the next URLs and repeats the whole download process again for each one. Eventually, the API will respond with a records element which doesn’t have a next attribute (meaning that there are no more records). At that point, the XPath expression will evaluate to nothing, and the harvester will run out of URLs to harvest, and grind to a halt.

Happy New Year to all my readers! I hope this tool is of use to some of you, and I wish you a productive year of metadata harvesting in 2017!

Information Technology and Libraries: President's Message: Focus on Information Ethics

planet code4lib - Fri, 2016-12-30 22:48
President's Message: Focus on Information Ethics

Information Technology and Libraries: Editorial Board Thoughts: Metadata Training in Canadian Library Technician Programs

planet code4lib - Fri, 2016-12-30 22:48
Editorial Board Thoughts: Metadata Training in Canadian Library Technician Programs

Information Technology and Libraries: Technology Skills in the Workplace: Information Professionals’ Current Use and Future Aspirations

planet code4lib - Fri, 2016-12-30 22:48
Information technology serves as an essential tool for today's information professional, with a need for ongoing research attention to assess the technological directions of the field over time. This paper presents the results of a survey of the technologies used by library and information science (LIS) practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn.  The most common technologies employed were: email, office productivity tools, web browsers, library catalog and database searching tools, and printers, with programming topping the list of most-desired technology skill to learn. Generally similar technology usage patterns were observed for early and later-career practitioners. Findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice.

Information Technology and Libraries: Accessibility of Vendor-Created Database Tutorials for People with Disabilities

planet code4lib - Fri, 2016-12-30 22:48
Many video, screencast, webinar, or interactive tutorials are created and provided by vendors for use by libraries to instruct users in database searching. This study investigates whether these vendor-created database tutorials are accessible for people with disabilities, to see whether librarians can use these tutorials instead of creating them in-house.  Findings on accessibility were mixed.  Positive accessibility features and common accessibility problems are described, with recommendations on how to maximize accessibility.

Information Technology and Libraries: Analyzing Digital Collections Entrances: What Gets Used and Why It Matters

planet code4lib - Fri, 2016-12-30 22:48
This paper analyzes usage data from Hunter Library's digital collections using Google Analytics for a period of twenty-seven months from October 2013 through December 2015. The authors consider this data analysis to be important for identifying collections that receive the largest number of visits. We argue this data evaluation is important in terms of better informing decisions for building digital collections that will serve user needs. The authors also study the benefits of harvesting to sites such as the DPLA and consider this paper will contribute to the overall literature on Google Analytics and its use by libraries.

LITA: Jobs in Information Technology: December 28, 2016

planet code4lib - Wed, 2016-12-28 22:24

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University at Albany, State University of New York, Director of Technical Services and Library Systems, Albany, NY

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Meredith Farkas: My year in reading 2016

planet code4lib - Wed, 2016-12-28 21:58

2016 has been one hell of a year. It started out for me with optimistic giddiness, then crashed into the land of extreme stress and fear and stayed there rather longer than I would have liked. But what I’d thought was the end of so many good things in my life actually marked the beginning of a fantastic new chapter. While I wish I could have skipped the painful lessons and jumped right to the end, I’m grateful for all I learned this year. I’m happier and healthier for having gone through it. That doesn’t mean I’m not freaked out as hell by the incoming presidential administration or really saddened by the deaths of so many actors, writers, and musicians I loved, but I also feel tremendously lucky for what I have in my life. So much love.

Compiling the list of books I read this year takes me back to some of the sadder times, because I can remember where I was while I was reading each one. Those books listed in bold were among my Top 10 for the year. Those with an asterisk are ones I either didn’t finish or didn’t really like. This list does not include the books I read to my son this year because it would be a VERY long list otherwise. Maggie Nelson’s visceral, honest, and poetic essays in Bluets and The Argonauts were, without question, two of the three best things I read this year. If you haven’t read Maggie Nelson, what are you waiting for?!?!?

Novels:

  • Rich and Pretty: A Novel by Rumaan Alam
  • Fifteen Dogs by Andre Alexis
  • Did You Ever Have a Family by Bill Clegg
  • The Sunlit Night by Rebecca Dinerstein
  • The Green Road by Anne Enright
  • Days of Awe: A novel by Lauren Fox
  • The Nightingale by Kristin Hannah
  • The First Bad Man by Miranda July (after slogging through the whole thing, I’m still not sure what to think of this book and whether or not I should give it an asterisk)
  • Modern Lovers by Emma Straub
  • Crossing to Safety by Wallace Stegner
  • The Engagements by J. Courtney Sullivan
  • Gold Fame Citrus: A Novel by Claire Vaye Watkins
  • A Little Life: A Novel by Hanya Yanagihara

Short-Story Collections:

  • A Manual for Cleaning Women by Lucia Berlin
  • What We Talk About When We Talk About Love by Raymond Carver (a re-read, many times over)
  • American Housewife: Stories* by Helen Ellis
  • You Should Pity Us Instead by Amy Gustine
  • Fortune Smiles: Stories by Adam Johnson
  • The Dream Life of Astronauts: Stories by Patrick Ryan

Memoirs/Essays/Non-Fiction:

  • Desert Solitaire: A Season in the Wilderness by Edward Abbey
  • Not That Kind of Girl: A Young Woman Tells You What She’s “Learned”* by Lena Dunham
  • Where Nobody Knows Your Name: Life in the Minor Leagues of Baseball by John Feinstein
  • Wishful Drinking by Carrie Fisher
  • The Sisters Are Alright: Changing the Broken Narrative of Black Women in America by Tamara Winfrey Harris
  • When Breath Becomes Air by Paul Kalanithi
  • Is Everyone Hanging Out Without Me? (And Other Concerns) by Mindy Kaling
  • Bluets by Maggie Nelson
  • The Argonauts by Maggie Nelson
  • Dear Mr. You* by Mary-Louise Parker
  • The Faraway Nearby by Rebecca Solnit
  • Born to Run by Bruce Springsteen

Poetry:

  • All of Us: The Collected Poems by Raymond Carver
  • E. E. Cummings Complete Poems by e. e. cummings
  • Collected Poems by Edna St. Vincent Millay
  • Felicity by Mary Oliver
  • The Selected Poetry of Rainer Maria Rilke by Ranier Maria Rilke
  • Poems New and Collected by Wislawa Szymborska

Here are some books I hope to read in 2017. If you have any feedback on them (must-reads or don’t-reads) let me know!

  • Either Secondhand Time or Voices from Chernobyl (or both) by Svetlana Alexievich
  • Willful Disregard: A Novel About Love by Lena Andersson
  • The Elegance of the Hedgehog by Muriel Barbery
  • Evicted: Poverty and Profit in the American City by Matthew Desmond
  • Eleven Hours by Pamela Erens
  • Abandon Me by Melissa Febos
  • My Brilliant Friend by Elena Ferrante
  • Homegoing by Yaa Gyasi
  • Before the Fall by Noah Hawley
  • Lab Girl by Hope Jahren
  • Furiously Happy by Jenny Lawson
  • It Can’t Happen Here by Sinclair Lewis
  • The Association of Small Bombs by Karan Mahajan
  • Nutshell by Ian McEwan
  • Norwegian by Night by Derek Miller
  • The Bluest Eye by Toni Morrison
  • The Sympathizer by Viet Thanh Nguyen
  • Commonwealth by Ann Patchet
  • Eleanor and Park by Rainbow Rowell (the last Rowell book I haven’t read!)
  • Men Explain Things to Me by Rebecca Solnit
  • The Paying Guests by Sarah Waters
  • The Underground Railroad by Colson Whitehead

May 2017 be kinder to all of us. And if it isn’t, I hope you find some good books that transport you somewhere else (at least temporarily).

Jason Ronallo: Testing DASH and HLS Streams on Linux

planet code4lib - Wed, 2016-12-28 20:08

I’m developing a script to process some digitized and born digital video into adaptive bitrate formats. As I’ve gone along trying different approaches for creating these streams, I’ve wanted reliable ways to test them on a Linux desktop. I keep forgetting how I can effectively test DASH and HLS adaptive bitrate streams I’ve created, so I’m jotting down some notes here as a reminder. I’ll list both the local and online players that you can try.

While I’m writing about testing both DASH and HLS adaptive bitrate formats, really we need to consider 3 formats as HLS can be delivered as MPEG-2 TS segments or fragmented MP4 (fMP4). Since mid-2016 and iOS 10+ HLS segments can be delivered as fMP4. This now allows you to use the same fragmented MP4 files for both DASH and HLS streaming. Until uptake of iOS 10 is greater you likely still need to deliver video with HLS-TS as well (or go with an HLS-TS everywhere approach). While DASH can use any codec I’ll only be testing fragmented MP4s (though maybe not fully conformant to DASH-IF AVC/264 interoperability points). So I’ll break down testing by DASH, HLS TS, and HLS fMP4 when applicable.

The important thing to remember is that you’re not playing back a video file directly. Instead these formats use a manifest file which lists out the various adaptations–different resolutions and bitrates–that a client can choose to play based on bandwidth and other factors. So what we want to accomplish is the ability to play back video by referring to the manifest file instead of any particular video file or files. In some cases the video files will be self-contained, muxed video and audio and byte range requests will be used to serve up segments, but in other cases the video is segmented with the audio in either a separate single file or again the audio segmented similar to the video. In fact depending on how the actual video files are created they may even lack data necessary to play back independent of another file. For instance it is possible to create a separate initialization MP4 file that includes the metadata that allows a client to know how to play back each of the segment files that lack this information. Also, all of these files are intended to be served up over HTTP. They can also include links to text tracks like captions and subtitles. Support for captions in these formats is lacking for many HTML5 players.

Also note that all this testing is being done on Ubuntu 16.04.1 LTS though the Xubuntu variant and it is possible I’ve compiled some of these tools myself (like ffmpeg) rather than using the version in the Ubuntu repositories.

Playing Manifests Directly

I had hoped that it would be fairly easy to test these formats directly without putting them behind a web server. Here’s what I discovered about playing the files without a web server.

GUI Players

Players like VLC and other desktop players have limited support for these formats, so even when they don’t work in these players that doesn’t mean the streams won’t play in a browser or on a mobile device. I’ve had very little luck using these directly from the file system. Assume for this post that I’m already in a directory with the video manifest files: cd /directory/with/video

So this doesn’t work for a DASH manifest (Media Presentation Description): vlc stream.mpd

Neither does this for an HLS-TS manifest: vlc master.m3u8

In the case of HLS it looks like VLC is not respecting relative paths the way it needs to. Some players appear like they’re trying to play HLS, but I haven’t found a Linux GUI player yet that can play the stream directly from the file sytem like this yet. Suggestions?

Command Line Players DASH

Local testing of DASH can be done with the GPAC MP4Client: MP4Client stream.mpd

This works and can tell you if it is basically working and a separate audio file is synced, but only appears to show the first adaptation. I also have some times when it will not play a DASH stream that plays just fine elsewhere. It will not show you whether the sidecar captions are working and I’ve not been able to use MP4Client to figure out whether the adaptations are set up correctly. Will the video sources actually switch with restricted bandwidth? There’s a command line option for this but I can’t see that it works.

HLS

For HLS-TS it is possible to use the ffplay media player that uses the ffmpeg libraries. It has some of the same limitations as MP4Client as far as testing adaptations and captions. The ffplay player won’t work though for HLS-fMP4 or MPEG-DASH.

Other Command Line Players

The mpv media player is based on MPlayer and mplayer2 and can play back both HLS-TS and HLS-fMP4 streams, but not DASH. It also has some nice overlay controls for navigating through a video including knowing about various audio tracks. Just use it with mpv master.m3u8. The mplayer player also works, but seems to choose only one adaptation (the lowest bitrate or the first in the list?) and does not have overlay controls. It doesn’t seem to recognize the sidecar captions included in the HLS-TS manifest.

Behind a Web Server

One simple solution to be able to use other players is to put the files behind a web server. While local players may work, these formats are really intended to be streamed over HTTP. I usually do this by installing Apache and allowing symlinks. I then symlink from the web root to the temporary directory where I’m generating various ABR files. If you don’t want to set up Apache you can also try web-server-chrome which works well in the cases I’ve tested (h/t @Bigggggg_Al).

GUI Players & HTTP

I’ve found that the GStreamer based Parole media player included with XFCE can play DASH and HLS-TS streams just fine. It does appear to adapt to higher bitrate versions as it plays along, but Parole cannot play HLS-fMP4 streams yet.

To play a DASH stream: parole http://localhost/pets/fmp4/stream.mpd

To play an HLS-TS stream: parole http://localhost/pets/hls/master.m3u8

Are there other Linux GUIs that are known to work?

Command Line Players & HTTP

ffplay and MP4Client also work with localhost URLs. ffplay can play HLS-TS streams. MP4Client can play DASH and HLS-TS streams, but for HLS-TS it seems to not play the audio.

Online Players

And once you have a stream already served up from a local web server, there are online test players that you can use. No need to open up a port on your machine since all the requests are made by the browser to the local server which it already has access to. This is more cumbersome with copy/paste work, but is probably the best way to determine if the stream will play in Firefox and Chromium. The main thing you’ll need to do is set CORS headers appropriately. If you have any problems with this check your browser console to see what errors you’re getting. Besides the standard Access-Control-Allow-Origin “*” for some players you may need to set headers to accept pre-flight Access-Control-Allow-Headers like “Range” for byte range requests.

The Bitmovin MPEG-DASH & HLS Test Player requires that you select whether the source is DASH or HLS-TS (or progressive download). Even though Linux desktop browsers do not natively support playing HLS-TS this player can repackage the TS segments so that they can be played back as MP4. This player does not work with HLS-fMP4 streams, though. Captions that are included in the DASH or HLS manifests can be displayed by clicking on the gear icon, though there’s some kind of double-render issue with the DASH manifests I’ve tested.

Really when you’re delivering DASH you’re probably using dash.js underneath in most cases so testing that player is useful. The DASH-264 JavaScript Reference Client Player has a lot of nice features like allowing the user to select the adaptation to play and display of various metrics about the video and audio buffers and the bitrate that is being downloaded. Once you have some files in production this can be helpful for seeing how well your server is performing. Captions that are included in the DASH manifest can be displayed.

The hls.js player has a great demo site for each version that has a lot of options to test quality control and show other metrics. The other nice part about this demo page is that you can just add a src parameter to the URL with the localhost URL you want to test. I could not get hls.js to work with HLS-fMP4 streams, though there is an issue to add fMP4 support. Captions do not seem to be enabled.

There is also the JW Player Stream Tester. But since I don’t have a cert for my local server I need to use the JW Player HTTP stream tester instead of the HTTPS one. I was successfully able to test a DASH and HLS-TS streams with this tool. Captions only displayed for the HLS stream.

The commercial Radiant media player has a DASH and HLS tester than can be controlled with URL parameters. I’m not sure why the streaming type needs to be selected first, but otherwise it works well. It knows how to handle DASH captions but not HLS ones, and it does not work with HLS-fMP4.

The commercial THEOplayer HLS and DASH testing tool only worked for my HLS-TS stream and not the DASH or HLS-fMP4 streams I’ve tested. Maybe it was the test examples given, but even their own examples did not adapt well and had buffering issues.

Wowza has a page for video test players but it seems to require a local Wowza server be set up.

What other demo players are there online that can be used to test ABR streams?

I’ve also created a little DASH tester using Plyr and dash.js. You can either enter a URL to an MPD into the input or append a src parameter with the URL to the MPD to the test page URL. To make it even easier to use, I created a short script that allows me to launch it from a terminal just by giving it the MPD URL. This approach could be used for a couple of the other demos above as well.

One gap in my testing so far is the Shaka player. They have a demo site, but it doesn’t allow enabling an arbitrary stream.

Other Tools for ABR Testing

In order to test automatic bitrate switching it is useful to test that bandwidth switching is working. Latest Chromium and Firefox nightly both have tools built into their developer tools to simulate different bandwidth conditions. In Chromium this is under the network tab and in Firefox nightly it is only accessible when turning on the mobile/responsive view. If you set the bandwidth to 2G you ought to see network requests for a low bitrate adaptation, and if you change it to wifi it ought to adapt to a high bitrate adaptation.

Summary

There are decent tools to test HLS and MPEG-DASH while working on a Linux desktop. I prefer using command line tools like MP4Client (DASH) and mpv (HLS-TS, HLS-fMP4) for quick tests that the video and audio are packaged correctly and that the files are organized and named correctly. These two tools cover both formats and can be launched quickly from a terminal.

I plan on taking a DASH-first approach, and for desktop testing I prefer to test in video.js if caption tracks are added as track elements. With contributed plugins it is possible to test DASH and HLS-TS in browsers. I like testing with Plyr (with my modifications) if the caption file is included in DASH manifest since Plyr was easy to hack to make this work. For HLS-fMP4 (and even HLS-TS) there’s really no substitute to testing on an iOS device (and for HLS-fMP4 on an iOS 10+ device) as the native player may be used in full screen mode.

Harvard Library Innovation Lab: Physical Pitch Decks

planet code4lib - Thu, 2016-12-22 16:00

I’ve been playing with physical pitch decks lately. Slides as printed cards.

PowerPoint, Deck.js, and the like are fantastic when sharing with large groups of people — from a classroom full of folks to a web full of folks. But, what if easy and broad sharing isn’t a criteria for your pitch deck?

You might end up with physical cards like I did when I recently pitched Private Talking Spaces. The cards are surprisingly good!! Just like non-physical slides, they can provide outlines for talks and discussions, but they’re so simple (just paper and ink), they won’t get in the way when sharing ideas with small groups.

The operation of the cards is as plain as can be – just take the card off the top, flip it upside down, and put it to the side. 

n cards = n screens in the world of physical pitch decks. I wish we had multiple projectors in rooms! In the photo above, I pin my agenda card up top.

I drew the slides in Adobe Illustrator. They’re six inches square and printed on sturdy paper. If you’d like to make your own, here’s my .ai file and here’s a .pdf version.

It feels like there’s something here. Some depth. If you’ve had success with physical pitch decks, please send me pointers. Thanks!!

Pages

Subscribe to code4lib aggregator