You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 2 hours 19 min ago

Terry Reese: MarcEdit MacOS Updates

Mon, 2017-02-06 06:15

This past weekend, I spent a good deal of time getting the MacOS version of MarcEdit synchronized with the Windows and Linux builds.  In addition to the updates, there is a significant change to the program that needs to be noted as well. 

First, let’s start with the changelog.  The following changes were made in this version:

*************************************************
** 2.2.30
*************************************************
* Bug Fix: Delimited Text Translator — when receiving Unix formatted files on Windows, the program may struggle with determining new line data.  This has been corrected.
* Bug Fix: RDA Helper — when processing copyright information, there are occasions where the output can create double brackets ($c[[) — this should be corrected.
* Behavior Change: Delimited Text Translator — I’ve changed the default value from on to off as it applies to ignoring header rows. 
* Enhancement: System Info (main window) — I’ve added information related to referenced libraries to help with debugging questions.
* Bug fix/Behavior Change: Export Tab Delimited Records: Second delimiter insertion should be standardized with all regressions removed.
* New Feature: Linked Data Tools: Service Status options have been included so users can check the status of the currently profiled linked data services.
* New Feature: Preferences/Networked Tasks: MarcEdit uses a short timeout (0.03 seconds) when determining if a network is available.  I’ve had reports of folks using MarcEdit have their network dropped from MarcEdit.  This is likely because their network has more latency.  In the preferences, you can modify this value.  I would never set it above 500 milliseconds (0.05 seconds) because it will cause MarcEdit to freeze when off network, but this will give users more control over their network interactions.
* Bug Fix: Swap Field Function: The new enhancement in the swap field function added with the last update didn’t work in all cases.  This should close that gap.
* Enhancement: Export Tab Delimited Records: Added Configurable third delimiter.
* Enhancement: MarcEditor: Improvements in the Page Counting to better support invalid formatted data.
* Enhancement: Extract/Delete MARC Records: Added file open button to make it easier to select file for batch search
* Bug Fix: Log File locking and inaccessible till closed in very specific instances.
* Enhancement: Compiling changes…For the first time, I’ve been able to compile as 64-bit, which has reduced download size.
* Bug Fix: Deduplicate Records: The program would thrown an error if the dedup save file was left blank.

Application Architecture Changes

The first thing that I wanted to highlight is that the program is being built as a 64-bit application.  This is a significant change to the program.  Since the program was ported to MacOS, the program has been compiled as a 32-bit application.  This has been necessary due to some of the requirements found in the mono stack.  However, over the past year, Microsoft has become very involved in this space (primarily to make it easier to develop IOS applications on Windows via an emulator), and that has lead to the ability to compile MarcEdit as a 64-bit application. 

So why do this if the 32-bit version worked?  Well, what spurred this on was a conversation that I had with the homebrew maintainers.  It appears that they are removing the universal compilation options which will break Z39.50 support in MarcEdit.  They suggested making my own tap (which I will likely pursue), but it got me spending time seeing what dependencies were keeping me from compiling directly to 64-bit.  It took some doing, but I believe that I’ve gotten all code that necessitated building as 32-bit out of the application, and the build is passing and working. 

I’m pointing this out because I could have missed something.  My tools for automated testing for the MacOS build are pretty non-existent.  So, if you run into a problem, please let me know.  Also, as a consequence of compiling only to 64-bit, I’ve been able to reduce the size of the download significantly because I am able to reduce the number of dependencies that I needed to link to.  This download should be roughly 38 MB smaller than previous versions.

Downloading the Update

You can download the update using the automated download prompt in MarcEdit or by going to the downloads page at: http://marcedit.reeset.net/downloads/

–tr

Terry Reese: MarcEdit Windows/Linux Updates

Mon, 2017-02-06 06:02

This weekend, I worked on a couple of updates related to MarcEdit.  The updates applicable to the Windows and Linux builds are the following:

6.2.455
* Enhancement: Export Tab Delimited Records: Added Configurable third delimiter.
* Enhancement: MarcEditor: Improvements in the Page Counting to better support invalid formatted data.
* Enhancement: Extract/Delete MARC Records: Added file open button to make it easier to select file for batch search
* Update: Field Count: The record count of the field count can be off if formatting is wrong.  I’ve made this better.
* Update: Extract Selected Records: Added an option to sort checked items to the top.
* Bug Fix: Log File locking and inaccessible till closed in very specific instances.

The downloads can be picked up via the automatic downloader or via the downloads page at: http://marcedit.reeset.net/downloads/

–tr

Jason Ronallo: Choosing a Path Forward for IIIF Audio and Video

Sun, 2017-02-05 02:47

IIIF is working to bring AV resources into IIIF. I have been thinking about how to bring to AV resources the same benefits we have enjoyed for the IIIF Image and Presentation APIs. The initial intention of IIIF, especially with the IIIF Image API, was to meet a few different goals to fill gaps in what the web already provided for images. I want to consider how video works on the web and what gaps still need to be filled for audio and video.

This is a draft and as I consider the issues more I will make changes to better reflect my current thinking.

See updates at the end of this post.

Images

When images were specified for the web the image formats were not chosen, created, or modified with the intention of displaying and exploring huge multi-gigabit images. Yet we have high resolution images that users would find useful to have in all their detail. So the first goal was to improve performance of delivering high resolution images. The optimization that would work for viewing large high resolution images was already available; it was just done in multiple different ways. Tiling large images is the work around that has been developed to improve the performance of accessing large high resolution images. If image formats and/or the web had already provided a solution for this challenge, tiling would not have been necessary. When IIIF was being developed there were already tiling image servers available. The need remained to create standardized access to the tiles to aid in interoperability. IIIF accomplished standardizing the performance optimization of tiling image servers. The same functionality that enables tiling can also be used to get regions of an image and manipulate them for other purposes. In order to improve performance smaller derivatives can be delivered for use as thumbnails on a search results page.

The other goal for the IIIF Image API was to improve the sharing of image resources across institutions. The situation before was both too disjointed for consumers of images and too complex for those implementing image servers. IIIF smoothed the path for both. Before IIIF there was not just one way of creating and delivering tiles, and so trying to retrieve image tiles from multiple different institutions could require making requests to multiple different kinds of APIs. IIIF solves this issue by providing access to technical information about an image through an info.json document. That information can then be used in a standardized way to extract regions from an image and manipulate them. The information document delivers the technical properties necessary for a client to create the URLs needed to request the given sizes of whole images and tiles from parts of an image. Having this standard accepted by many image servers has meant that institutions can have their choice of image servers based on local needs and infrastructure while continuing to interoperate for various image viewers.

So it seems as if the main challenges the IIIF Image API were trying to solve were about performance and sharing. The web platform had not already provided solutions so they needed to be developed. IIIF standardized the pre-existing performance optimization pattern of image tiling. Through publishing information about available images in a standardized way it also improved the ability to share images across institutions.

What other general challenges were trying to be solved with the IIIF Image API?

Video and Audio

The challenges of performance and sharing are the ones I will take up below with regards to AV resources. How does audio and video currently work on the web? What are the gaps that still need to be filled? Are there performance problems that need to be solved? Are there challenges to sharing audio and video that could be addressed?

AV Performance

The web did not gain native support for audio and video until later in its history. For a long time the primary ways to deliver audio and video on the web used Flash. By the time video and audio did become native to the web many of the performance considerations of media formats already had standard solutions. Video formats have such advanced lossy compression that they can sometimes even be smaller than an image of the same content. (Here is an example of a screenshot as a lossless PNG being much larger than a video of the same page including additional content.) Tweaks to the frequency of full frames in the stream and the bitrate for the video and audio can further help improve performance. A lot of thought has been put into creating AV formats with an eye towards improving file size while maintaining quality. Video publishers also have multiple options for how they encode AV in order to strike the right balance for their content between compression and quality.

Progressive Download

In addition video and audio formats are designed to allow for progressive download. The whole media file does not need to be downloaded before part of the media can begin playing. Only the beginning of the media file needs to be downloaded before a client can get the necessary metadata to begin playing the video in small chunks. The client can also quickly seek into the media to play from any arbitrary point in time without downloading the portions of the video that have come before or after. Segments of the media can be buffered to allow for smooth playback. Requests for these chunks of media can be done with a regular HTTP web server like Apache or Nginx using byte range requests. The web server just needs minimal configuration to allow for byte range requests that can deliver just the partial chunk of bytes within the requested range. Progressive download means that a media file does not have to be pre-segmented–it can remain a single whole file–and yet it can behave as if it has been segmented in advance. Progressive download effectively solves many of the issues with the performance of the delivery of very long media files that might be quite large in size. Media files are already structured in such a way that this functionality of progressive download is available for the web. Progressive download is a performance optimization similar to image tiling. Since these media formats and HTTP already effectively solve the issue of quick playback of media without downloading the whole media file, there is no need for IIIF to look for further optimizations for these media types. Additionally there is no need for special media servers to get the benefits of the improved performance.

Quality of Service

While progressive download solves many of the issues with delivery of AV on the web based on how the media files are constructed, it is a partial solution. The internet does not provide assurances on quality of service. A mobile device at the edge of the range of a tower will have more latency in requesting each chunk of content than a wired connection at a large research university. Even over the same stable network the time it takes for a segment of media to be returned can fluctuate based on network conditions. This variability can lead to media playback stuttering or stalling while retrieving the next segment or taking too much time to buffer enough content to achieve smooth playback. There are a couple different solutions to this that have been developed.

With only progressive download at your disposal one solution is to allow the user to manually select a rendition to play back. The same media content is delivered as several separate files at different resolutions and/or bitrates. Lower resolutions and bitrates mean that the segments will be smaller in size and faster to deliver. The media player is given a list of these different renditions with labels and then provides a control for the user to choose the version they prefer. The user can then select whether they want to watch a repeatedly stalling, but high quality, video or would rather watch a lower resolution video playing back smoothly. Many sites implement this pattern as a relatively simple way to take into account that different users will have different network qualities. The problem I have found with this solution for progressive download video is that I am often not the best judge of network conditions. I have to fiddle with the setting until I get it right if I ever do. I can set it higher than it can play back smoothly or select a much lower quality than what my current network could actually handle. I have also found sites that set my initial quality level much lower than my network connection can handle which results in a lesser experience until I make the change to a higher resolution version. That it takes me doing the switching is annoying and distracting from the content.

Adaptive Bitrate Formats

To improve the quality of the experience while providing the highest quality rendition of the media content that the network can handle, other delivery mechanisms were developed. I will cover in general terms a couple I am familiar with, that have the largest market share, and that were designed for delivery over HTTP. For these formats the client measures network conditions and delivers the highest quality version that will lead to smooth playback. The client monitors how long it takes to download each segment as well as the duration of the current buffer. (Sometimes the client also measures the size of the video player in order to select an appropriate resolution rendition.) The client can then adapt on the fly to network conditions to play the video back smoothly without user intervention. This is why it is called “smooth streaming” in some products.

For adaptive bitrate formats like HLS and MPEG-DASH what gets initially delivered is a manifest of the available renditions/adaptations of the media. These manifests contain pointers for where (which URL) to find the media. These could be whole media files for byte range requests, media file segments as separate files, or even in the case of HLS a further manifest/playlist file for each rendition/stream. While the media is often referred to in a manifest with relative URLs, it is possible to serve the manifest from one server and the media files (or further manifests) from a different server like a CDN.

How the media files are encoded is important for the success of this approach. For these formats the different representations can be pre-segmented into the same duration lengths for each segment across all representations. In a similar way they can also be carefully generated single files that have full frames relatively close together within a file and all have these full frames synchronized between all the renditions of the media. For instance all segments could be six seconds with an iframe every 2 seconds. This careful alignment of segments allows for switching between representations without having glitchy moments where the video stalls, without the video replaying or skipping ahead a moment, and with the audio staying synchronized with the video.

It is also possible in the case of video to have one or more audio streams separate from the video streams. Separate audio streams aligned with the video representations will have small download sizes for each segment which can allow a client to decide to continue to play the audio smoothly even if the video is temporarily stalled or reduced in quality. One use case for this audio stream performance optimization is the delivery of alternative language tracks as separate audio streams. The video and audio bitrates can be controlled by the client independently.

In order for adaptive formats like this to work all of the representations need to have the next required segment ready on the server in case the client decides to switch up or down bitrates. While cultural heritage use cases that IIIF considers do not include live streaming broadcasts, the number of representations that all need to be encoded and available at the same time effects the “live edge”–how close to real-time the stream can get. If segments are available in only one high bitrate rendition then the client may not be able to keep up with a live broadcast. If all the segments are not available for immediate delivery then it can lead to playback issues.

The manifests for adaptive bitrate formats also include other helpful technical information about the media. (For HLS the manifest is called a master playlist and for MPEG-DASH a Media Presentation Description.) Included in these manifests can be the duration of the media, the maximum/minimum height and width of the representations, the mimetype and codecs (including MP4 level) of the video and audio, the framerate or sampling rate, and lots more. Most importantly for quality of experience switching, each representation includes a number for its bandwidth. There are cases where content providers will deliver two video representations with the same height and width and different bitrates to switch between. In these cases it is a better experience for the user to maintain the resolution and switch down a bandwidth than to switch both resolution and bandwidth. The number of representations–the ladder of different bandwidth encodes–can be quite extensive for advanced cases like Netflix over-the-top (OTT aka internet) content delivery. These adaptive bitrate solutions are meant to scale for high demand use cases. The manifests can even include information about sidecar or segmented subtitles and closed captions. (One issue with adaptive formats is that they may not play back across all devices, so many implementations will still provide progressive download versions as a fallback.) Manifests for adaptive formats include the kind of technical information that is useful for clients.

Because there are existing standards for the adaptive bitrate pattern that have broad industry and client support, there is no need to attempt to recreate these formats.

AV Performance Solved

All except the most advanced video on demand challenges have current solutions through ubiquitous video formats and adaptive bitrate streaming. As new formats like VP9 increase in adoption the situation for performance will improve even further. These formats have bitrate savings through more advanced encoding that greatly reduces file sizes while maintaining quality. This will mean that adaptive bitrate formats are likely to require fewer renditions than are typically published currently. Note though that in some cases smaller file sizes and faster decoding comes at the expense of much slower encoding when trying to keep a good quality level.

There is no need for the cultural heritage community to try to solve performance challenges when the expert AV community and industry has developed advanced solutions.

Parameterized URLs and Performance

One of the proposals for providing a IIIF AV API alongside the Image API involves mirroring the existing Image API by providing parameters for segmenting and transforming of media. I will call this the “parameterized approach.” One way of representing this approach is this URL:

http://server/prefix/identifier/timeRegion/spaceRegion/timeSize/spaceSize/rotation/quality.format

You can see more about this type of proposal here and here. The parameters after the identifier and before the quality would all be used to transform the media.

For the Image API the parameterized approach for retrieving tiles and other derivatives of an image works as an effective performance optimization for delivery. In the case of AV having these parameters does not improve performance. It is already possible to seek into progressive download and adaptive bitrate formats. There is not the same need to tile or zoom into a video as there is for a high definition image. A good consumer monitor will show you as full a resolution as you can get out of most video.

And these parameters do not actually solve the most pressing media delivery performance problems. The parameterized approach probably is not optimizing for bitrate which is one of the most important settings to improve performance. Having a bitrate parameter within a URL would be difficult to implement well. Bitrate could significantly increase the size of the media or increase visible artifacts in the video or audio beyond usability. Would the audio and video bitrates be controlled separately in the parameterized approach? Bitrate is a crucially important parameter for performance and not one I think you would put into the hands of consumers. It will be especially difficult as bitrate optimization for video on demand is slow and getting more complicated. In order to optimize variable bitrate encoding 2-pass encoding is used and slower encoding settings can further improve quality. With new formats with better performance for delivery, bitrate is reduced for the same quality while encoding is much slower. Advanced encoding pipelines have been developed that perform metrics on perceptual difference so that each video or even section of a video can be encoded at the lowest bitrate that still maintains the desired quality level. Bitrate is where performance gains can be made.

The only functionality proposed for IIIF AV that I have seen that might be helped by the parameterized approach is download of a time segment of the video. This is specific to download of just that time segment. Is this use case big enough to be seriously considered for the amount of complexity it adds? Why is download of a time segment crucial? Why would most cases not be met with just skipping to that section to play? Or can the need be met with downloading the whole video in those cases where download is really necessary? If needed any kind of time segment download use case could live as a separate non-IIIF service. Then it would not have any expectation of being real-time. I doubt most would really see the need to implement a download service like this if the need can be met some other way. In those cases where real-time performance to a user does not matter those video manipulations could be done outside of IIIF. For any workflow that needs to use just a portion of a video the manipulation could be a pre-processing step. In any case if there is really the desire for a video transformation service it does not have to be the IIIF AV API but could be a separate service for those who need it.

Most of the performance challenges with AV have already been solved via progressive download formats and adaptive bitrate streaming. Remaining challenges not fully solved with progressive download and adaptive bitrate formats include live video, server-side control of quality of service adaptations, and greater compression in new codecs. None of these are the types of performance issues the cultural heritage sector ought to try to take on, and the parameterized approach does not contribute solutions to these remaining issues. Beyond these rather advanced issues, performance is a solved problem that has had a lot of eyes on it.

If the parameterized approach is not meant to help with optimizing performance what problem is it trying to solve? The community would be better off steering clear of this trap of trying to optimize for performance and instead focus on problems that still need to be solved. The parameterized approach is sticking with a performance optimization pattern that does not add anything for AV. It has a detrimental fixation on the bitstream that does not work for AV especially as adaptive bitrate segmented formats are concerned. It appears motivated by some kind of purity of approach rather than taking into account the unique attributes of AV and solving these particular challenges well.

AV Sharing

The other challenge a standard can help with is sharing of AV across institutions. If the parameterized approach does not solve a performance problem, then what about sharing? If we want to optimize for sharing and have the greatest number of institutions sharing their AV resources, then there is still no clear benefit for the parameterized approach. What about this parameterized approach aids in sharing? It seems to optimize for performance, which as we have seen above is not needed, at the expense of the real need to improve and simplify sharing. There are many unique challenges for sharing video across institutions on the web that ought to be considered before settling on a solution.

One of the big barriers to sharing is the complexity of AV. Compared to delivery of still images video is much more complicated. I have talked to a few institutions that have digitized video and have none of it online yet because of the hurdles. Some of the complication is technical, and because of this institutions are quicker to use easily available systems just to get something done. As a result many fewer institutions will have as much control over AV as they have over images. It will be much more difficult to gain that kind of control. For instance with some media servers they may not have a lot of control over how the video is served or the URL for a media file.

Video is expensive. Even large libraries often make choices about technology and hosting for video based on campus providing the storage for it. Organizations should be able to make the choices that work for their budget while still being able to share in as much as they desire and is possible.

One argument made is that many institutions had images they were delivering in a variety of formats before the IIIF Image API, so asking for similar changes to how AV is delivered should not be a barrier to pursuing a particular technical direction. The difficulty of institutions in dealing with AV can not be minimized in this way as any kind of change will be much greater and asking much more. The complexity and costs of AV and the choices that forces should be taken into consideration.

An important question to ask is who you want to help by standardizing an API for sharing? Is it only for the well-resourced institutions who self-host video and have the technical expertise? If it is required that resources live in a particular location and only certain formats be used it will lead to fewer institutions gaining the sharing benefits of the API because of the significant barriers to entry. If the desire is to enable wide sharing of AV resources across as many institutions as possible, then that ought to lead to a different consideration of the issues of complexity and cost.

One issue that has plagued HTML5 video from the beginning is the inability of the browser vendors to agree on formats and codecs. Early on open formats like WebM with VP8 were not adopted by some browsers in favor of MP4 with H.264. It became common practice out of necessity to encode each video in a variety of formats in order to reach a broad audience. Each source would be listed on the page (on a source element within a video element) and the browser picks which it can play. HTML5 media was standardized to use a pattern to accommodate the situation where it was not possible to deliver a single format that could be played across all browsers. It is only recently that MP4 with H.264 has been able to be played across all current browsers. Only after Cisco open sourced its licensed version of H.264 was this possible. Note while the licensing situation for playback has been improved there are still patent/licensing issues which mean that some institutions still will not create or deliver any MP4 with H.264.

But now even as H.264 can be played across all current browsers, there are still changes coming that mean a variety of formats will be present in the wild. New codecs like VP9 that provide much better compression are taking off and have been adopted by most, but not all, modern browsers. The advantages of VP9 are that it reduces file size such that storage and bandwidth costs can be reduced significantly. Encoding time is increased while performance is improved. And still other new, open formats like AV1 using the latest technologies are being developed. Even audio is seeing some change as Firefox and Chrome are implementing FLAC which will make it an option to use a lossless codec for audio delivery.

As the landscape for codecs continues to change the decision on which formats to provide should be given to each institution. Some will want to continue to use a familiar H.264 encoding pipeline. Others will want to take advantage of the cost savings of new formats and migrate. There ought to be allowance for each institution to pick which formats best meet their needs. Since sources in HTML5 media can be listed in order of preference, in as much as is possible a standard ought to support the ability of a client to respect the preferences of the institution for these reasons. So if WebM VP9 is the first source and the browser can play that format it should play it even if an MP4 H.264 is available which it can also play. The institution may make decisions around the quality to provide for each format to optimize for their particular content and intended uses.

Then there is the choice to implement adaptive bitrate streaming. Again institutions could decide to implement these formats for a variety of reasons. Delivering the appropriate adaptation for the situation has benefits beyond just enabling smooth playback. By delivering only the segment size a client can use based on network conditions and sometimes player size, the segments can be much smaller lowering bandwidth costs. The institution can make a decision depending on their implementation and use patterns whether their costs are more with storage or bandwidth and use the formats that work best for them. It can also be a courtesy to mobile users to deliver smaller segment sizes. Then there are delivery platforms where an adaptive bitrate format is required. Apple requires iOS applications to deliver HLS for any video over ten minutes long. Any of these types of considerations might nudge an AV provider to use ABR formats. They add complexity but also come with attractive performance benefits.

Any solution for an API for AV media should not try to pick winners among codecs or formats. The choice should be left to the institution while still allowing them to share the media in these formats with other institutions. It should allow for sharing AV in whatever formats an institution chooses. An approach which restricts which codecs and formats can be shared does harm and closes off important considerations for publishers. Asking them to deliver too many duplicate versions will also mean forcing certain costs. Will this variety of codecs allow for complete interoperability from every institution to every other institution and user? Probably not, but the tendency will be for institutions to do what is needed to support a broad range of browsers while optimizing for their particular needs. Guidelines and evolving best practices can also be part of any community built around the API. A standard for AV sharing should not shut off options while allowing for a community of practice to develop.

Simple API

If an institution is able to deliver any of their video on the web, then that is an accomplishment. What could be provided to allow them to most easily share their video with other institutions? One simple approach would be for them to create a URL where they can publish information about the video. Some JSON with just enough technical information could map to the properties an HTML5 video player uses. Since it is still the case that many institutions are publishing multiple versions of each video in order to cover the variety of new and old browsers and mobile devices, it could include a list of these different video sources in a preferred order. Preference could be given to an adaptive bitrate format or newer, more efficient codec like VP9 with an MP4 fallback further down the list. Since each video source listed includes a URL to the media, the media file(s) could live anywhere. Hybrid delivery mechanisms are even possible where different servers are used for different formats or the media are hosted on different domains or use CDNs.

This ability to just list a URL to the media would mean that as institutions move to cloud hosting or migrate to a new video server, they only need to change a little bit of information in a JSON file. This greatly simplifies the kind of technical infrastructure that is needed to support the basics of video sharing. The JSON information file could be a static file. No need even for redirects for the video files since they can live wherever and change location over time.

Here is an example of what part of a typical response might look like where a WebM and an MP4 are published:

{ "sources": [ { "id": "https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.webm" "format": "webm", "height": 480, "width": 720, "size": "3360808", "duration": "35.627000", "type": "video/webm; codecs=\"vp8,vorbis\"", }, { "id": "https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.mp4" "format": "mp4", "frames": "1067", "height": 480, "width": 720, "size": "2924836", "duration": "35.627000", "type": "video/mp4; codecs=\"avc1.42E01E,mp4a.40.2\"", } ] }

You can see an example of this “sources” approach here.

An approach that simply lists the available sources an institution makes available for delivery ought to be easier for more institutions over other options for sharing AV. It would allow them to effectively share the whole range of the types of audio and video they already have no matter what technologies they are currently using. In the simplest cases there would be no need for even redirects. If you are optimizing for widest possible sharing from the most institutions, then an approach along these lines ought to be considered.

Straight to AV in the Presentation API?

One interesting option has been proposed for IIIF to move forward with supporting AV resources. This approach is presented in What are Audio and Video Content APIs?. The mechanism is to list out media sources similar to the above approach but on a canvas within a Presentation API manifest. The pattern appears clear for how to provide a list of resources in a manifest in this way. It would not require a specific AV API that tries to optimize for the wrong concerns. The approach still has some issues that may impede sharing.

Requiring an institution to go straight to implementing the Presentation API means that nothing is provided to share AV resources outside of a manifest or a canvas that can be referenced separate from a Presentation manifest. Not every case of sharing and reuse requires the complexity of a Presentation manifest in order to just play back a video. There are many use cases that do not need a sequence with a canvas with media with an annotation with a body with a list of items–a whole highly nested structure, just to get to the AV sources needed to play back some media. This breaks the pattern from the Image API where it is easy and common to view an image without implementing Presentation at all. Only providing access to AV through a Presentation manifest lacks simplicity which would allow an institution to level up over time. What is the path for an institution to level up over time and incrementally adopt IIIF standards? Even if a canvas could be used as the AV API as a simplification over a manifest, requiring a dereferenceable canvas would further complicate what it takes to implement IIIF. Even some institutions that have implemented IIIF and see the value of a dereferenceable canvas have not gotten that far yet in their implementations.

One of the benefits I have found with the Image API is the ability to view images without needing to have the resource described and published to the public. This allows me to check on the health of images, do cache warming to optimize delivery, and use the resources in other pre-publication workflows. I have only implemented manifests and canvases within my public interface once a resource has been published, so would effectively be forced to publish the resource prematurely or otherwise change the workflow. I am guessing that others have also implemented manifests in such a way that is tied to their public interfaces.

Coupling of media access with a manifest has some other smaller implications. Requiring a manifest or canvas leads to unnecessary boilerplate when an institution does not have the information yet and still needs access to the resources to prepare the resource for publication. For instance a manifest and a canvas MUST have a label. Should they use “Unlabeled” in cases where this information is not available yet?

In my own case sharing with the world is often the happy result rather than the initial intention of implementing something. For instance there is value in an API that supports different kinds of internal sharing. Easy internal sharing enables us to do new things with our resources more easily regardless of whether the API is shared publicly. That internal sharing ought to be recognized as an important motivator for adopting IIIF and other standards. IIIF thus far has enabled us to more quickly develop new applications and functionality that reuse special collections image resources. Not every internal use will need or want the features found in a manifest, but just need to get the audio or video sources to play them.

If there is no IIIF AV API that optimizes for the sharing of a range of different AV formats and instead relies on manifests or canvases, then there is still a gap that could be filled. For at least local use I would want some kind of AV API in order to get the technical information I would need to embed in a manifest or canvas. This seems like it could be a common desire to decouple technical information about video resources from the fuller information needed for a manifest including attributes like labels needed for presentation with context to the public. Coupling AV access too tightly to Presentation does not help to solve the desire to decouple these technical aspects. It is a reasonable choice to consider this technical information a separate concern. And if I am already going through the work to create such an internal AV API, I would like to be able to make this API available to share my AV resources outside of a manifest or canvas.

Then there is also the issue of AV players. In the case of images many pan zoom image viewers were modified to work with the Image API. One of the attractions to delivery images via IIIF or adopting a IIIF image server is that there is choice in viewers. Is the expectation that any AV players would need to read in a Presentation manifest or canvas in order to support IIIF and play media? The complexity of the manifest and canvas documents may hinder adoption IIIF in media players. These are rather complicated documents that take some time to understand. A simpler API than Presentation may have a better chance to be more widely adopted for players and easier to maintain. We only have the choice of a couple featureful client side applications for presenting manifests (UniversalViewer and Mirador), but we already have many basic viewers for the Image API. Even though not all of those basic viewers are used within the likes of UniversalViewer and Mirador, the simpler viewers have still been of value for other use cases. For instance a simple image viewer can be used in a metadata management interface where UniversalViewer features like the metadata panel and download buttons are unnecessary or distracting. Would the burden of maintaining plugins and shims for various AV players to understand a manifest or canvas rest with the relatively small IIIF community rather than with the larger group of maintainers of AV players? Certainly having choice is part of the benefit of having the Image API supported in many different image viewers. Would IIIF still have the goal of being supported by a wide range of video players? This ability to have broad support within some of the foundational pieces like media players allows for better experimentation on top of it.

My own implementation of the Image API has shown how having a choice of viewers can be of great benefit. When I was implementing the IIIF APIs I wanted to improve the viewing experience for users by using a more powerful viewer. I chose UniversalViewer even though it did not have a very good mobile experience at the time. We did not want to give up the decent mobile experience we had previously developed. Moving to only using UV would have meant giving up on mobile use. So that we could still have a good mobile interface while UV was in the middle of improving its mobile view, we also implemented a Leaflet-based viewer alongside UV. We toggled each viewer on/off with CSS media queries. This level of interoperability at this lower level in the viewer allowed us to take advantage of multiple viewers while providing a better experience for our users. You can read more about this in Simple Interoperability Wins with IIIF. As AV players are uneven in their support of different features this kind of ability to swap out one player for another, say based on video source type, browser version, or other features, may be particularly useful. We have also seen new tools for tasks like cropping grow up around the Image API and it would be good to have a similar situation for AV players.

So while listing out sources within a manifest or canvas would allow for institutions with heterogeneous formats to share their distributed AV content, the lack of an API that covers these formats results in some complication, open questions, and less utility.

Conclusion

IIIF ought to focus on solving the right challenges for audio and video. There is no sense in trying to solve the performance challenges of AV delivery. That work has been well done already by the larger AV community and industry. The parameterized approach to an AV API does not bring significant delivery performance gains though that is the only conceivable benefit to the approach. The parameterized approach does not sufficiently help make it easier for smaller institutions to share their video. It does not provide any help at all to institutions that are trying to use current best practices like adaptive bitrate formats.

Instead IIIF should focus on achieving ubiquitous sharing of media across many types of institutions. The focus on solving the challenges with sharing media and the complexity and costs with delivering AV resources leads to meeting institutions more where they are at. A simple approach to an AV API that lists out the sources would more readily solve the challenges institutions will face with sharing.

Optimizing for sharing leads to different conclusions than optimizing for performance.

Updates

Since writing this post I’ve reconsidered some questions and modified my conclusions.

Update 2017-02-04: Canvas Revisited

Since I wrote this post I got some feedback on it, and I was convinced to try the canvas approach. I experimented with creating a canvas, and it looks more complex and nested than I would like, but it isn’t terrible to understand and create. I have a few questions I’m not sure how I’d resolve, and there’s some places where there could be less ambiguity.

You can see one example in this gist.

I’d eventually like to have an image service that can return frames from the video, but for now I’ve just included a single static poster image as a thumbnail. I’m not sure how I’d provide a service like that yet, though I had prototyped something in my image server. One way to start with creating an image service that just provides full images for the various sizes that are provided with the various adaptations. Or could a list of poster image choices with width & height just be provided somehow? I’m not sure what an info.json would look like for non-tiled images. Are there any Image API examples out in the wild that only provide a few static images?

I’ve included a width an height for the adaptive bitrate formats, but what I really mean is the maximum height and width that’s provided for those formats. It might be useful to have those values available.

I haven’t included duration for each format, though there would be slight variations. I don’t know how the duration of the canvas would be reconciled with the duration of each individual item. Might just be close enough to not matter.

How would I also include an audio file alongside a video? Are all the items expected to be a video and the same content? Would it be alright to also add an audio file or two to the items? My use case is that I have a lot of video oral histories. Since they’re mostly talking heads some may prefer to just listen to the audio than to play the video. How would I say that this is the audio content for the video?

I’m uncertain how with the seeAlso WebVTT captions I could say that they are captions rather than subtitles, descriptions, or chapters. Would it be possible to add a “kind” field that maps directly to an HTML5 track element attribute? Otherwise it could be ambiguous what the proper use for any particular WebVTT (or other captions format) file is.

Several video players allow for preview thumbnails over the time rail via a metadata WebVTT file that references thumbnail sprites with media fragments. Is there any way to expose this kind of metadata file on a canvas to where it is clear what the intended use of the metadata file is? Is this a service?

LibUX: The library interface cracks a little. Elsevier acquihires Plum Analytics

Sat, 2017-02-04 05:22

This (!), my friends, is the — hic — sound of confirmation bias. I’m drunk with it. Elsevier announced yesterday they acquihired Plum Analytics. Let me pull together a rundown, then I’ll weigh in.

Nearly three years to the day after Plum Analytics was acquired by EBSCO, Elsevier announced  this morning that it has acquired the altmetric data aggregator’s brand, tools, data, and the team working on the project. …

While serving the library market broadly has been EBSCO’s goal, serving the academic market deeply has been Elsevier’s. Within this framework, one can see that Plum Analytics is simply a better fit strategically for Elsevier than for EBSCO. …

With this acquisition, we might now be able to infer that altmetrics are even further detached from libraries and more suitably tied to other elements of the academic community, such as the research office or individually with the faculty.

Todd A. Carpenter
Plum Goes Orange: Elsevier Acquires Plum Analytics

Today’s news about Plum helps to clarify the directions that both EBSCO and Elsevier are heading. EBSCO and ProQuest alike are building up an impressive array of content support businesses, which are intended to be sold principally to libraries but are outside of the materials budget. Elsevier and Springer (through its related Digital Science businesses) are building portfolios in research management and analytics businesses, which are intended to be sold to universities (beyond the library budget altogether) and to scholars themselves. Now that Plum is in the hands of Elsevier, the delineation is even clearer than it was before. …

Looking across the three companies, EBSCO, Elsevier, and ProQuest, we can see common patterns. These firms and their predecessors held very different positions 15 years ago, but today each has pivoted substantially to reduce its overall exposure to content licensing to academic libraries. …

But increasingly, these companies are offering an array of products and services that will cut across traditional organizational boundaries, both within the library and more widely within the university.

These new product configurations also position the library no longer as the only organization on campus to serve as counterparty to these vendors. While not in every case, in some cases, vendors appear to be using this new position to take a “divide and conquer” approach, pitting the library at least implicitly against other parts of the university. How will libraries position themselves to respond?

Roger C. Schonfield
The Strategic Investments of Content Providers

A year or so ago I suggested that we strategically embrace the libraries role as the interface between users and vendors, wherein the most successful experience requires that the interface be designed as thinly as possible. There at the interface is where you have the greatest impact – both on the end-user experience, but also at the negotiating table. The fastest growing companies in the world all occupy this space.

Elsevier knows it. They’re wriggling free.

Mashcat: Mashcat Twitter chats in 2017

Sat, 2017-02-04 02:07

Including the inaugural Mashcat Twitter chat in April of 2015, we have had 19 chats since the second round of Mashcat began. Now, it’s time to start planning for even more.

What is a #mashcat Twitter chat? It is a way for us share information (and sometimes war stories) about a topic of interest to programmers, catalogers, metadata folk, and techies active in libraries, archives, museums, and related cultural heritage institutions. By meeting every month or so, we hope to tear down the walls that sometimes divide those who wish to put metadata in service to the benefit of all.

As the name implies, the chat takes place on Twitter at a time announced in advance. Somebody acts as moderator, asking a set of questions, spread out over the hour; participants can answer them—then let the conversation ramify. The #mashcat hashtag is used to keep everything together. After each chat, a summary is published allowing those who couldn’t attend the chat to read along.

Past topics have included:

  • How catalogers and library technologists can build better relationships
  • Use Cases: What exactly are the problems catalogers and metadata librarians want to solve with code?
  • Critical approaches to library data and systems (AKA #critlib meets #mashcat)
  • Systems migrations
  • Linked open data

Is there a library metadata itch you would liked scratched? Do you want to learn how folks “on the other side” do something? Would you like to moderate a chat? Then visit this Google document and add your suggestions!

District Dispatch: The Email Privacy Act’s time is now!

Fri, 2017-02-03 19:56

As reported in District Dispatch less than a month ago, ALA President Julie Todaro called on both Chambers of Congress to immediately pass H.R. 387, the Email Privacy Act. This critical and long overdue legislation had just been reintroduced after unanimously passing the House last year before stalling in the Senate. If approved in the current Congress, the bill finally will extend full 4th Amendment privacy protection to all Americans’ emails, texts, tweets, cloud-stored photos and files, and other electronic communications. Now is the time to start making that a reality.

On Monday, February 6, the entire House of Representatives will vote on H.R. 387 using a special procedure that will protect it from amendments and expedite the process. That procedure (known as a suspension of the rules) also requires that it receive support from two-thirds of the Representatives voting, not as a simple majority. The bill should have no trouble clearing that hurdle. Given how many Members of Congress and their staffs are brand new—and how important this vote is—we cannot afford to sit back and watch.

No matter where you live, now is the time to bring the 4th Amendment fully into the 21st century by calling, emailing, and/or texting your Member of Congress through the ALA Action Center.

Help us send the Email Privacy Act to the Senate with another unanimous vote in the House. With the crucial vote on H.R. 387 set for Monday, we have no time to lose. Contact your Representative now.

Coalition Letter to Chairman and Ranking Member of House Judiciary Committee urging support of HR 387, the Email privacy Act, to reform the Electronic Communications Privacy Act of January 30, 2017

The post The Email Privacy Act’s time is now! appeared first on District Dispatch.

LITA: Top Five Pop Up Tech Toys for Teens

Fri, 2017-02-03 15:10

After-school can be a challenging time for a teen librarian. The teens stream in, bubbling with energy after a long day of sitting in a desk. They’re enthusiastic to be around their peers in a new setting. If left to fester, this energy can yield behavioral issues—especially in the winter months, when cabin fever combined with an inability to blow off some steam outside leave teens feeling restless and bored.

One of my favorite methods to direct teens’ energy towards productive, library appropriate behaviors  is to come prepared with an activity. I find it ideal to bring something into the space, rather than utilize something that’s already there, because the novelty of the activity generates more interest. While board games, coloring, and small crafts remain go-tos, it’s especially fun to bring in some tech toys.

Here are some of my favorites, ordered roughly in old tech to new tech.

1. Record Player. 

(https://www.pexels.com/photo/vintage-music-sound-retro-96857/)

What’s cooler than going old school? …Okay, don’t actually answer that. But despite growing up with thousands of songs in a pocket-sized gadget, teens are consistently eager to backtrack to ye olde method of selecting a single record, setting it up, and enjoying the improved acoustics. Plus, it creates a cozy cafe feeling in the space.

If your library has a record collection or archive, this is a great way to promote that resource. Don’t have the top 40 music the teens typically listen to? No problem. The record player naturally invites a more diverse music selection, be it oldies or indie or beyond.

2. Cameras and accessories.

(https://www.pexels.com/photo/black-and-gray-polaroid-supercolor-printer-121797/)

Whether it’s selfies or artsy shots—or artsy selfies!—picture taking is a very common interest among teens. Most teens like to do this on their own phones, but the library could tap into that interest by getting some fun cameras and add-ons.

Think Polaroids or disposable cameras, which create photos you can use to decorate your teen space or give the teens to take home. Think phone camera accessories, like fish eye or macro lenses that can be passed around or used on a library phone or tablet. These are great for a teen department’s social media account or any kind of library marketing.

Think, too, of the ridiculous, fun, and ridiculously fun filters on Snapchat. If your library has an account, consider playing around with the filters to do things like face swaps with book covers or teen-led book talks with silly voice edits. The possibilities are endless, but I’m going to stop there for now and save the larger Snapchat conversation for another post.

3. Old, broken devices.

(https://www.pexels.com/photo/vintage-music-antique-radio-9295/)

Two of my favorite things are recycling and learning new skills, so this one’s a home run in my book. Invite patrons to donate their old devices, like radios, cameras, and phones, even the broken ones—especially the broken ones! Bring them to your teen space with some basic tools and invite them to see if they can get the devices up and running again, if they can take it all apart and put it back together to learn how it works, or if they can Macgyver something new out of the parts.

This is a great way to work on STEM skills while having fun. If this goes over well with your community, you can think about expanding it into a larger, on-going program, complete with adult mentors.

Credit to YouMedia for inspiring me on this one.

4. Virtual Reality Goggles.

(https://www.pexels.com/photo/sea-landscape-nature-sky-123318/)

I wasn’t sure how I felt about these when I first heard about them, but man, did the teens have a blast when we brought out a pair! We used a haunted asylum game and a roller coaster game. Though only one person can use it at a time, the others had a great time watching their friend get scared by something we couldn’t see, walking into a chair, or laughing at their friend putting their arms up in the air as if they were on a real ride.

With an app full of games to choose from, it’s easy to use this multiple times without it getting old. I recommend limiting each person’s use to a couple of minutes at a time, because the effects can really throw your brain off after sustained use. However, that makes it all the easier to advocate for sharing and taking turns.

5. STEM toys.

(https://www.pexels.com/photo/alone-anime-art-artistic-262272/)

Between makerspaces and STEM programs, it’s likely your library has a coding tool or a programmable robot. It may be tucked away in your makerspace, or it may belong to your children’s department, but why not borrow it for the afternoon and bring it to your teen space? Your teens may not know the library has these, in which case, this is an easy way to promote things you already have. For your teens who have more experience with tech toys, it’s fun to revisit an old favorite, teach a friend how it works, or challenge them to try to do something new with it.

Bringing these toys into the space breathes new life into your tech, and it can help the teens connect to new areas of the library.

 

What are your favorite tech toys for teens? What’s your experience with using tech in a pop-up setting?

Galen Charlton: Continuing the lesson

Fri, 2017-02-03 13:39

The other day, school librarian and author Jennifer Iacopelli tweeted about her experience helping a student whose English paper had been vandalized by some boys. After she had left the Google Doc open in the library computer lab when she went home, they had inserted some “inappropriate” stuff. When she and her mom went to work on it later that evening, mom saw the insertions, was appalled, and grounded the student. Iacopelli, using security camera footage from the library’s computer lab, was able to demonstrate that the boys were responsible, with the result that the grounding was lifted and the boys suspended.

This story has gotten retweeted 1,300 times as of this writing and earned Iacopelli a mention as a “badass librarian” in HuffPo.

Before I continue, I want to acknowledge that there isn’t much to complain about regarding the outcome: justice was served, and mayhap the boys in question will think thrice before attacking the reputation of another or vandalizing their work.

Nonetheless, I do not count this as an unqualified feel-good story.

I have questions.

Was there no session management software running on the lab computers that would have closed off access to the document when she left at the end of the class period? If not, the school should consider installing some. On the other hand, I don’t want to hang too much on this pin; it’s possible that some was running but that a timeout hadn’t been reached before the boys got to the computer.

How long is security camera footage from the library computer lab retained? Based on the story, it sounds like it is kept at least 24 hours. Who, besides Iacopelli, can access it? Are there procedures in place to control access to it?

More fundamentally: is there a limit to how far student use of computers in that lab is monitored? Again, I do not fault the outcome in this case—but neither am I comfortable with Iacopelli’s embrace of surveillance.

Let’s consider some of the lessons learned. The victim learned that adults in a position of authority can go to bat for her and seek and acquire justice; maybe she will be inspired to help others in a similar position in the future. She may have learned a bit about version control.

She also learned that surveillance can protect her.

And well, yes. It can.

But I hope that the teaching continues—and not the hard way. Because there are other lessons to learn.

Surveillance can harm her. It can cause injustice, against her and others. Security camera footage sometimes doesn’t catch the truth. Logs can be falsified. Innocent actions can be misconstrued.

Her thoughts are her own.

And truly badass librarians will protect that.

LibUX: How do you reasonably gut-check the timeframe of a project?

Fri, 2017-02-03 11:06

I saw yesterday in a listserv an RFP (that’s “request for proposal”) for a redesign with a short — and I mean short — delivery time. Not including the incredibly involved proposal process — and maybe that’s a topic for another day, but I am adamantly against responding to RFPs — these folks were looking for a website in four months.

Unless you're looking for a small palette swap, or you're paying a shit ton of money, you're totally misguided. https://t.co/hIbijwNkW2

— Michael Schofield (@schoeyfield) February 2, 2017

So, in slack, a friend noted that

I have no idea how people make up these timeframes. I’m curious what timeframe is considered reasonable for a project like this? For smaller sites (like our Special Collections) we’ve done in house, we took about 6 months and created something pretty good, though not great. But for our main site that we’re gutting and rebuilding, rearchitecturing, rewriting, etc. – we’re now in year 3. Estimating time on these often unpredictable projects can sometimes feel like throwing darts blindfolded.

This is a super interesting and nuanced topic with a lot of right answers. I have thoughts. Quick aside, consider that the angle I’m approaching this with is as someone who’s part of an in-house web team for the work week, often in addition to being an external contractor for part of one.

The reality is that this is an every-industry kind of a problem, namely because it boils down to the fact there’s just not a lot of really good advice about how to shop for web work. Part of the role of someone being part of an in-house team, or any web-worker, is to help teach the client.

You need end-goals

Timeframes — and budgets — require definitive end-goals with metrics for success. Redesigns that are essentially just palette-swaps are totally legit services, but everyone needs to understand these are suited only for projects where end-goals include nothing but a change in scenery.

Without defined end-goals and success metrics, we put ourselves in a position for scope creep, where projects spiral out of hand.

You need to define what your minimum viable product is, and release it

It benefits projects to iterate as quickly as possible toward a minimum viable product, a 1.0., understanding that the MVP is by and large not the end-product you envision.

Given the nature of the web, the original end-goals for a three year long redesign have probably shifted, or been forgotten, and it’s well into scope creep. Rather, such a long project is indicative of a team looking for perfection.

It’s more important to release something quickly that’s good than release something never that’s perfect.

The reason is two-part.

  1. You immediately start getting user feedback, which will impact design decisions. I don’t agree that user feedback of a prototype that’s not in production is as valuable as feedback of something that is. The use-cases are different. The responses are honest.
  2. This commitment to release an MVP shapes the philosophy for how the team goes about building this thing. Accept that the product or service experience you’re aiming for is 2.0, not 1.0. Knowing that user feedback will impact your design decisions should force your hand into creating modular components and modular content. This makes it easier t oadapt and iterate, which also helps future proof the project for the long term.
How you might gut-check this stuff

So, with that said, here is how I gut check timeframes for new projects. I stress “new projects,” here, because often we might turn around things with existing design systems much more quickly, e.g., a new microsite for a permanent art installation. This isn’t making anything new, but puzzling together things that already exist.

I lump design frameworks like Bootstrap in here. You can cut loads of time — and cost — using tools like this to rapidly develop. When time’s short or budget’s tight, these aren’t bad options.

Major caveat: the process I want to share is only for gut-checking. If you kind of know you need a simple custom web design, and you know the technical requirements that go into that, you can reasonably estimate your timeframe with the assumption that your timeframe will certainly be fine-tuned as you go. Don’t commit to this timeframe until then.

Keep in mind:

  • just say no to anything with a turnaround time of 3 months or less. Or, charge much, much higher. You might be able to turnaround work in this time but — to me — the stress of the timeline just isn’t worth it. Pro-tip: whatever your definition of “rush”, here, I as a contractor will quadruple my price for rushed projects. You quickly discover that the stakeholders’ need to meet such-and-such deadline isn’t real.
  • That said, the stakeholders’ deadline probably isn’t real. Ask: “what would be so bad about adding three months to this deadline?” Probably nothing major.

Okay. So, where was I?

  1. If you measure the duration of a project by percentage, so that 100% completion is version 1.0, then dedicate 50% to discovery. Front-loading the discovery process pays off in dividends in terms of the quality and satisfaction of the product that follows. This means: gather your research, do more research, sketch, sprint, test, workshop, sticky note things, brainstorm. In all reality, you probably need a fraction of that time – but we’re gut-checking timeframes here. No one gets angry if you beat your deadline, right? Be liberal.
  2. Whether this project requires custom development or technical implementation, allow 30% of your timeframe. When you bust this down to hours, allow for things to go wrong. This means if you think a new WordPress theme will take you reasonably 20 hours, allow for 40. If you aren’t used to a development workflow, or you haven’t quite developed that feel for how long something will take, a quick and rough trick for determining hours required for implementation is to break the needs of the project into individual features that require 5 hours each.
  3. Your final 20% is committed to beta testing, having people who need to use your product or service actually use it in the wild, or populate it with content. This is where you discover your deal-breaking bugs.

The second bullet is the crux. Let’s say the actual implementation time of a redesign includes but is not limited to coding and componentizing the design systemwriting modular and performance scriptswriting markup and piecing these all together into templates and pages. If this takes 120 hours, then your project totals 400 hours.

Let’s be realistic in higher-ed and say that with all the other demands, one can only dedicate 20 hours per week to this project. Then, protracted over weeks, you’re looking at a deadline 20 weeks out for an individual. This timeframe contracts with each person added to the implementation of the project. That part’s key: a team with 10 stakeholders and 2 implementors is governed by the workload of the implementors.

I think 5 months for a simple custom web design is reasonable for a single developer, but it assumes some ideal conditions. The catch is that you don’t know the technical needs of your project until you’re well into your discovery phase.

Anyway, plan accordingly.

Eric Hellman: How to enable/disable privacy protection in Google Analytics (it's easy to get wrong!)

Fri, 2017-02-03 03:39
In my survey last year of ARL library web services, I found that 72% of them used Google Analytics. So it's not surprising that a common response to my article about leaking catalog searches to Amazon was to wonder whether the same thing is happening with Google Analytics.

The short answer is "It Depends". It might be OK to use Google Analytics on a library search facility, if the following things are true:
  1. The library trusts Google on user privacy. (Many do.)
  2. Google is acting in good faith to protect user privacy and is not acting under legal compulsion to act otherwise. (We don't really know.)
  3. Google Analytics is correctly doing what their documentation says they are doing and not being circumvented by the rest of Google. (They're not.)
  4. The library has implemented Google Analytics correctly to enable user privacy.
There's an entire blog post to write about each of the first three conditions, but I have only so many hours in a day.  Given that many libraries have decided that the benefits using of Google Analytics outweigh the privacy risks, the rest of this post concerns only this last condition. Of the 72% of ARL libraries that use Google Analytics, I find that only 19% of them have implemented Google Analytics with privacy-protection features enabled.

So, if you care about library privacy but can't do without Google Analytics, read on!

Google Analytics has a lot of configuration options, which is why webmasters love it. For the purposes of user privacy, however, there are just two configuration options to pay attention to, the "IP Anonymization" option and the "Display Features" option.

IP Anonymization says to Google Analytics "please don't remember the exact IP address of my users". According to Google, enabling this mode masks the least significant bits of the user's IP address before the IP address is used or saved. Since many users can be identified by their IP address, this prevents anyone from discovering the search history for a given IP address. But remember, Google is still sent the IP address, and we have to trust that Google will obscure the IP address as advertised, and not save it in some log somewhere. Even with the masked IP address, it may still be possible to identify a user, particularly if a library serves a small number of geographically dispersed users.

"Display Features" says to Google to that you don't care about user privacy, and it's OK to track your users all to hell so that you can get access to "demographic" information. To understand what's happening, it's important to understand the difference between "first-party" and "third-party" cookies, and how they implicate privacy differently.

Out of the box, Google Analytics uses "first party" cookies to track users. So if you deploy Google Analytics on your "library.example.edu" server, the tracking cookie will be attached to the library.example.edu hostname. Google Analytics will have considerable difficulty connecting user number 1234 on the library.example.edu domain with user number 5678 on the "sci-hub.info" domain, because the user ids are chosen randomly for each hostname. But if you turn on Display Features, Google will connect the two user ids via a third party tracking cookie from its Doubleclick advertising service. This enables both you and Google to know more about your users. Anyone with access to Google's data will be able to connect the catalog searches saved for user number 1234 to that user's searches on any website that uses Google advertising or any site that has Display Features turned on.

IP Anonymization and Display Features can be configured in Google Analytics in three ways, depending on how it's being configured. The instructions here apply to the "Universal Analytics" script. You can tell a site uses Universal Analytics because the pages execute a javascript named "analytics.js". An older "classic" version of Google Analytics uses a script named "ga.js"; its configuration is similar to that of Universal. More complex websites may use Google Tag Manager to deploy and configure Google Analytics.

Google Analytics is usually deployed on a web page by inserting a script element that looks like this:

<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
    ga('create', 'UA-XXXXX-Y', 'auto');
    ga('send', 'pageview');
</script>IP Anonymization and Display Features are turned on with extra lines in the script:
    ga('create', 'UA-XXXXX-Y', 'auto');
    ga('require', 'displayfeatures');  // starts tracking users across sites
    ga('set', 'anonymizeIp', true); // makes it harder to identify the user from logs
    ga('send', 'pageview');The Google Analytics Admin allows you to turn on cross site user tracking, though the privacy impact of what you're doing is not made clear . In the "Data Collection" item of the Tracking info pane, look at the toggle switches for "Remarketing" and "Advertising Reporting Features" if these are switched to "ON", then you've enabled cross site tracking and your users can expect no privacy.

Turning on IP anonymization is not quite as easy and turning on cross-site tracking. You have to add it explicitly in your script or turn it on in Google tag manager (where you won't find it unless you know what to look for!).

To check if cross-site tracking has been turned on in your institution's Google Analytics, use the procedures I described in my article on How to check if your library is leaking catalog searches to Amazon.  First, clear the cookies for your website, then load your site and look at the "Sources" tab in Chrome developer tools. If there's a resource from "stats.g.doubleclick.net", then your website is asking google to track your users across sites. If your institution is a library, you should not be telling Google to track your users across sites.

Bottom line: if you use Google Analytics, always remember that Google is fundamentally an advertising company and it will seldom guide you towards protecting your users' privacy.

LITA: New Checklists to Support Library Patron Privacy

Thu, 2017-02-02 19:33

LITA’s Patron Privacy Interest Group has partnered with the ALA IFC’s Privacy Subcommittee to create new checklists to support library patron privacy policies.

The checklists cover:

  • data exchange between networked devices and services
  • e-book lending and digital content vendors
  • library management systems/integrated library systems
  • library websites, OPACs, and discovery services
  • public access computers and networks
  • students in K-12 schools.

Read the complete announcement at: http://www.ala.org/news/member-news/2017/02/lita-offers-patron-privacy-checklists-support-library-bill-rights

Find the Checklists at: http://www.ala.org/lita/advocacy

Thank you to Sarah Houghton and Mike Robinson for leading this effort.

Andromeda Yelton: my statement at the ALA Midwinter Town Hall

Thu, 2017-02-02 19:02

(American Libraries has helpfully provided an unedited transcript of the ALA Council town hall meeting this past Midwinter, which lets me turn my remarks there into a blog post here. You can also watch the video; I start around 24:45. I encourage you to read or watch the whole thing, though; it’s interesting throughout with a variety of viewpoints represented. I am also extremely gratified by this press release, issued after the Town Hall, which speaks to these issues.)

As I was looking at the statements that came out at ALA after the election, I found that they had a lot to say about funding, and that’s important because that’s how we pay our people and collect materials and keep the lights on.

But my concern was that they seemed to talk only about funding, and I found myself wondering — if they come for copyright, will we say that’s okay as long as we’ve been bought off? If they come for net neutrality, will we say that’s okay, as long as we’ve been bought off? When they come for the NEH and the NEA, the artists who make the content that we collect and preserve, are we going to say that’s okay, as long as we get bought off? When they come for free speech — and five bills were introduced in five states just, I think, on Friday, to criminalize protest — will we say that’s okay, as long as we’ve been bought off?

I look at how people I know react and the past actions of the current administration. The fact that every trans person I know was in a panic to get their documents in order before last Friday because they don’t think they will be able to in the next four years. The fact that we have a President who will mock disabled people just because they are disabled and disagreeing with him. The fact that we have a literal white supremacist in the White House who co-wrote the inauguration speech. The fact that one of the architects of Gamergate, which has been harassing women in technology for years, is now a White House staffer. The fact that we have many high-level people in the administration who support conversion therapy, which drives gay and lesbian teenagers to suicide at unbelievable rates. Trans people and people of color and disabled people and women and gays and lesbians are us, they are our staff, they are our patrons.

Funding matters, but so do our values, and so do our people. Funding is important, but so is our soul. And when I look at our messaging, I wonder, do we have a soul? Can it be bought? Or are there lines we do not cross?

Thank you.


Andrew Pace: Seeking Certainty

Thu, 2017-02-02 17:57

“Uncertain times” is a phrase you hear a lot these days. It was actually in the title of the ALA Town Hall that took place in Atlanta last month (ALA Town Hall: Library Advocacy and Core Values in Uncertain Times). Political turmoil, uncertainty, divisiveness, and vitriol have so many of us feeling a bit unhinged. When I feel rudderless, adrift, even completely lost at sea, I tend to seek a safer port. I’ve exercised this method personally, geographically, and professionally and it has always served me well. For example, the stability and solid foundation provided by my family gives me solace when times seem dark. Professionally, I seek refuge in the like-mindedness of librarians and the mission of libraries.

Have you ever encountered a profession more earnest than librarianship? When I feel despair, I recall how lucky I am to be a member of the best profession around. We’ve all made jokes about the value of the degree, we’ve all suffered fools and the bewildered expressions of strangers who ask our line of work, but never once have I questioned my decision to become a librarian.

Now some of you might be saying, “Really? Even now? Even in the wake of controversial press releases, reduced numbers attending conferences, dismaying Executive Orders, protests, and conflict?” I say “Especially now.” Once again, my profession has not let me down. You got it wrong, Publisher’s Weekly. Where you see despair, I see thousands of professionals coming together to solve common problems. I see shared understanding, shared values, and a professional organization that strives to support its membership with a solid pragmatic, strategic, and financial platform.

Debating educational requirements for the next Executive Director, I see democracy at work. Thank you, ALA Council and  Steven Bell.

I see passion, civility, and earnest devotion to core values in a Town Hall. Thank you, American Libraries and the ALA Membership.

I see introspection and activism come alive among my IT brethren. Thank you, Ruth Kitchin Tillman.

I see false dichotomies challenged and the professional and the political trying to find a symbiotic relationship. Thank you, John Overholt.

I see unwavering support for the ALA Code of Ethics. Thank you Andromeda Yelton and Sara Houghton and Andy Woodworth.

I see our professional Bill of Rights defended with practical advice and actions surrounding patron privacy. Thank you, LITA.

I see the ALA stepping up, reminding us about our core values, and even preparing for a fight. Thank you ALA and Julie Todaro.(full disclosure: I am a member of the ALA Executive Board that helped release this statement).

And then I need only thank the thousands and thousands of librarians and library workers who have never diverted from their mission, their core values, and their day-in day-out devotion to serving library users. When I need certainty, librarianship is my rudder, librarians are my life preserver, library workers my oarsmen. And libraries are my port in the storm.

Jonathan Rochkind: never bash again, just ruby

Thu, 2017-02-02 17:52

Sometimes I have a little automation task so simple that I think, oh, I’ll just use a bash script for this.

I almost always regret the decision, as it tends to grow more complicated, and I start fighting with bash and realize that I don’t know bash very well, and why do I want to spend time knowing bash well anyway, and some things are painful to do in bash even if you do know bash more, should have just used a ruby script from the start.

I always forget this again, and repeat. Doh.

One thing that drives me to bash for simple cases, is when your task does consist of a series of shell commands, getting a reasonably behaving script (esp with regard to output and error handling) can be a pain with just backticks or system in a ruby script.

tty-command gem to the rescue!  I haven’t used it yet, but it’s API looks exactly what I need to never accidentally start with bash again, with no added pain from starting with ruby.  I will definitely try to remember this next time I think “It’s so simple, just use a bash script”, maybe I can use a ruby script with tty-command instead.

tty-command is one gem in @piotrmurach’s  TTY toolkit. 


Filed under: General

LibUX: The challenge facing libraries in an era of fake news

Thu, 2017-02-02 15:23

@prarieskygal: Related to the assignment on fake news: http://theconversation.com/the-challenge-facing-libraries-in-an-era-of-fake-news-70828

In recognition of a dynamic and often unpredictable information landscape and a rapidly changing higher education environment in which students are often creators of new knowledge rather than just consumers of information, the Association of College and Research Libraries (ACRL) launched its Framework for Information Literacy for Higher Education, the first revision to the ACRL’s standards for information literacy in over 15 years.

The framework recognizes that information literacy is too nuanced to be conceived of as a treasure hunt in which information resources neatly divide into binary categories of “good” and “bad.”

Notably, the first of the framework’s six subsections is titled “Authority Is Constructed and Contextual” and calls for librarians to approach the notions of authority and credibility as dependent on the context in which the information is used rather than as absolutes.

This new approach asks students to put in the time and effort required to determine the credibility and appropriateness of each information source for the use to which they intend to put it.

For students this is far more challenging than either a) simply accepting authority without question or b) rejecting all authority as an anachronism in a post-truth world. Formally adopted in June 2016, the framework represents a way forward for information literacy.

State Library of Denmark: Automated improvement of search in low quality OCR using Word2Vec

Thu, 2017-02-02 12:50

This abstract has been accepted for Digital Humanities in the Nordic Countries 2nd Conference, http://dhn2017.eu/

In the Danish Newspaper Archive[1] you can search and view 26 million newspaper pages. The search engine[2] uses OCR (optical character recognition) from scanned pages but often the software converting the scanned images to text makes reading errors. As a result the search engine will miss matching words due to OCR error. Since many of our newspapers are old and the scans/microfilms is also low quality, the resulting OCR constitutes a substantial problem. In addition, the OCR converter performs poorly with old font types such as fraktur.

One way to find OCR errors is by using the unsupervised Word2Vec[3] learning algorithm. This algorithm identifies words that appear in similar contexts. For a corpus with perfect spelling the algorithm will detect similar words synonyms, conjugations, declensions etc. In the case of a corpus with OCR errors the Word2Vec algorithm will find the misspellings of a given word either from bad OCR or in some cases journalists. A given word appears in similar contexts despite its misspellings and is identified by its context. For this to work the Word2Vec algorithm requires a huge corpus and for the newspapers we had 140GB of raw text.

Given the words returned by Word2Vec we use a Danish dictionary to remove the same word in different grammatical forms. The remaining words are filtered by a similarity measure using an extended version of Levenshtein distance taking the length of the word and an idempotent normalization taking frequent one and two character OCR errors into account.

Example: Let’s say you use the Word2Vec to find words for banana and it returns: hanana, bananas, apple, orange. Remove bananas using the (English) dictionary since this is not an OCR error. For the three remaining words only hanana is close to banana and it is thus the only misspelling of banana found in this example. The Word2Vec algorithm does not know how a words is spelled/misspelled, it only uses the semantic and syntactic context.

This method is not an automatic OCR error corrector and cannot output the corrected OCR. But when searching it will appear as if you are searching in an OCR corrected text corpus. Single word searches on the full corpus gives an increase from 3% to 20% in the number of results returned. Preliminary tests on the full corpus shows only relative few false positives among the additional results returned, thus increasing recall substantially without a decline in precision.

The advantage of this approach is a quick win with minimum impact on a search engine [2] based on low quality OCR. The algorithm generates a text file with synonyms that can be used by the search engine. Not only single words but also phrase search with highlighting works out of the box. An OCR correction demo[4] using Word2Vec on the Danish newspaper corpus is available on the Labs[5] pages of The State And University Library, Denmark.

[1] Mediestream, The Danish digitized newspaper archive.
http://www2.statsbiblioteket.dk/mediestream/avis

[2] SOLR or Elasticsearch etc.

[3] Mikolov et al., Efficient Estimation of Word Representations in Vector Space
https://arxiv.org/abs/1301.3781

[4] OCR error detection demo (change word parameter in URL)
http://labs.statsbiblioteket.dk/dsc/ocr_fixer.jsp?word=statsminister

[5] Labs for State And University Library, Denmark
http://www.statsbiblioteket.dk/sblabs/

 


Open Knowledge Foundation: A look back at the work of Open Knowledge Foundation Deutschland in 2016

Thu, 2017-02-02 10:30

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Foundation Deutschland team.

We are the Open Knowledge Foundation Deutschland (OKF DE), the German chapter of OKI. We advocate for open knowledge, open data, transparency, and civic participation and consider ourselves an active part of German and European civil society.

Our goals

* we provide technical tools that inform citizens about the potential and chances of open data and empower citizens to become active

* we organize educational events and projects and author publications in the domain of science, research, and public relations

* we offer trainings on open data and related technical tools

* we organize groups that discuss sustainable strategies and applications for the usage and advancement of open knowledge

* we build our community and connect relevant individuals with one another

Currently, we have 25 employees (16,5 FTE, 14 female/11 male) and 8 board members (6 male/2 female) in our team. We are pursuing the concept of “Open Salaries.” We have a simple formula to calculate salaries and we share this with the whole team.  Our salaries are based on the public services salaries (TVÖD 12/S1 – Project Assistant, TVÖD 13/S2 – Project Manager, TVÖD 13/S3 – Project Lead and CEO).

Our anticipated annual budget in 2016 of 1.2 million Euros remains relatively consistent compared to 2015 and is a result of our collective efforts to consolidate our programs and focus on fewer priorities. We are aiming for a mixed funding portfolio to avoid dependency on a few big funders. We are currently working on 19 grant-based projects to advance unlimited access to knowledge across different branches of society (politics, culture, economics, science).

Here’s a brief look back over our work and major projects in 2016:

Ask The Jobcentre! (original: Frag Das Jobcenter!)

Project: FragdenStaat.de

Project lead: Arne.Semsrott@okfn.de

The project FragdenStaat ( “Ask Your Government” in English) runs a campaign to demand wider transparency in public jobcentres in Germany. Jobcentres are powerful authorities: not only are they allowed to track unemployed persons who draw unemployment benefits, they also control the personal data of anyone sharing a household with those beneficiaries.

Internal directives and target agreements manage how jobcentres operate, for instance when and why they cover costs for health insurance, and when they penalise beneficiaries. To understand how jobcentres operate, FragdenStaat wants to request all internal directives and target agreements. Help us to request these documents! More information is available here.

Annual Youth Hackathon “Youth hacked” (orginial: “Jugend hackt”)

Project: Jugendhackt.de

Project lead: Maria.Reimer@okfn.de

“Youth Hacked” is a hackathon that brings together young, tech-savvy people to write code, tinker with hardware, and develop ideas that can change society. In mid-October participants between 12 and 18 years old travelled from all around Germany in order to attend the event. Those who couldn’t join physically were able to attend through livestream. It was a busy weekend: 24 projects were developed by 120 youngsters, supported by 42 mentors and volunteers and followed by about 700 visitors. More about the event can be read in this blogpost, and in this news article (both in German).

The “Youth Hacked” event celebrated a premier in Austria and Switzerland. In November, “Youth Hacked Austria” brought young people in Linz together, shortly followed by the first Youth Hacked event in Zurich, Switzerland. Furthermore, we are happy about a  collaboration with Goethe-Institut Ostasien. Together we teamed up and organised a workshop in Seoul titled “Vernetzte Welten” (engl. “Connected Worlds”).

Prototype fund: first round closes with 500+ submissions

Project: PrototypeFund.de

Project lead: Julia.kloiber@okfn.de; Cosmin.Cabuela@okfn.de

The Prototype Fund is a brand new project of Open Knowledge Foundation Germany. It is also the first public-funding programme around civic tech, data literacy, and data security which targets non-profit software projects. We support software developers, hackers and creatives to develop their ideas – from concept to the first pilot. Every project receives 30.000 Euros, including a mentorship programme and knowledge sharing within an interesting network.

Now the first round of a call for submissions is closed. During this round we received more than 500 submissions. This overwhelming interest is a strong message confirming the need for this project which in total will invest 1.2 million Euros into open source projects.

Within three years, 40 open source prototypes will be funded. Latest news are available on the webseite of the Prototype Fund. The project is supported by the BMBF, Germany’s Federal Ministry of Education and Research.

OGP Summit in Paris: We represented German civil society

For years Open Knowledge Foundation Germany has demanded that Germany join the Open Government Partnership (OGP) and promote the values of open government.

32 European and Central -Asian countries had joined the partnership, but Germany was not among them. This changed in December 2016. Being mindful of recent political developments, we used the opportunity to represent German civil society during the OGP Summit in Paris which was held between December 7 and 9. Our participation included actions and debates such as:

Save the date: OKF DE Data Summit in 2017

Date & location: April 28-29, 2017 | Berlin

Conference with keynotes, workshops and barcamp/unconference

Topics: open data | digital volunteering | civic tech | mobility concepts | open administration | participation | transparency | freedom of information | connectivity | data for social good | data literacy

This year we are planning a data summit connecting the networks that developed through our project ‘Datenschule’ (engl. School of Data). Within two successful years of Code for Germany we developed many different projects and networks around Germany. Our educational program ‘Datenschule’ connects charitable, and non-profit organisations with our community. The goal is to enable NGOs using data as an information source for their socio-political work.

The data summit is intended to connect the members of our School of Data network even more. Over two days, open data and civic tech enthusiasts, representatives of policy, public administration, entrepreneurs, journalists and non-profit organisations can exchange experiences with one another. The data summit shall be a platform to develop new projects, to deepen data literacy through workshops, and to learn how digital tools can be employed in a modern data-driven society. Our goal: To provide a forum where participants can expand their networks, share experiences, get to know each other and exchange knowledge.

Note by the author

OKF DE is an independent not-for-profit organisation registered in Berlin, Germany in 2011 (under VR 30468 B, to be fully transparent). OKF DE is a pioneering and award-winning civil society organisation engaging in different aspects of the digital age. Their work is independent, non-partisan, interdisciplinary and non-commercial.

Open Knowledge Foundation: A look back at the work of Open Knowledge Foundation Deutschland in 2016

Thu, 2017-02-02 10:30

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Foundation Deutschland team.

We are the Open Knowledge Foundation Deutschland (OKF DE), the German chapter of OKI. We advocate for open knowledge, open data, transparency, and civic participation and consider ourselves an active part of German and European civil society.

Our goals

* we provide technical tools that inform citizens about the potential and chances of open data and empower citizens to become active

* we organize educational events and projects and author publications in the domain of science, research, and public relations

* we offer trainings on open data and related technical tools

* we organize groups that discuss sustainable strategies and applications for the usage and advancement of open knowledge

* we build our community and connect relevant individuals with one another

Currently, we have 25 employees (16,5 FTE, 14 female/11 male) and 8 board members (6 male/2 female) in our team. We are pursuing the concept of “Open Salaries.” We have a simple formula to calculate salaries and we share this with the whole team.  Our salaries are based on the public services salaries (TVÖD 12/S1 – Project Assistant, TVÖD 13/S2 – Project Manager, TVÖD 13/S3 – Project Lead and CEO).

Our anticipated annual budget in 2016 of 1.2 million Euros remains relatively consistent compared to 2015 and is a result of our collective efforts to consolidate our programs and focus on fewer priorities. We are aiming for a mixed funding portfolio to avoid dependency on a few big funders. We are currently working on 19 grant-based projects to advance unlimited access to knowledge across different branches of society (politics, culture, economics, science).

Here’s a brief look back over our work and major projects in 2016:

Ask The Jobcentre! (original: Frag Das Jobcenter!)

Project: FragdenStaat.de

Project lead: Arne.Semsrott@okfn.de

The project FragdenStaat ( “Ask Your Government” in English) runs a campaign to demand wider transparency in public jobcentres in Germany. Jobcentres are powerful authorities: not only are they allowed to track unemployed persons who draw unemployment benefits, they also control the personal data of anyone sharing a household with those beneficiaries.

Internal directives and target agreements manage how jobcentres operate, for instance when and why they cover costs for health insurance, and when they penalise beneficiaries. To understand how jobcentres operate, FragdenStaat wants to request all internal directives and target agreements. Help us to request these documents! More information is available here.

Annual Youth Hackathon “Youth hacked” (orginial: “Jugend hackt”)

Project: Jugendhackt.de

Project lead: Maria.Reimer@okfn.de

“Youth Hacked” is a hackathon that brings together young, tech-savvy people to write code, tinker with hardware, and develop ideas that can change society. In mid-October participants between 12 and 18 years old travelled from all around Germany in order to attend the event. Those who couldn’t join physically were able to attend through livestream. It was a busy weekend: 24 projects were developed by 120 youngsters, supported by 42 mentors and volunteers and followed by about 700 visitors. More about the event can be read in this blogpost, and in this news article (both in German).

The “Youth Hacked” event celebrated a premier in Austria and Switzerland. In November, “Youth Hacked Austria” brought young people in Linz together, shortly followed by the first Youth Hacked event in Zurich, Switzerland. Furthermore, we are happy about a  collaboration with Goethe-Institut Ostasien. Together we teamed up and organised a workshop in Seoul titled “Vernetzte Welten” (engl. “Connected Worlds”).

Prototype fund: first round closes with 500+ submissions

Project: PrototypeFund.de

Project lead: Julia.kloiber@okfn.de; Cosmin.Cabuela@okfn.de

The Prototype Fund is a brand new project of Open Knowledge Foundation Germany. It is also the first public-funding programme around civic tech, data literacy, and data security which targets non-profit software projects. We support software developers, hackers and creatives to develop their ideas – from concept to the first pilot. Every project receives 30.000 Euros, including a mentorship programme and knowledge sharing within an interesting network.

Now the first round of a call for submissions is closed. During this round we received more than 500 submissions. This overwhelming interest is a strong message confirming the need for this project which in total will invest 1.2 million Euros into open source projects.

Within three years, 40 open source prototypes will be funded. Latest news are available on the webseite of the Prototype Fund. The project is supported by the BMBF, Germany’s Federal Ministry of Education and Research.

OGP Summit in Paris: We represented German civil society

For years Open Knowledge Foundation Germany has demanded that Germany join the Open Government Partnership (OGP) and promote the values of open government.

32 European and Central -Asian countries had joined the partnership, but Germany was not among them. This changed in December 2016. Being mindful of recent political developments, we used the opportunity to represent German civil society during the OGP Summit in Paris which was held between December 7 and 9. Our participation included actions and debates such as:

Save the date: OKF DE Data Summit in 2017

Date & location: April 28-29, 2017 | Berlin

Conference with keynotes, workshops and barcamp/unconference

Topics: open data | digital volunteering | civic tech | mobility concepts | open administration | participation | transparency | freedom of information | connectivity | data for social good | data literacy

This year we are planning a data summit connecting the networks that developed through our project ‘Datenschule’ (engl. School of Data). Within two successful years of Code for Germany we developed many different projects and networks around Germany. Our educational program ‘Datenschule’ connects charitable, and non-profit organisations with our community. The goal is to enable NGOs using data as an information source for their socio-political work.

The data summit is intended to connect the members of our School of Data network even more. Over two days, open data and civic tech enthusiasts, representatives of policy, public administration, entrepreneurs, journalists and non-profit organisations can exchange experiences with one another. The data summit shall be a platform to develop new projects, to deepen data literacy through workshops, and to learn how digital tools can be employed in a modern data-driven society. Our goal: To provide a forum where participants can expand their networks, share experiences, get to know each other and exchange knowledge.

Note by the author

OKF DE is an independent not-for-profit organisation registered in Berlin, Germany in 2011 (under VR 30468 B, to be fully transparent). OKF DE is a pioneering and award-winning civil society organisation engaging in different aspects of the digital age. Their work is independent, non-partisan, interdisciplinary and non-commercial.

LibUX: The opportunity and danger around library vendors selling design services

Thu, 2017-02-02 04:00

An hour ago, Ingrid Lunden wrote in TechCrunch that “Salesforce acquires Sequence to build out its UX design services“, saying

 Salesforce has made another acquisition that underscores how the CRM and cloud software giant is looking to sell more services to its customers that complement the software they are already buying. It has acquired Sequence, a user experience design agency based out of San Francisco and New York that works with brands like Best Buy, PeetsApple, Google and many more.

It makes sense that there’s similar opportunities for vendors in the higher-ed and library space.

Although design and the user experience is now part of the vocabulary, inspiring job descriptions, departments, interest groups, and the like, the fact is that this kind of expertise in libraries is relatively shallow. I criticized in “How to talk about user experience” that across the board UX Librarians couldn’t even agree on a practical definition of what the user experience is, and this creates a vacuum that consultants — like me — or vendors can fill.

Businesses provide products — let’s be loose with the term: a neat tool, solid resources, some kind of interface — that customers need to do their job. What’s missing is the insight and expertise to use that product in a way designed to purpose, custom to the customer’s needs, environment, and goals.

Libraries buy things they don’t really know how to use. And even if user experience design is on their radar, or there are even service designers on staff, it’s likely that expertise doesn’t scale easily to the volume of resources libraries maintain. So, that’s the vendor opportunity.

The danger of that opportunity is to the libraries themselves. Ours is an industry pocked by ill-will because the lack of business acumen in most academic or public institutions has allowed for exploitation. Let’s be honest, this isn’t just a few bad eggs, it’s the trend.

There’s little to suggest that design services provided by these same companies won’t in the same way take advantage of the lack of expertise and serve contractual loopholes or antipatterns designed to better profit the vendor at the customer’s expense.

LibUX: LibUX is on Patreon so we can pay writers, speakers, dream-up new content, and make free tools. Our rewards rock.

Thu, 2017-02-02 02:52

Hey there. I’m Michael – and my website about the design and user experience of libraries is changing. I write — I hope — uniquely insightful and strategic articles on the regular, co-host Metric (a podcast) with Amanda L. Goodman, curate the Web for Libraries newsletter, workshop and teach full-blown courses, and commit a ton of time to making useful, thoughtful, free content.

Over the last couple of years, LibUX has played a small role pushing the conversation forward. Now, I would really love your help to inch it a little further.

Why? Well, I think public and academic libraries are the bedrock of education and democratic participation, aspiring to bridge demographic gaps, defend privacy, enrich community, and preserve knowledge — the whole shebang. But libraries and other non-profits have strategic problems, which compound existing budget, time, and talent constraints.

I’m pretty sure that the open web and an organizational commitment to the user experience is key.

What’s more, it’s important that this expertise is accessible for free. Design-thinking is interdisciplinary, but the world I come from — libraries and higher-ed — is constrained by budget and bureaucracy. User centricity shakes this up, but often those on the ground floor who would champion it won’t have access to expensive conferences or courses – especially if they don’t align with the job description.

I want to do so much more

This last year, I started changing things up first by inviting expert guest writers to really positive response, and then — starting this month! —  by organizing free webinars. These writers and speakers are volunteers. LibUX doesn’t make any money, but I benefit from being associated with it. It made me realize that

  • there is so much expertise I lack, which folks like you have
  • the conversation is better with you in it
  • but librarianship’s culture of guest-writing or presenting for exposure just doesn’t sit right.

I want to help produce more excellent content than I am able to make on my own, but I want LibUX to inspire a higher standard by adhering to that same standard.

I need your help to pay writers, presenters, and pay for services like GoToWebinar that all go in to improving the quality, experience, and ethic of our content. I am at the limits of what I can afford out of pocket, but LibUX is aching to grow.

So, if you or your organization has found something I’ve made useful — like the core content matrixBlueprint for Trello — consider being a patreon! I put a lot of thought into the rewards.

They cover everything from, you know, little things like twenty times more content than I usually write ($1), giveaways, exclusive access to pilot projects ($5), the bomb webinar archive with high-quality transcripts we’re starting ($10), sponsorship if you’ve got something to promote, or even me — your pal Michael — on retainer.

Consider subscribing. Your support goes a long way.

Pages