You are here

Feed aggregator

Equinox Software: Statistical Popularity Badges

planet code4lib - Tue, 2016-08-30 01:55

Statistical Popularity Badges allow libraries to set popularity parameters that define popularity badges, which bibliographic records can earn if they meet the set criteria.  Popularity badges can be based on factors such as circulation and hold activity, bibliographic record age, or material type.  The popularity badges that a record earns are used to adjust catalog search results to display more popular titles (as defined by the badges) first.  Within the OPAC there is a new sort option called “Sort by Popularity” which will allow users to sort records based on the popularity assigned by the popularity badges.

Popularity Rating and Calculation

Popularity badge parameters define the criteria a bibliographic record must meet to earn the badge, as well as which bibliographic records are eligible to earn the badge.  For example, the popularity parameter “Circulations Over Time” can be configured to create a badge that is applied to bibliographic records for DVDs.  The badge can be configured to look at circulations within the last 2 years, but assign more weight or popularity to circulations from the last 6 months.

Multiple popularity badges may be applied to a bibliographic record.  For each applicable popularity badge, the record will be rated on a scale of 1-5, where a 5 indicates the most popularity.  Evergreen will then assign an overall popularity rating to each bibliographic record by averaging all of the popularity badge points earned by the record.  The popularity rating is stored with the record and will be used to rank the record within search results when the popularity badge is within the scope of the search.  The popularity badges are recalculated on a regular and configurable basis by a cron job.  Popularity badges can also be recalculated by an administrator directly on the server.

Creating Popularity Badges

There are two main types of popularity badges:  point-in-time popularity (PIT), which looks at the popularity of a record at a specific point in time—such as the number of current circulations or the number of open hold requests; and temporal popularity (TP), which looks at the popularity of a record over a period of time—such as the number of circulations in the past year or the number of hold requests placed in the last six months.

The following popularity badge parameters are available for configuration:

  • Holds Filled Over Time (TP)
  • Holds Requested Over Time (TP)
  • Current Hold Count (PIT)
  • Circulations Over Time (TP)
  • Current Circulation Count (PIT)
  • Out/Total Ratio (PIT)
  • Holds/Total Ratio (PIT)
  • Holds/Holdable Ratio (PIT)
  • Percent of Time Circulating (Takes into account all circulations, not specific period of time)
  • Bibliographic Record Age (days, newer is better) (TP)
  • Publication Age (days, newer is better) (TP)
  • On-line Bib has attributes (PIT)
  • Bib has attributes and copies (PIT)
  • Bib has attributes and copies or URIs (PIT)
  • Bib has attributes (PIT)

To create a new Statistical Popularity Badge:

  1. Go to Administration>Local Administration>Statistical Popularity Badges.
  2. Click on Actions> Add badge.
  3. Fill out the following fields as needed to create the badge:

(Note: only Name, Scope, Weight, Recalculation Interval, Importance Interval, and Discard Value Count are required)

  • Name: Library assigned name for badge.  Each name must be unique.  The name will show up in the OPAC record display.  For example: Most Requested Holds for Books-Last 6 Months.  Required field.
  • Description: Further information to provide context to staff about the badge.
  • Scope: Defines the owning organization unit of the badge.  Badges will be applied to search result sorting when the Scope is equal to, or an ancestor, of the search location.  For example, a branch specific search will include badges where the Scope is the branch, the system, and the consortium.  A consortium level search, will include only badges where the Scope is set to the consortium.  Item specific badges will apply only to records that have items owned at or below the Scope.  Required field.
  • Weight:  Can be used to indicate that a particular badge is more important than the other badges that the record might earn.  The weight value serves as a multiplier of the badge rating.  Required field with a default value of 1.
  • Age Horizon:  Indicates the time frame during which events should be included for calculating the badge.  For example, a popularity badge for Most Circulated Items in the Past Two Years would have an Age Horizon of ‘2 years’.   The Age Horizon should be entered as a number followed by ‘day(s)’, ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Use with temporal popularity (TP) badges only.
  • Importance Horizon: Used in conjunction with Age Horizon, this allows more recent events to be considered more important than older events.  A value of zero means that all events included by the Age Horizon will be considered of equal importance.  With an Age Horizon of 2 years, an Importance Horizon of ‘6 months’ means that events, such as checkouts, that occurred within the past 6 months will be considered more important than the circulations that occurred earlier within the Age Horizon.
  • Importance Interval:  Can be used to further divide up the timeframe defined by the Importance Horizon.  For example, if the Importance Interval is ‘1 month, Evergreen will combine all of the events within that month for adjustment by the Importance Scale (see below).  The Importance Interval should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.  Required field.
  • Importance Scale: The Importance Scale can be used to assign additional importance to events that occurred within the most recent Importance Interval.  For example, if the Importance Horizon is ‘6 months’ and the Importance Interval is ‘1 month’, the Importance Scale can be set to ‘6’ to indicate that events that happened within the last month will count 6 times, events that happened 2 months ago will count 5 times, etc. The Importance Scale should be entered as a number followed by ‘day(s)’, ‘week(s)’,  ‘month(s)’, ‘year(s)’, such as ‘6 months’ or ‘2 years’.
  • Percentile:  Can be used to assign a badge to only the records that score above a certain percentile.  For example, it can be used indicate that only want to assign the badge to records in the top 5% of results by setting the field to ‘95’.  To optimize the popularity badges, percentile should be set between 95-99 to assign a badge to the top 5%-1% of records.
  • Attribute Filter:  Can be used to assign a badge to records that contain a specific Record Attribute.  Currently this field can be configured by running a report (see note below) to obtain the JSON data that identifies the Record Attribute.  The JSON data from the report output can be copied and pasted into this field.   A new interface for creating Composite Record Attributes will be implemented with future development of the web client.
    • To run a report to obtain JSON data for the Attribute Filter, use SVF Record Attribute Coded Value Map as the template Source.  For Displayed Fields, add Code, ID, and/or Description from the Source; also display the Definition field from the Composite Definition linked table.  This field will display the JSON data in the report output.  Filter on the Definition from the Composite Definition liked table and set the Operator to ‘Is not NULL’.
  • Circ Mod Filter: Apply the badge only to items with a specific circulation modifier.  Applies only to item related badges as opposed to “bib record age” badges, for example.
  • Bib Source Filter:  Apply the badge only to bibliographic records with a specific source.
  • Location Group Filter:  Apply the badge only to items that are part of the specified Copy Location Group.  Applies only to item related badges.
  • Recalculation Interval: Indicates how often the popularity value of the badge should be recalculated for bibliographic records that have earned the badge.  Recalculation is controlled by a cron job.  Required field with a default value of 1 month.
  • Fixed Rating: Can be used to set a fixed popularity value for all records that earn the badge.  For example, the Fixed Rating can be set to 5 to indicate that records earning the badge should always be considered extremely popular.
  • Discard Value Count:  Can be used to prevent certain records from earning the badge to make Percentile more accurate by discarding titles that are below the value indicated.   For example, if the badge looks at the circulation count over the past 6 months, Discard Value Count can be used to eliminate records that had too few circulations to be considered “popular”.  If you want to discard records that only had 1-3 circulations over the past 6 months, the Discard Value Count can be set to ‘3’.  Required field with a default value of 0.
  • Last Refresh Time: Displays the last time the badge was recalculated based on the Recalculation Interval.
  • Popularity Parameter: Types of TP and PIT factors described above that can be used to create badges to assign popularity to bibliographic records.
  1. Click OK to save the badge.

New Global Flags

OPAC Default Sort:  can be used to set a default sort option for the catalog.  Users can always override the default by manually selecting a different sort option while searching.

Maximum Popularity Importance Multiplier:  used with the Popularity Adjusted Relevance sort option in the OPAC.  Provides a scaled adjustment to relevance score based on the popularity rating earned by bibliographic records.  See below for more information on how this flag is used.

Sorting by Popularity in the OPAC

Within the stock OPAC template there is a new option for sorting search results called “Most Popular”.  Selecting “Most Popular” will first sort the search results based on the popularity rating determined by the popularity badges and will then apply the default “Sort by Relevance”.  This option will maximize the popularity badges and ensure that the most popular titles appear higher up in the search results.

There is a second new sort option called “Popularity Adjusted Relevance” that can be turned on by editing the ctx.popularity_sort setting in the OPAC template configuration.  The “Popularity Adjusted Relevance” sort option can be used to find a balance between popularity and relevance in search results.  For example, it can help ensure that records that are popular, but not necessarily relevant to the search, do not supersede records that are both popular and relevant in the search results.  It does this by sorting search results using an adjusted version of Relevance sorting.  When sorting by relevance, each bibliographic record is assigned a baseline relevance score between 0 and 1, with 0 being not relevant to the search query and 1 being a perfect match.  With “Popularity Adjusted Relevance” the baseline relevance is adjusted by a scaled version of the popularity rating assigned to the bibliographic record.  The scaled adjustment is controlled by a Global Flag called “Maximum Popularity Importance Multiplier” (MPIM).  The MPIM takes the average popularity rating of a bibliographic record (1-5) and creates a scaled adjustment that is applied to the baseline relevance for the record.  The adjustment can be between 1.0 and the value set for the MPIM.  For example, if the MPIM is set to 1.2, a record with an average popularity badge score of 5 (maximum popularity) would have its relevance multiplied by 1.2—in effect giving it the maximum increase of 20% in relevance.  If a record has an average popularity badge score of 2.5, the baseline relevance of the record would be multiplied by 1.1 (due to the popularity score scaling the adjustment to half way between 1.0 and the MPIM of 1.2) and the record would receive a 10% increase in relevance.  A record with a popularity badge score of 0 would be multiplied by 1.0 (due to the popularity score being 0) and would not receive a boost in relevance.

Popularity Badge Example

A popularity badge called “Long Term Holds Requested” has been created which has the following parameters:

Popularity Parameter:  Holds Requested Over Time

Scope: CONS

Weight: 1 (default)

Age Horizon: 5 years

Percentile: 99

Recalculation Interval: 1 month (default)

Discard Value Count: 0 (default)

This popularity badge will rate bibliographic records based on the number of holds that have been placed on it over the past 5 years and will only apply the badge to the top 1% of records (99th percentile).

If a keyword search for harry potter is conducted and the sort option “Most Popular” is selected, Evergreen will apply the popularity rankings earned from badges to the search results.

Title search: harry potter. Sort by: Most Popular.

The popularity badge also appears in the bibliographic record display in the catalog. The name of the badge earned by the record and the popularity rating are displayed in the Record Details.

A popularity badge of 5.0/5.0 has been applied to the most popular bibliographic records where the search term “harry potter” is found in the title. In the image above, the popularity badge has identified records from the Harry Potter series by J.K. Rowling as the most popular titles matching the search and has listed them first in the search results.

Equinox Software: Copy Alerts

planet code4lib - Tue, 2016-08-30 01:38

The Copy Alerts feature allows library staff to add customized alert messages to copies. The copy alerts will appear when a specific event takes place, such as when the copy is checked in, checked out, or renewed. Alerts can be temporary or persistent: temporary alerts will be disabled after the initial alert and acknowledgement from staff, while persistent alerts will display each time the alert event takes place. Copy Alerts can be configured to display at the circulating or owning library only or, alternatively, when the library at which the alert event takes place is not the circulating or owning library. Copy Alerts at check in can also be configured to provide options for the next copy status that should be applied to an item. Library administrators will have the ability to create and customize Copy Alert Types and to suppress copy alerts at specific org units.

Adding a Copy Alert

Copy Alerts can be added to new copies or existing copies using the Volume/Copy Editor. They can also be added directly to items through the Check In, Check Out, Renew, and Item Status screens.

To add a Copy Alert in the Volume/Copy Editor:

1. Within the Volume/Copy Editor, scroll to the bottom of the screen and click on Copy Alerts.

2. A New Copy Alert window will pop up.

3. Select an alert Type and enter an additional alert message if needed. Check the box next to Temporary if this alert should not appear after the initial alert is acknowledged. Leaving the Temporary box unchecked will create a persistent alert that will appear each time the action to trigger the alert occurs, such as check in or check out.

4. Click OK to save the new Copy Alert. After a Copy Alert has been added. Clicking on the Copy Alerts button in the Volume/Copy Editor will allow you to add another Copy Alert and to view and edit Existing Copy Alerts.

5. Make any additional changes to the item record and click Store Selected to store these changes and the new copy alert(s) to the Completed Copies tab. If you are done modifying the copy, click Save & Exit to finalize the changes.

To add a Copy Alert from the Check In, Check Out, or Renewal screens:

1. Navigate to the appropriate screen, for example to Circulation>Check In.
2. Scan in the item barcode.
3. Select the item row and go to Actions>Add Copy Alerts or right click on the item row and select Add Copy Alerts.

4. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.

To add a Copy Alert from the Item Status screen:

1. Go to the Detail View of the Item Status screen.
2. In the bottom left-hand corner of the item record there is a Copy Alerts option. Click Add to create a new copy alert.

3. The Add Copy Alert window will pop up. Select the alert Type, add an additional alert message if needed, and Click OK to save. This alert will be added to the copy.

Triggering a Copy Alert

The Copy Alert will appear when the action required to trigger the alert occurs. For example, the Normal Checkin Alert will appear when the item is checked in:

If Next Status options have been configured for the Checkin Alert, staff will see a drop down menu that allows then to select the next Status for the copy:

Managing Copy Alerts

Copy Alerts can be managed from the Item Status screen. Within the Quick Summary tab of the Detailed View of an item, click on Manage to view and Remove copy alerts.

Administration of Copy Alerts

Copy Alert Types

Copy Alert Types are created and managed in Administration>Local Administration>Copy Alert Types. Copy Alert Types define the action and behavior of an alert message type. The Alert Types included in a stock installation of Evergreen are:

• Normal checkout
• Normal checkin
• Checkin of missing copy
• Checkin of lost-and-paid copy
• Checkin of damaged copy
• Checkin of claims-returned copy
• Checkin of long overdue copy
• Checkin of claims-never-checked-out copy
• Checkin of lost copy

To create a new Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on Create and fill out the following fields as needed:
Name: name of the Copy Alert Type.
Active: indicates if the alert type is currently in use (Yes) or not (No).
State: indicates the Copy Status of the item at the time of the event.
Event: the action that takes place in the ILS to trigger the alert.
Scope Org Unit: indicates which org unit(s) the alert type will apply to.
Next Status: can be used with temporary Checkin Alerts only. If a next status is configured, staff will be presented with a list of statuses to choose from when the item is checked in. Next statuses should be configured by using the Copy Status ID # surrounded by curly brackets. For example {7, 11}.
Renewing?: indicates if the alert should appear during a renewal.
Invert location?: if set to yes, this setting will invert the following two settings. For example, if an alert is set to appear at the Circulating Library only, inverting the location will cause the alert to appear at all libraries except the Circulating Library.
At Circulation Library?: indicates if the alert should appear at the circulation library only.
At Owning Library?: indicates if the alert should appear at the owning library only.
3. Click Save.

To edit an existing Copy Alert Type:

1. Go to Administration>Local Administration>Copy Alert Types.
2. Click on the type and go to Actions>Edit or right-click and select Edit.
3. Make changes to the existing configuration and click Save.

Copy Alert Suppression

The Copy Alert Suppression interface can be used to suppress alert types at a specific org unit. Suppression of alerts will adhere to the organization unit hierarchy. For example, if an alert is suppressed at the System level, it will be suppressed for all descendent branches.

To suppress an alert type:

1. Go to Administration>Local Administration>Copy Alert Suppression.
2. Click Create and select the Alert Type that you want to suppress from the drop down menu.
3. Next, select the Org Unit at which the alert should be suppressed.
4. Click Save.

DuraSpace News: NEW RELEASE: Message-based Integrations for Fedora

planet code4lib - Tue, 2016-08-30 00:00

From Aaron Coburn, Programmer and Systems Administrator, Amherst College

Amherst, MA  I would like to announce the immediate availability of version 4.6.0 of the Fedora Messaging Toolbox.

The messaging toolbox is designed to support a variety of asynchronous integrations with external tools and services, such as a Solr search engine or an external Triplestore. Version 4.6.0 of the messaging toolbox is compatible with both the forthcoming 4.6.0 release of the Fedora Commons server and previous releases of Fedora.

DuraSpace News: Learn More About Scholars@Duke

planet code4lib - Tue, 2016-08-30 00:00

From Julia Trimmer, Manager, Faculty Data Systems & Analysis, Office of the Provost, Duke University

Durham, NC  Will you be attending the Symplectic User Conference at Duke University on September 13 and 14?  If you would like to get together around that event to learn more about VIVO at Duke University, members of the Scholars@Duke team are available to meet before or after the event.

DuraSpace News: NEW Fedora Repository Web Site

planet code4lib - Tue, 2016-08-30 00:00

Austin, TX  DuraSpace is pleased to announce that the Fedora team recently completed a redesign of The site was designed in consultation with members of the Fedora Leadership Group and reflects a modern, mobile-friendly approach that makes it easy to find key items first.

Eric Lease Morgan: Blueprint for a system surrounding Catholic social thought & human rights

planet code4lib - Mon, 2016-08-29 20:32

This posting elaborates upon one possible blueprint for comparing & contrasting various positions in the realm of Catholic social thought and human rights.

We here in the Center For Digital Scholarship have been presented with a corpus of documents which can be broadly described as position papers on Catholic social thought and human rights. Some of these documents come from the Vatican, and some of these documents come from various governmental agencies. There is a desire by researchers & scholars to compare & contrast these documents on the paragraph level. The blueprint presented below illustrates one way — a system/flowchart — this desire may be addressed:

The following list enumerates the flow of the system:

  1. Corpus creation – The system begins on the right with sets of documents from the Vatican as well as the various governmental agencies. The system also begins with a hierarchal “controlled vocabulary” outlined by researchers & scholars in the field and designed to denote the “aboutness” of individual paragraphs in the corpus.
  2. Manual classification – Reading from left to right, the blueprint next illustrates how subsets of document paragraphs will be manually assigned to one more more controlled vocabulary terms. This work will be done by people familiar with the subject area as well as the documents themselves. Success in this regard is directly proportional to the volume & accuracy of the classified documents. At the very least, a few hundred paragraphs need to be consistently classified from each of the controlled vocabulary terms in order for the next step to be successful.
  3. Computer “training” – Because the number of paragraphs from the corpus is too large for manual classification, a process known as “machine learning” will be employed to “train” a computer program to do the work automatically. If it is assumed the paragraphs from Step #2 have been classified consistently, then it can also be assumed that the each set of similarly classified documents will have identifiable characteristics. For example, documents classified with the term “business” may often include the word “money”. Documents classified as “government” may often include “law”, and documents classified as “family” may often include the words “mother”, “father”, or “children”. By counting & tabulating the existence & frequency of individual words (or phrases) in each of the sets of manually classified documents, it is possible to create computer “models” representing each set. The models will statistically describe the probabilities of the existence & frequency of words in a given classification. Thus, the output of this step will be two representations, one for the Vatican documents and another for the governmental documents.
  4. Automated classification – Using the full text of the given corpus as well as the output of Step #3, a computer program will then be used to assign one or more controlled vocabulary terms to each paragraph in the corpus. In other words, the corpus will be divided into individual paragraphs, each paragraph will be compared to a model and assigned one more more classification terms, and the paragraph/term combinations will be passed on to a database for storage and ultimately an indexer to support search.
  5. Indexing – A database will store each paragraph from the corpus along side metadata describing the paragraph. This meta will include titles, authors, dates, publishers, as well as the controlled vocabulary terms. An indexer (a sort of database specifically designed for the purposes of search) will make the content of the database searchable, but the index will also be supplemented with a thesaurus. Because human language is ambiguous, words often have many and subtle differences in meaning. For example, when talking about “dogs”, a person may also be alluding to “hounds”, “canines”, or even “beagles”. Given the set of controlled vocabulary terms, a thesaurus will be created so when researchers & scholars search for “children” the indexer may also return documents containing the phrase “sons & daughters of parents”, or another example, when a search is done for “war” documents (paragraphs) also containing the words “battle” or “insurgent” may be found.
  6. Searching & browsing – Finally, a Web-based interface will be created enabling readers to find items of interest, compare & contrast these items, identify patterns & anomalies between these items, and ultimately make judgments of understanding. For example, the reader will be presented with a graphical representation of controlled vocabulary. By selecting terms from the vocabulary, the index will be queried, and the reader will be presented with sortable and groupable lists of paragraphs classified with the given term. (This process is called “browsing”.) Alternatively, researchers & scholars will be able to enter simple (or complex) queries into an online form, the queries will be applied to the indexer, and again, paragraphs matching the queries will be returned. (This process is called “searching”.) Either way, the researchers & scholars will be empowered to explore the corpus in many and varied ways, and none of these ways will be limited to any individuals’ specific topic of interest.

The text above only outlines one possible “blueprint” for comparing & contrasting a corpus of Catholic social thought and human rights. Moreover, there are at least two other ways of addressing the issue. For example, it it entirely possible to “simply” read each & every document. After all, that is they way things have been done for millennium. Another possible solution is to apply natural language processing techniques to the corpus as a whole. For example, one could automatically count & tabulate the most frequently used words & phrases to identify themes. One could compare the rise & fall of these themes over time, geographic location, author, or publisher. The same thing can be done in a more refined way using parts-of-speech analysis. Along these same lines there are well-understood relevancy ranking algorithms (such as term frequency / inverse frequency) allowing a computer to output the more statistically significant themes. Finally, documents could be compared & contrasted automatically through a sort of geometric analysis in an abstract and multi-dimensional “space”. These additional techniques are considerations for a phase two of the project, if it ever comes to pass.

Equinox Software: Evergreen 2013: Linus’s Law

planet code4lib - Mon, 2016-08-29 17:21

By 2013 Evergreen was, to coin a phrase, “nominally complete.”  It had gained the features needed to check off most of the right RFP boxes, and so be considered alongside other ILS’s with a significantly older code base.  Acquisitions and serials, along with circulation, cataloging, authority control, and the (underrated, in my opinion) booking functionality were all in place.  By this point it had a modern, pluggable OPAC infrastructure, integration with many 3rd party products to expand its functionality, and was attracting attention via non-traditional use cases such as publishing house backend systems.  So, we developers were done, right?

Not at all.

In years past, the development team working on Evergreen had been small, and grew slowly.  In important ways, though, that began to change around 2013.  Previously, having more than twelve distinct contributors in a month submitting code for inclusion in the master repository was quite rare, and usually happened right around the time when a new release was being polished.  But from late 2012 through all of 2013, 15-25 contributors became the rule and less than that was the exception.  That is a solid 20-30% increase, and is significant for any project.

At the software level this was a period of filing down rough edges and broadening the talent pool.  There were few truly massive technological advances but there were many, and varied, minor improvements made by a growing group of individuals taking time to dive deeper into a large and complex codebase.  Importantly, this included ongoing contributions from a Koha developer on a now-shared bit of infrastructure, the code we both use to parse searches against our respective catalogs.

In short, 2013 is the year that we began to truly realize one of the promises of Open Source, something that is attributed to Linus Torvalds of Linux fame.  Specifically that given enough eyeballs, all bugs are shallow.  What this means is that as your project adds users, testers, and developers, it becomes increasingly likely that bugs will be discovered early, classified quickly, and that the solution will be obvious to someone.

In some ways this can be a critical test for an Open Source project.  Many projects do not survive contact with an influx of new development talent.  For some projects, that is political.  For others, it is a consequence of early design decisions.  Fortunately, Evergreen passed that test, and that is in large part a credit to its community.  After seven years and significant scrutiny, Evergreen continued to improve and its community continued to grow.

— Mike Rylander, President

This is the eighth in our series of posts leading up to Evergreen’s Tenth birthday.

LITA: Transmission #8 – Return to Regularly Scheduled Programming

planet code4lib - Mon, 2016-08-29 15:00

Thank you to everyone who participated in my feedback survey! I have parsed the results (a little less than 100 responses) and I’m currently thinking through format changes.

I’ll give a full update on the changes to come and more after we conclude our initial ten interviews in October. Stay tuned, faithful viewers.

In today’s webisode, I am joined by one of my personal all-time favorite librarians and colleagues, Michael Rodriguez. Michael is Electronic Resources Librarian at the University of Connecticut. Enjoy his perspectives on one of my favorite topics, librarianship in the intersection of collections, technology, and discovery.

Begin Transmission will return September 12th.

Jonathan Rochkind: bittorrent for sharing enormous research datasets

planet code4lib - Mon, 2016-08-29 14:24 says:

We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

There are data sets from researchers are several respected universities listed, including the University of Michigan and Stanford.

Filed under: General

Jonathan Rochkind: technical debt/technical weight

planet code4lib - Mon, 2016-08-29 14:22

Bart Wronski writes a blog post about “technical weight”, a concept related to but distinct from “technical debt.”  I can associate some of what he’s talking about to some library-centered open source projects I’ve worked on.

Technical debt… or technical weight?

…What most post don’t cover is that recently huge amount of technical debt in many codebases comes from shifting to naïve implementations of agile methodologies like Scrum, working sprint to sprint. It’s very hard to do any proper architectural work in such environment and short time and POs usually don’t care about it (it’s not a feature visible to customer / upper management)…


…I think of it as a property of every single technical decision you make – from huge architectural decisions through models of medium-sized systems to finally way you write every single line of code. Technical weight is a property that makes your code, systems, decisions in general more “complex”, difficult to debug, difficult to understand, difficult to change, difficult to change active developer.…


…To put it all together – if we invested lots of thought, work and effort into something and want to believe it’s good, we will ignore all problems, pretend they don’t exist and decline to admit (often blaming others and random circumstances) and will tend to see benefits. The more investment you have and heavier is the solution – the more you will try to stay with it, making other decisions or changes very difficult even if it would be the best option for your project.…





Filed under: General

Islandora: iCampMO Instructors Announced

planet code4lib - Mon, 2016-08-29 13:34

Islandora Camp is heading down to Kansas City, courtesy of our hosts at the University of Missouri Kansas City. Camp will consist of three days: One day of sessions taking a big-picture view of the project and where it's headed (including big updates about Islandora CLAW) one day of hands-on workshops for developers and front-end administrators, and one day of community presentations and deeper dives into Islandora tools and sites. The instructors for that second day have been selected and we are pleased to introduce them:


Rosie Le Faive started with Islandora in 2012 while creating the a trilingual digital library for the Commission for Environmental Cooperation. With experience and - dare she say - wisdom gained from creating highly customized sites, she's now interested in improving the core Islandora code so that everyone can use it. Her interests are in mapping relationships between objects, and intuitive UI design. She is the Digital Infrastructure and Discovery librarian at UPEI, and develops for Agile Humanities.  This is her second Islandora Camp as an instructor.

Jared Whiklo began working with Islandora in 2012. After stumbling and learning for a year, he began to give back to the community in late 2013. He has since assisted in both Islandora and Fedora releases and (to his own disbelief) has become an Islandora 7.x-1.x, Islandora CLAW, and Fedora committer. His day job is Developer with Digital Initiatives at the University of Manitoba Libraries. His night job is at the Kwik-E-Mart.


Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp.

Sandy Rodriguez is the Digital Special Collections Coordinator at the University of Missouri—Kansas City.  She has been working with Islandora for almost three years and currently serves as a member of the Islandora Metadata Interest Group and the Metadata Tools Subgroup.

LibUX: How to Write a User Experience Audit

planet code4lib - Mon, 2016-08-29 11:00

A User Experience Audit, or UX Audit for short, is something that should be conducted in the very beginning steps of a website, web application, dedicated app, or similar redesign project. Sometimes referred to as a deck or part of a design brief, UX Audits are typically done before user interface (UI) design occurs, and primarily consists of data intake, data compiling, research, and data visualization through presentation.

A UX Audit is simultaneously in-depth design research, and a cursory presentation of data. The UX Audit doesn’t jump to conclusions, or proposes finite UI and UX mechanics, but more so evaluates the project’s current state in order to:

  • compile qualitative data
  • conduct peer evaluation
  • discover interactive pain points
  • evaluate quantitative data
  • identify accessibility errors
  • survey information architecture
  • point out any branding violations
  • propose additional UX testing

Ultimately the UX Audit should serve as a compilation of the previous mentioned research, identify what data is missing or would need to be captured going forward, and would function as a point-of-departure for the next steps – which commonly would be sketching, wireframing, interactive wireframing, or prototyping depending on your development process.

The UX Audit is not a wireframe, it isn’t a design or user interface proposal, and it typically doesn’t highlight project requirements from stakeholders (although this is not uncommon). Additionally, a UX Audit’s summary does propose and provide recommendations based on the data compiled, but doesn’t do so in a way that graphically exceeds anything more than facilitating understanding (so no high fidelity solutions or graphics). As such, a UX Audit acts as a timestamp or bookmark as to project’s history, and serves as documentation for improvement. UX Auditing is also the prefered mode for project development, which is opposite of simply giving a project a ‘face lift’ without concern or regard to a project’s history or incremental improvement.

Once completed, the UX Audit and its recommendations are then given to a UI designer, front-end dev, web designer, or similar position who would begin designing, iterating, or wireframing (preferably in a medium that is close as possible to the final deliverable). The UI professional would then be in a better position going forward to wireframes (for example), and would be aware of the target audience, previous errors, and what data is present and what data is missing.

Possible Parts of a UX Audit

Depending on the project – be it a web application, website, or native app redesign – and on what data is available, each UX Audit is going to be different. Whenever possible, intake as much data as possible, because this data is a UX professional’s bread and butter, and they should spend a decent amount of time collecting, collating, filtering, interpreting, and visualizing it for stakeholders, developers, and UI professionals.

Although most UX Audits are essentially made from the same data and parts, they can follow any format or order. The following sections are some suggestions as to what can be included in a UX Audit:


This sometimes is called an Executive Summary, a Project Overview, or even an Introduction to the Problem. Despite what it’s called, the Introduction serves the function of briefly and succinctly introducing the the intent of the redesign, who all/what departments are involved in the project, and the scope of the UX Audit. Also accompanying, or contained in the Introduction are Research Objectives and a Table of Contents.

Research Objectives

The Research Objectives highlights and presents the hard deliverables of the UX Audit, as well as sets up the expectations of the reader as to what research will be presented.

Competitor Analysis

The UX Audit’s competitor analysis section is usually derived from data from a parent organization or competitors. A parent organization could be a policy commission, an accrediting body, location peers, etc. As for competitors, this can be determined by competitive sales, customers, consumers, goods, or services – all of which are being viewed for usage, and are best on increasing conversions.

The Competitor Analysis section can be comprised of a list of these peers, and hyperlinks to the similar projects’ website, web application, or native app. It also contains a features, functionality, and interactive elements comparison, in the form of percentages, and usually presents the comparisons through data visualization. This enables readers to see what percentage of peers have a responsive website, a homepage newsletter sign-up, sticky navigations, or other such features (which establishes baseline/industry UI standards).

Quantitative Data

Quantitative data refers to ‘the numbers’, or project traffic, usage, device/browser statistics, referrals, add-on blockers, social media sharing, and pathway mapping. All of this is quantitative data, is part of design research, and hopefully has already been set up on the project you are redesigning. Adobe Analytics and Google Analytics offer a lot of different solutions, but require a lot of customization, learning, or a significant financial investment. The following is a list of common industry software for this type of quantitative data:

Qualitative Data

Qualitative Data usually refers to the customer experience (CX) side of data, and can contain customer behavior, demographics, feedback, and search terms. This is usually derived from surveying mechanisms like Qualtrics or SurveyMonkey, embedded feedback tools like Adobe Experience Manager, search engine optimization (SEO) information from titles, spent advertising, and meta data tracking over time, and Google Trends and Google Insights.


The Accessibility portion of a UX Audit should contain both WCAG 2.0 AA and AAA errors, color contrast checking for fonts and UI mechanisms against their backgrounds, and even JavaScript errors that appear in the console log. Software used to help this portion of the UX Audit is WebAIM: Color Contrast Checker, WAVE: Web Accessibility Evaluation Tool, and any web browser’s web development toolbar’s JavaScript console window.

Interactive Pain Points

Interactive Pain Points refers to egregious UI and UX errors, non-typical animations or interactions, and unexpected functionality. This can be as little as forms and buttons being too small to click on a touch-based or mobile device, dysfunctional carousel buttons, all the way to hover navigations being jerky, and counter-intuitive forms wherein the labels cover up the input fields. This is usually best presented in the UX Audit through screenshots or videos, with annotations about what is occurring in contrast to user’s expectations.

Brad Frost has an excellent Interface Inventory Checklist available on Google Docs; this is a great place to start to know what all to look for, and what interactions to examine for improvement. A checklist like the one he shared is very helpful, but the most important thing is to demonstrate things like inconsistent button sizing, or if interaction elements are confusing/not functioning.

Information Architecture

Information Architecture (IA) is the structural design of information, interaction, and UX with the goals of making things both findable and discoverable. This part of the UX Audit focuses on the findability and discoverability of navigation items, general content strategy deficiencies, reading levels, and label auditing.

For example, analyzing the IA of a project could potentially identify that label auditing for primary and secondary navigation items, quick links, buttons, and calls-to-action is necessary. Analyzing IA could also demonstrate that the project’s Flesch readability score – a score which uses the sentence length and the number of syllables per word in an equation to calculate the reading ease – isn’t written for a 8th grade level (or that your content with specific instruction requires a 6th grade reading level). For more information the Nielsen Norman Group has a great article about legibility, readability, comprehension, and anyone can use Microsoft Word to analyze content’s Flesch readability scores.

Branding Violations

This mainly depends on an organization or company’s established style guides and pattern library. If the project being redesigned is particularly old and in need of a UX Audit, there may be a lot of color, font family, interaction elements, and UX patterns that are out of sync. If a company or organization doesn’t have a set style guide or pattern library, maybe that’s the best place to start before a UX Audit. The following are some really great style guides and pattern libraries from companies, entities, and organizations you might know already:


If the project that’s being redesigned is a website, web application, or dynamically pulls information through JavaScript data binding, performance should be factored into a UX Audit. Michael Schofield has a great article on LibUX about users having different connection speeds –  millennials all the way to broadband users – and main figures in the field of UX speak about the importance of performance all of the time.

“Over the past few months, conversations about Responsive Web design have shifted from issues of layout to performance. That is, how can responsive sites load quickly -even on constrained mobile networks.” Luke Wroblewski
Product Director at Google

When conducting a UX Audit, the Chrome web browser’s DevTools has a ‘Network’ and ‘Timeline’ view that analyzes and displays the loading of assets – images, scripts, code libraries, external resources, etc. – in real time. This information can and should be included in a UX Audit to document project load times, emulate different network conditions, verify any load time issues, and ultimately point out potential pain points for users.


Google Insights or even PageFair is desirable. This is place in the UX Audit where a UX professional really gets to shine, because they already demonstrated their data collection and presentation skills, and now they get to advise the stakeholders, UI, and development on what steps and UI decisions should be taken going forward.

How can UX Audits be used?

UX Audits can and should be incorporated as a non-bias and essential part of any redesign project. Many times a UX professional also has to be a librarian, or a UI designer, or even a front-end developer – so it’s easy to skip this important step of the redesign process, with limited staff and short deadlines.

However, performing a UX Audit will enable you to slow down, focus primarily on UX for a change, and perform an audit that will provide a lot of valuable information for stakeholders, designers, and developers. This will ultimately make everyone’s job easier, and what’s wrong with working smarter rather than harder?

LibUX: Crafting Websites with Design Triggers

planet code4lib - Mon, 2016-08-29 07:00

A design trigger is a pattern meant to appeal to behavior and cognitive biases observed in users. Big data and the user experience boom has provided a lot of information about how people actually use the web, which designs work, and–although creepy–how it is possible to cobble together an effective site designed to social engineer users

This episode is an introduction from a longer talk in which I introduce design triggers as a concept and their reason for being.

Help us out and say something nice. Your sharing and positive reviews are the best marketing we could ask for.

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.



Terry Reese: MarcEdit Mac Update–Inclusion of Console Mode

planet code4lib - Mon, 2016-08-29 03:50

One of the gaps in the Mac version of MarcEdit has been the lack of a console mode.  This update should correct that.  However, a couple things about how his works…

1) Mac applications are bundles, so in order to run the console program you need to run against the application bundle.  What does this look like?   From the terminal, one would run
>>/Applications/ –console

The –console flag initializes the terminal application and prompts for file names.  You can pass the filenames (this must be fully qualified paths at this point) via command-line arguments rather than running in an interactive mode.  For example:
>>/Applications/ –s /users/me/Desktop/source.mrc –d /users/me/Desktop/output.mrk –break

The above would break a MARC file into the mnemonic format.  For a full list of console commands, enter:
>>/Applications/ –help

In the future, the MarcEdit install program will be setting an environmental variable ($MARCEDIT_PATH) on installation.  At this point, I recommend opening your .bash_profile, and add the following line:
export MARCEDIT_PATH=/Applications/

You can get this download from: 


Library Tech Talk (U of Michigan): HTTPS (Almost) Everywhere

planet code4lib - Mon, 2016-08-29 00:00

The University of Michigan Library pledges to update its major websites to use secure (HTTPS) connections between the servers and web browsers by December 2016.

FOSS4Lib Recent Releases: Metaproxy - 1.11.5

planet code4lib - Fri, 2016-08-26 20:55

Last updated August 26, 2016. Created by Peter Murray on August 26, 2016.
Log in to edit this page.

Package: MetaproxyRelease Date: Friday, August 26, 2016

District Dispatch: Google Policy Fellow: my OITP summer

planet code4lib - Fri, 2016-08-26 20:42

guest post by Nick Gross, OITP’s 2016 Google Policy Fellow

This summer I worked as a Google Policy Fellow at the American Library Association’s Office for Information Technology Policy (OITP) in Washington, D.C. The Google Policy fellowship gives undergraduate, graduate, and law students the opportunity to spend the summer working at public interest groups engaged in Internet and technology policy issues.

Google Policy Fellowships give undergraduate, graduate, and law students the opportunity to spend the summer working at public interest groups engaged in Internet and tech policy issues

As a fellow, my primary role at OITP was to prepare tech policy memos to submit to the incoming presidential administration. The goal is to inform policymakers about ALA’s public policy concerns, including why, and to what extent, ALA has an interest in specific tech issues and what the next policies should look like. With balanced, future-looking information and tech policies, libraries can continue to enable Education, Employment, Entrepreneurship, Empowerment, and Engagement for their patrons— The E’s of Libraries. To that end, I drafted a brief on telecommunications issues and one on copyright issues.

The telecommunications brief addresses the importance of broadband Internet to libraries. In particular, a robust broadband infrastructure ensures that libraries can continue to provide their communities with equitable access to information and telecommunications services, as well as serve residents with digital services and content via “virtual branches.” Through the Federal Communications Commission’s Universal Service Fund (USF), which includes the E-Rate program, the Lifeline program, and the Connect America Fund, libraries and underserved or unserved communities are better able to enjoy access to affordable high-capacity broadband. And greater broadband competition and local choice increase broadband deployment, affordability, and adoption for libraries and their communities, while opening up more unlicensed spectrum for Wi-Fi expands broadband capacity so libraries can better serve their communities. Moreover, libraries sometimes provide the only Internet access points for some communities and they play an important role in digital inclusion efforts. Finally, because libraries use the Internet to research, educate, and create and disseminate content, as well as provide no-fee public access to it, they highly value the FCC’s 2015 Open Internet Order which helps guarantee intellectual freedom and free expression, thereby promoting innovation and the creation and exchange of ideas and content.

As copyright lies at the core of library operations, OITP advocates for law that fulfills the constitutional purpose of copyright—namely, a utilitarian system that grants “limited” copyright protection in order to “promote the progress of science and useful arts.” The copyright brief calls for a balanced copyright system in the digital age that realizes democratic values and serves the public interest. The first sale doctrine enables libraries to lend books and other materials. The fair use doctrine is critical to libraries’ missions, as it enables the “free flow of information,” fostering freedom of inquiry and expression; for instance, it enables libraries to use so-called “orphan works” without fear of infringement liability. Moreover, libraries are at the forefront of archiving and preservation, using copyright law’s exceptions to make reproductions and replacements of works that have little to no commercial market or that represent culturally valuable content in the public domain. Libraries also enjoy protections against liability under the Section 512 Safe Harbors in the Digital Millennium Copyright Act (DMCA).

My brief on copyright issues also highlights specific challenges that threaten libraries’ mission to provide the public with access to knowledge and upset the careful balance between copyright holders and users. For instance, e-licensing and digital rights management (DRM) under section 1201 of the DMCA, as well as the section 1201 rulemaking process, limit libraries’ ability to take full advantage of copyright exceptions, from fair use to first sale to preservation and archiving. ALA also advocates for the ratification and implementation of the World Intellectual Property Organization’s “Marrakesh Treaty” to facilitate access to published works for persons who are blind, visually impaired, or otherwise print disabled.

In addition to my policy work, Google’s bi-weekly meetings at its D.C. headquarters shed light on the public policy process. At each event, Google assembled a panel of experts composed of its own policy-oriented employees and other experts from public interest groups in D.C. Topics ranged from copyright law to broadband deployment and adoption to Net Neutrality. During the meetings, I also enjoyed the opportunity to meet the other Google fellows and learn about their work.

My experience as a Google Policy Fellow at OITP taught me a great deal about how public interest groups operate and advocate effectively. For instance, I learned how public interest groups collaborate together and form partnerships to effect policy change. Indeed, ALA works, or has worked, with groups like the Center for Democracy & Technology to advocate for Net Neutrality, while advancing public access to information as a member of the Re:Create Coalition and the Library Copyright Alliance. As a founding member of the Schools, Health & Libraries Broadband Coalition and WifiForward, ALA promotes Internet policies, such as the modernization of the USF. Not only did I gain a deeper insight into telecommunications law and copyright law, I also developed an appreciation as to how such laws can profoundly impact the public interest. I’d highly recommend the Google Policy Fellowship to any student interested in learning more about D.C.’s policymaking in the tech ecosystem.

The post Google Policy Fellow: my OITP summer appeared first on District Dispatch.

Jez Cope: Software Carpentry: SC Config; write once, compile anywhere

planet code4lib - Fri, 2016-08-26 18:47

Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier!
Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page)

On to the next Software Carpentry competition category, then. One of the challenges of writing open source software is that you have to make it run on a wide range of systems over which you have no control. You don’t know what operating system any given user might be using or what libraries they have installed, or even what versions of those libraries.

This means that whatever build system you use, you can’t just send the Makefile (or whatever) to someone else and expect everything to go off without a hitch. For a very long time, it’s been common practice for source packages to include a configure script that, when executed, runs a bunch of tests to see what it has to work with and sets up the Makefile accordingly. Writing these scripts by hand is a nightmare, so tools like autoconf and automake evolved to make things a little easier.

They did, and if the tests you want to use are already implemented they work very well indeed. Unfortunately they’re built on an unholy combination of shell scripting and the archaic Gnu M4 macro language. That means if you want to write new tests you need to understand both of these as well as the architecture of the tools themselves — not an easy task for the average self-taught research programmer.

SC Conf, then, called for a re-engineering of the autoconf concept, to make it easier for researchers to make their code available in a portable, platform-independent format. The second round configuration tool winner was SapCat, “a tool to help make software portable”. Unfortunately, this one seems not to have gone anywhere, and I could only find the original proposal on the Internet Archive.

There were a lot of good ideas in this category about making catalogues and databases of system quirks to avoid having to rerun the same expensive tests again the way a standard ./configure script does. I think one reason none of these ideas survived is that they were overly ambitions, imagining a grand architecture where their tool provide some overarching source of truth. This is in stark contrast to the way most Unix-like systems work, where each tool does one very specific job well and tools are easy to combine in various ways.

In the end though, I think Moore’s Law won out here, making it easier to do the brute-force checks each time than to try anything clever to save time — a good example of avoiding unnecessary optimisation. Add to that the evolution of the generic pkg-config tool from earlier package-specific tools like gtk-config, and it’s now much easier to check for particular versions and features of common packages.

On top of that, much of the day-to-day coding of a modern researcher happens in interpreted languages like Python and R, which give you a fully-functioning pre-configured environment with a lot less compiling to do.

As a side note, Tom Tromey, another of the shortlisted entrants in this category, is still a major contributor to the open source world. He still seems to be involved in the automake project, contributes a lot of code to the emacs community too and blogs sporadically at The Cliffs of Inanity.

LibUX: Meaningfully Judging Performance in Terms of User Experience

planet code4lib - Fri, 2016-08-26 16:00

Much about user experience design is concerned with subjective improvements to language and structure, style, tone. The bulk of our quantitative data is used toward these purposes — and, of course, being user-centric is precisely what that data is for. The role of the user experience designer connotes a ton about the sorts of improvements at the surface of our websites, at the obvious touchpoints between patron and library. Unfortunately, this approach can neglect deep systemic or technical pain points to which “design” is wrongfully oblivious but which are fundamental to good user experience.

Speed is a major example. Website performance is crucial enough that, when it is poor, the potential for even the best designs to convert is diminished. The most “usable” website can have no effect if it fails to load when and in the way users expect it to.

One thing we can be thankful for when improving the performance of a website is that while “more speed” definitely has a strong impact on the user experience, it is also easy to measure. Look, feel, and the “oomph” of meaningful, quality content, navigability, usability, each have their own quantitative metrics like conversion or bounce rate, time watched, and so on. But at best these aspects of the web design are objective-ish: the numbers hint at a possible truth, but these measurements only weather scrutiny when derived from real, very human, users.

A fast site won’t make up for other serious usability concerns, but since simple performance optimization doesn’t necessarily require any actual users, it lends itself to projects constrained by time or budget, or those otherwise lacking the human resources needed to observe usage, gather feedback, and iterate. The ideal cycle of “tweak, test, rinse, and repeat” is in some cases not possible. Few user experience projects return as much bang for the buck as site optimization, and it can be baked into the design and development process early and with known—not guessed-at, nor situational—results.

The signals

When it comes to site optimization, there are no shortage of signals to watch. There is a glut of data right in the browser about the number of bytes in, script or style file size, network status codes, drop-shadow rendering, frames per second, and so on. Tim Kadlec, author of Implementing Responsive Design, broke a lot of these down in terms of meaningful measurements in a series of articles throughout the last couple of years oriented around the “performance budget.”

A performance budget is just what it sounds like: you set a “budget” on your page and do not allow the page to exceed that. This may be a specific load time, but it is usually an easier conversation to have when you break the budget down into the number of requests or size of the page.

Such a strategy really took root in the #perfmatters movement, spurred by folks repulsed by just how fast the web was getting slower. Their observation was that because the responsive web was becoming increasingly capable and high pixel density screens were the new norm, developers making cool stuff sent larger and larger file sizes through the pipes. While by definition responsive websites can scale for any screen, they were becoming cumbersome herky-jerky mothras for which data was beginning to show negative impacts.

In his talk in 2013, “Breaking the 1000ms Time to Glass Mobile Barrier” — and, later, his book High Performance Browser Networking — Ilya Grigorik demonstrated users’ reactions to even milliseconds-long delays:

Delay User Reaction 0 – 100ms Instant 100 – 300ms Feels sluggish 300 – 1000ms Machine is working … 1s + Mental context switch 10s + I’ll come back later …

Since then, the average page weight has grown 134 percent, 186 percent since 2010. Poor performance is such a drag on what might otherwise be a positive user experience—encapsulated by a July 2015 article in The Verge, “The Mobile Web Sucks”—that the biggest players in the web game (Facebook and Google) have dramatically reacted by either enforcing design restrictions on the SEO-sensitive developer or removing the dev’s influence entirely.

Comparison of average bytes per content type in November 2010 (left) and November 2015 (right).

Self-imposed performance budgets are increasingly considered best practice, and—as mentioned—there are different ways to measure its success. In his write-up on the subject, Tim Kadlec identifies four major categories:

  • Milestone timings
  • Rule based metrics
  • Quantity based metrics
  • Speed index
Milestone Timings

A milestone in this context is a number like the time in seconds until the browser reaches the load event for the main document, or, for instance, the time until the page is visually complete. Milestones are easy to track, but there are arguments against their usefulness. Pat Meenan writes in the WebPagetest documentation that a milestone “isn’t a very good indicator of the actual end-user experience.”

As pages grow and load a lot of content that is not visible to the user or off the screen (below the fold) the time to reach the load event is extended even if the user-visible content has long-since rendered… [Milestones] are all fundamentally flawed in that they measure a single point and do not convey the actual user experience.

Rule Based and Quantity Based Metrics

Rule based metrics check a page or site against an existing checklist with a tool like YSlow or Google PageSpeed to grade your site. Quantity based metrics, on the other hand, include a lot of the data as reported by outlets like the HTTP Archive. These include total number of requests, overall page weight, and even the size of the CSS file. Not all these metrics indicate poor performance, but they are useful for conceptualizing the makeup of a page and where efforts at optimization can be targeted. If the bulk of the page weight is chalked-up to heavy image use, then perhaps there are image-specific techniques you can use for stepping-up the pace.

Example of a library web page graded by YSlow.

Speed Index

Speed Index is set apart by its attempts to measure the experience (there is an algorithm) to which Pat Meenan referred by determining how much above-the-fold content is visually complete over time then assigning a score. This is not a timing metric, but Meenan explains:

the ‘area above the curve’ calculated in ms and using 0.0–1.0 for the range of visually complete. The calculation looks at each 0.1s interval and calculates IntervalScore = Interval * ( 1.0 – (Completeness/100)) where Completeness is the percent visually complete for that frame and Interval is the elapsed time for that video frame in ms… The overall score is just a sum of the individual intervals.

Basically, the faster the website loads above the fold, the faster the user can start to interact with the content. A low score is better, which is read as milliseconds. A score of “1000” roughly means that a user can start to use the website after just one second. So if other metrics measure the Time To Load (TTL), then Speed Index measures Time To Interact (TTI), which may be a more meaningful signal.

TTI encapsulates an important observation even by quantitative-data nerds that web performance is just as much tied to the psychology of time and the perception of speed as it is by the speed of the network. If we look at page speed as a period of waiting, then how the user waits plays a role in how that wait is experienced. As Denys Mishunov writes in an article about “Why Performance Matters,” the wait is either active or passive:

The period in which the user has no choice or control over the waiting time, such as standing in line or waiting for a loved one who is late for the date, is called a passive phase, or passive wait. People tend to estimate passive waiting as a longer period of time than active, even if the time intervals are objectively equal.

For example, during my recent involvement with an academic library homepage redesign, our intention was that it would serve as thin a buffer as possible between the students or faculty and their research. This not only involved bringing search tools and content from deeper in the website to the forefront, but also reducing any barrier or “ugh” factor when engaging with them—such as time. Speed Index has a user-centric bias in that its measurement approximates the time the user can interact with—thus experience—the site. And it is for this reason we adopted it as a focal metric for our redesign project.

A report from Google Pagespeed.

Quick tangent tutorial: measuring Speed Index with WebPagetest

Google develops and supports WebPagetest, the online open-source web performance diagnostic tool at, which uses virtual machines to simulate websites loading on various devices and with various browsers, throttling the network to demonstrate load times over slower or faster connections, and much more. Its convenience and ease of use makes it an attractive tool. Generating a report requires neither browser extensions nor prior experience with in-browser developer tools. WebPagetest, like alternatives, incorporates rule-based grading and quantity metrics, but it was also the first to introduce Speed Index, which can be measured by telling it to “Capture Video.”

WebPagetest returns a straightforward report card summarizing the performance results of its tests, including a table of milestones alongside speed indices. The tool provides results for “First View” and “Repeat View,” which demonstrates the role of the browser cache. These tests are remarkably thorough in other ways as well, including screen captures, videos, waterfall charts, content breakdowns, and optimization checklists.

It’s worth noting that these kinds of diagnostics can be run by other tools on either end of development. Google PageSpeed Insights can be generated in the same way: type a URL and run the report. But folks can also install PageSpeed’s Apache and Nginx modules to optimize pages automatically, or otherwise integrate PageSpeed—or YSlowinto the build-process with grunt tasks. The bottom line is that these kinds of performance diagnostics can be run wherever it is most convenient, at different depths, whether you prefer to approach it as a developer or not. They can be as integrated or used ex-post-facto as needed.

The order in which the elements load matters

Of course, the user’s experience of load times is not only about how long it takes any interactive elements of the page to load but how long it takes certain elements to load. Radware’s recent report “Speed vs. Fluency in Website Loading: What Drives User Engagement” shows that “simply loading a page faster doesn’t necessarily improve users’ emotional response to the page.” They outfitted participants with neuroimaging systems and eye-trackers (mounted on monitors) in an attempt to objectively measure things like cognitive load and motivation. In the study, the same web page was loaded using three different techniques:

  1. the original, unaltered loading sequence,
  2. the fastest option, where the techniques used provided the most demonstrably fast load times regardless of rendering sequence,
  3. a version where the parts of the page most important to what the user wanted to accomplish were loaded first.

Results of Radware’s study on how users process web pages during rendering

In six out of ten pages, the sequence in which elements loaded based off their importance toward a primary user task affected overall user engagement, measured by total fixation time.

While not overwhelming, the results suggest that depending on the type of website, rendering sequence can play an important role on the “emotional and cognitive response and at which order [users] will look at different items.” Radware makes no suggestions about which rendering sequences work for which websites.

Still, the idea that cherry-picking the order in which things load on the page might decrease cognitive load (especially on an academic library homepage where the primary user task is search) is intriguing.

Earmark a Performance Budget

Anyway, this is getting a little long in the tooth. All this is to say that there are all sorts of improvements that can be made to library websites that add value to the user experience. Prioritizing between these involves any number of considerations. But while it may take a little extra care to optimize performance, it’s worth the time for one simple reason: your users expect your site to load the moment they want it.

This sets the tone for the entire experience.

Copyrights. So, this article originally appeared in Weave: Journal of Library User Experience, in an issue alongside people I really respect writing about anticipatory design and performance. It’s licensed under a creative commons attribution 3.0 license. I made some changes up there and embedded some links, but for the most part the article is in its original form.

District Dispatch: New season, new entrepreneurship opportunities

planet code4lib - Fri, 2016-08-26 13:55

A young girl works in the “Fab Lab” at Orange County Library System’s Dorothy Lumley Melrose Center. Photo credit: Orange County Library System.

This is a strange time of year. The days are still long and hot – at least here in D.C. – but the Labor Day promos and pre-season football games signal the start of a new season. It’s around this time that I usually reflect on the waning summer. Having just gotten back from a long vacation at the beach, I’ve had plenty of time for reflection on the past year. Professionally, I’ve focused heavily on a single topic these past few months: entrepreneurship.

In late June, months of research, outreach, and writing culminated in OITP’s release of a white paper on the library community’s impact on the entrepreneurship ecosystem. The paper brought together data and cases from across the country to outline the bevy of services academic and public libraries offer entrepreneurs. We called the paper “The People’s Incubator.” You don’t have to read the text to recognize the accuracy of this metaphor for describing the role the library community plays in helping people bring innovative ideas to life. Libraries are, and have always been, creative spaces for everyone. Since the analog era, library programs and services have encouraged all people to convert notions into innovations.

But, the more time that passes since the paper’s release, the more I feel the “People’s Incubator” moniker isn’t quite adequate to describe what the modern library community does in today’s entrepreneurship space. It does justice to the creative power of library resources, but it doesn’t convey the steadiness of the support the library community offers entrepreneurs at every turn. At each stage of launching and running a business – planning, fundraising, market analysis and more – libraries are equipped to offer assistance. Business plan competitions, courses on raising capital, research databases, census records, prototyping and digital production equipment, business counseling and intellectual property information all combine to round out the picture of the entrepreneurship services available at the modern library.

A facility offering these services is not just an incubator – it’s a constant companion; a hand to hold while navigating a competitive and often unforgiving ecosystem. And the more I read about library entrepreneurship activities, the more convinced I become that influencers across all sectors should leverage the robust resources libraries provide entrepreneurs to encourage innovation across the country. In just the few months since we published the paper, I have found one after another example of libraries’ commitment to developing a more democratic and strong entrepreneurship ecosystem. In addition to the examples described in the paper, recent library partnerships illustrate the entrepreneurship synergies the library community can help create.

The New York Public Library (NYPL) recently partnered with the 3D printing service bureau Shapeways to develop curricula for teaching the entrepreneurial applications of 3D printing.  The curricula will be piloted in a series of NYPL courses in the fall of 2016, and then publically released under an open license. Continued partnerships between libraries and tech companies like this one will advance the capacity of libraries to build key skills for the innovation economy.

For over a year, the Memphis Public Library has been a key partner in a citywide effort to boost start-up activity. Working with colleges, universities and foundations, the library’s resources and programming has helped the Memphis entrepreneurship ecosystem create hundreds of jobs. Libraries can and should continue to be a major part of these sorts of collaborations.

With support from the Kendrick B. Melrose Family Foundation, The Orange County Library System in Orlando opened the Dorothy Lumley Melrose Center in 2014. The Center offers video and audio production equipment, 3D printers, arduino and other electronics, and a host of tech classes – all of which individuals can use to launch new innovations and build key skills for the modern economy.

Through a partnership between the Montgomery County Public Library and the Food and Drug Administration (FDA), 80 teens had the opportunity to work in teams this summer to design their own mobile medical apps. The teens recently “pitched” their apps to a panel of judges at the FDA’s main campus in Silver Spring, Maryland. They’ve also gotten the chance to visit the White House.

Beyond partnerships between libraries, private firms, government agencies, academic institutions and foundations, library collaborations with Small Business Development Centers – federally-supported entrepreneurship assistance facilities – continue to be publically highlighted.

So, if I’ve learned anything from my summer of entrepreneurship, it’s this: libraries, as constant companions for entrepreneurs, are natural partners for the many public, private, non-profit and academic actors that work to advance the innovation economy. We will trumpet this important message in the coming weeks and months, as we work to alert policymakers to the important work of libraries ahead of the November elections. To do that, we need good examples of library efforts to advance start-up activities. Share yours in the comments section!

The post New season, new entrepreneurship opportunities appeared first on District Dispatch.


Subscribe to code4lib aggregator