You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 2 days 17 hours ago

Hydra Project: Announcing Avalon 3.2

Mon, 2015-01-12 13:33

Indiana University and Northwestern University are delighted to announce Avalon Media System 3.2, completed and released on December 19, 2014. As part of a series of minor 3.x releases, Avalon 3.2 provides support for important content management efficiencies and other improvements.

Release 3.2 adds the following capabilities:

  • Bulk item management actions, including publish, un-publish, change collection, delete, and assign access
  • Avalon dropbox subdirectories are accessible to collection managers using the web interface
  • Upgrade to Hydra 7 framework
  • Numerous interface improvements and bug fixes

For more details on each of these new features, visit the What’s New in Avalon 3.2 wiki page:

LibUX: “Social” the Right Way is a Timesuck

Mon, 2015-01-12 12:16

Pew Research Center’s 2014 Social Media Update, published Friday, validates pretty much any argument libraries have to make for actively reaching out through social media. Your audience is there.

58% of ALL [U.S.] adults are on Facebook. 31% of ALL seniors are on Facebook. #libweb #libux

— Library UX Data (@libuxdata) January 9, 2015

This is a numbers game. Whether libraries should be there is sort of head-scratchingly moot, but brand decisions about which social bandwagon to jump1 should be made only when libraries are prepared to commit real resources to their upkeep. I say “resources,” I mostly mean time – but marketing dollars are not misspent on Facebook ads.

Crafting good content is not an insubstantial timesuck. Knowing your audience, time spent analyzing metrics, helps mitigate people’s capacity to detect bullshit. And this is important. Poor content not only reflects poorly on your library, but for channels like Facebook that highlight popular or relevant content, posts that bomb negatively impact the overall visibility of your brand.

A basic level of engagement requires just the right amount of content, too. Part of this just has to do with currency, right? Old content tends to roll off. It’s too many thumb swipes down. Wisemetrics finds that, on average, the half-life of a Facebook post is about 90 minutes. Hell, a tweet is lost to the void in just 18 minutes. The point is that you have to post regularly to stay on people’s radar – and, for Facebook especially, if you’re off-radar long enough the algorithm [allegedly] guarantees subsequent posts will reach fewer people.

I think, here, it is also important to mention that users expect brands to actively monitor their channels. By having an account, you wade into the pool. It ain’t a billboard over the highway. You and your audience are on the same level. You’re sharing fluids. You’re friends. If they contact you and you don’t answer, that’s not just the passive neglect of strangers on the sidewalk: it’s a dis; it hits home. On Twitter, specifically, 42% expect a response within the hour. Outright ignoring someone is like a punch in the gut.

How Much to Post

We have to be a little honest about the realities of library social media. We’re on board, sure, but we most likely haven’t the benefit of a marketing team. The social accounts are managed on the side and probably aren’t part of anyone’s actual job description. Roger. So, where do we get the most bang for our buck?

Post to Twitter at least 5 times a day. If you can swing up to 20 posts, you might be even better off. Post to Facebook five to 10 times per week. Fast Company

Several studies show that posting too little risks losing “connection with your audience,” that brands should shoot for ten (10) posts per week. Posting more often is annoying.

For Twitter, it depends how you measure success. If you’re looking at retweets, replies, and clicks per tweet, “if you want to wring the most value out of every tweet you send, tweet about five times each day.” If you measure overall response per day, well, 30 tweets a day ought to do it.

This Fast Company article shares optimal post frequency for other platforms, if you’re interested.

2 Hours Per Channel

The timesuck to do social right is determined by the time required to

  • inspect your account’s metrics to understand who your followers are, when they’re on, and what they like
  • analyze your history of content so you know what works, what doesn’t
  • craft enough good content specific to your audience and the medium2
  • schedule that content for optimum reach
  • monitor and respond

Really, this is unique to you. For me, according to my Harvest account, last week I spent two hours scheduling just six tweets (for the entire week! I suck) and ten Facebook posts. This is a little short of previous weeks where I posted more and spent about 4 hours. I include time spent looking at analytics, corresponding about what needs to be posted, and optimizing content I’m sharing in our WordPress network (we use this plugin, which lets you tweak OpenGraph metadata and Twitter Cards).

So, my gut-checked suggestion is that it’s reasonable to expect to spend at least two hours per channel – minimum. Real content marketers for brands-with-budgets certainly devote a lot more, but I think it’s important to recognize the timesuck for what it is and reconcile decisions to go whole hog on a new channel with the human resources required to maintain it.

  1. If your library is on any another social platform except Facebook – wtf are you doing?
  2. People can tell what is auto-tweeted and cross-posted.

The post “Social” the Right Way is a Timesuck appeared first on LibUX.

District Dispatch: Speakers to explore library funding options at 2015 ALA Midwinter Meeting

Mon, 2015-01-12 06:14

Thinking about new funding sources for your library? Join leaders from the Institute of Museum and Library Services (IMLS) when they discuss federal library funding resources at the 2015 American Library Association (ALA) Midwinter Meeting in Chicago. The session, titled “All Eyes on IMLS: Funding Priorities and Reauthorization,” takes place from 10:30 to 11:30 a.m. on Saturday, January 31, 2015, in the McCormick Convention Center, room W183A.

MLK Digital Commons in Washington, D.C. Photo by Phil Freelon

During the sessions, speakers will shed new light on the Library Services and Technology Act (LSTA), the primary source of annual funding for libraries in the federal budget. Library staff are encouraged to attend the conference session to learn more about Institute of Museum and Library Services’ priorities for the next two years, which will shape the agency’s discretionary and Library Services and Technology Act Grants to States programs. Additionally, participants will learn more about how they can support the Museum and Library Services Act while the law undergoes a reauthorization period in 2016.

Speakers include Maura Marx, acting director of the Institute of Museum and Library Services
and Robin Dale, associate deputy director for state programs for the Institute of Museum and
Library Services.

View other ALA Washington Office Midwinter Meeting conference sessions

The post Speakers to explore library funding options at 2015 ALA Midwinter Meeting appeared first on District Dispatch.

Alf Eaton, Alf: Searching for mergeable tables

Mon, 2015-01-12 01:10

Among CartoDB’s many useful features is the ability to merge tables together, via an interface which lets you choose which column from each to use as the shared key, and which columns to import to the final merged table.

Google's Fusion Tables similarly encourages merging of tabular data. Fusion Tables creates a virtual merged table, allowing updates to the source tables to be replicated to the final merged table as they occur.

CartoDB can also merge tables using location columns, counting items from one table (with latitude and longitude, or addresses) that are positioned within the areas defined in another table (with polygons).

I've found that UK parliamentary constituencies are useful for visualising data, as they have a similar population number in each constituency and they have at least two identifiers in published ontologies which can be used to merge data from other sources*. The UK parliamentary constituency shapefiles published by the Ordnance Survey as part of the Boundary-Line dataset contain polygons, names and two identifiers for each area: one is the Ordnance Survey’s own “unit id” and one is the Office for National Statistics’ “GSS code”.

Once the parliamentary constituency shapefile has been imported to a base table, any CSV table that contains either of those identifiers can easily be merged with the base table to create a new, merged table and associated visualisation.

So, the task is to find other data sets that contain either the OS “unit id” or the ONS “GSS code”.

The URLs for the data types of these codes are defined in the Ordnance Survey’s “administrative geography and civil voting area” ontology:

The values themselves can also be expressed as URLs:

GSSE14000929 GSS URL Unit ID24896 Unit ID URL

However, unlike the Linked Data/SPARQL interfaces, most CSV or Excel files that are currently published (such as those produced by the Office for National Statistics as a result of census analysis) don’t define the data type of each column using URLs. Although there’s usually a property name in the first row, there’s rarely a datapackage.json file defining a basic data type (number, string, date, etc), and practically never a JSON-LD context file to map those names to URLs.

Given an index of CSV files, like those in CKAN-based stores such as, how can we identify those which contain either unit IDs or GSS codes?

As Thomas Levine's commasearch project demonstrated at csvconf last year, if you have a list of all (or even just some) of the known members of a collection of typed entities (e.g. a list of all the countries in the world), it’s easy enough to find other datasets that contain them: as long as at least a certain proportion of the distinct values of a column match those in the known collection, the data type can be guessed, and can be assigned a URL.

TODO: compile lists of values for known data types, particularly from Wikidata. For example: country names (a list of names that changes slowly), members of parliament (a list of names that changes regularly), years (a range of numbers that grows gradually), gene identifiers (a list of strings that grows over time), postcodes (a list of known values, or values matching a regular expression).

Related tools Footnotes

* On the downside, parliamentary constituencies can be changed (currently every 5 years), as population density shifts around the country and the current government decides to increase or decrease the number of constituencies. This makes it difficult to use the constituencies for long-term comparisons.

Mark E. Phillips: Digital Preservation System Interfaces: UNT Libraries Coda Repository

Mon, 2015-01-12 01:04

I mentioned to a colleague that I would be happy to do a short writeup of some of the interfaces that we have for our digital preservation system.  This post is trying to move forward that conversation a bit.

System 1, System 2

At UNT we manage our digital objects in a consistent and unified way.  What this means in practice is that there is one way to do everything,  items are digitized, collected, or created, staged for ingest into the repository and everything moves into the system in the same way.  We have two software stacks that we use for managing our digital items,  Aubrey and Coda.

Aubrey is our front-end interface which provides end user access to resources,  search, browsing, and display.  For managers it provides a framework for defining collections, partners, and most importantly it has a framework for creating and managing metadata for the digital objects.  Most of the interaction (99.9%) of the daily interaction with the UNT Libraries Digital Collections is through Aubrey with one of its front-end user interfaces,  The Portal to Texas History, the UNT Digital Library, or The Gateway to Oklahoma History.

Aubrey manages the presentation versions of a digital object,  locally we refer to this package of files as an Access Content Package, or ACP.  The other system in this pair is a system we call Coda.  Coda is responsible for managing the Archival Information Packages (AIP) in our infrastructure.  Coda was designed to manage a collection of BagIt Bags,  help with the replication of these bags and allow curators and managers to access the master digital objects if needed.

What does it look like though?

The conversation I had with a colleague was around user interfaces to the preservation archive, how much or how little we are providing and our general thinking about that system’s user interfaces.  Typically these interfaces are “back-end” and usually are never seen by a larger audience because of layers of authentication and restriction.  I wanted to take a few screenshots and talk about some of the interactions that users have with these systems.

 Main Views

The primary views for the system include a dashboard view which gives you an overview of the happenings within the Coda Repository.

UNT Libraries’ Coda Dashboard

From this page you can navigate to lists for the various sub-areas within the repository.  If you want to view a list of all of the Bags in the system you are able to get there by clicking on the Bags tile.

Bag List View – UNT Libraries’ Coda Repository

The storage nodes that are currently registered with the system are available via the Nodes button.  This view is especially helpful in gauging the available storage resources and deciding which storage node to write new objects to.  Typically we use one storage node until it is completely filled and then move onto another storage node.

Nodes List View – UNT Libraries’ Coda Repository

For events in the coda system including ingest, replication, migration, and fixity check we create and store a PREMIS Event.  These are aggregated using the PREMIS Event Service

PREMIS Event List View – UNT Libraries’ Coda Repository

The primary Coda instance is considered the Coda instance of record and additional Coda instances will poll the primary for new items to replicate.  They do this using ResourceSync to broadcast available resources and their constituent files.  Because the primary Coda system does not have queued items this list is empty.

Replication Queue List View – UNT Libraries’ Coda Repository

To manage information about what piece of software is responsible for an event on an object we have a simple interface to list PREMIS Agents that are known to the system.

PREMIS Agents List View – UNT Libraries’ Coda Repository

Secondary Views

With the primary views out of the way the next level that we have screens for are the detail views.  There are detail views for most of the previous screens once you’ve clicked on a link.

Below is the detail view of a Bag in the Coda system.  You will see the parsed bag-info.txt fields as well as PREMIS Events that are associated with this resource.  You have the buttons at the top which will get you to a list of URLS that when downloaded will re-constitute a given Bag of content and the ATOM Feed for the object.

Bag Detail View – UNT Libraries’ Coda Repository

Here is a URLS list,  if you download all of these files and keep the hierarchy of the folders you can validate the Bag and have a validated version of the item plus additional metadata.  This is effectively the Dissemination Information Package for the system.

Coda URLs List – UNT Libraries’ Coda Repository

An Atom Feed is created for each document as well which can be used by the AtomPub interface for the system.  Or just to look at and bask in the glory of angle brackets.

Atom Feed for Bag – UNT Libraries’ Coda Repository

Below is the detail view of a PREMIS Event in the repository.  You can view the Atom Feed for this document or navigate to the Bag in the system that is associated with this event.

PREMIS Event Detail View – UNT Libraries’ Coda Repository

The detail of a storage node in the system.  These nodes are updated to reflect the current storage statistics for the storage nodes in the system.

Node Detail View – UNT Libraries’ Coda Repository

The detail view of a PREMIS Agent is not too exciting but is included for completeness.

Agent Detail View – UNT Libraries’ Coda Repository

Interacting with Coda

When there is a request for the master/archival/preservation files for a given resource we find the local identifier for the resource,  put that into the Coda repository and do a quick search

Dashboard with Search – UNT Libraries’ Coda Repository

You will end up with search results for one or more Bags in the repository.  If there is more than one for that identifier select the one you want (based on the date, size, or number of files) and go grab the files.

Search Result – UNT Libraries’ Coda Repository


The following screens show some of the statistics views for the system.  They include the Bags added per month and over time,  number of files added per month and over time, and finally the number of bytes added per month and over time.

Stats: Monthly Bags Added – UNT Libraries’ Coda Repository

Stats: Running Bags Added Total – UNT Libraries’ Coda Repository

Stats: Monthly Files Added – UNT Libraries’ Coda Repository

Stats: Running Total of Files Added – UNT Libraries’ Coda Repository

Stats: Monthly Size Added – UNT Libraries’ Coda Repository

Stats: Running Total Sizes – UNT Libraries’ Coda Repository

What’s missing.

There are a few things missing from this system that one might notice.  First of all is the process of authentication to the system.  At this time the system is restricted to a small list of IPs in the library that have access to the system.  We are toying around with how we want to handle this access as we begin to have more and more users of the system and direct IP based authentication becomes a bit unwieldy.

Secondly there is a full set of Atom Pub interfaces for each of the Bag, Node, PREMIS Event, PREMIS Agent, and Queue sections.  This is how new items are added to the system.  But that it a little bit out of scope for this post.

If you have any specific questions for me let me know on twitter.

DuraSpace News: CALL for Proposals for Open Apereo 2015

Mon, 2015-01-12 00:00
From Ian Dolphin, Executive Director, Apereo Foundation, Laura McCord, Open Apereo 2015 Planning Committee Chair, Reba-Anna Lee, Open Apereo 2015 Program Committee Co-chair  

Access Conference: Details on AccessYYZ

Sun, 2015-01-11 20:00

Access is headed to Toronto on September 8th-11th, 2015, so mark those calendars! We know that it’s a bit earlier than usual this year, but we hope that giving advance notice will allow attendees to plan accordingly.

Hackfest will be happening on September 8th at Ryerson University’s Heaslip House, while the remainder of the conference (September 9th-11th) will unfold at the beautiful Bram & Bluma Appel Salon on the second floor of the Reference Library (789 Yonge St.) in downtown Toronto.

Keep your eyes on the website in the coming weeks–we’ll announce more details as we have them!

Karen Coyle: This is what sexism looks like #2

Sun, 2015-01-11 13:38
Libraries, it seems, are in crisis, and many people are searching for answers. Someone I know posted a blog post pointing to community systems like Stack Overflow and Reddit as examples of how libraries could create "community." He especially pointed out the value of "gamification" - the ranking of responses by the community - as something libraries should consider. His approach was that it is "human nature" to want to gain points. "We are made this way: give us a contest and we all want to win." (The rest of the post and the comments went beyond this to the questions of what libraries should be today, etc.)

There were many (about 4 dozen, almost all men) comments on his blog (which I am not linking to, because I don't want this to be a "call out"). He emailed me asking for my opinion.

I responded only to his point about gamification, which was all I had time for, saying that in that area his post ignored an important gender issue. The competitive aspect was part of what makes those sites unfriendly to women.

I told him that there have been many studies of how children play, and they reveal some distinct differences between genders. Boys begin play by determining a set of rules that they will follow, and during play they may stop to enforce or discuss the rules. Girls begin to play with an unstructured understanding of the game, and, if problems arise during play, they work on a consensus. Boys games usually have points and winners. Girls' games are often without winners and are "taking turns" games. Turning libraries into a "winning" game could result in something like Reddit, where few women go, or if they do they are reluctant to participate.

And I said: "As a woman, I avoid the win/lose situations because, based on general social status (and definitely in the online society) I am already designated a loser. My position is pre-determined by my sex, so the game is not appealing."

I didn't post this to the site, just emailed it to the owner. It's good that I did not. The response from the blog owner was:
This is very interesting. But I need to see some proof.Some proof. This is truly amazing. Search on Google Scholar for "games children gender differences" and you are overwhelmed with studies.

But it's even more amazing because none of the men who posted their ideas to the site were asked for proof. Their ideas are taken at face value. Of course, they didn't bring up issues of gender, class, or race in their responses, as if these are outside of the discussion of what libraries should be. And to bring them up is an "inconvenience" in the conversation, because the others do not want to hear it.

He also pointed me to a site that is "friendly to women." To that I replied that women decide what is "friendly to women."

I was invited to comment on the blog post, but it is now clear that my comments will not be welcome. In fact, I'd probably only get inundated with posts like "prove it." This does seem to be the response whenever a woman or minority points out an inconvenient truth.

Welcome to my world.

District Dispatch: Big shoes to fill

Sun, 2015-01-11 05:32

Those who worked with Linda know her’s are big shoes to fill

E-rate Orders aside, the library community is starting the New Year with one less champion. Linda Lord, now former Maine State Librarian, is officially retired and has turned the keys over to her successor, Jaimie Ritter.

No one who knows Linda is at all reticent in talking about her dedication to her home state libraries—nor are those of us who work with her as a national spokesperson for libraries. Her work for ALA’s Office for Information Technology Policy (OITP) could be an encyclopedic list covering at least of decade of advocacy. In her most recent role as Chair of the E-rate Task Force, Linda has been invaluable to advancing library interests at the Federal Communications Commission (FCC), in Congress, and with her colleagues. At the height of the recent E-rate activity at the FCC, we joked with Linda that she should have special frequent flier miles for all the flights from Bangor (ME) to Washington D.C. That, and the fact that Linda’s email was first to pop up under the “Ls” and her phone number was always under “recents” on my phone list are testament to our reliance on her experience, her dogged support, and her willingness to work well beyond her role as a member-leader (a volunteer).

Of course Linda’s work is well respected in her home state as is evidenced by a number of articles and even a television interview as her retirement approached. These stories make it clear Linda builds strong, collaborative relationships with her colleagues, whether staff at the state library, librarians across Maine, and as far away as the Senate in Washington, D.C.

“Linda has done an amazing job making information accessible through libraries and schools across Maine,” said Senator Angus King. “She has the essential leadership qualities of vision, perseverance, willingness to work on the details, and a personality that enables her to collaborate and bring out the best in people. Her leadership at the national level on the E-rate program and other issues has been a huge benefit to Maine. She will always have my profound respect and appreciation for all that she’s accomplished for Maine and for the country.”

I can testify first hand on the difference Linda’s work has made for Maine libraries from my (wonderful) summer trips to Maine. In recent years we have noticed a marked improvement in library WiFi. While my kids love to hike when we travel in rural Maine, they now also are dedicated texters and need to know the next time we will be near a library so they can update friends in between dry periods of no connectivity. While passing through a town I point out the universal library sign and one child will ask, “Is that one of Linda’s libraries? Can we stop?” (knowing that there will be plenty of WiFi to go around).

We are proud to be able to share our own remembrances of Linda’s long tenure working with ALA. While I have long considered Linda “my ALA member,” many others have similar sentiments when asked to share anecdotes about working with Linda. I have included a few here.

Emily Sheketoff, executive director of ALA’s Washington Office reminds us all of Linda’s strong leadership qualities that have won her a respected place on the national stage:

“Linda has always been a strong voice for libraries, so OITP recognized and took great advantage of that. Coming from Maine, she had a soft spot for rural libraries and she became our “go-to” person when we needed an example of the difference a well-connected library can make for small towns or rural communities. When ALA staff use a Maine library as an exemplar the response is something along the lines of “Oh we know Linda Lord” and the point is immediately legitimized. She will be missed as a voice for libraries on the national stage.

As Chair of the ALA E-rate Task Force, Linda has spent countless hours on the phone, on email, in person making sure issues get covered—often asking the hard questions of how a policy course could impact the daily life of the librarian who has to implement or live with a policy. This ability has been invaluable as a gentle (and sometimes like a hurricane) reminder that what we do in D.C. has a very real impact locally. She is quite a leader.”

Linda Schatz, an E-rate consultant who worked with Linda and ALA for many years, describes Linda’s dedication to garnering support for the E-rate program:

“As I think about the many ways in which Linda has impacted the E-rate program, perhaps the most long-lasting has been her diligence in working with Members [of Congress] and their staff. Not only did she take the time to meet with and inform Senators Snowe and Collins about the impact of the E-rate program on Maine libraries, she continued to point out the benefits to all libraries and helped with last minute negotiations through the night to prevent legislation that would have had a negative impact on the program. She didn’t stop her communications when Senator Snowe left the Senate but took the time to meet with Senator King and his staff as well to ensure that they, too, understood the importance of the program to libraries. These communications about the E-rate program as well as the general needs of libraries will long be felt by the library community.”

Linda has the respect she does across ALA staff and members who have had the privilege of seeing her in action in large part because of her warm and sincere manner. “Not many people can bring the same passion for network technology as for early childhood learning, but Linda did. Not only was she an incredibly effective advocate, but I have admired and enjoyed her generous and collaborative spirit for years,” said Larra Clark, deputy director for ALA OITP. Linda easily wins over her audience.

Kathi Peiffer, current Chair of the E-rate Task Force and Pat Ball, member of the joint ALA Committee on Legislation and OITP Telecommunications Subcommittee both highlight these qualities in their recollection of Linda. “She is always gracious and has a wonderful sense of humor. She is the Queen of E-rate! (Kathi). She is always smiling and always gracious and I am glad that I had the opportunity to meet and work with her.  I salute a great librarian and lady.” (Pat)

Alan S. Inouye, director of OITP, puts it well when he says, “Saying “thank you” to Linda Lord is just so inadequate. Her contributions to national policy on E-rate are extensive and range from testifying at the U.S. Senate Commerce Committee and participating on FCC expert panels to chairing innumerable E-rate Task Force meetings (at their notorious Sunday 8:00 am times!). As Maine State Librarian, she has greatly advanced library services and visibility in her state in many ways. I hope that the library community, ALA, and OITP can find a way to continue to avail ourselves of Linda’s expertise and experience—retirement notwithstanding!”

So Alan leaves me with a little hope that I can continue to dream up ways we can call on Linda. As we often tell members who get involved with OITP, it’s very difficult to cut the ties once you join us.

And Linda was worried she might lose touch with library issues. I doubt it.

The post Big shoes to fill appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup January 10, 2015

Sat, 2015-01-10 23:35

Stay warm inside with these links.

Beachbot – YouTube

Beachbot draws art in the sand

Twilights: New Ink Paintings on Vintage Books by Ekaterina Panikanova | Colossal

Ink paintings on grids of vintage books

Library Commons Fly Thru

Drones. Inside! Love seeing the video through the stacks. Canyons = aisles.

How to built an e-book library you can touch

Device frames. I love device frames.

Roller Coaster House – Unique Homes for Sale – Popular Mechanics

This household roller coaster makes me want an in-library version

John Miedema: Lila Slipstream II: Extend reading capabilities by processing content into slips

Sat, 2015-01-10 16:42

Lila is a cognitive computing system that extends writing capabilities. It also extends reading capabilities.

  1. In a previous post I outlined how an author uses existing writing software to generate “slips” of content. A slip is the unit of text for the Lila cognitive system. The slip has just a few required properties: a subject line, a bit of content, and suggestions for tags and categories. The author generates many slips, hence a “slipstream.” In this post, I show part two of the slipstream for other kinds of content.
  2. In the writing process, an author collects and curates related content generated by dialog with other people, e.g., email and blog commments, or written by other people, e.g., articles and books. This content is usually filtered and managed by the author, but the volume piles up well beyond the author’s ability to read. (Notice the icon in the lower right of item two looks like both a book and a scanner. It is assumed that all content will be digital text.)
  3. Existing technologies such as Google Alerts allow authors to monitor the web for undiscovered but related content generated by anyone in the world. This content abides on the open web, growing daily. The volume easily exceeds an author’s ability to curate let alone read. A Lila curation process will be described later.
  4. The second part of the Lila slipstream is a process that will automatically convert the curated and undiscovered content into slips. The common slip unit format will enable Lila to generate visualizations of the content, enabling the author to read and analyze a high volume of content. The visualization tool will be described later.

Patrick Hochstenbach: Je Suis Charlie cartoon for UGent

Sat, 2015-01-10 14:33
Filed under: Doodles Tagged: jesuisahmed, jesuischarlie, jesuisjuif

Ted Lawless: OrgRef data as RDF

Sat, 2015-01-10 05:00

This past fall, Data Salon, a UK-based data services company, released an open dataset about academic and research organizations called OrgRef. The data is available as a CSV and contains basic information about over 30,000 organizations.

OrgRef was created with publishers in mind, and so its main focus is on institutions involved with academic content: universities, colleges, schools, hospitals, government agencies and companies involved in research.

This announcement caught our attention at my place of work because we are compiling information about educational organizations in multiple systems, including a VIVO instance, and are looking for manageable ways to consume Linked Data that will enrich or augment our local systems. Since the OrgRef data has been curated and focuses on a useful subset of data that we are interested in, it seemed to be a good candidate for investigation, even it isn't published as RDF. Due to it's size, it is also easier to work with than attempting to consume and process something like VIAF or DBPedia itself.


We downloaded the OrgRef CSV dataset and used the ever helpful csvkit tool to get handle on what data elements exist.

$ csvstat --unique orgref.csv 1. Name: 31149 2. Country: 210 3. State: 51 4. Level: 3 5. Wikipedia: 31149 6. VIAF: 10764 7. ISNI: 5765 8. Website: 25910 9. ID: 31149

The attributes are will documented by OrgRef. To highlight though, identifiers to other systems are included - Wikipedia Page ID (pageid), ISNI, and VIAF. These identifiers will be important for matching data from other systems or finding more LOD resources later. There is also a link to official organizational home pages. We've found that organizational home pages are surprisingly inconsistently available or not to an official page in other sources, so this is something from OrgRef that we would be interested in using right away.

OrgRef to DBPedia

Since we are working on a project that uses RDF as the data model, we wanted to convert this OrgRef data from CSV to RDF. All the organizations in the dataset (as of the December 2014 download) have Wikipedia page IDs (pageid). DBPedia also includes the pageid so we can lookup the DBPedia URI for each and use that in our RDF representation of the data. Rather that sending 30,000 SPARQL queries to DBPedia, we downloaded the DBPedia to pageid ntriples file from DBPedia and wrote a script to output another CSV with OrgRef ID and DBPedia URI pairs, like below.

orgref-id,uri 1859, 2025, 2236, 3712, 3768, OrgRef as RDF

With a mapping of OrgRef IDs to DBPedia URIs we were able to create an RDF representation of each organization. For an initial pass, we decided to only use name, pageid, ISNI, VIAF, and websites from OrgRef. A script merged the original OrgRef CSV with our DBPedia URI to OrgRefID CSV and produced triples like the following for a single organization.

<> a foaf:Organization ; rdfs:label "Michigan Technological University" ; dbpedia-owl:isniId "0000000106635937" ; dbpedia-owl:viafId "150627130" ; dbpedia-owl:wikiPageID "45893" ; schema:url "" .

The VIAF information is stored as both a string literal (to aid querying by the identifier later) and as an owl:sameAs relationship, since VIAF is published as Linked Data. For ISNI, we are only storing the literal because, as of January 2015, isn't available as Linked Data.

Publishing for querying with Linked Data Fragments.

With the OrgRef data model as RDF we, decided to use a Linked Data Fragments server to publish and query it. LDF is a specification and software for publishing Linked Data datasets in way that minimizes server side requirements. LDF data can be queried via SPARQL using a client developed by the team or via HTTP requests. Ruben Verborgh, one of the researchers behind the LDF, has a posted a one minute video with a clear summary of the motivations behind the effort.

Following the documentation for the LDF server, we setup an instance on Heroku and loaded it with the OrgRef RDF file. You can query this data at with a LDF client or browse it via the web interface. Due to the design of the LDF server, we are able to publish and query this using a free Heroku instance. See this paper for related, lightweight approaches.


To wrap up, we found a quality, curated, and targeted dataset available as CSV that we would like to integrate into our local projects that use RDF. Using the identifiers in the CSV file, we were able to match it to Linked Data URIs from DBPedia and create an RDF representation of it. We also published the RDF via Linked Data Fragments for others to browse and query.

Our interest in the OrgRef data doesn't stop here though. We want to make use of it on our local applications, particularly a VIVO instance. I'll write more about that later.

Ranti Junus: The sound version of a Google (old) reCAPTCHA

Fri, 2015-01-09 23:39

Last month, Google announced the new no-captcha reCAPTCHA that is supposedly more accurate and better at preventing spams. We’ll see how this goes.

In the mean time, plenty of websites that employ Google’s reCAPTCHA still use the old version like this:

The problem with this reCAPTCHA is that it fundamentally doesn’t work with screen readers (among other things, like forcing you crossed your eyes trying to figure out each character in the string.) Some people pointed out that reCAPTCHA offers the sound version (see that little red speaker?) that should mitigate the problem.

Here’s the link to sound version of a Google reCAPTCHA:

This example was taken from the PubMed website and happened to be set as a string of numbers.


p.s. what is this a about PubMed using inaccessible reCAPTCHA? There are other ways to employ non-captcha security techniques without using that kind solution. :-/

p.p.s. In case you’re curious, I could not decipher two out of the eleven (if I counted it correctly) numbers said in that recording.

CrossRef: CrossRef Staff at the FORCE2015 Conference

Fri, 2015-01-09 21:31

Ed Pentz, Karl Ward, Geoffrey Bilder and Joe Wass will be attending the FORCE2015 Conference in Oxford, UK.They'll be available to answer any CrossRef related questions. The conference runs 12-13 January. Learn more.

Jonathan Rochkind: Control of information is power

Fri, 2015-01-09 21:04

And the map is not the territory.

From the Guardian, Cracks in the digital map: what the ‘geoweb’ gets wrong about real streets

“There’s no such thing as a true map,” says Mark Graham, a senior research fellow at Oxford Internet Institute. “Every single map is a misrepresentation of the world, every single map is partial, every single map is selective. And every single map tells a particular story from a particular perspective.”

Because online maps are in constant flux, though, it’s hard to plumb the bias in the cartography. Graham has found that the language of a Google search shapes the results, producing different interpretations of Bangkok and Tel Aviv for different residents. “The biggest problem is that we don’t know,” he says. “Everything we’re getting is filtered through Google’s black box, and it’s having a huge impact not just on what we know, but where we go, and how we move through a city.”

As an example of the mapmaker’s authority, Matt Zook, a collaborator of Graham’s who teaches at the University of Kentucky, demonstrated what happens when you perform a Google search for abortion: you’re led not just to abortion clinics and services but to organisations that campaign against it. “There’s a huge power within Google Maps to just make some things visible and some things less visible,” he notes.

From Gizmodo, Why People Keep Trying To Erase The Hollywood Sign From Google Maps

But the sign is both tempting and elusive. That’s why you’ll find so many tourists taking photos on dead-end streets at the base of the Hollywood Hills. For many years, the urban design of the neighbourhood actually served as the sign’s best protection: Due to the confusingly named, corkscrewing streets, it’s actually not that easy to tell someone how to get to the Hollywood Sign.

That all changed about five years ago, thanks to our suddenly sentient devices. Phones and GPS were now able to aid the tourists immensely in their quests to access the sign, sending them confidently through the neighbourhoods, all the way up to the access gate, where they’d park and wander along the narrow residential streets. This, the neighbours complained, created gridlock, but even worse, it represented a fire hazard in the dry hills — fire trucks would not be able to squeeze by the parked cars in case of an emergency.

Even though Google Maps clearly marks the actual location of the sign, something funny happens when you request driving directions from any place in the city. The directions lead you to Griffith Observatory, a beautiful 1920s building located one mountain east from the sign, then — in something I’ve never seen before, anywhere on Google Maps — a dashed grey line arcs from Griffith Observatory, over Mt. Lee, to the sign’s site. Walking directions show the same thing.

Even though you can very clearly walk to the sign via the extensive trail network in Griffith Park, the map won’t allow you to try.

When I tried to get walking directions to the sign from the small park I suggest parking at in my article, Google Maps does an even crazier thing. It tells you to walk an hour and a half out of the way, all the way to Griffith Observatory, and look at the sign from there.

No matter how you try to get directions — Google Maps, Apple Maps, Bing — they all tell you the same thing. Go to Griffith Observatory. Gaze in the direction of the dashed grey line. Do not proceed to the sign.

Don’t get me wrong, the view of the sign from Griffith Observatory is quite nice. And that sure does make it easier to explain to tourists. But how could the private interests of a handful of Angelenos have persuaded mapping services to make it the primary route?

(h/t Nate Larson)

Filed under: General

Open Library: Open Library heads to the stars

Fri, 2015-01-09 20:29

We are excited to announce that the Open Library metadata, pointing to the growing collection of content housed by Internet Archive, has been selected for inclusion in the core archive of Outernet. If you are not familiar with Outernet, they’re calling themselves Humanity’s Public Library and they want to increase access to information for people around the world. Read more here (they’ve got a funding thing happening as well). In their own words

Currently, 2/3 of humanity lacks Internet access. Outernet wants to broadcast humanity’s best work to the entire world from space. For free. They believe that no one should be denied a basic level of information due to wealth, geography, political environment, or infrastructure. Furthermore, every person should be able to participate in the global marketplace of ideas. They are currently live on four continents with more to come. Users can build their own receiver or purchase one.

Inclusion of Open Library metadata will help Outernet users understand the breadth of content that is available. We’re happy to help get more information to more people.

OCLC Dev Network: OCLC LC Name Authority File (LCNAF) Temporarily Unavailable

Fri, 2015-01-09 20:00

The LC Name Authority File (LCNAF) is temporarily unavailable due to a problem at the data source level. While users will find that this experimental service is up and running, no data is currently available. The good news for users particularly interested in this data is that you can now access it from the Library of Congress directly at

Jenny Rose Halperin: Reading Highlights 2014

Fri, 2015-01-09 18:06

I did this last year too, but here are some of the best books that I read this year. I tend to read a bit haphazardly and mostly fiction, but here’s the list of books that surprised or excited me most in 2014. I can honestly say that this year I only read a few duds and that most of my reading life was very rich!

Fiction: I read a lot of Angela Carter this year, including Burning Your Boats (her collection of short stories,) Wise Children, which is so wildly inventive, and Nights at the Circus, which many consider to be her best. She remains my favorite author and I am glad she has such a large catalog. Each book is like a really delicious fruit.

Perhaps the most surprising book I read this year was The Name of the Rose by Umberto Eco. I picked it up in a used bookstore in London and found it thrilling. I would love to read more monastery murder mysteries.

In the British romances category, standouts include The Enchanted April, Persuasion, Emma, Sense and Sensibility, and Far From the Madding Crowd.
British romances are my comfort food, and I always turn to them when I don’t know what to read next. I find most through browsing Project Gutenberg and seeing what I haven’t read yet. I love Project Gutenberg and think that the work they’re doing is incredibly important.

I devoured Mavis Gallant’s Paris Stories collection from the NYRB back in January and was very sad when she passed.

In German, I read only one book, which was Schachnovelle by Steftan Zweig. I read it because of the Grand Budapest Hotel connection and it was as good as promised.

I finished off my year with Snow by Orhan Pamuk, which I highly recommend! It is particularly prescient now and asks important questions about Western hegemony, art, and religion.

Memoir: I had somehow missed Heartburn by Nora Ephron and have recommended it to everyone, though it’s halfway between memoir and fiction. It is so smart, so funny, and so bitchy, like the best romcom.

Because a bunch of people have asked me: I had very mixed feelings about Not that Kind of Girl by Lena Dunham. The stories in the collection weren’t novel or exciting; the narratives had appeared in her work repeatedly and seemed like a rehashing of the most boring parts of Girls or Tiny Furniture. By the time she got to the section about her food diary, I honestly wondered if anyone had even thought to edit this work. In all, I found it smug and poorly written.

My Berlin Kitchen: A Love Story by Luisa Weiss was a lovely book about remembrance, identity, and food.

Non-fiction: My team read Cultivating Communities of Practice by Etienne Wegner, and it made a massive impression on me and my work. It is a very brilliant book!

I am cheating a bit here because I just finished it this week, but Don’t Make me Think by Steve Krug was also fantastic and asked all the important questions about usability, testing, and the Web.

Reinventing Organizations by Frederic Laloux made some interesting claims and I am not quite sure what to make of it still, but definitely gave me food for thought.

If you don’t yet use it, Safari Books Online is the best tool for discovering literature in your field, both in terms of platform and content.

Historical fiction: I didn’t read so much in this category this year, but what I did was amazing. In the Garden of Beasts by Erik Larson was so well-researched and engrossing. I am officially a Larson convert!  The Orientalist by Tom Reiss was incredibly exciting as well.

Honorable Mentions: In the field of community management, Jono Bacon’s The Art of Community is a classic. I liked it very much, but found its emphasis on “meritocracy” deeply problematic.

I picked up Good Poems, an anthology by Garrison Keillor at a library sale last month and it is a delight! I leave it on my kitchen table to read while hanging around.

Nicholson Baker is such a good writer, so The Way the World Works was enjoyable, though not my favorite of his.

Feel free to share your favorites as well! Here’s to a 2015 full of even more books!