It feels a little self-indulgent to write about one’s technology purchases so before I describe my set up, let me explain why I’m sharing this information.
Some time back, in preparation for a session I was giving on Zotero for my university’s annual technology conference, I realized that before going into the reasons how to use Zotero, I had to address the reasons why. I recognized that I was asking students and faculty who were likely already time-strapped and overburdened, to abandon long-standing practices that were already successfully working for them if they were going to switch to Zotero for their research work.
Before my presentation, I asked on Twitter when and why faculty would change their research practices. Most of the answers were on the cynical side but there were some that gave me some room to maneuver, namely this one: “when I start a new project.” And there’s a certain logic to this approach. If you were starting graduate school and know that you have to prepare for comps and generate a thesis at the end of the process, wouldn’t you want to conscientiously design your workflow at the start to capture what you learn in such a way that it’s searchable and reusable?
My own sabbatical is over and oddly enough, it is now at the end of my sabbatical in which I feel the most like I’m starting all over again in my professional work. So I’m using that New Project feeling to fuel some self-reflection in my own research process, bring some mindfulness to my online habits, and deliberate design into My Setup.
There’s another reason why I’m thinking about the deliberate design of research practice. As libraries start venturing into the space of research service consultation, I believe that librarians need to follow best practices for ourselves if we hope to develop expertise in this area.
As well, I think we need to more conscious of how and when our practices are not in line with our values. It’s simply not possible to live completely without hypocrisy in this complicated world but that doesn’t mean we can’t strive for praxis. It’s difficult for me to take seriously accusations that hackerspaces are neoliberal when it’s being stated by a person cradling a Macbook or iPhone. That being said, I greatly rely on products from Microsoft, Amazon, and Google so I'm in no position to cast stones.
I just want to care about the infrastructures we’re building….
And with that, here’s my setup!
There are three computers that I spend my time on: the family computer in the kitchen (a Dell desktop running Windows 7), my work computer (another Dell desktop running Windows 7), and my Thinkpad X1 Carbon laptop which I got earlier this year. Grub turned my laptop into a dual boot machine that I can switch between Ubuntu and Windows 7. I feel I need a Windows environment so I can run any ESRI products and all those other Mac/Windows only products if need be.
I have a Nexus 4 Android phone made by LG and a Kindle DX as my ebook reader. I don’t own a tablet or an mp3 player.
Worldbackup Day is March 31st. I need to get myself an external drive for backups (Todo1).
After getting my laptop, the first thing I did was investigated password managers to find which one would work best for me. I ended up choosing LastPass and I felt the benefits immediately. Using a password manager has saved me so much pain and aggravation and now my passwords are now (almost) all unique. Next, I need to set up two factor authentication for the services that I haven’t gotten around to yet (Todo2).
With work being done on three computers, it’s not surprising that I have a tendency to work online. My browser of choice is Mozilla but I will flip to Chrome from time to time. I use the sync functionality on both so my bookmarks are the automatically updated and the same across devices. I use SublimeText for my text editor for code, GIMP as my graphics editor, and QGIS for my geospatial needs.
This draft, along with much of my other writing and presentations are on Google Drive. I spend much of my time in Gmail and Google Calendar. While years ago, I downloaded all my email using Mozilla Thunderbird, I have not set up a regular backup strategy for these documents (Todo3). I’ve toyed with using Dropbox to back up Drive but think I’m better with an external drive. I have a Dropbox account because people occasionally share documents with me through it but at the moment, I only use it to backup my kids Minecraft games.
From 2007 to 2013, I used delicious to capture and share the things I read online. Then delicious tried to be the new Pinterest and made itself unusable (although it has since reverted back to close to its original form) and so I switched to Evernote (somewhat reluctantly because I missed the public aspect of sharing bookmarks). I’ve grown to be quite dependent on Evernote to save my outboard brain. I use IFTTT to post the links from my Twitter faves to delicious which are then imported automatically into Evernote. I also use IFTTT to automatically backup my Tumblr posts to Evernote, my Foursquare check-ins saved to Evernote (and Google Calendar) and my Feedly saved posts to Evernote. Have I established a system to back up my Evernote notes on a regular basis? No, no I have not (Todo4).
The overarching idea that I have come up with is that the things I write are backed up on my Google Drive account and the library of things that I have read or saved to future reading (ha!) are saved on Evernote. To this end, I use IFTTT to save my Tweets to a Google Spreadsheet and my Blogger and WordPress posts are automatically saved to Google Drive (still in a work in progress. Todo 5). My ISP is Dreamhost but I am tempted to jump ship to Digital Ocean.
My goal is to have at least one backup for the things I’ve created. So I use IFTTT to save my Instagram posts to Flickr. My Flickr posts are just a small subset of all the photos that are automatically captured and saved on Google Photos. No, I have not backed up these photos (Todo 6) but I have, since 2005, printed the best of my photos on an annual basis into beautiful softcover books using QOOP and then later, through Blurb. My Facebook photos and status updates from 2006 to 2013 have been printed in a lovely hardcover book using MySocialBook. One day I would like to print a book of the best of my blogged writings using Blurb, if just as a personal artifact.
Speaking of books, because I’m one of the proud and the few to own a KindleDX, I use it to read PDFs and most of my non-fiction reading. When I stumble upon a longread on the web, I use Readability’s Send to Kindle function so I can read it later without eyestrain. I’m inclined to buy the books that I used in my writing and research as Kindle ebooks because I can easily attach highlighted passages from these books to my Zotero account. My ebooks are backed up in my calibre library. I also use Goodreads to keep track of my reading because I love knowing what my friends are into.
I subscribe to Rdio and for those times that I actually spend money on owning music, I try to use Bandcamp. I’m an avid listener of podcasts and for this purpose use BeyondPod. Our Sonos system allows us to play music from all these services, as well as TuneIn, in the living room. The music that I used to listen to on CD is now sitting on an unused computer running Windows XP and I know if I don’t get my act together and transfer those files to an external drive soon those files will be gone for good.. if they haven’t already become inaccessible (*gulp*) (Todo 8).
For my “Todo list” I use Google Keep, which also captures my stray thoughts when I’m away from paper or my computer. Google Keep has an awesome feature that will trigger reminders based on your location.
So that’s My Setup. Let me know if you have any suggestions or can see some weaknesses in my workflow. Also, I’d love to learn from your Setup.
And please please please call me out if I don’t have a sequel to this post called The Backup by the time of next year's World Backup Day.
John Miedema: I’m a bit of a classification nut. It comes from my Dutch heritage. How do you organize files and emails into folders?
I’m a bit of a classification nut. It comes from my Dutch heritage — those Dutchies are always trying to be efficient with their tiny bits of land. It’s why I’m drawn to library science too. I think a lot about the way I organize computer files and emails into folders. It provides insight into the way all classification works, and of course ties into my Lila project. I’d really like to hear about your own practices. Here’s mine:
- Start with a root folder. When an activity starts, I put a bunch of files into a root folder (e.g., a Windows directory or a Gmail label).
- Sort files by subject or date. As the files start to pile up in a folder, I find stuff by sorting files by subject or date using application sorting functions (e.g., Windows Explorer).
- Group files into folders by subject. When there are a lot of files in a folder, I group files into different folders. The subject classification is low level, e.g, Activity 1, Activity 2. Activities that are expire are usually grouped together into an ‘archive’ folder.
- Develop a model. Over time the folder and file structure can get complex, making it hard to find stuff. I often resort to search tools. What helps is developing a model that reflects my work. E.g., Client 1, Client 2. Different levels correspond to my workflow, E.g., 1. Discovery, 2. Scoping, 3. Estimation, etc. The model is really a taxonomy, an information architecture. I can use the same pattern for each new activity.
- Classification always requires tinkering. I’ve been slowly improving the way I organize files into folders for as long as I’ve been working. Some patterns get reused over time, others get improved. Tinkering never ends.
(I will discuss the use of tagging later. Frankly, I find manual tagging hopeless.)
In the previous post I started to explore the metadata edit events dataset generated from 94,222 edit events from 2014 for the UNT Libraries’ Digital Collections. I focused on some of the information about when these edits were performed.
This post focuses on the “who” of the dataset.
All together we had 193 unique users edit metadata for one of the systems that comprise the UNT Libraries’ Digital Collections. This includes The Portal to Texas History, UNT Digital Library, and the Gateway to Oklahoma History.
The top ten most frequent editors of metadata in the system are responsible for 57% of the overall edits.Username Edit Events htarver 15,451 aseitsinger 10,105 twarner 4,655 mjohnston 4,143 atraxinger 3,905 cwilliams 3,490 sfisher 3,466 thuang 3,327 mphillips 2,669 sdillard 2,518
The overall distribution of edits per user looks like this.
As you can see it shows the primary users of the system and then very quickly tapers down to the “long tail” of users who have a lower number of edit events.
A quick look at the total number of users active for given days of the week across the entire dataset.Sun Mon Tue Wed Thu Fri Sat 40 95 122 122 123 97 39
There is a swell for Tue, Wed, and Thu in the table above. It seems to be pretty consistent, either you have 39,40 users, 95-97 users, or 122-123 unique users on a given day of the week.
In looking at how unique users were spread across the year, grouped into months, we got the following table and then graph.Month Unique Users January 54 February 73 March 64 April 61 May 44 June 40 July 48 August 50 September 50 October 84 November 49 December 36
There were some spikes throughout the year, most likely related to a metadata class in the UNT College of Information that uses the Edit system as part of their teaching. This is the October and February spikes in number of unique users. Other than that we are a consistently over 40 unique users per month with a small dip for the December holiday season when school is not is session.
In the previous post we had a heatmap with the number of edit events distributed over the hours of the day and the days of the week. I’ve included that graph below.
I was curious to see how the unique number of editors mapped to this same type of graph, so that is included below.User Status
Of the 193 unique metadata editors in the dataset, 135 (70%) of the users were classified as Non-UNT-Employee and 58 (30%) were classified as UNT-Employee. For the edit events themselves, 75,968 (81%) were completed by users classified with a status of UNT-Employee and 18,254 (19%) by users classified with the status of Non-UNT-Employee.User Rank Rank Edit Events Percentage of Total Edits (n=94,222) Unique Users Percentage of Total Users (n=193) Librarian 22,466 24% 16 8% Staff 12,837 14% 13 7% Student 41,800 44% 92 48% Unknown 17,119 18% 72 37%
You can see that 44% of all of the edits in the dataset were completed by users who were students. Librarians and Staff members accounted for 38% of the edits.
This is the second in a series of posts related to metadata edit events in the UNT Libraries’ Digital Collections. check back for the next installment.
As always feel free to contact me via Twitter if you have questions or comments.
Love of certainty is a demand for guarantees in advance of action. Ignoring the fact that truth can be bought only by the adventure of experiment, dogmatism turns truth into an insurance company. Fixed ends upon one side and fixed “principles” — that is authoritative rules — on the other, are props for a feeling of safety, the refuge of the timid, and the means by which the bold prey upon the timid.
In my social media feeds yesterday, I saw some friends and acquaintances say that they were reconsidering their attendance at DPLAfest, scheduled to be held in Indianapolis, IN, April 17-18, in light of the recent signing of SEA 101, or the “Religious Freedom Restoration Act,” into law by Governor Pence of Indiana. I must admit that as an openly gay employee at DPLA, I had an immediate and strong negative reaction. I was unhappy about my organization spending money in a place that would allow businesses not to serve me simply because I am gay.
However, after more thought and a night of sleep, I have come to a different conclusion. The passing of this law should make us all want to attend DPLAfest even more than we might have before. We should want to support our hosts and the businesses in Indianapolis who are standing up against this law, and we should make it clear that our money will only be spent in places that welcome all.
At DPLA, we have already begun to diligently ensure that all the venues we are supporting welcome all of the DPLA staff and community. Messages like these have already helped put our mind at ease about a number of our scheduled activities:
Stickers like the one below are going to help us know which businesses to support while we are in Indianapolis:
At DPLAfest, we will also have visible ways to show that we are against this kind of discrimination, including enshrining our values in our Code of Conduct. We encourage you to use this as an opportunity to let your voice and your dollars speak. Let’s use this as a time to support those businesses and venues that support true freedom, all while enjoying each other’s company and a great conference lineup!
DPLA Director for Content
Our second round of competition is complete, and only eight conferences remain standing! And yes, our tournament Cinderella, Big South, is still with us! Details below, but here are the Round of 16 results:Competition in this round was on book length – which conference has the thickest books?* Big South, continuing its magical tournament run, ended up with the thickest books of all the conferences, averaging about 292 pages and ousting the powerful Big Ten from the tournament! West Coast also continues on to the next round, with a convincing victory over the Ivy Leaguers! Summit League, Ohio Valley, Atlantic 10, Missouri Valley, and Big Sky will also move on to the Round of 8. Conference USA and American Athletic had the tightest battle, with Conference USA coming out on top by less than 10 pages!
While Big South had the thickest books of all the conferences competing in this round (averaging about 292 pages), the Ivy League had the thinnest books, averaging about 225 pages. Does this surprise you? It turns out that the larger the size of the collective collection, the thinner the books. Take a look at this:Big South had the smallest collective collection among the conferences competing in this round; the Ivy League had the largest. As the chart shows, there is a pretty strong correlation between collection size, and the percentage of the collection accounted for by books with less than 100 pages. Got any ideas why? Put them in the comments!
By the way, in case you were wondering, the average length of a print book in WorldCat is about 255 pages.
Bracket competition participants: Remember, if the conference you chose has been ousted from the tournament, do not despair! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!
The Round of 8 is next, where the tournament field will be reduced to just four conferences! Results will be posted March 31.
*Average number of pages per print book in conference collective collection. Data is current as of January 2015.More information: Brian Lavoie
Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.Mail | Web | LinkedIn | More Posts (11)
I’ve been using Vagrant & Virtualbox for development on my OS X machines for my solo projects. But in an effort to get an intern started up on developing a front-end to a project I started a while ago I ran into a really strange problem getting Vagrant working on Windows.
So as a tale of caution for whatever robot wants to pick up this bleg.
Bootcamp partition on a Mid-2010 MacBook Pro. Running a dormant OS X and a full Windows 7. The Windows 7 is the main environment:
Use the git bash shell since it has SSH to stand up the boxes with vagrant init, vagrant up.
And then stuck (similar to Vagrant stuck connection timeout retrying):==> default: Clearing any previously set network interfaces... ==> default: Preparing network interfaces based on configuration... default: Adapter 1: nat default: Adapter 2: hostonly ==> default: Forwarding ports... default: 22 => 2222 (adapter 1) ==> default: Booting VM... ==> default: Waiting for machine to boot. This may take a few minutes... default: SSH address: 127.0.0.1:2222 default: SSH username: vagrant default: SSH auth method: private key default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying...
Well we booted into the VM with a head and it looked like the booting got interrupted by some sort of kernal panic due to:Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.
Ok makes sense…the machine isn’t booting up and there has to be a reason why.
Long story short. The Windows 7 partition didn’t have virtualization enabled, and there is no BIOS setting or switch somewhere to do it. So what do you do:
Like waving a dead fish in front of your computer.
- Boot into OSX.
- System Preferences > Select the Start Up preference pane
- Select the Boot Camp partition with Windows
- Restart into the Boot Camp partition
Welcome to Part 2 of my two-part series introducing LibGuides CMS for use as a website. Read Part 1 (with comments from Springshare!). This companion piece was released February 27.
We can design surprisingly good websites with LibGuides 2.0 CMS. WordPress and Drupal are free and open source, but Springshare, the maker of LibGuides, also delivers reliable hosting and support for two grand a year. Moreover, even folks clueless about coding can quickly learn to maintain a LibGuides-based website because (1) the interface is drop-and-drag, fill-in-the-box intuitive, and (2) many academic librarians create research guides as part of their liaison duties and are already familiar with the system. Most importantly, libraries can customize LibGuides-based websites as extensively or minimally as available talent and time permits, without sacrificing visual appeal or usability–or control of the library’s own site.
The Albuquerque and Bernalillo County (ABC) Library is that rare public library that uses LibGuides. The homepage is beautifully laid out, with tons of neat customizations and a carousel that actually enhances UX, despite the load time. One of my favorite LibGuides sites!
The World Maritime University Library, run by the United Nations, has a beautifully minimalist blue-and-white look – classic Scandinavian. Like Google, the logo and search box are front and center; everything else is placed discreetly in tabs at the top and bottom of the homepage.
The American College of Greece’s John S. Bailey Library is text-heavy, but its navigation is as clear as the Aegean Sea. Note the absence of a federated search box, which, unless the algorithms are of search-engine caliber, tends to produce results that undergraduates find bewildering.
Even you have other priorities or skills, you can still create a quality LibGuides-based website without major customizations to the stylesheets. Hillsborough Community College Library and Harrison College both do nice jobs, albeit with LibGuides 1.0. Walters State Community College did hardly any deep customizing of LibGuides 2.0, but its site is perfectly functional.
My Library’s Website
Moving the Hodges University Library to LibGuides has followed a three-stage agile process.
1. September 2014. We upgraded the existing LibGuides CMS to LibGuides 2.0 and reorganized and enhanced existing content. Review my February 27 post for more on this first stage.
2. January 2015. We rolled out the new library homepage and associated pages, which unified the library’s entire web presence under LibGuides. Previously our homepage was designed and run by the university’s IT department using Microsoft SharePoint (ugh), so students could only access the homepage by signing into the university intranet–dreadful for accessibility. We also shuffled DNS records and redirects so that the homepage has a much cleaner URL (library.hodges.edu) than previously (https://myhugo.hodges.edu/organizations/org-libr/Pages/Home.aspx). The new site can be accessed by anyone from anywhere without logging into anything. #librarianwin
3. June 2015. We will roll out the next major iteration of our website, integrating OCLC’s new and improved WorldCat discovery layer, our new LibAnswers virtual reference service, and our revamped website to build better UX. The page header and federated search box will be optimized for mobile devices, as the rest of the site already is. Our motto? Continual improvement!
Have you used LibGuides as a website? What is your experience?
Today I found the following resources and bookmarked them on Delicious.
- Booktype Lets you produce beautiful, engaging books in minutes. Booktype is free and open source software that helps you write and publish print and digital books.
Digest powered by RSS Digest
Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.
From the announcement:
A Northeast Fedora User Group meeting will be held at Yale University on May 11-12. Monday May 11 will be an unconference style format with a lightning round in the afternoon. Tuesday May 12 will focus on Fedora 4 training led by Andrew Woods
Please register for this event by April 3 here: https://docs.google.com/forms/d/1b4ntNkhRuJvtNEfi0vXSjk7C9uuR3bwa2A8e3U-6w08/viewform
Unless you haven’t been out of your house for the past month, you know that it’s Girl Scout cookie season. The girls out tugging boxes of cookies around the neighborhood are learning all sorts of skills they’ll use later in life as political leaders, entertainers, astronauts, and athletes. Literally. For proof, check out this list of 25 of the most famous Girl Scouts while enjoying the last of your Thin Mints and Caramel Delights…until next year.
Banner image from Digital Commonwealth, Boston Public Library.
Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.
Siegfried is a PRONOM-based file format identification tool.
Key features are:
- complete implementation of PRONOM (byte and container signatures)
- reliable results (siegfried is tested against Ross Spencer’s skeleton suite and QA tested against DROID and FIDO output using http://github.com/richardlehane/comparator)
- fast matching without limiting the number of bytes scanned
- detailed information about the basis for format matches
- simple command line interface with a choice of outputs (YAML, JSON, CSV)
- a built-in server for integrating with workflows and language inter-op
- power options including debug mode, signature modification, and multiple identifiers.
- Siegfried - 1.0 22-Mar-2015
CrossRef: CrossRef Extends Management Team, Appoints Ginny Hendricks To Focus on Member and Community Outreach
26 March 2015, Lynnfield, MA - CrossRef, the global not-for profit digital hub for scholarly communications, is pleased to announce the addition of Ginny Hendricks to its management team in the newly-created role of Director of Member and Community Outreach, where she will be responsible for marketing, business development, member services, and product support. The appointment reflects CrossRef's mission to innovate for the future of scholarly content, and to foster collaboration among an increasingly diverse community of publishers, researchers, authors, libraries, funders, and beyond.
Executive Director, Ed Pentz, says "I'm very pleased Ginny is joining the CrossRef team; her international experience, background in scholarly publishing, and digital marketing expertise, make her the perfect person to spearhead the CrossRef brand, lead outreach around the world, and contribute to our ongoing success."
Ginny Hendricks says: "CrossRef is indispensible to the reliable running and progression of scholarly communications and its scope is broadening to accommodate changing publisher needs and serve the wider communities. I'm excited to work with some great people and to be able to contribute to such a central part of scholarly publishing."
Ginny has run Ardent Marketing for nine years where she consulted with publishers to develop multichannel marketing plans, brand and launch online products, and build engaged communities. Prior to consulting she managed the launch of Scopus at Elsevier, where she established advisory boards and outreach programs with library and scientific communities. In 1998 Ginny started an early e-resources help desk for Blackwell's information Services and later led training and communication programs for Swets' digital portfolio in Asia Pacific, Middle East, and Africa. She's lived and worked in many parts of the world and has managed globally dispersed creative, technical, and commercial teams. She co-hosts the Scholarly Social networking events in London, and is considering finishing her Master's of Science in Digital Marketing Communications. Ginny will start on Monday 30th March and can be reached via twitter @GinnyLDN or email firstname.lastname@example.org.
CrossRef (www.crossref.org) serves as a digital hub for the scholarly communications community. A global not-for profit membership organization of scholarly publishers, CrossRef's innovations shape the future of scholarly communications by fostering collaboration among multiple stakeholders. CrossRef provides a wide spectrum of services for identifying, locating, linking to, and assessing the reliability and provenance of scholarly content.
Contact: Ed Pentz at email@example.com.
View this news release on the CrossRef website.
Several times a year I attend meetings of the National Geospatial Advisory Committee, a federal advisory committee that reports to the chair of the Federal Geographic Data Committee. The NGAC pulls together participants from across academia, the private sector and all levels of government to advise the Federal government on geospatial policy and ways to advance the vision of a National Spatial Data Infrastructure. They held two days of meetings in DC on March 17 and 18, 2015 and I was happy to have the opportunity to attend.
We originally got involved with the group when two members of the GeoMAPP project team (Zsolt Nagy and Dennis Goreham) were named founding NGAC members (PDF) and we’ve kept up with it because of the wealth of information that comes out of the meetings about national geospatial policy initiatives.
The group’s membership changes over time, but in the past has included Jack Dangermond, the founder of Esri, and currently includes both Michael Jones of Google (one of the inventors of Google Earth) and Steve Coast, the founder of OpenStreetMap.
Julie Sweetkind-Singer, the Assistant Director of Geospatial, Cartographic and Scientific Data & Services at Stanford University libraries and a former principal investigator on the NDIIPP National Geospatial Digital Archive project, is now the Vice Chair of the group.
As usual, the committee covered a number of topic areas that have ramifications for the library, archive and museum digital stewardship communities.
FGDC Report/GAO Report
A chief area of discussion in the FGDC’s report to the attendees was the March 16 release of the Government Accountability Office report “Geospatial Data: Progress Needed on Identifying Expenditures, Building and Utilizing a Data Infrastructure, and Reducing Duplicative Efforts.” This is the second GAO report in the past 3 years on geospatial information, with the first, “Geospatial Information: OMB and Agencies Need to Make Coordination a Priority to Reduce Duplication,” having been released on November 26, 2012.
GAO’s objectives with the report were to
(1) describe the geospatial data that selected federal agencies and states use and how much is spent on geospatial data; (2) assess progress in establishing the National Spatial Data Infrastructure; and (3) determine whether selected federal agencies and states invest in duplicative geospatial data.
The report urged Congressional input towards a national addressing database, while also recommending that the Office of Management and Budget and associated federal agencies fully implement national spatial data infrastructure activities.
Crowd-Sourced Geospatial Data
Next came an interesting presentation on the concepts of crowd-sourced data, citizen science and volunteered geographic information, as well as crowd-sourced data initiatives happening inside the Federal government. It featured Sophia Liu, a Mendenhall Postdoc Fellow at the U.S. Geological Survey; Denice Ross, a Presidential Innovation Fellow at the Department of Energy; and Sean Gorman from Timbr.io.
Key questions that crossed each of the presentations included the challenges with integrating crowd-sourced data with agency-originated data while validating its integrity, as well as potential legal consequences when agencies rely on crowd-sourced data for action. One suggested way to address the validity question is to incorporate a “human-in-the-loop” to vet, edit or “massage” crowd-sourced data to ensure its accuracy and usability. See http://radar.oreilly.com/2015/02/human-in-the-loop-machine-learning.html for further info.
There was also a bit of discussion on the difference between “ambient” crowd-sourced data (think traffic data compiled from the location reports of cell phones) and volunteered geographic information such as that found in citizen-mapping initiatives such as OpenStreetMap.
Geospatial Privacy Subcommittee Report
The Geospatial Privacy Subcommittee of the NGAC is largely exploring the privacy challenges presented by Unmanned Aircraft Systems and as such is somewhat out of our purview. An important recent document on this front is “Presidential Memorandum: Promoting Economic Competitiveness While Safeguarding Privacy, Civil Rights, and Civil Liberties in Domestic Use of Unmanned Aircraft Systems” released on Feb. 15, 2015.
COGO Report card
COGO is the Coalition of Geospatial Organizations, a grouping of private sector geospatial organizations such as the American Society of Civil Engineers (ASCE), American Society for Photogrammetry and Remote Sensing (ASPRS), Association of American Geographers (AAG), National States Geographic Information Council (NSGIC) and a number of others.
On February 16, 2015 they published their first “Report Card on the U.S. National Spatial Data Infrastructure” (PDF). The report was written by an expert panel led by former Wyoming governor James E. Geringer (who presented the findings at the meeting). The focus of the initial report card is on the status of the seven FGDC “framework” data layers and how they are being maintained and accentuated to meet the needs of a national spatial data infrastructure. As the report says, “by evaluating the Federal government’s efforts to lead and coordinate the creation and maintenance of these data, this report reflects on how well the NSDI is meeting its goals.” According to COGO the student is not doing too well.
There was ample discussion on whether COGO was measuring the right thing (is it a measure of what’s actually getting done in a somewhat hostile budgetary environment, or are agencies being measured against an abstract standard of what should be done based on the original goals of the NSDI?) and whether this report could do more harm than good for acquiring future resources across the federal geospatial community.
During the discussion on the report it was noted that the 2016 President’s budget includes an increase of nearly $150 million for the USGS, including “an increase of $11 million for the USGS to support the community resilience toolkit, which is a web-based clearinghouse of data, tools, shared applications, and best practices for resource managers, decision-makers, and the public,” so at least there’s recognition that work does need to get done.
Geospatial Data Act of 2015
Finally, not on the meeting agenda but hanging over all the discussions was the “Geospatial Data Act of 2015,” introduced by Senators Hatch and Warner on March 16, 2015, the day prior to the start of the meeting. The text of the legislation is at https://www.congress.gov/bill/114th-congress/senate-bill/740/text, and my initial reading (note: I am not a lawyer!) is that it codifies in law things that are attempting to be implemented in current practice. Several important items in the proposed bill:
- Each covered agency shall include geospatial data as a capital asset for purposes of preparing the budget submission of the President.
- Each covered agency shall disclose each contract, cooperative agreement, grant or other transaction that deals with geospatial data on USAspending.gov.
- Greater OMB oversight, and a limitation on receiving future funds for data that does not conform to FGDC standards.
The next NGAC meeting is June 9-10, 2015. As always, they are open to the public.
VIAF has long interchanged data with Wikipedia, and the resulting links between library authorities and Wikipedia are widely used. Unfortunately we only harvested data from the English Wikipedia (en.wikipedia.org), so we missed names, identifiers and other information in non-English Wikipedia pages.
Fortunately the problem VIAF had with Wikipedia was similar to the problems that Wikipedia itself had in sharing data across language versions. Wikidata is Wikimedia's solution to the problem, and over the last year or two has grown from promising to useful. In fact, from VIAF's point of view Wikidata now looks substantially better than just working with the English pages. In addition to picking up many more titles for names, we are finding a million names that do not occur in the English pages, and names that match those in other VIAF sources has nearly doubled to 800 thousand from 440 thousand.
Since we (i.e. Jenny Toves) was reexamining the process, we took the opportunity to harvest corporate/organization names as well, something we have wanted for some time, so some 300K of the increase comes from those.
We expect to have the new data in VIAF in mid to late April 2015 and it is visible now in our test system at http://test.viaf.org.
The advantages we see:
- Much less bias towards English
- More entities (people and organizations)
- More coded information about the entities
- More non-Latin forms of names
- More links into Wikipedia
This will cause some changes in the data that are visible in the VIAF interface. One of these is that VIAF will link to the Wikidata pages rather than the English Wikipedia pages, and we are changing the WKP icon to reflect that ( to ). This means that Jane Austen's WKP identifier (VIAF's abbreviation for Wikipedia) will change from WKP|Jane_Austen to WKP|Q36322. Links to the WKP source page will change from
Although it is possible to jump from the Wikidata pages to Wikipedia pages in specific languages, we feel these links are important enough that we will be importing all the language specific Wikipedia page links we find in the Wikidata. These will show up as 'external links' in the interface in the 'About' section of the display.
A commonly used bulk file from VIAF is the 'links' file that shows all the links made between VIAF identifiers and source file identifiers (pointers to the bulk files can be found here). The links file includes external links, so the individual Wikipedia pages will show up in the file along with the Wikidata WKP IDs. Here are some of the current links in the file for Lorcan Dempsey:
. . .
The new file will change to:
. . .
Lorcan only has one Wikipedia page, the English language one. Jane Austen has more than a hundred, and all those links will be there.
Of course, this also means some changes to the RDF view of the data. We're still working on that and will post more information when we get it closer to its final form.