You are here

Feed aggregator

LITA: 2017 LITA Forum – Full Schedule Now Live

planet code4lib - Fri, 2017-10-27 17:20

The full schedule for the 2017 LITA Forum is now on the website. You can discover sessions of particular interest for yourself by time and day, or subject track, or meeting room. You’re sure to find so many sessions you really want to attend.

2017 LITA Forum
Denver, CO
November 9-12, 2017
#litaforum

Participate with your LITA and library technology colleagues for the excellent networking opportunities at the 2017 LITA Forum.

Register Now!

  • Online registration closes Sunday November 5th, 2017

Conference highlights in addition to all the excellent sessions.

Keynote Speakers:

The Preconference Workshops:

Details

Join us in Denver, Colorado, at the Embassy Suites by Hilton Denver Downtown Convention Center, for the 2017 LITA Forum, a three-day education and networking event featuring 2 preconferences, 2 keynote sessions, more than 50 concurrent sessions and 15 poster presentations. It’s the 20th annual gathering of the highly regarded LITA Forum for technology-minded information professionals. Meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Attendees take advantage of the informal Friday evening reception, networking dinners, game night, and other social opportunities to get to know colleagues and speakers.

Get the latest information, register and book a hotel room at the 2017 Forum Web site.

We thank our LITA Forum Sponsors:

ExLibrisGoogleAtenBiblioCommons

Questions or Comments?

Contact LITA at (312) 280-4268 or Mark Beatty, mbeatty@ala.org

See you in Denver.

OCLC Dev Network: Upcoming Changes to VIAF XML Structure

planet code4lib - Fri, 2017-10-27 15:30

The VIAF XML structure is changing soon to no longer include the "covers" element.

District Dispatch: Quick history of the Museum and Library Services Act

planet code4lib - Fri, 2017-10-27 14:36

Director of the Washington Office Germaine Krettek (far left) and Executive Director of the ALA David Clift (second to left, behind Ms. Krettek) with President Lyndon Johnson (far right) in 1964.

When we talk about saving the Institute of Museum and Library Services (IMLS) and Library Services and Technology Act (LSTA) funding, we are also talking about the Museum and Library Services Act (MLSA), a piece of legislation with a 53-year long history.

Here’s a quick background:

1964: The 36th U.S. President Lyndon B. Johnson signed the Library Services and Construction Act (LSCA), which enabled the federal government to provide direct aid for public libraries. This act and its successors represent the largest federal investment in public libraries.

1996: In its nearly forty-year history, the LSCA underwent numerous reauthorizations until 1996 when it was replaced by the LSTA, enacted within the Museum and Library Services Act (MLSA), the current legislation that authorizes funding for libraries across the nation. The MLSA of 1996 was sponsored by Republican Representative Bill Young from Florida’s 10th congressional district. It established IMLS, which combined the Institute of Museum Services (which had been in existence since 1976) and the Library Programs Office (which had been part of the Department of Education since 1956) and, in 1996, authorized federal funding for IMLS and LSTA through FY 2002.

2003: The 43rd U.S. President George W. Bush signed MLSA into law again. It was sponsored by Republican Representative Peter Hoekstra from Michigan’s 2nd congressional district and authorized federal funding for LSTA through FY 2009.

2010: The 44th U.S. President Barack Obama again renewed MLSA. This time, the legislation was sponsored by Democrat Senator Jack Reed of Rhode Island and authorized federal funding for LSTA through FY 2016.

There are seven instances when MSLA was introduced and did not advance in previous Congresses:

2015-2016 (114th Congress) S. 3391 Museum and Library Services Act of 2016Sen. Reed Jack (RI) D

2009-2010 (111th Congress) S. 3984 Museum and Library Services Act of 2010 Sen. Reed Jack (RI) D 2003-2004 (108th Congress) S. 888 Museum and Library Services Act of 2003 Sen. Gregg Judd (NH) R 2003-2004 (108th Congress) S. 238 Museum and Library Services Act of 2003 Sen. Reed Jack (RI) D 2003-2004 (108th Congress) H.R. 13 Museum and Library Services Act of 2003 Rep. Hoekstra Peter (MI-2) R 2001-2002 (107th Congress) S. 2611 Museum and Library Services Act of 2002 Sen. Reed Jack (RI) D 2001-2002 (107th Congress) H.R. 3784 Museum and Library Services Act of 2002 Rep. Hoekstra Peter (MI-2) R

The post Quick history of the Museum and Library Services Act appeared first on District Dispatch.

Galen Charlton: Amelia, 2000-2017

planet code4lib - Thu, 2017-10-26 20:52

Mellie on a blue blanket

Last year, I wrote about the blossoming of the Mellie-cat, and closed with this line: “Sixteen years is not long enough to get to know a cat.”

It turns out that neither is seventeen and a half years.

Mellie passed away today after a brief illness. She is the last of my first set of cats, daughter of Erasmus and LaZorra, sister of Sophia. In the last year of her life, she trained Freddie how to cat; while she perhaps did not have the most apt of pupils, I know that he will miss her too.

She was the bravest cat I have ever known. She was not inclined to pounce on the world and take it in full; she was reserved and cautious… and yet she always showed up to observe, no matter how unfamiliar the strangers or unusual the circumstances.

Amelia is a grand name for a cat, but perhaps too grand for daily use. She was Mellie most days, but like many cats had accumulated a number of names and sobriquets throughout her life. The Clown Princess. Senior Member of the Treat Committee. Inspector of the Feets. Her mother’s special daughter. The softest and fluffiest.

And so another cat joins the realm of story.

It never gets any easier to mark that transition.

District Dispatch: Congratulations to the 2017 Libraries Ready to Code Cohort

planet code4lib - Thu, 2017-10-26 15:15

This morning ALA and Google announced the new cohort of libraries that will participate in the Phase III of the Libraries Ready to Code initiative. These libraries will work together to design and implement coding programs for young people to promote computer science (CS) and computational thinking among youth.

Libraries in the new Ready to Code cohort will work together to develop a toolkit of coding resources that will be released in April 2018.

This is the first time ALA has dedicated funding for CS programs in libraries. As ALA President Jim Neal put it, this really is a landmark for libraries:

“The Libraries Ready to Code grants are a landmark investment in America’s young people and in our future,” said ALA President Jim Neal. “As centers of innovation in every corner of the country, libraries are the place for youth – especially those underrepresented in tech jobs – to get the CS skills they need to succeed in the information age. These new resources will help cultivate problem-solving skills, in addition to coding, that are at the heart of libraries’ mission to foster critical thinking.”

As noted throughout the Ready to Code project, libraries are filling a crucial opportunity gap for millions of kids, especially those from backgrounds that are underrepresented in CS careers – girls, rural residents, those from low-income communities, young people of color or with disabilities. Fewer than half of U.S. K-12 schools offer computer science classes. Yet even students who are fortunate enough to have such programs at their schools need places outside the classroom to practice coding skills. Libraries are ideal places to provide equitable access to technology and training.

Just as important as the seed money to build CS programs is the guidance the cohort members will receive from each other as a community of practice, along with support from Google and ALA. The community will work together to create a national CS educational toolkit made up of resources and activities that they find most useful for youth CS programming. It will also include an implementation guide to help libraries learn how to use and customize the resources for their unique library/community. Developed by U.S. libraries, for libraries, the toolkit will be released in conjunction with National Library Week in April 2018.

As Google program manager Nicky Rigg put it, this program is “not meant to transform librarians into expert programmers but to support them with the knowledge and skills to do what they do best: empower youth to create, problem solve and develop the confidence and skills to succeed in their future careers.”

The cohort will meet during ALA’s 2018 Midwinter Meeting and Exhibits for a hands-on workshop, where they will share best practices and refine the toolkit.

The Libraries Ready to Code grants are just one part of an ongoing collaboration between OITP and Google. While this cohort of libraries is building their coding programs, a cohort of Master of Library and Information Science (MLIS) faculty is building a curriculum to prepare LIS students to facilitate coding programs for young people in their future careers.

More and more libraries are getting Ready to Code, and the future looks promising!

The post Congratulations to the 2017 Libraries Ready to Code Cohort appeared first on District Dispatch.

Evergreen ILS: It Takes a Village

planet code4lib - Thu, 2017-10-26 14:26

Lyrasis with the support of an IMLS grant hosted a forum for open source software in libraries called “It Takes a Village.”  Lyrasis brought together a variety of leaders in open source projects to discuss issues related to the creation, management, and sustainability of open source systems.  The open source projects included but were not limited to document and image repositories, integrated library systems, course management software, archival description applications, and electronic resource management systems used in cultural institutions.

I attended as chair of the Oversight Board to represent the Evergreen community.  The goal of the forum was to identify tools and approaches which would assist organizations in starting up open source projects and managing them long term.

The discussion began with representatives evaluating their stage of their project.  Lyrasis provided the following stages.

  • New/Starting Up
  • Growing
  • Self Sustaining
  • Maintenance
  • End of Life

The discussion focused on what does the above categories meant.  Are they related to the community, organization or technology?  Does the cycle apply to individual versions of the software or the life cycle of the project?   Is there a difference between self sustaining and maintenance?  Some of the attendees seemed to want to focus on the development cycle but not the life cycle of the project.  Others focused on the life cycle of the project.

The forum also discussed and identified the top level issues for open source projects.  The participants identified:

  • Governance
  • Resources
  • Technology
  • Community

Each project identified where they were on a scale of 1-10 for each category.  The groups who rated themselves similarly then discussed their issues in each category.  We then discussed  what was needed to advance a  project in these areas especially if it seemed to be a stand out issue for the individual’s project.  We also discussed what tools might assist a project to advance their project in these areas.

The forum was strongly represented by large academic research institutions even though Lyrasis was looking for open source projects across all cultural institutions. Most of the projects coming out of academia had a top down approach to management which is contrary to the bottom up approach Evergreen, Wikipedia, and Koha have taken.  There are a variety of reasons for this.  Some of the projects were developed and promoted by one or two individuals and or institutions and the end users of these applications were looking for leadership from these individuals so the projects naturally evolved this way.  Others, were deliberately structured this way from the beginning.  With top down control, there are gatekeepers who strictly control feature development and manage the project.  All the projects were at various stages.

Lyrasis will be writing a report that will come out next year with the goal of identifying a process for developing and possibly managing open source projects.

Projects Attending

  • Archivematica
  • ArchivesSpace
  • Avalon Media System
  • BitCurator
  • Blacklight
  • Collective Access
  • ConservationSpace
  • Coral
  • DSpace
  • Duraspace
  • Fedora
  • Islandora
  • Knowledge Project
  • Koha
  • Library Simplified
  • LOCKSS
  • MetaArchive
  • OLE/Folio
  • Omeka
  • Public Knowledge Project
  • Sakai
  • Samvera
  • Specify Software Project
  • Vega
  • VuFind
  • Wikimedia (Wikipedia)

Open Knowledge Foundation: How Wikimedia helped authors make over 3000 articles green open access via Dissemin

planet code4lib - Thu, 2017-10-26 08:30

In light of this year’s Open Access week, Michele Marchetto of Wikimedia Italia shares the story of how they helped authors to make their open access articles more widely available. This post has been cross-posted from Wikimedia Italia.

Wikipedia is probably the most effective initiative in the world to increase the readership of academic literature: for instance, wikipedia.org is a top 10 source of clicks for doi.org. Wikipedia contributors are among the biggest consumers of scientific publications in the world, because Wikipedia articles are not allowed to be primary sources: the five pillars allow anyone to edit but require copyleft and a neutral point of view based on reliable sources. Readers are advised to trust what they read only insofar it’s confirmed by provided sources. So, does free culture need all sources to be accessible, affordable and freely licensed?

Open access

Scholarly sources, while generally high quality, are problematic for Wikipedia users in that they are often paywalled and ask for hefty payments from readers. Open access wants research output to be accessible online without restrictions, ideally under a free license, given it’s produced by authors, reviewers and editors “for free” (as part of their duties). This includes papers published in journals and conference proceedings, but also book chapters, books, experiment data.

cost-effective open science infrastructure is possible but requires political will and proprietary private platforms grow to fill unmet needs, but authors can make their works green open access autonomously and for free, thanks to open archives and publisher or employer policies. The problem is, how much effort does it take? We tried to find out.

The easy way out

In the past year we saw many developments in the open access landscape. On the reading side, DOAI and then oaDOI plus Unpaywall have made it possible to access some 40 % of the literature in just one click, collecting data from thousands of sources which were formerly rather hard to use. It was also proven that cancelling subscriptions produces little pain.

On the authoring side, the SSRN fiasco paved the way to various thematic open archives and general-purpose repositories like Zenodo (offered by OpenAIRE and CERN), that make sure that an open access platform is available for all authors in the world, whatever their outputs. Publishers begin to understand the importance of metadata, although much work needs to be done, and the Open Access button staff helps connect with authors.

Finally, the web platform Dissemin put ORCID and all the above initiatives together to identify 36 million works which could benefit from green open access. Authors can deposit them from Dissemin to an open archive in a couple clicks, without to need to enter metadata manually. With the possibility of a “legal Sci-Hub” within our reach, what does it take to get the authors to help?

Frontpage of the Dissem.in platform

Wikimedia Italia takes initiative

Wikimedia projects contributor Federico Leva, frustrated at the number of pay-walled articles linked from the Italian and English Wikipedia, decided to contact their authors directly. Using the available data, almost half a million depositable articles by a million authors were found. An email was sent to each of them where possible: the message thanked them for contributing sources to Wikipedia, presented them with the dilemma of a simple volunteer editor who wants to link an open access copy for all Wikipedia users to see, and asked to check the publication on Dissemin to read more about its legal status and to deposit it.

The response has been overwhelmingly positive: over 15 % of the recipients clicked the links to find out more, thousands wrote encouraging replies, over 3000 papers were deposited via Dissemin in two months. Wikimedia Italia, active since 2008 in open access, covered the costs (few hundreds euro on phplist.com) and provided its OTRS instance to handle replies. With AISA’s counsel, hundreds of support requests have been handled (mostly about the usual pains of green OA, such as locating an appropriate manuscript).

Tell me a story

Our reasoning has been driven by examples such as the story of Jack Andraka, which showed how open access can change the world. Jack, as high school student, proposed a cheap method for an early diagnose of pancreatic cancer.

Jack’s research, like every invention, is based on previous scientific results. Jack was not affiliated to any research entity and was not able to access paywalled research, but he was able to consult the extensive body of open access research provided by NIH’s PubMed Central, which is often in the public domain or under a free Creative Commons license. Jack’s story was a potent message in mass media on how open access can save lives.

Some reactions and what we learnt

The authors’ responses taught us what makes a difference:

  • make deposit easy and authors will love open archives;
  • focus on their own work and its readership;
  • show the concrete difference they can make, rather than talk abstractly about open access;
  • lead by example: list other colleagues who archived papers from the same journal;
  • some will adopt a free Creative Commons license to facilitate further reuse, if told about it.

More warmth came from Peter Suber’s supportJohn Dove’s proposal for OA journals to accelerate depositing of papers they reference and a lively discussion.

Surprisingly many authors simply don’t know about green open access possibilities: they just need to hear about it in a way that rings true to their ears. If you work with a repository, an OA journal or other, you have a goldmine of authors to ask for deposits and stories relevant to them: why not start doing it systematically? If you are a researcher, you can just search your name on Dissemin and see what is left to make open access; when you are done, you can ask your colleagues to do the same.

It’s simple and, as with Jack Andraka, you can really change the world around us.

Ted Lawless: SPARQL to Pandas Dataframes

planet code4lib - Thu, 2017-10-26 04:00
Using Pandas to explore data SPARQL

Pandas is a Python based power tool for munging and analyzing data. While working with data from SPARQL endpoints, you may prefer to explore and analyze it with pandas given its full feature set, strong documentation and large community of users.

The code below …

DuraSpace News: Next Steps: A Fedora API Test Suite

planet code4lib - Thu, 2017-10-26 00:00

As described in the the Fedora API specification (specification charter), the specification is designed to:

  • Define the characteristics and expectations of how clients interact with Fedora implementations

  • Define such interactions such that an implementation's conformance is testable

LITA: Jobs in Information Technology: October 25, 2017

planet code4lib - Wed, 2017-10-25 19:03

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Davidson College, Library Director, Davidson, NC

Northern Arizona University Cline Library, Digital Access Librarian (position 603453), Flagstaff, AZ

California State University, Sacramento. University Library, Head of User Services, Sacramento, CA

Colorado State University Libraries, Head, Digital Repository Unit, Fort Collins, CO

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Dan Cohen: Humility and Perspective-Taking: A Review of Alan Jacobs’s How to Think

planet code4lib - Wed, 2017-10-25 17:19

In Alan Jacobs’s important new book How to Think: A Survival Guide for a World at Odds, he locates thought within our social context and all of the complexities that situation involves: our desire to fit into our current group or an aspirational in-group, our repulsion from other groups, our use of a communal (but often invisibly problematic) shorthand language, our necessarily limited interactions and sensory inputs. With reference to recent works in psychology, he also lays bare our strong inclination to bias and confusion.

However, Jacobs is not by trade a social scientist, and having obsessed about many of the same works as him (Daniel Kahneman’s Thinking, Fast and Slow looms large for both of us), it’s a relief to see a humanist address the infirmity of the mind, with many more examples from literature, philosophy, and religion, and with a plainspoken synthesis of academic research, popular culture, and politics.

How to Think is much more fun than a book with that title has the right to be. Having written myself about the Victorian crisis of faith, I am deeply envious of Jacobs’s ability to follow a story about John Stuart Mill’s depression with one about Wilt Chamberlain’s manic sex life. You will enjoy the read.

But the approachability of this book masks only slightly the serious burden it places on its readers. This is a book that seeks to put us into uncomfortable positions. In fact, it asks us to assume a position from which we might change our positions. Because individual thinking is inextricably related to social groups, this can lead to exceedingly unpleasant outcomes, including the loss of friends or being ostracized from a community. Taking on such risk is very difficult for human beings, the most social of animals. In our age of Twitter, the risk is compounded by our greater number of human interactions, interactions that are exposed online for others to gaze upon and judge.

So what Jacobs asks of us is not at all easy. (Some of the best passages in How to Think are of Jacobs struggling with his own predisposition to fire off hot takes.) It can also seem like an absurd and unwise approach when the other side shows no willingness to put themselves in your shoes. Our current levels of polarization push against much in this book, and the structure and incentives of social media are clearly not helping.

Like any challenge that is hard and risky, overcoming it requires a concerted effort over time. Simple mental tricks will not do. Jacobs thus advocates for, in two alliterative phrases that came to mind while reading his book, habits of humility and practices of perspective-taking. To be part of a healthy social fabric—and to add threads to that fabric rather than rend it—one must constantly remind oneself of the predisposition to error, and one must repeatedly try to pause and consider, if only briefly, the source of other views you are repulsed by. (An alternative title for this book could have been How to Listen.)

Jacobs anticipates some obvious objections. He understands that facile calls for “civility,” which some may incorrectly interpret as Jacobs’ project, is often just repression in disguise. Jacobs also notes that you can still hold strong views, or agree with your group much of the time, in his framing. It’s just that you need to have a modicum of flexibility and ability to see past oneself and one’s group. Disagreements can then be worked out procedurally rather than through demonization.

Indeed, those who accept Jacobs’s call may not actually change their minds that often. What they will have achieved instead, in Jacobs’s most memorable phrase, is “a like-hearted, rather than like-minded,” state that allows them to be more neighborly with those around them and beyond their group. Enlarging the all-too-small circle of such like-hearted people is ultimately what How to Think seeks.

HangingTogether: Take the OCLC-euroCRIS survey of research information management practices

planet code4lib - Wed, 2017-10-25 13:54

Research information management (RIM), also often called Current Research Information Systems (CRISs), is the aggregation, curation, and utilization of metadata about research activities. Institutional RIM adoption, in tandem with activities by publishers, funders, and libraries, can help to reliably connect a complex scholarly communications landscape of researchers, affiliations, publications, datasets, grants, projects, and their persistent identifiers.

OCLC Research recognizes that libraries today are playing a larger role in research information management at institutions worldwide, and is conducting research on behalf of the library community in order to better understand library roles and institutional needs in this rapidly changing ecosystem.

This year we have partnered with euroCRIS as well as with librarians from OCLC Research Library Partnership institutions to develop a survey to help us understand and report on the state of RIM activities worldwide.

We invite universities, research institutes, and other research organizations to participate in the survey, and to help us learn more about:

  • Why have institutions adopted–or are considering adopting–RIM infrastructures?
  • How are institutions using RIM functionality? What are the principal uses?
  • Who are institutional stakeholders, and what, in particular, is the role of libraries?
  • What processes and systems are in use? How do they interoperate with internal and external systems? What is the scope?
  • What are the principal drivers?
  • What are regional and international differences in drivers, uses, and processes?

Survey findings and data will be published CC-BY in 2018.

Take the survey.

More information, including a PDF copy of the survey, is available at oc.lc/rim. A Spanish language version of the survey will be available in November, thanks to partners at CONCYTEC, the Peruvian National Council for Science, Technology and Technological Innovation, which will be using this instrument to assess RIM practices in Peruvian universities and institutes, as it seeks to establish a national RIM infrastructure.

I also invite you to read the recent OCLC Research position paper titled Research Information Management: Defining RIM and the Library’s Role. This publication, prepared by OCLC Research and a working group of librarians representing OCLC Research Library Partnership institutions, is intended to help libraries and other institutional stakeholders understand developing research information management practices—and particularly the value add that libraries can offer in a complex ecosystem.

Open Knowledge Foundation: eLife: Facilitating data validation & reuse with goodtables

planet code4lib - Wed, 2017-10-25 08:00

Open Knowledge International is working on the Frictionless Data project to remove the friction in working with data. We are doing this by developing a set of tools, standards, and best practices for publishing data. Through a series of pilots, we are working directly with organizations to solve real problems managing data.  eLife is a non-profit organisation with a mission to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. eLife publishes important research in all areas of the life and biomedical sciences. The research is selected and evaluated by working scientists and is made freely available to all readers.  In this blog, Jo Barrat, Adria Mercader and Naomi Penfold share learnings from  a pilot of Frictionless Data’s goodtables on data shared with eLife.

eLife is a non-profit organisation with a mission  to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science. eLife publishes important research in all areas of the life and biomedical sciences. The research is selected and evaluated by working scientists and is made freely available to all readers.

“Ensuring data availability alone is insufficient for the true potential of open data to be realised. The push from journals and funders at the moment is to encourage sharing, which is the first step towards reuse. The next step is to consider how we ensure actually reusable data. Any efforts to make it easier for researchers to prepare high quality reusable datasets, and to do so with minimal effort, are welcome. Further, tools that reduce the burden of reviewing datasets are of interest to data publishers.”
– 
Naomi Penfold, eLife

Use Case

Data sharing is an important cornerstone in the movement towards more reproducible science: it provides a means to validate assertions made, which is why many journals and funders require that research data is shared publicly and appropriately within a reasonable timeframe following a research project. At eLife, authors are encouraged to deposit their data in an appropriate external repository and to cite the datasets in their article or, where this is not possible or suitable, publish the source data as supplements to the article itself. The data files are then stored in the eLife data store and made available through download links available within the article.

Source data shared with eLife is listed under the Figures and data tab. Source: Screenshot from eLife 2017;6:e29820.

Open research data is an important asset in the record of the original research, and its reuse in different contexts help make the research enterprise more efficient. Sharing and reuse of research data is fairly common, and researchers may reuse others’ data more readily than they might share their own.

The exact nature of data reuse however is less clear: forty percent of Wellcome Trust-funded researchers make their data available as open access, and three-quarters report reusing existing data for validation, contextualisation, methodological development, and novel analyses, for example (Van den Eynden et al, 2016). Interestingly, a third of researchers who never publish their own data report reusing other researcher’s open data (Treadway et al, 2016) and dataset citation by researchers other than the original authors appears to be growing at least in line with greater availability of data (for gene expression microarray analysis; Piwowar & Vision, 2013). However, only a minority of citations (6% of 138) pertained to actual data reuse when citation context was manually classified in this study.

Indeed, the quality of the data and its documentation were listed as important factors when Wellcome Trust-funded researchers were deciding whether to reuse a dataset or not (Van den Eynden et al, 2016).

Very few formal studies that look into the problems faced by researchers when attempting to reuse open data have been published. Anecdotal evidence from conversations with life scientists indicates that:

  1. The process of preparing open data for reuse — including cleaning, restructuring, and comparing multiple datasets prior to combining — is onerous and time-consuming.
    The time and effort it takes for researchers to prepare their own data for repository deposition is considered a barrier to sharing. Further, the quality of the data and its documentation are important factors when deciding whether to reuse a dataset or not. (Van den Eynden et al, 2016)
    This is why projects that improve the reusability of research data in a way that requires minimal effort on the researcher’s part are of interest within the eLife Innovation Initiative.
  2. There is also a sparsity of formal structures for secondary data users to openly collaborate with original data providers, to share the work of improving quality of open research data. Such infrastructure could provide the social and academic feedback cycle in a rapid enough timescale to fuel a rich and useful Open Data ecosystem. While the utility of goodtables does not extend to this use case, it is the first step along this pathway.

These problems are relevant not only to open data in academic research but also to government data. Similarly to moving beyond incentivising sharing data to encouraging sharing reusable data for research, we shouldn’t only incentivise governments for raw publication and access. We need to incentivise data quality, towards actual insight and change. Without a simple, solid foundation of structural integrity, schematic consistency, and timely release, we will not meet quality standards higher up in the chain. We need to have essential quality assurances in plain text publication of data first, for data that is published via manual and automated means.

We shouldn’t only incentivise governments for raw publication and access. We need to incentivise data quality, towards actual insight and change.

For our Frictionless Data pilot work, we analyzed 3910 articles, 1085 of which had data files. The most common format was Microsoft Excel Open XML Format Spreadsheet (xlsx), with 89% of all 4318 files being published on this format. Older versions of Excel and CSV files made up the rest.

A summary of the eLife research articles analysed as part of the Frictionless Data pilot work

In terms of  validation, more than three quarters of the articles analyzed contained at least one invalid file. Following analysis of a sample of the results, the vast majority of the errors appear to be due to the data being presented in aesthetically pleasing tables, using formatting to make particular elements more visually clear, as opposed to a machine-readable format.

Data from Maddox et al. was shared in a machine-readable format (top), and adapted here to demonstrate how such data are often shared in a format that looks nice to the human reader (bottom). Source: Source data The data file is presented as is and adapted from Maddox et al. eLife 2015;4:e04995 under the Creative Commons Attribution License (CC BY 4.0).

This is not limited to the academic field of course, and the tendency to present data in spreadsheets so it is visually appealing is perhaps more prevalent in other areas – perhaps because consumers of the data are even less likely to have the data processed by machines or because the data is collated by people with no experience of having to use it in their work.

Work to improve the reusability of research data pushes towards an ideal situation where most data is both machine-readable and human-comprehensible. In general the eLife datasets had better quality than for instance those created by government organisations, where structural issues such as missing headers and extra cells are much more common. So although the results here have been good, the community may derive greater benefit from researchers going that extra mile to make files more machine-friendly and embrace more robust data description techniques like Data Packages.

Overall, the findings from this pilot demonstrate that there are different ways of producing data for sharing: datasets are predominantly presented in an Excel file with human aesthetics in mind, rather than structured for use by a statistical program. We found few issues with the data itself beyond presentation preferences. This is encouraging and is a great starting point for venturing forward with helping researchers to make greater use of open data.

You can read more about this work in the Frictionless Data Pilot writeup.

Parts of this piece are cross-posted on eLife Labs.

Evergreen ILS: Evergreen 3.0 development update #17: aloft

planet code4lib - Wed, 2017-10-25 01:17

Mallard (Male), Reifel Migratory Bird Sanctuary, Ladner, British Columbia. CC-BY-SA image by Alan D. Wilson from Wikimedia Commons.

Since the previous update in this series noting steps along the path to 3.0, a lot has happened. Evergreen 3.0.0 was released on 3 October and 3.0.1 was released on 19 October. Many libraries are now laying plans for their upgrades to 3.0.

On the one hand, a major release is an important event; on the other hand, life (and Evergreen development) carries on, step by step, patch by patch, bug report by bug report, thought by thought. Tomorrow the Evergreen community will select the release manager for 3.1, the spring 2018 release, which will be the best Evergreen release yet.

But for this final update post for the 3.0 series, I would like to express a personal note: gratitude. Gratitude for those who directed their time, their money, their consideration, their passion towards improving Evergreen; for those who use Evergreen every day; for those who work to maintain our community.

Thank you for helping to launch 3.0 aloft.

Duck trivia

The fastest duck ever recorded was a merganser observed flying in excess of 100 mph.

DuraSpace News: AVAILABLE: VIVO 1.9.E

planet code4lib - Wed, 2017-10-25 00:00
From Graham Triggs, Technical Lead for VIVO   On behalf of the VIVO team, I am pleased to announce that VIVO 1.9.E is now available.Downloads are available on the GitHub release page:

District Dispatch: ALA files E-rate comments with the FCC about C2

planet code4lib - Tue, 2017-10-24 20:52

ALA, along with over 140 libraries and library organizations, submitted comments to the Federal Communications Commission on its Public Notice seeking information about the E-rate program category two (C2) budgets yesterday.

ALA’s comments highlight the fact that libraries are assured access to critical funding for internal connections for the first time in many cases in 15 years. This is true whether you are a small rural library in Montana or a large urban library in Pennsylvania. ALA strongly supports the current C2 budget process.

We note, however, that it is premature for the FCC to make comprehensive assumptions about why libraries are not applying for C2, as there are many different reasons why not. We heard from the ALA E-rate Task Force that planning for an equipment upgrade can take a long lead time, which does not correspond to the E-rate application window. Libraries may also be preparing for new construction and holding off on applying until those plans are finalized.

Our comments also raise several issues that should the FCC address them, we think it likely that more libraries can make use of their C2 budget allotment. In brief, these are:

  • The FCC should direct USAC to accept the IMLS square footage data to determine the library budget without further PIA review;
  • Allow applicants to spend their C2 funding over two years; and
  • Review and fund C1 and C2 applications at the same time.

While we do not have any indication that the current $2.30 per square foot is insufficient, we do ask the FCC to add another IMLS locale code to the pool of libraries eligible to receive $5 per square foot.

We look forward to working with the FCC as they move forward with determining how to address issues raised in the ALA comments as well as those from more than 140 libraries that also submitted comments. Having that number of libraries on record voicing their support for E-rate and C2 is phenomenal. Thank you all for stepping up!

Of the libraries that filed yesterday, many have submitted personal stories that paint the picture of library users across the country downloading job applications, applying for financial aid, seeking health information, and communicating with family – all dependent on a strong WiFi signal.

For example, the Vigo County Public Library in Terre Haute, IN filed a comment that goes into detail about the library’s mission and the way high-speed internet has impacted the library’s customers:

“Whether looking for a new job, taking classes, connecting with family and friends, or simply downloading a book to read, the service we are able to provide is appreciated by our community. We would not be able to offer the range of programming and other services without the funding provided by Category 2 to upgrade our equipment.”

North Carolina’s Durham County Library’s comment tells the story of a patron who had complained about the Wi-Fi but wrote in again to complement the library on its upgrades. The letter also has a great quote:

“A library without reliable internet access is almost like a library without books.”

Robertson County Public Library, which is the smallest Kentucky county by both population and geographic area, explained the importance of Wi-Fi to her community.

“The nearest college is 25 miles away and many of our patrons use our WiFi for their classwork. There is no industry in our county so a large percentage of people travel out of the county to work. They can be seen using our WiFi in the library parking lot after hours, many nights per week.”

Patrons are also voicing their support. A woman in New York wrote in who relied on Wi-Fi at Bennington Free Library in Vermont while unemployed during the recession.

The Texas Library Association states, “College students, jobseekers, K-12 students, small business owners, entrepreneurs, researchers, seniors, and community members across Texas rely on libraries as a vital link to online information and resources… Libraries must continue to invest significant resources in costly internet equipment and services, and expand their WiFi and internal connectivity to offer enhanced access to their communities.”

Want to file comments? You still can! We have until November 7 to file reply comments. Commenting at the FCC takes a few steps, but it’s not challenging. Here’s what to do:

  • Format your response as a PDF document (Use letterhead!).
  • Go to https://www.fcc.gov/ecfs/filings
  • For the Proceeding Number, enter the following proceeding numbers: 13-184
  • Complete the rest of the information on the form.
  • Select “REPLY TO COMMENTS” in the drop-down menu under “Type of Filing”
  • Upload your comments at the bottom of the form.

Not sure what to write? Use this template to tell the FCC how your patrons depend on the library to connect to the internet. We encourage you to edit the template to add specifics (like the examples above!) that are important to your library and your community. These stories and examples are critical for the FCC to know about!

The post ALA files E-rate comments with the FCC about C2 appeared first on District Dispatch.

John Miedema: I Built my First Robot Today

planet code4lib - Tue, 2017-10-24 18:56

I built my first “robot” today, using snatchbot. If you visit my site and click on the avatar on the bottom right, you will be able to chat with it. That, or click here. It is very basic at present, but it is learning. Visit soon to see improvements.

Pages

Subscribe to code4lib aggregator