You are here

Feed aggregator

OCLC Dev Network: Save the Date

planet code4lib - Wed, 2017-01-18 16:00

Library developers be sure to mark your calendar for the first annual OCLC DevCONnect conference to be held at OCLC headquarters in Dublin, Ohio on May 8 and 9, 2017

OCLC Dev Network: WorldCat knowledge base API Maintenance

planet code4lib - Wed, 2017-01-18 16:00

To allow for system maintenance, WorldShare Collection Manager, the WorldCat knowledge base API, and related services will be unavailable from 9:00 pm to approximately 10:30 pm US EST today, 18 January 2017.

Tim Ribaric: Junk records in Discovery Platforms

planet code4lib - Wed, 2017-01-18 15:55

<rant>

I got a problem with a couple of junk records in our discovery platform. Slow down that bandwagon I need to jump on it.

read more

DPLA: Catch up with DPLA at ALA Midwinter in Atlanta

planet code4lib - Wed, 2017-01-18 15:00

The American Library Association’s 2017 Midwinter Meeting & Exhibits  takes place this weekend in Atlanta, Georgia and DPLA staffers are excited to connect with a fantastic community of librarians and share updates on the exciting projects underway and upcoming at DPLA in 2017.  Here’s your guide to when and where to catch up with DPLA’s staff and community members at ALA Midwinter.  If you’ll be following the conference from afar, connect with us on Twitter and follow the conference at #alamw17.

[S] = DPLA Staff Participating, [A] = DPLA Affiliate, [H] = DPLA Hub represented

FRIDAY, January 20, 2017

2:30pm – 4:00pm: Consortial Ebooks Interest Group Meeting (ASCLA) [S]
Location: Georgia World Congress Center, B216

A meeting to discuss issues and challenges for shared ebook collections by statewide, network and consortial groups

DPLA Staff participating: Michelle Bickert, Ebook Program Manager

SATURDAY, January 21, 2017

1:00pm – 2:30pm: New Initiatives and Milestones at the Digital Public Library of America [S]
Location: Georgia World Congress Center, B407

DPLA Executive Director Dan Cohen and Director for Content Emily Gore will highlight some of the recent new initiatives and milestones at the Digital Public Library of America, including the launch and progress of RightsStatements.org, an international standard for rights statements for libraries; the growth of DPLA’s network in states across the country; DPLA’s work in ebooks and education, including exhibitions and primary source sets; and the growth of DPLA’s staff. There will also be ample time for questions and interactions with DPLA staff on hand.

DPLA Staff presenting: Dan Cohen, Executive Director and Emily Gore, Director for Content

SUNDAY, January 22, 2017

10:30am – 11:30am: Bridging the Digital Divide with Open Ebooks [S]
Location: Georgia World Congress Center, A403

Open eBooks is an app containing thousands of popular and award-winning titles that are free for kids from in-need communities, without holds or late fees. Since First Lady Michelle Obama announced its launch in February 2016, Open eBooks has received overwhelming feedback from teachers, librarians and students praising the initiative. Attendees will learn more about how the program is being utilized in school classrooms and libraries, how public libraries can help, and best practices for implementation. Attendees will also have a chance to demo the app.

DPLA Staff presenting: Michelle Bickert, Ebook Program Manager

10:30am – 11:30am: Collaborating on Libraries Digital Futures: A Conversation with New York Public Library’s Anthony W. Marx [H]
Location: Georgia World Congress Center, A411/A412b

Access to knowledge has never been faster or more ubiquitous, making the role of public and research libraries ever more essential. Dr. Anthony Marx will discuss the collaborative vision for all libraries in this digital age and a path forward to ensure that the reading public — students, life-long learners, researchers and scholars — truly benefit from online access to information.

1:30 – 2:30 – Collude! Resist! Collaborate! ebook strategies for the modern revolutionary [A]
Location: Georgia World Congress Center, A411/A412b

Do you want to be a revolutionary? Join our digital posse in our prime directive to experiment and innovate with how libraries acquire, access and deliver econtent to the masses.

We work with forward thinking publishers, geeky and entrepreneurial ebook vendors, fabulous grant funders and out of this world authors to co-create models that deliver ebooks and other content in ways that position libraries as the go-to source for econtent by leveraging library expertise about the community to connect the public to their next great read. Our goal is to rock the reading ecosystem.Our outcomes so far: an ereader app for seamless access across multiple ebook platforms, statewide ebook programs that include simultaneous use to eliminate friction and attract new audiences to libraries, developing a national voice for libraries in the econtent marketplace and co-conspiring to build a better ebook experience. We need you.  Join us.

4:30 – 5:30: Digital Literacy Training Resources Update [A]
Location: Georgia World Congress Center, B405

Community members who lacking basic computer or Internet skills can’t find and apply for jobs online, learn more about health conditions, connect to their grandchildren via Facebook, or look up new recipes. Your public library helps them develop technology skills – whether it’s standing over a computer user’s shoulder to answer a question, or teaching a scheduled class. At this session, representatives from the Public Library Association, Anneal and leading public libraries will discuss some of the main technology training resources available to support your library’s efforts, including GCFLearnFree.org, Microsoft Digital Literacy, JobScout, and DigitalLearn.org. Interested in basic computer literacy classes in Spanish? Need resources for entry-level learners? Interested in getting help to develop your own courses? Join this session to learn from experts and share your ideas and experiences with other public library staff. You will also learn how PLA’s DigitalLearn.org can be personalized for your library to help reduce one-on-one time with learners; provide access to a range of quality training and the ability for libraries to create their own; report out data on who and what you are teaching; and allow community members to benefit from your library’s digital literacy training they are in the branch or at home.

DPLA Board Member presenting: Jamie Hollier, Co-CEO, Anneal/Commerce Kitchen

In the Library, With the Lead Pipe: Social Media at The College of New Jersey Library

planet code4lib - Wed, 2017-01-18 14:00

In Brief: This article discusses how and why The College of New Jersey (TCNJ) Library decided not to develop library-specific social media channels and why other academic institutions should consider a similar approach. For many years, most literature on social media was how-to based (Jacobson, 2011); as a result, when academic librarians discuss engaging students, they start with a presumptive answer of “social media” and then work backwards to the how and why. In contrast, TCNJ’s Library addressed the question: “Should the Library actively develop its social media presence?” We first looked at the social media channels already supported by the college, and considered whether our audience was already being reached by other well-maintained sources. Second, we looked at what students were already posting on social media about TCNJ’s Library, both publicly and anonymously. Finally, we surveyed the students to better understand their social media usage, concerns, and desired method of interaction with the Library. Considering all of these factors, the Library decided not to develop or maintain its own social media channels.

By Amanda Cowell

Introduction

This paper discusses how and why The College of New Jersey (TCNJ) Library decided not to develop library-specific social media channels. For many years, most literature on social media was “how-to based” (Jacobson, 2011); as a result, when academic librarians discuss engaging students, they start with a presumptive answer of “social media” and then work backwards to the how and why. The understanding is that since students are on social media, librarians should be too.

In 2014, TCNJ Library had an abandoned Twitter account and was absent from all other social media outlets. At that time, the Library Steering Committee (LSC) issued a charge to the Library Web Committee that stated “TCNJ Library does not have an official policy for participating in social media. The LSC has received a suggestion to consider the nature and extent of the Library’s online presence in social media venues.”  The Library governance process requires a three step process for any charge to ensure that there are opportunities for formal testimony and open comment from affected individuals. The steps are:

  1. Identifying and reporting the problem. If necessary, the Library Steering Committee (LSC) will prepare a charge identifying the issue for the appropriate committee.
  2. Preparing a preliminary recommendation.
  3. Making a final recommendation.

Committees are expected to be proactive in inviting stakeholder groups to provide testimony at both steps # 2 and #3 of the process. This type of feedback is usually gathered through open forums in the Library, but attendance by undergraduates at these forums tends to be very low. Instead, a survey was created to better understand undergraduate preferences and potential levels of engagement.

Literature Review

For the last decade, libraries have investigated how to best make use of social media for everything from marketing and outreach to information literacy and instruction. In a 2006 article published in Library Journal, Beth Evans of Brooklyn College declared that the best way to reach students at her institution was Myspace (Evans, 2006). The same year Brian Matthews of Georgia Tech declared “Among college students, Facebook is king” (Matthews, 2006).

In 2010, Andy Burkhardt published guidelines for a library’s social media presence. In the article, he points out that when libraries are planning to use social media, or any technology they should first ask “Why are we doing this, and what do we hope to gain from it?” (Burkhardt, 2010).  While he offers no specific examples of what a library’s social media goal might be, he discusses its use as a tool for marketing new products or initiatives. He further explains that libraries should devise concrete goals for social media. One potential goal he cites is “after one year we will have 100 fans” (Burkhardt, 2010).

In her 2013 book entitled The Librarian’s Nitty-Gritty Guide to Social Media, Laura Solomon defined specific goals for social media usage. She recommends setting tangible goals such as “more people at events and programs” and “new knowledge about your patrons and how they view the library” (Salomon, 2013). Laura Solomon also reminds users that to accomplish the goal of getting more people at events through social media, it is not enough to just announce the event as a library would in a newsletter. Effective social media requires a lot of time, effort and planning in order to build connections and community (Salomon, 2013).

Following Burkhardt’s model, Montana State University (MSU) Library created guidelines for building community and developing engagement online with social media. Between April 2012 and August 2013, they were able to nearly double their Twitter followers, growing from 514 to 937. As of November 26th, 2016, their Twitter account has nearly 7,000 followers. While this is a massive increase, the student population at MSU in 2015 was 15,688 and the MSU Library indicates that only about 28% of its followers are students (Young & Rossmann, 2015).

Social Media at The College of New Jersey Library

In order to better understand the Library’s social media needs, the first step was to examine the social media policy of the College as a whole. The College publishes guidelines for creating a social media presence. TCNJ’s Office of Communications, Marketing, and Brand Management ask that the following questions be answered before starting any new social media channel on campus.

  1. What do you hope to achieve?
  2. Who is your audience?
  3. What channels would reach them most effectively?
  4. Do you have the resources and commitment to run these channels well?
  5. Are other related departments already doing something similar?
  6. Do you need multiple channels?  Would few, stronger channels be better?

The Library Web Committee first considered the question: “Who is our audience?”  To find the answer, the committee looked at TCNJ Library’s mission statement: “The College of New Jersey Library, in support of the College’s mission, provides high-quality information resources, expertise and a learning environment that enhances the search for knowledge and understanding. The Library serves as an intellectual, cultural and social center for the College, empowering TCNJ community members to become self-directed, lifelong learners and responsible citizens” (http://library.tcnj.edu). Based on the mission statement, the Library’s audience is the entire TCNJ community.

The committee next looked at the question: “Are other related departments already doing something similar?” The committee found that the College was, and is, aggressively pursuing an active social media presence. For example, TCNJ created a page for use by faculty, staff, students and alumni called TCNJ Today. TCNJ Today is designed to share campus news, and pulls data from all of TCNJ’s social media accounts. TCNJ also has many official social media channels designed to reach the entire campus, including a Facebook page, Twitter account, YouTube channel, Instagram account, etc. All of these channels are already designed to reach the Library’s audience and the Office of Communications encourages individual departments to share their information on these already established channels. Many TCNJ schools and departments also have their own social media channels, targeting specific groups.

The next question the Library Web Committee evaluated was whether the Library had the resources to create and maintain social media channels of its own. Maintaining social media channels requires large investments in time and maintenance. Social media managers are frequently reminded of the Pareto principle, otherwise known as the 80/20 rule (Matei & Bruno, 2015). This rule states that only 20% of social media content should be about the brand – in this case the Library – and the other 80% should be other content that is interesting to and shareable by users. The type of posting required to keep a social media site relevant and interesting requires a significant amount of personnel hours and expertise, particularly in finding non-promotional material. As the library had no additional funding to hire new personnel, social media content would need to be generated by existing librarians and staff. This would require shifting responsibilities and would take librarians and library staff away from other ongoing and proposed projects.

Around the same time, other departments on campus began encouraging the Library to develop an institutional repository and digital archive. While these projects were not related to social media, limited resources and manpower made it impossible to continue with both of these efforts simultaneously. It was likely that these two projects would require the same librarians and staff to be successful. There was a great need to preserve and archive student and faculty research and material was being lost. Additionally, the demand for digital materials related to college history was growing as the need for fundraising on campus increased.

Finally, the Library looked at the most important question which asked: “What do we hope to achieve?” This came last because it was the least clear. TCNJ’s Library currently does not do any library programming throughout the year. While there are some events held inside the Library, the Library does not organize or sponsor them financially. Any promotion that the Library did was to advertise services and resources and to develop relationships with the TCNJ community.

The Committee also recognized that the Library markets itself and supports its community through building and maintaining strong relationships. The Library prioritizes face-to-face interactions to build relationships with students and faculty. An information literacy course is taught every semester by a librarian and is required of all incoming students. Subject liaisons teach dozens of library sessions throughout the semester. The liaisons also work individually with faculty and staff members to market our services by building relationships. In 2015 alone, librarians answered 5,600 reference and informational questions. Librarians also serve on college-wide committees and planning councils, insuring that no matter what is being discussed, the Library is always part of the conversation.

Research Methods

Faculty and staff opinions were gathered during open forums in the Library as part of the governance process. Open forums are held during open campus meeting times and allow interested faculty, staff and students to weigh in on proposed policy changes. Anonymous questions are gathered through a web form before the forum to frame the discussion. Participants are encouraged to give feedback at the forums or to provide it anonymously through the webform afterwards. This data is compiled and used during the decision making process, however due to Institutional Review Board (IRB) constraints, those comments cannot be shared publicly.

Student attendance at the open forums was predictably low so the Library decided to conduct a survey to assess specific undergraduate social media behavior. 86 undergraduate students were surveyed at TCNJ in the spring 2016 semester. Researchers obtained permission from professors to distribute surveys in randomly selected courses during regular scheduled meeting times. Students were then given the option to spend the first five minutes of class either completing the survey or working on other course work. All survey participation was optional, and the college’s IRB reviewed and approved the survey. The survey used the following definition: “Social media refers to any commercial product through which users create online communities to share information, ideas, personal messages, and other content. Examples include, but are not limited to Facebook, Instagram, Yik Yak, Snapchat and Twitter.”

The survey addressed the following research questions:

  1. For what purposes do students use social media applications?
  2. What are the students’ privacy concerns when using social media both personally and professionally?
  3. What are the students’ expectations for social media interactions?
  4. Is social media the best way to reach our audience?
Findings

The survey was designed to determine why TCNJ students use social media. Results showed that, while 97% of students utilized social media for personal use, the numbers were lower for professional, academic and college-related activities. 78% of students surveyed indicated that they used social media for academic course work, 70% used social media for professional purposes and 60% indicated that they used social media to interact with non-academic areas of the college such as housing or dining services.

In addition to assessing students’ social media habits, the survey also asked about their privacy concerns and how often they post publicly or anonymously on various topics. A Likert scale ranging from “very comfortable” to “not comfortable at all” was used to assess how willing students were to make public posts on various topics.

***Images are currently unavailable on our site. We are working to fix this issue as quickly as we can. We apologize for the disruption!***

In order to understand what type of communication the students expected to receive, we used a Likert scale asking students to rate how likely they were to expect a response to different types of feedback. 46.5% of students surveyed stated that they agreed or strongly agreed with this statement “When I post a comment or complaint about a customer service experience to their social media channels, I hope to get a response.”  Fewer, only approximately 37%, said that they agreed or strongly agreed with statements asking if they expected to receive a comment when posting to their own page, a third party or in an anonymous forum.

While, approximately 78% of students said that they use social media for academic course work, only about 50% were comfortable or very comfortable having professional or academic discussions on social media. Furthermore, students expressed less comfort using social media for activities related to course work, grades and teacher reviews. Therefore, while it was apparent that students were using these channels regularly, in view of the expressed discomfort with social media for academic and professional purposes, the Library determined that student needs could best be met through face-to-face interactions and other channels.

What Patrons Were Already Doing

If the Library was not going to create our own social media channels, the question then became how could existing channels best be used to reach the Library’s users? To do this, it was important to identify where TCNJ community members were online and more importantly where they wanted to find the Library online.

In order to find where TCNJ faculty, staff and students were and had been online, the committee did targeted searches of various social media accounts looking for posts mentioning both the words, TCNJ and Library. All of the posts shown in this paper were found in 2016, anywhere between one year and four years after they were posted. While too much time had passed to effectively respond to these individual posts, they were evaluated to better understand how the Library could respond to these types of postings in the future.

The posts reviewed were gathered from a variety of sources, including student-run and personal Twitter and Facebook accounts. While some of the posts addressed real or serious problems in the Library, others did not. It also was not always clear if the user expected a response. These posts could generally be categorized as related to facilities issues, noise and other student behavior complaints, and directional questions. Some examples of the posts are shown below.

***Images are currently unavailable on our site. We are working to fix this issue as quickly as we can. We apologize for the disruption!***

Currently, the Library provides several ways for students to report problems or request information. There is a Send-A-Suggestion form located on the Library website that allows users to send in comments or questions about the library. There is also a chat box, email form and SMS number on the Library website as part of our Ask-A-Librarian service. All of these methods allow the user to either comment anonymously or include their name. Issues raised on social media are already handled through these other channels. Students are also encouraged to come to the reference, circulation or IT help desk to ask questions and to report problems.

Conclusions

After evaluating the student survey, faculty feedback and existing social media interactions, the Library Web Committee concluded that there was not a sufficient need to justify creating Library-specific social media channels. Like most libraries, TCNJ Library has limited time and resources to accomplish its goals while meeting the needs of all of its patrons.

While the Library strives to keep up with ever changing technology, decisions need to be made that best meet the needs of the majority of the TCNJ community. For now, developing library-specific social media channels has been put aside so the Library’s efforts can stay focused on moving forward in other directions. Projects that are more highly demanded on campus, such as the Library’s institutional repository and digital archive need to take priority.

Despite the decision not to create library-specific social media channels, social media is a platform that cannot be ignored. The Library Web Committee determined that the Library can reach a wider audience through already established channels, without spending time and effort to develop and maintain Library-specific channels. The Committee continues to develop better relationships with TCNJ campus social media coordinators to ensure that news items get shared on the well-maintained channels that already exist. When trying to reach the campus community, librarians and library staff are encouraged to send newsworthy information to TCNJ’s Office of Communication, Marketing and Brand Management. This information will then be shared through social media channels already created and maintained by the College. In order to do this, the Library created a social media policy to encourage and to set guidelines for these types of communications. The policy is currently under review by Library administration.

Acknowledgements

Thank you to my reviewers Bethany Messersmith and Leah White and publishing editor Sofia Leung for their time and assistance with this project. Also, thank you to The College of New Jersey Library Web Committee for believing in me when I said it would be okay to not do something, even if it seemed as though everyone else was doing it.

References
  1. Burkhardt, A. (2010). “Social media: A guide for college and university libraries.” College & research libraries news, 71(1), 10-24.
  2. Constine, Josh (2016). “Yik Yak’s CTO drops out as the hyped anonymous app stagnates.” Tech Crunch. Retrieved from https://techcrunch.com/2016/04/06/yik-yuck/
  3. Digital Trends Staff (2016). “The history of social networking.” Digital Trends. Retrieved from http://www.digitaltrends.com/features/the-history-of-social-networking/
  4. Evans, B. (2006). “Your space or MySpace?” Library Journal, 37, 8-13
  5. Jacobson, T. B. (2011). “Facebook as a library tool: Perceived vs. actual use.” College & Research Libraries, 72(1), 79-90..
  6. Matei, S. A., & Bruno, R. J. (2015). “Pareto’s 80/20 law and social differentiation: A social entropy perspective.” Public Relations Review, 41(2), 178-186.
  7. Mathews, B. S. (2006). “Do you Facebook? Networking with students online.” College & Research Libraries News, 67(5), 306-307.
  8. Solomon, Laura. (2013). The Librarian’s Nitty-Gritty Guide to Social Media. Chicago, IL: ALA Editions
  9. Young, S. W., & Rossmann, D. (2015). “Building library community through social media.” Information Technology and Libraries, 34(1), 20-37.

Open Knowledge Foundation: 10 Critical Insights for Democratising the Data Revolution

planet code4lib - Wed, 2017-01-18 13:00

This week marks the launch of the first-ever UN World Data Forum, aimed at bringing together data experts and sustainable development leaders. Danny Lämmerhirt shares findings from a new research series on citizen-generated data, how it can be used to monitor and drive change for sustainable development, and why this matters for civil society.

Image credit: Aotora (CC BY)

With the advent of the 2030 Agenda for Sustainable Development and evaluation of progress around the Sustainable Development Goals (SDGs), actions around sustainability have increasingly centred around data collection, monitoring, and key indicators. The United Nations called for a data revolution – tapping into the vast trove of existing and emerging data sources – in order to prevent the marginalised and most vulnerable from being hidden behind national average numbers. 

This is a major step forward to promote concerted efforts around sustainability on an international stage. It acknowledges the role of information to change the way we live. But it leaves the questions open on how nation-wide monitoring can be translated into local action. How can the data revolution drive progress around sustainability? Will it foreground the issues that matter to the most vulnerable and marginalised?

The role of data for sustainable development

Data is not only a mere camera to look unto the world. By determining what is measured and how, data writes a certain story–and leaves out many others that could be written. Citizens and civil society increasingly recognise the value that data holds in tackling the issues affecting our lives – whether they are collecting evidence on oil spills in the Gulf of Mexico, running surveys to understand the satisfaction of local communities with health facilities, or challenging existing statistics about politicised topics.

“Citizens and civil society increasingly recognise the value that data holds to tackle the issues affecting our lives…”

These projects prove the need for, what Jonathan Gray calls, a democratised data revolution – enabling citizens to ‘read’ and understand governance issues, providing them with evidence to engage with politics, or sparking their imagination to design and implement a solution to a problem.

Research series on Citizen-Generated Data can be found here.

This blogpost seeks to broaden our imagination of the role of data for sustainable development and provoke thinking on how to democratise the data revolution. Open Knowledge International teamed up with the DataShift to understand how citizens and civil society can create their own data to foreground the problems that matter most to them, and to directly monitor, demand or drive change on issues affecting them.

The series discusses three topics: Our first research piece sheds light on the incentives to produce citizen-generated data. The second research piece dives into the question how citizens generate data to inform decision-making and drive sustainability. If and how citizen data can be linked to the Sustainable Development Goals was subject of research piece three.

What follows is a list of ten provocations for a sustainability agenda that reflects the needs of civil society inspired by our research.

10 Critical Insights for Democratising the Data Revolution 1. Data needs to resonate with human problems, perceptions, and knowledge, to drive sustainability

In order to progress sustainability, support decision-making, and trigger action, the problems facing different stakeholders need to be well understood. Stakeholders have different priorities, values, or responsibilities, and are affected differently by an issue. Some actors may lack the literacy, knowledge, time, or interest to engage with complicated data.

Civic initiatives are most successful if they understand these nuances, and translate their data into digestible, easily understandable, and relevant messages. We observed that citizen-generated data transports the issue into other people’s minds by using a common framing, a narrative, or a story that resonates with other people’s priorities. Some case studies showed that the SDGs can be a useful common framing for collaboration between citizens, civil society, government, and the private sector –  enabling buy-in from decision-makers, funding, or other support for the cause of a civic project.

2. We must be more sensitive to figuring out which types of information is most useful for different types of decision-making

Of paramount importance are questions around what type of information is most useful and for whom. National government bodies may be responsible for allocating money to regions for water-point construction. Responsibility for their maintenance may reside with local districts. While the national government needs comparative data across regions to allocate infrastructure investments, local districts need hyperlocal water-point information.

The main purpose of the SDGs is to advance progress on sustainable development, which first and foremost requires action. However, the main focus lies on how to monitor actions on a national scale. A democratised data revolution would be more sensitive towards the data needed to enhance action at different geographic scales – but particularly on a local scale, in the realm of the everyday, where sustainable actions eventually have to be enrolled. It would start with the question which collaborations and governance arrangements are required to tackle which kinds of problems, and what data is needed to do so.

3. A democratised data revolution understands the vast array of actions needed to drive sustainability.

Citizen-generated data can inform diverse types of human decision-making that go beyond monitoring. Besides agenda setting and the flagging of problems, citizen-generated data can inspire citizens to design their own solutions. It can also give citizens the literacy to ‘read’ and understand governance issues and thereby provide confidence to engage with politics. Sometimes data can be used to directly implement a solution to a problem.

Citizen-generated data can directly steer behaviour and enable better actions by giving stakeholders relevant information to enable actions. It can also help taking decisions, or rewarding certain actions as performance indicators do. The value of citizen-generated data is fairly broad and depends largely on the issue it is used for and the individuals, groups, organisations, and networks using it.

4. National Statistics Offices are important for national monitoring – but actual action towards sustainable development is born on the shoulders of strong collaborations between governments, civil society, and others.

Given the holistic nature of sustainable development, achieving the SDGs requires concerted efforts. Projects working with citizen-generated data are exemplary for cross-sectoral collaborations. They often bring together actors from government, the private sector, and civil society, all of which have very different interests in the same data. Different actors can value different aspects of the data; understanding how actors perceive this value is key to build multi-stakeholder partnerships.

The right degree of participation is essential to manage collaborations: Should citizens or policy-makers be engaged in the definition of data? How does this affect the credibility of data and buy-in? Who should be engaged in the dissemination of findings? Does the project benefit collaborate with a ‘knowledge broker’ like an experienced advocacy group, a university, or a newspaper?

5. A democratised data revolution has a user-centric vision of data quality.

The SDGs argue that data needs to be accurate, reliable, disaggregated, and timely to be useable for SDG monitoring. Often citizen-generated data is refuted as lacking representativity and accuracy, or as not meeting other features of ‘good quality data’. This is only partly true: In practice, data is of ‘good quality’ if it is fit for purpose. If data shall drive action on the ground it often needs to match with the action at hand. Long-term monitoring needs reliable, accurate, and standardised data.

Setting the agenda for a formerly unknown issue may require a citizen-generated data project to build trust, and to ensure credibility. Some projects might need to produce highly disaggregated data, other tasks only require rough indications of trends. Successful citizen-generated data projects embrace these nuances instead of refuting data. It does not mean that methodological rigour is irrelevant for citizen-generated data. The opposite is the case. Data should be thoughtfully designed in order to address specific tasks and to respond to more ‘human criteria’ of data quality like issues of trust. What matters is that citizens collect data in a systematic way that demonstrates how the data was collected, and processed in the first place.

6. A democratised data revolution embraces the value of ‘soft data’.

Different types of data have different usefulness. The term ‘data’ itself seems to suggest a very narrow notion of numbers, figures, and statistics. Actors involved in policy-making seem to prefer ‘hard’ evidence (e.g. quantitative data from researchers and government agencies) over ‘soft’ evidence (e.g. narrative texts, personal perceptions, or autobiographical material). The soft evidence is often neglected,  in favour of numbers which become a main argumentative device. Debates around the data revolution or sustainable development data should not gloss over the fact that narrative texts, individual perceptions, interviews, images or video footage all count as ‘data’ – which might be best understood broadly as a building block of human knowledge, decision-making, and action.

In observed case studies, we found that soft data residing in written reports sparked investigations, guided civil society to spot the facts in official government documents and flag issues. In other cases, personal perceptions gave contextual information on why high-level policies succeeded or failed. A fixation on numbers is likely to hamper the quality of policy-making. Soft evidence, such as personal qualitative stories (including from marginalised groups), should, therefore, be more readily considered in policy decisions.

7. Passive monitoring, analysing, and visualising will not help to tackle sustainable development – targeted engagement strategies are needed.

Targeted engagement strategies do not end with publishing reports or visualising data online. Instead, the engagement methods need to be suitable for individual stakeholders and often involve public hearings, educational meetings with local decision-makers, on-site visits with decision makers, hackathons, or others. The engagement strategies should fit with the desired change, be it to change policies, perceptions, or individual behaviour.

8. Citizen-generated data provides contextual information around an SDG indicator and can prevent silo thinking

Given, that a fair amount of citizen-generated data projects is grounded in sub-national contexts, it can provide a baseline to understand (the absence of) progress around the SDGs. For instance, citizen-generated data projects working on disaster risk reduction may conduct hazard risk mapping, indicating local vulnerabilities to environmental disaster. The maps can be a baseline used to understand the outcomes of natural disasters. In other cases, citizens can collect data that is relevant across SDGs. In this way, citizen-generated data can contribute to preventing silo-thinking. For instance, data on land acquisition may be usable to understand gender-disaggregated land ownership, as well as the amount of arable land.

9. A democratised data revolution needs trust and credibility if it is to leverage the voices of the marginalised

Emerging data sources and practices put into question the monopoly of established data producers and routines. Big data, small data, citizen science, or social media are all examples of a reconfiguration how data becomes trustworthy information. Citizen-generated data can be leveraged to build trust with different communities, but a lack of official recognition or credibility can hamper uptake. What is needed is a culture of openness among governments, high-level decision-makers, and others towards emerging data sources that are not administered by established data producers.

10. The politics of data are crucial – a democratised data revolution acknowledges that some data does not represent sterile facts, but matters of concern.

The very process of creating data is born out of priorities over what to measure and how. The same applies to citizen-generated data which is intended to be a direct reflection of citizen’s issues. Sometimes citizens might want to highlight the magnitude of a problem and scale their data production across local regions. In order to scale citizen-generated data projects, collective data standards can be developed to render citizen-generated data comparable – sometimes at the expense of evening out local differences between data. A democratised data revolution would be more attuned to the political processes behind standardisation and would embrace the fact that sustainable development will not solely be built on one-size-fits-all solutions.

You can find all three reports on Citizen-Generated Data on the DataShift website.

HangingTogether: Can we measure and demonstrate the value of sharing collections?

planet code4lib - Wed, 2017-01-18 08:30

“Value of Resource Sharing” discussion word cloud

In the spring of 2016, Brandeis University’s interim university librarian Matthew Sheehy and I engaged in some informal back-and-forth about the kinds of questions that can be answered about collection-sharing using data. We share an overlapping interest in finding ways in which data can be used to demonstrate or even measure the impact that library resource sharing operations have on the ability of a parent organization to fulfill its mission.

Our correspondence progressed to a point where we wanted to include more voices in the discussion. The idea was to get a number of stakeholders with various perspectives in a room to discuss the value of resource sharing. The 2016 American Library Association Annual Conference in Orlando, Florida, provided an ideal opportunity. Matthew and I took turns inviting colleagues whom we a) knew would be in Orlando, and b) thought would bring something essential to the conversation. We then settled on this starter set of four questions to kick off the meeting:

1) What is the value of resource sharing?
2) To whom?
3) Can it be measured?
4) How do we expect it will change?

The group gathered for 90 minutes at the Rosen Hotel in Orlando on Friday, June 24, 2016, which gave us just enough time to eat lunch, introduce ourselves, and say a bit about our top-of-mind issues. In spite of our limited amount of time together, we managed to produce what I think is a fairly detailed snapshot of the environment in which we work and laid the groundwork for a second discussion that will take place this coming Saturday at ALA Midwinter in Atlanta.

I thought our first pass at describing that environment might be worth sharing more broadly.

Our comments and questions can be sorted into seven main categories, or themes (which are listed below, along with what might arguably be considered the group’s top takeaway for each category).

We also noted some essential facts about metrics, came up with an inventory of ways that collection sharing is valued by three different sets of constituents, and made a few predictions about where the current roil of institutional, technological and economic forces might take us in the near future.

First, our seven categories of comments and questions, and our top takeaway for each:

1. Seeking to understand the current collection-sharing environment.

Top takeaway: We noted that ease of access tends to determine whether research materials get used.

2. Seeking to improve the current collection-sharing environment.

Top takeaway: Several participants noted that we don’t always know exactly what it is that our patrons value, and that it may actually vary in different situations; sometimes it’s speed, while at other times a longer use period or the ability to renew may weigh more heavily on a patron’s wish list. A better understanding of patron desires, expectations, and overall information-gathering behaviors would definitely be a big step toward improving our current sharing environment.

3. Calling some previously-shared values and assumptions into question.

Top takeaway: Once, and for the longest time, lending more than you borrow was a source of pride, but now some net lenders are beginning to think about how finite their resources have become and to wonder if their accustomed role in the sharing community is still sustainable (or desirable) in today’s environment.

4. Seeking to understand the effect of the changing environment on collection sharing.

Top takeaway: These days there are so many consortia to join and opportunities for collaborating that it is becoming essential for collection-sharing managers to carefully assess the value of each current and potential partnership and to limit themselves to participating only in those that are essential.

5. Seeking to understand the effect of the changing environment on the library overall (and, in some cases, the parent organization).

Top takeaway: We realize that, as many libraries manage down redundant print collections, fewer manifestations will have to serve as both preservation and service copies; we’ve seen that this concept works reasonably well with journals but wonder, as we move into shared management of print monographs, if it is a sustainable model for other formats.

6. Seeking to understand the effect of the changing environment on patron behaviors.

Top takeaway: We wondered what effects years of an “access over ownership” library collection-building strategy have had on research outputs at various levels and in the various disciplines.

7. Seeking to understand the effect of the changing environment on content and service providers.

Top takeaway: The increased popularity of libraries purchasing articles on demand for their patrons will lead to changes in the business models of publishers.

Next, we identified six different essential facts about resource sharing value metrics, including specific measurements that can and probably should be made, plus key ideas about why these metrics matter, and to whom.

Top takeaway regarding metrics: We know that we won’t be effective in improving the institution if we are using the wrong metrics to measure success.

Since “value” was the name of the game, we identified ten specific ways that collection sharing is valued, from three different points of view:

1. The Patron
2. ILL Staff
3. The Institution

Top takeaway regarding value: Value to the patron ultimately determines the value to the institution.

Finally, without specifically setting out to make any grand predictions, during the course of our meal four different predictions emerged – all of them having to do with resource sharing’s growing impact on collection management activities, and how that impact will affect ILL units, acquisitions, and content providers.

Top takeaway regarding our predictions: We expect resource sharing to change the course of library operations by becoming the steering mechanism for collection development and acquisitions, through purchase-on-demand for articles and patron-driven-acquisitions for loans, and also by creating effective delivery mechanisms to empower collective collection development across consortial relationships.

A fifth and final prediction came out of this initial session: that we would definitely seek to meet again in the not-so-distant future. When we gather once more this weekend in Atlanta, we’ll focus on trying to nail down ways in which we can actually measure and demonstrate the value of collection sharing to its various constituents. None of what I’ve reported above can be considered big news, but we hope that it will serve as solid ground upon which to stand for the next phase of our conversation.

Again, a detailed account of who said what that day can be found here.

About Dennis Massie

Dennis is a program officer for OCLC Research, concentrating on studies and activities involving the sharing of collections.

Mail | Web | More Posts (7)

District Dispatch: 7 sessions + 1 party = Washington Office at Midwinter 2017

planet code4lib - Wed, 2017-01-18 00:15

What do Google, Inc., the Harvard School of Education, Georgia Department of Public Health and the Lakeview (Okl.) Elementary School have in common?

If you guessed “the Washington Office lineup at Midwinter 2017,” you were right! Our sessions cover topics ranging from computer science education to professional development to family engagement in libraries. For your convenience, we’ve compiled a list of events below. Note: if you plan to participate one of the CS sessions on Friday, please pre-register to ensure there will be enough laptops – and Google goodies! – for everyone.

Join us for a final conference farewell to Washington Office Executive Director Emily Sheketoff, who retires in May

Finally, please join the ALA Washington Office in celebrating Emily Sheketoff’s 17 years of lobbying chutzpah with and for libraries in a reception on Sunday, January 22, from 11 a.m. to 12:30 p.m. at the Omni Hotel at CNN Center (International Ballroom A-C). We’ll provide light fare and beverages, and invite you to share well wishes at Emily’s last ALA conference as she retires as executive director of the Washington Office. At 11:45 a.m., people will have the opportunity to share their recollections about Emily and libraries. Please RSVP to Ros Bradley (rbradley@alawash.org) as soon as possible.

 

WASHINGTON OFFICE SESSIONS AT ALA MIDWINTER MEETING 2017

Libraries Ready To Code: Google’s CS First program & other free resources
Friday, 1:00-2:20 and 2:40-4:00 p.m. (offered twice); pre-registration recommended

  • Tina Ornduff, Program Manager, Engineering Education, Google
  • Hai Hong, Program Manager, Google K-12 Education Outreach, Google Education, Google Inc.
  • Marijke Visser, ALA OITP

Improving Federal and State Policy to Support Family Engagement in Libraries (joint program with PLA)
Saturday, 10:30-11:30 a.m.

  • Scott Allen, Deputy Director, American Library Association, Public Library Association
  • Larra Clark, Deputy Director, Public Library Association/ALA Office for Information Technology Policy
  • Margaret Caspe Klein, PhD, Senior Research Analyst, Harvard Family Research Project, Harvard Graduate School of Education
  • Johanna Pringle, Title V Senior Manager, Georgia Department of Public Health

Are libraries ready to code?
Saturday, 3:00-4:00 p.m.

  • Linda Braun, Consultant
  • Kelsey Gourd, Teacher Librarian, Lakeview Elementary School
  • Marijke Visser, Associate Director Office for Information Technology Policy

‘I wish someone had taught me to ___’: An interactive session for children and youth librarians
Sunday and Monday, 10:30 a.m. to noon (offered twice)

  • Dr. Mega Subramaniam, Associate Professor, College of Information Studies, University of Maryland
  • Dr. Tammy Clegg, Assistant Professor, College of Education, University of Maryland.
  • Amanda Waugh, Doctoral Candidate, College of Information Studies, University of Maryland

Google Fiber libraries:Infrastructure and digital inclusion
Sunday 1:00-2:30 p.m.

  • Fabiola Charles Stokes (moderator): Google Fiber Community Impact Manager
  • Frank Blair: Director of Technology and Operations, Charlotte Mecklenburg Library
  • Emma Hernandez: Digital Inclusion Fellow, San Antonio Public Library
  • Shauna Edson: Digital Inclusion Fellow, Salt Lake City Public Library
  • Daynise Joseph: Community Impact Manager, Nashville Public Library

The post 7 sessions + 1 party = Washington Office at Midwinter 2017 appeared first on District Dispatch.

OCLC Dev Network: Change to Requesting Sandbox WSKeys

planet code4lib - Tue, 2017-01-17 22:30

How Sandbox WSkeys are requested has changed. Please see the FAQ.

Brown University Library Digital Technologies Projects: Storing Embargo Data in Fedora

planet code4lib - Tue, 2017-01-17 20:53

We have been storing dissertations in the BDR for a while. Students have the option to embargo their dissertations, and in that case we set the access rights so that the dissertation documents are only accessible to the Brown community (although the metadata is still accessible to everyone). The problem is that embargoes can be extended upon request, so we really needed to store the embargo extension information.

We wanted to use a common, widely-used vocabulary for describing the embargoes, instead of using our own terms.  We investigated some options, including talking with Hydra developers on Slack, and emailing the PCDM community. Eventually, we opened a PCDM issue to address the question of embargoes in PCDM. As part of the discussion and work from that issue, we created a shared document that lists many vocabularies that describe rights, access rights, embargoes, … Eventually, the consensus in the PCDM community was to recommend the PSO and FaBiO ontologies (part of the SPAR Ontologies suite), and a wiki page was created with this information.

At Brown, we’re using the “Slightly more complex” option on that wiki page. It looks like this:

<pcdm:Object> pso:withStatus pso:embargoed .

<pcdm:Object> fabio:hasEmbargoDate “2018-11-27T00:00:01Z”^^xsd:dateTime .

In our repository, we’re not on Fedora 4 or PCDM, so we just put statements like these in the RELS-EXT datastream of our Fedora 3 instance. It looks like this:

<rdf:RDF xmlns:fabio=“http://purl.org/spar/fabio/#” xmlns:pso=“http://purl.org/spar/pso/#” xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”>
<rdf:Description rdf:about=“info:fedora/test:230789”>
<pso:withStatus rdf:resource=“http://purl.org/spar/pso/#embargoed”></pso:withStatus>
<fabio:hasEmbargoDate>2018-11-27T00:00:01Z</fabio:hasEmbargoDate>
<fabio:hasEmbargoDate>2020-11-27T00:00:01Z</fabio:hasEmbargoDate>
</rdf:Description>
</rdf:RDF>

In the future, we may want to track various statuses for an item (eg. dataset) over its lifetime. In that case, we may move toward more complex PSO metadata that describes various states that the item has been in.

District Dispatch: Developing librarian resources to enhance patrons’ digital literacy

planet code4lib - Tue, 2017-01-17 19:43

Guest post by Jessica Vitak

Applying for jobs, social services, or food stamps. Obtaining health care. Filing taxes. Each of these tasks requires digital skills to transmit sensitive and private information about one’s finances, health, and location in a safe and secure manner through the Web. However, many low-SES individuals face compounding problems: they must use the Web or other communication technologies to get access to important resources but they often lack both direct access to the technologies and the requisite knowledge and skills to successfully navigate them.

Digital literacy is an increasingly critical skill in modern society to ensure that sensitive personal information submitted through online channels is not compromised. That said, many Americans—and especially those in economically disadvantaged groups—lack the proper knowledge or training to safely and securely navigate the Internet.

This is where librarians enter. Public librarians serve a critical role in local communities. When it comes to engaging with technology, they act as information intermediaries, assisting patrons in exchanging and disseminating personal information, translating technical information, and making information easier to use. Librarians wear many hats in their jobs; they must manage a wide range of information sources and develop a wide range of privacy and security skills to most effectively serve patrons. This makes it essential to identify the most important challenges they face when providing services to their communities and to develop resources that counter those challenges.

Drs. Jessica Vitak and Mega Subramaniam, faculty in the College of Information Studies at the University of Maryland, recently received an IMLS National Leadership Grant to address this research problem. During the next three years, they will be working with key stakeholders, including public librarians and economically disadvantaged and immigrant families—as well as their children—to identify the areas of greatest need around digital literacy and privacy/security, to develop sustainable resources for each group, and to implement these resources in library settings. The project strives to build skills and abilities in the library workforce and provide ongoing and evolving resources that adapt to stakeholders’ needs over time.

Interested in contributing to this project?
During the ALA Midwinter Conference in Atlanta, Drs. Vitak and Subramaniam are conducting focus groups with public librarians to discuss the most salient challenges they face around digital literacy when serving their constituents. If you’ll be at the conference, they would love to speak with you about their experiences. They have two times available on Sunday, January 22: 8:30-10:00am and 1:00-2:30pm, both at the Georgia World Convention Center. Snacks will be provided.

To participate, sign-up is required. If you are interested, please submit your information via this Google Form.

More information can be found on the project’s website, Safe Data | Safe Families. Questions can be directed to Dr. Vitak at jvitak@umd.edu.

Jessica Vitak (@jvitak) is an Assistant Professor at the University of Maryland’s College of Information Studies, studying different populations’ knowledge of, attitudes toward, and enactment of privacy strategies in online spaces. She is the Director of the Center for the Advanced Study of Communities and Information (CASCI).

Mega Subramaniam (@mmsubram) is an Associate Professor at the University of Maryland’s College of Information Studies, studying young adults’ use of libraries for their development of digital literacies and information practices. She led the development of this certificate, and serves as the certification Director.

The post Developing librarian resources to enhance patrons’ digital literacy appeared first on District Dispatch.

ZBW German National Library of Economics: Economists in Wikidata: Opportunities of Authority Linking

planet code4lib - Tue, 2017-01-17 15:53

Wikidata is a large database, which connects all of the roughly 300 Wikipedia projects. Besides interlinking all Wikipedia pages in different languages about a specific item – e.g., a person -, it also connects to more than 1000 different sources of authority information.

The linking is achieved by a „authority control“ class of Wikidata properties. The values of these properties are identifiers, which unambiguously identify the wikidata item in external, web-accessible databases. The property definitions includes an URI pattern (called „formatter URL“). When the identifier value is inserted into the URI pattern, the resulting URI can be used to look up the authoritiy entry. The resulting URI may point to a Linked Data resource - as it is the case with the GND ID property. This, on the one hand, provides a light-weight and robust mechanism to create links in the web of data. On the other hand, these links can be exploited by every application which is driven by one of the authorities to provide additional data: Links to Wikipedia pages in multiple languages, images, life data, nationality and affiliations of the according persons, and much more.

Wikidata item for the Indian Economist Bina Agarwal, visualized via the SQID browser

In 2014, a group of students under the guidance of Jakob Voß published a handbook on "Normdaten in Wikidata" (in German), describing the structures and the practical editing capabilities of the the standard Wikidata user interface. The experiment described here focuses on persons from the subject domain of economics. It uses the authority identifiers of the about 450,000 economists referenced by their GND ID as creators, contributors or subjects of books, articles and working papers in ZBW's economics search portal EconBiz. These GND IDs were obtained from a prototype of the upcoming EconBiz Research Dataset (EBDS). To 40,000 of these persons, or 8.7 %, a person in Wikidata is connected by GND. If we consider the frequent (more than 30 publications) and the very frequent (more than 150 publications) authors in EconBiz, the coverage increases significantly:

Economics-related Persons in EconBiz Number of publications total in Wikidata percentage Datasets: EBDS as of 2016-11-18; Wikidata as of 2016-11-07 (query, result) > 0 457,244 39,778 8.7 % > 30 18,008 3,232 17.9 % > 150 1,225 547 44.7 %

These are numbers "out of the box" - ready-made opportunities to link out from existing metadata in EconBiz and to enrich user interfaces with biographical data from Wikidata/Wikipedia, without any additional effort to improve the coverage on either the EconBiz or the Wikidata side. However: We can safely assume that many of the EconBiz authors, particularly of the high-frequency authors, and even more of the persons who are subject of publications, are "notable" according the Wikidata notablitiy guidelines. Probably, their items exist and are just missing the according GND property.

To check this assumption, we take a closer look to the Wikidata persons which have the occupation "economist" (most wikidata properties accept other wikidata items - instead of arbitrary strings - as values, which allows for exact queries and is indispensible in a multilingual environment).  Of these approximately 20,000 persons, less than 30 % have a GND ID property! Even if we restrict that to the 4,800 "internationally recognized economists" (which we define here as having Wikipedia pages in three or more different languages), almost half of them lack a GND ID property. When we compare that with the coverage by VIAF IDs, more than 50 % of all and 80 % the internationally recognized Wikidata economists are linked to VIAF (SPARQL Lab live query). Therefore, for a whole lot of the persons we have looked at here, we can take it for granted the person exists in Wikidata as well as in the GND, and the only reason for the lack of a GND ID is that nobody has added it to Wikidata yet.

As an aside: The information about the occupation of persons is to be taken as a very rough approximation: Some Wikidata persons were economists by education or at some point of their career, but are famous now for other reasons (examples include Vladimir Putin or the president of Liberia, Ellen Johnson Sirleaf). On the other hand, EconBiz authors known to Wikidata are often qualified not as economist, but as university teacher, politican, historican or sociologist. Nevertheless, their work was deemed relevant for the broad field of economics, and the conclusions drawn at the "economists" in Wikidata and GND will hold for them, too: There are lots of opportunities for linking already well defined items.

What can we gain?

The screenshot above demonstrates, that not only data about the person itself, her affiliations, awards received, and possibly many other details can be obtained. The "Identifiers" box on the bottom right shows authoritiy entries. Besides the GND ID, which served as an entry point for us, there are links to VIAF and other national libraries' authorities, but also to non-library identifier systems like ISNI and ORCID. In total, Wikidata comprises more than 14 million authority links, more than 5 millions of these for persons.

When we take a closer look at the 40,000 EconBiz persons which we can look up by their GND ID in Wikidata, an astonishing variety of authorities is addressed from there: 343 different authorities are linked from the subset, ranging from "almost complete" (VIAF, Library of Congress Name Authority File) to - in the given context- quite exotic authorities of, e.g., Members of the Belgian Senate, chess players or Swedish Olympic Committee athletes. Some of these entries link to carefully crafted biographies, sometimes behind a paywall  (Notable Names Database, Oxford Dictionary of National Biography, Munzinger Archiv, Sächsische Biographie, Dizionario Biografico degli Italiani), or to free text resources (Project Gutenberg authors). Links to the world of museums and archives are also provided, from the Getty Union List of Artist Names to specific links into the British Museum or the Musée d'Orsay collections.

A particular use can be made of properties which express the prominence of the according persons: Nobel Prize IDs, for example, definitivly should be linked to according GND IDs (and indeed, they are). But also TED speakers or persons with an entry in the Munzinger Archive (a famous and long-established German biographical service) are assumed to have GND IDs. That opens a road to a very focused improvement of the data quality: A list of persons with that properties, restricted to the subject field (e.g., "occupation economist"), can be easily generated from Wikidata's SPARQL Query Service. In Wikidata, it is very easy to add the missing ID entries discovered during such cross-checks interactively. And if it turns out that an "very important" person from the field is missing from the GND at all, that is a all-the-more valuable opportunity to improve the data quality at the source.

How can we start improving?

As a prove of concept, and as a practical starting point, we have developed a micro-application for adding missing authority property values. It consists of two SPARQL Lab scripts: missing_property creates a list of Wikidata persons, which have a certain authority property (by default: TED speaker ID) and lacks another one (by default: GND ID). For each entry in the list, a link to an application is created, which looks up the name in the according authority file (by default: search_person, for a broad yet ranked full-text search of person names in GND). If we can identify the person in the GND list, we can copy its GND ID, return to the first one, click on the link to the Wikidata item of the person and add the property value manually through Wikidata's standard edit interface. (Wikidata is open and welcoming such contributions!) It takes effect within a few seconds - when we reload the missing_property list, the improved item should not show up any more.

Instead of identifying the most prominent economics-related persons in Wikidata, the other way works too: While most of the GND-identified persons are related to only one or twe works, as an according statistics show, few are related to a disproportionate amount of publications. Of the 1,200 persons related to more than 150 publications, less than 700 are missing links to Wikidata by their GND ID. By adding this property (for the vast majority of these persons, a Wikidata item should already exist), we could enrich, at a rough estimate, more than 100,000 person links in EconBiz publications. Another micro-application demonstrates, how the work could be organized: The list of EconBiz persons by descending publication count provides "SEARCH in Wikidata" links (functional on a custom endpoint): Each link triggers a query which looks up all name variants in GND and executes a search for these names in a full-text indexed Wikidata set, bringing up an according ranked list of suggestions (example with the GND ID of John H. Dunning). Again, the GND ID can be added - manually but straightforward - to an identified Wikidata item.

While we can not expect to reduce the quantitative gap between the 450,000 persons in EconBiz and the 40,000 of them linked to Wikidata significantly by such manual efforts, we surely can step-by-step improve for the most prominent persons. This empowers applications to show biographical background links to Wikipedia where our users expect them most probably. Other tools for creating authority links and more automated approaches will be covered in further blog posts. And the great thing about wikidata is: All efforts add up - while we are doing modest improvements in our field of interest, many others do the same, so Wikidata already features an impressive overall amont of authority links.

PS. All queries used in this analysis are published at GitHub. The public Wikidata endpoint cannot be used for research involving large datasets due to its limitations (in particular the 30 second timeout, the preclusion of the "service" clause for federated queries, and the lack of full-text search). Therefore, we’ve loaded the Wikidata dataset (along with others) into custom Apache Fuseki endpoints on a performant machine. Even there, a „power query“ like the one on the number of all authority links in Wikidata takes about 7 minutes. Therefore, we publish the according result files in the GitHub repository alongside with the queries.

Wikidata for Authorities Wikidata   Authority control   Linked data  

DPLA: Arielle Perry Joins DPLA as Program Assistant

planet code4lib - Tue, 2017-01-17 15:45

The Digital Public Library of America is pleased to welcome Arielle Perry as our new Program Assistant.

In this role, Perry will work closely with Executive Director Dan Cohen and DPLA staff to manage the efficient administration of various DPLA projects and initiatives, including DPLAfest. Perry will also serve as liaison to DPLA’s Board of Directors; facilitate communication between DPLA’s staff, partners, community, and the public; and provide administrative assistance to support the executive director, as well as other programs including the Community Reps and team meetings, trainings, and events held throughout the year.

“We are very excited to welcome Arielle to DPLA” said Executive Director Dan Cohen. “Arielle brings the skills and experience to play an essential role in the efficient administration of our organization as we embark on exciting projects in 2017.”

Perry has a diverse background in libraries, administration, and special events. Prior to DPLA, Arielle worked with the Friends of the University of Wisconsin—Madison Libraries as the Administrative Program Specialist to create special events and support collaborative educational programs. Before working with the Friends, Arielle held a number of positions with local and university libraries and worked on special event planning and implementation for community groups and educational organizations.  Arielle holds a master’s degree in Library and Information Studies from the University of Wisconsin-Madison, and a BA in History from Virginia Wesleyan College.

Welcome, Arielle!

LibUX: When Native Apps are Unethical

planet code4lib - Tue, 2017-01-17 13:10

This episode of Metric is very much a companion to last week’s post on consciously infusing the ethics of the organization into design decisions.

There is an opportunity for institutions that are positioned — either actively or by reputation — as intellectual and moral community cores (libraries) to exert greater if not just more obvious influence on the filters through which patrons access content. Critical Librarianship in the Design of Libraries

It’s been an upward battle for accessibility wresting a seat at the design table even though it’s core to the user experience. It’s now time to pull-up a chair for ethical considerations.

It’s been almost three years since I amended the UX Honeycomb with “ethical”.

Notes

The ethics of good design: a principle for the connected age by Aaron Weyenberg

Good design is ethical. The product places the user’s interest at the center of its purpose. Any effort to influence the user’s agency or behavior is in the spirit of their own positive wellbeing, and the wellbeing of those around them. Aaron Weyenberg

Listen and subscribe

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Music, or just plug our feed straight into your podcatcher of choice.

Open Knowledge Foundation: Structuring a Global Online Survey – A Question Engine for Open Data Surveys!

planet code4lib - Tue, 2017-01-17 10:00

The Global Open Data Index (GODI) is one of our core projects at Open Knowledge International. The index measures and benchmarks the openness of government data around the world. Brook Elgie shares a behind-the-scenes look at the technical design of how we gather the data for the Index through our extensive Open Data Survey and how other organisations can use this survey codebase for their own purposes.

The Global Open Data Index Survey is an annual survey of the state of government open data around the world. The survey asks a series of questions about the availability and quality of a set of key datasets. As well as providing a valuable snapshot of the state of open data around the world, it also promotes discussion and engagement between government and civil society organisations.

This year Open Knowledge International made changes to the methodology and structure of the survey, and it was an ideal opportunity to revisit the way questions are handled technically within the survey codebase. As well as the survey for the Global Open Data Index, the same codebase hosts surveys for ‘local’ sites, for example, an individual country, or city administration.

Previously, the questions presented for each dataset were a hard-coded feature of the survey codebase. These questions were inflexible and couldn’t be tailored to the specific needs of an individual site. So, while each local site could customise the datasets they were interested in surveying, they had to use our pre-defined question set and scoring mechanisms.

We also wanted to go beyond simple ‘yes/no’ question types. Our new methodology required a more nuanced approach and a greater variety of question types: multiple-choice, free text entry, Likert scales, etc.

Also important is the entry form itself. The survey can be complex but we wanted the process of completing it to be clear and as simple as possible. We wanted to improve the design and experience to guide people through the form and provide in-context help for each question.

Question Sets

The previous survey hard-coded the layout order of questions and their behaviour as part of the entry form. We wanted to abstract out these details from the codebase into the CMS, to make the entry form more flexible. So we needed a data structure to describe not just the questions, but their order within the entry form and their relationships with other questions, such as dependencies. So we came up with a schema, written in JSON. Take this simple set of yes/no questions:

  1. Do you like apples?
  2. Do you like RED apples? (initially disabled, enable if 1 is ‘Yes’)
  3. Have you eaten a red apple today? (initially disabled, enable if 2 is ‘Yes’)

We want to initially display questions 1, 2, and 3, but questions 2 and 3 should be disabled by default. They are enabled once certain conditions are met. Here is what the form looks like:

And this is the Question Set Schema that describes the relationships between the questions, and their position in the form:

Each question has a set of default properties, and optionally an ifProvider structure that defines conditional dependent features. Each time a change is made in the form, each question’s ifProvider should be checked to see if its properties need to be updated.

For example, question 2, apple_colour, is initially visible, but disabled. It has a dependency on the like_apples question (the ‘provider’). If the value of like_apples is Yes, apple_colour‘s properties will be updated to make it enabled.

React to the rescue

The form is becoming a fairly complex little web application, and we needed a front-end framework to help manage the interactions on the page. Quite early on we decided to use React, a ‘Javascript library for building user interfaces’ from Facebook.

React allows us to design simple components and compose them into a more complex UI. React encourages a one-way data flow; from a single source of truth, passed down into child components via properties. Following this principle helped identify the appropriate location in the component hierarchy for maintaining state; in the top level QuestionForm component.

Component’s hierarchy for the entry form:

  1. QuestionForm (red)
  2. QuestionField (orange)
  3. Sub-components: QuestionInstructions, QuestionHeader, and QuestionComments (green)

Changing values in the QuestionFields will update the state maintained in the QuestionForm, triggering a re-render of child components where necessary (all managed by React). This made it easy for one QuestionField to change its visible properties (visibility, enabled, etc) when the user changes the value of another field (as determined by our Question Set Schema).

You can see the code for the entry form React UI on Github.

Some other benefits of using React:

  • it was fairly easy to write automated tests for the entry form, using Enzyme
  • we can render the initial state of the form on the server and send it to the page template using our web application framework (Express)
Developing in the Open

As with all of Open Knowledge International’s projects, the Open Data Survey is developed in the Open and available as Open Source software: Open Data Survey on Github.

Terry Reese: MarcEdit’s new Logging Features

planet code4lib - Tue, 2017-01-17 06:29

Over the years, I’ve periodically gotten requests for a much more robust logger in MarcEdit.  Currently, when the tool performs a global change, it reports the number of changes made to the user.  However, a handful of folks have been wanting much more.  Ideally, they’d like to have a log of every change the application makes, which is hard because the program isn’t built that way.  I provided the following explanation to the MarcEdit list last week.

**************************************

The question that has come up a number of times since posting notes about the logger is questions about granularity.  There has been a desire to have the tool provide additional information (about the records), more information around change context, and also wondering if this will lead to a preview mode.  I think other folks wondered why this process has taken so long to develop.  Well, it stems from decisions I make around the development.  MarcEdit’s application structure can be summed up by the picture below:

In developing MarcEdit, I have made a number of very deliberate decisions, and one of those is that no one component knows what the other one does.  As you can see in this picture, the application parts of MarcEdit don’t actually talk directly to the system components.  They are referenced through a messenger, which handles all interactions between the application and the system objects.  However, the same is true of communication between the system objects themselves.  The editing library, for example, knows nothing about MARC, validation, etc. – it only knows how to parse MarcEdit’s internal file format.  Likewise, the MARC library doesn’t know anything about validation, MARC21, or linked data.  Those parts live elsewhere.  The benefit of this approach is that I can develop each component independent of the other, and avoid breaking changes because all communication runs through the messenger.  This gives me a lot of flexibility and helps to enforce MarcEdit’s agnostic view of library data.  It’s also how I’ve been able to start including support for linked data components – as far as the tool is concerned, it’s just another format to be messaged.

Of course, the challenge with an approach like this then, is that most of MarcEdit’s functions don’t have a concept of a record.  Most functions, for reasons of performance, process data much like an XML sax processor.  Fields for edit raise events to denote areas of processing, as do errors, which then push the application into a rescue mode.  While this approach allows the tool to process data very quickly, and essentially remove size restrictions for data processing – it introduces issues if, for example, I want to expose a log of the underlying changes.  Logs exist – I use them in my debugging, but they exist on a component level, and they are not attached to any particular process.  I use messaging identifiers to determine what data I want to evaluate – but these logs are not meant to record a processing history, but rather, record component actions.  They can be muddled, but they give me exactly what I need when problems arise.  The challenge with developing logging for actual users, is that they would likely want actions associated with records.  So, to do that, I’ve added an event handler in the messaging layer.  This handles all interaction with the logging subsystem and essentially tracks the internal messaging identifier and assembles data.  This means that the logger still doesn’t have a good concept of what a record is, but the messenger does, and can act as a translator.

Anyway – this is how I’ll be providing logging.  It will also let me slowly expand the logging beyond the core editing functions if there is interest.  It is also how I’ll be able to build services around the log file – to provide parsing and log enhancement, for users that want to add record specific information to a log file, that goes beyond the simple record number identifier that will be used to track changes.  This would make log files more permanent (if for example the log was enhanced with a local identifier), but due to the way MarcEdit is designed, and the general lack of a standard control number across all MARC formats (in merging for example, merging on the 001 trips checks of 9 other fields that all could store associated control data), it is my belief that providing ways to enhance the log file after run, while an extra step, will allow me the most flexibility to potentially make greater user of the processing log in the future.  It also enables me to continue to keep MARCisms out of the processing library – and focus only on handling data edits.

**********************************************

So that’s pretty much the work in a nut shell.  So what do you get.  Well, once you turn it on, you get lots of stuff and a few new tools.  So, let’s walk through them.

Turning on Logging:

Since Logging only captures changes made within the MarcEditor, you find the logging settings in the MarcEditor Preferences Tab:

Once enabled, the tool will generate a new session in the Log folder each time the Editor starts a new Session.  With the logs, come log management.  From within the MarcEditor or the Main window, you find the following:

Main Window:

From the MarcEditor, you’ll find in Reports:

Functionally, both areas provide the same functionality, but the MarcEditor reports entry is scoped to the current session logfile and current record file loaded into the Editor (if one is loaded).  To manage old sessions, use the entry on the Main Window.

Advanced Log Management

To of the use cases that were laid out for me were the need to be able to enhance logs and the ability to extract only the modified records from a large file.  So, I’ve included an Advanced Management tool for just these kinds of queries:

This is an example run from within the MarcEditor. 

 

Anyway – this is a quick write-up.  I’ll be recording a couple sessions tomorrow.  I’ll also be working to make a new plugin available.

Best,

–tr

Terry Reese: MarcEdit update

planet code4lib - Tue, 2017-01-17 06:16

I’ve posted a new update for all versions of MarcEdit, and it’s a large one.  It might not look like it from the outside, but it represents close to 3 1/2 months of work.  The big change is related to the inclusion of a more detailed change log.  Users can turn on logging and see, at a low level, the actual changes made to specific data elements.  I’ve also added some additional logging enhancement features to allow users to extract just changed records, or enhance the log files with additional data.  For more information, see my next post on the new logging process.

The full change log:

6.2.447
* Enhancement: Z39.50: Sync’ng changes made to support Z39.50 servers that are sending records missing proper encoding guidelines.  I’m seeing a lot of these from Voyager…I fixed this in one context in the last update.  This should correct it everywhere.
* Enhancement: MARCEngine: 008 processing was included when processing MARCXML records in MARC21 to update the 008, particularly the element to note when a record has been truncated.  This is causing problems when working with holdings records in MARCXML – so I’ve added code to further distinguish when this byte change is needed.
* Enhancement: MarcEdit About Page: Added copy markers to simplify capturing of the Build and Version numbers.
* Enhancement: Build New Field: The tool will only create one field per record (regardless of existing field numbers) unless the replace existing value is selected. 
* Enhancement: Swap Field Function: new option to limit swap operations if not all defined subfields are present.
* Bug Fix: MARCValidator: Potential duplicates were being flagged when records had blank fields (or empty fields) in the elements being checked. 
* Update: MarcEditor: UI responsiveness updates
* New Feature: Logging.  Logging has been added to the MarcEditor and supports all global functions currently available via the task manager. 
* New Feature:  MarcEditor – Log Manager: View and delete log files.
* New Feature:  MarcEditor – Log Manager: Advanced Toolset. Ability to enhance logs (add additional marc data) or use the logs to extract just changed records.

You can download the update directly from the website at: http://marcedit.reeset.net/downloads or you can use the automated downloader in the program.

One last note, on the downloads page, I’ve added a directly listing that will provide access to the most previous 6.2 builds.  I’m doing this partly because some of these changes are so significant, that there may be behavior changes that crop up.  If something comes up that is preventing your work – uninstall the application and pull the previous version from the archive and then let me know what isn’t working. 

Best,

–tr

Eric Lease Morgan: tei2json: Summarizing the structure of Early English poetry and prose

planet code4lib - Mon, 2017-01-16 21:39

This posting describes a hack of mine, tei2json.pl – a Perl program to summarize the structure of Early English poetry and prose. [0]

In collaboration with Northwestern University and Washington University, the University of Notre Dame is working on a project whose primary purpose is to correct (“annotate”) the Early English corpus created by the Text Creation Partnership (TCP). My role in the project is to do interesting things with the corpus once it has been corrected. One of those things is the creation of metdata files denoting the structure of each item in the corpus.

Some of my work is really an effort to reverse engineer good work done by the late Sebastian Rahtz. For example, Mr. Rahtz cached a version of the TCP corpus, transformed each item into a number of different formats, and put the whole thing on GitHub. [1] As a part of this project, he created metadata files enumerating what TEI elements were in each file and what attributes were associated with each element. The result was an HTML display allowing the reader to quickly see how many bibliographies an item may have, what languages may be present, how long the document was measured in page breaks, etc. One of my goals is/was to do something very similar.

The workings of the script are really very simple: 1) configure and denote what elements to count & tabulate, 2) loop through each configuration, 3) keep a running total of the result, 4) convert the result to JSON (a specific data format), and 5) save the result to a file. Here are (temporary) links to a few examples:

JSON files are not really very useful in & of themselves; JSON files are designed to be transport mechanisms allowing other applications to read and process them. This is exactly what I did. In fact, I created two different applications: 1) json2table.pl and 2) json2tsv.pl. [2, 3] The former script takes a JSON file and creates a HTML file whose appearance is very similar to Rahtz’s. Using the JSON files (above) the following HTML files have been created through the use of json2table.pl:

The second script (json2tsv.pl) allows the reader to compare & contrast structural elements between items. Json2tsv.pl reads many JSON files and outputs a matrix of values. This matrix is a delimited file suitable for analysis in spreadsheets, database applications, statistical analysis tools (such as R or SPSS), or programming languages libraries (such as Python’s numpy or Perl’s PDL). In its present configuration, the json2tsv.pl outputs a matrix looking like this:

id bibl figure l lg note p q A00002 3 4 4118 490 8 18 3 A00011 3 0 2 0 47 68 6 A00089 0 0 0 0 0 65 0 A00214 0 0 0 0 151 131 0 A00289 0 0 0 0 41 286 0 A00293 0 1 189 38 0 2 0 A00395 2 0 0 0 0 160 2 A00749 0 4 120 18 0 0 2 A00926 0 0 124 12 0 31 7 A00959 0 0 2633 9 0 4 0 A00966 0 0 2656 0 0 17 0 A00967 0 0 2450 0 0 3 0

Given such a file, the reader could then ask & answer questions such as:

  • Which item has the greatest number of figures?
  • What is average number of lines per line group?
  • Is there a statistical correlation between paragraphs and quotes?

Additional examples of input & output files are temporarily available online. [4]

My next steps include at least a couple of things. One, I need/want to evaluate whether or not save my counts & tabulations in a database before (or after) creating the JSON files. The data may be prove to be useful there. Two, as a librarian, I want to go beyond qualitative description of narrative texts, and the counting & tabulating of structural elements moves in that direction, but it does not really address the “aboutness”, “meaning”, nor “allusions” found in a corpus. Sure, librarians have applied controlled vocabularies and bits of genre to metadata descriptions, but such things are not quantitive and consequently allude statistical analysis. For example, using sentiment analysis one could measure and calculate the “lovingness”, “war mongering”, “artisticness”, or “philosophic nature” of the texts. One could count & tabulate the number of times family-related terms are used, assign the result a score, and record the score. One could then amass all documents and sort them by how much they discussed family, love, philosophy, etc. Such is on my mind, and more than half-way baked. Wish me luck.

Links

Eric Lease Morgan: Synonymizer: Using Wordnet to create a synonym file for Solr

planet code4lib - Mon, 2017-01-16 18:03

This posting describes a little hack of mine, Synonymizer — a Python-based CGI script to create a synonym files suitable for use with Solr and other applications. [0]

Human language is ambiguous, and computers are rather stupid. Consequently computers often need to be explicitly told what to do (and how to do it). Solr is a good example. I might tell Solr to find all documents about dogs, and it will dutifully go off and look for things containing d-o-g-s. Solr might think it is smart by looking for d-o-g as well, but such is a heuristic, not necessarily a real understanding of the problem at hand. I might say, “Find all documents about dogs”, but I might really mean, “What is a dog, and can you give me some examples?” In which case, it might be better for Solr to search for documents containing d-o-g, h-o-u-n-d, w-o-l-f, c-a-n-i-n-e, etc.

This is where Solr synonym files come in handy. There are one or two flavors of Solr synonym files, and the one created by my Synonymizer is a simple line-delimited list of concepts, and each line is a comma-separated list of words or phrases. For example, the following is a simple Solr synonym file denoting four concepts (beauty, honor, love, and truth):

beauty, appearance, attractiveness, beaut honor, abide by, accept, celebrate, celebrity love, adoration, adore, agape, agape love, amorousness truth, accuracy, actuality, exactitude

Creating a Solr synonym file is not really difficult, but it can be tedious, and the human brain is not always very good at multiplying ideas. This is where computers come in. Computers do tedium very well. And with the help of a thesaurus (like WordNet), multiplying ideas is easier.

Here is how Synonymizer works. First it reads a configured database of previously generated synonyms.† In the beginning, this file is empty but must be readable and writable by the HTTP server. Second, Synonymizer reads the database and offers the reader to: 1) create a new set of synonyms, 2) edit an existing synonym, or 3) generate a synonym file. If Option #1 is chosen, then input is garnered, and looked up in WordNet. The script will then enable the reader to disambiguate the input through the selection of apropos definitions. Upon selection, both WordNet hyponyms and hypernyms will be returned. The reader then has the opportunity to select desired words/phrase as well as enter any of their own design. The result is saved to the database. The process is similar if the reader chooses Option #2. If Option #3 is chosen, then the database is read, reformatted, and output to the screen as a stream of text to be used on Solr or something else that may require similar functionality. Because Option #3 is generated with a single URL, it is possible to programmatically incorporate the synonyms into your Solr indexing process pipeline.

The Synonymizer is not perfect.‡ For example, it only creates one of the two different types of Solr synonym files. Second, while Solr can use the generated synonym file, search results implement phrase searches poorly, and this is well-know issue. [1] Third, editing existing synonyms does not really take advantage of previously selected items; data-entry is tedious but not as tedious as writing the synonym file by hand. Forth, the script is not fast, and I blame this on Python and WordNet.

Below are a couple of screenshots from the application. Use and enjoy.

[0] synonymizer.py – http://dh.crc.nd.edu/sandbox/synonymizer/synonymizer.py

[1] “Why is Multi-term synonym mapping so hard in Solr?” – http://bit.ly/2iyYZw6

† The “database” is really simple delimited text file. No database management system required.

‡ Software is never done. If it were, then it would be called “hardware”.

HangingTogether: An increasing role for libraries in research information management

planet code4lib - Mon, 2017-01-16 17:10

It’s no secret that the research ecosystem has been experiencing rapid change in recent years, driven by complex political, technological, and network influences. One component of this complicated environment is the adoption of research information management (RIM) practices by research institutions, and particularly the increasing involvement of libraries in this development.

Research information management is the aggregation, curation, and utilization of information about research. Research universities, research funders, as well as individual researchers are increasingly looking for aggregated, interconnected research information to better understand the relationships, outputs, and impact of research efforts as well as to increase research visibility.

Efforts to collect and manage research information are not new but have traditionally emphasized the oversight and administration of federal grants. Professional research administrative oversight within universities emerged in the 20th century, rapidly accelerating in the United States following Sputnik and exemplified through the establishment of professional organizations concerned primarily with grants administration & compliance, such as the Society for Research Administrators (SRA) in 1957 and the National Council of University Research Administrators (NCURA) in 1959.

Today research information management efforts seek to aggregate and connect a growing diversity of research outputs that encompass more than grants administration, and significantly for libraries, includes the collection of publications information. In addition, both universities and funding agencies have an interest in reliably connecting grants with resulting publications–as well as to researchers and their institutional affiliations.

Not long ago the process for collecting the scholarly publications produced by a campus’s researchers would have been a manual process, possible only through the collection of each scholar’s curriculum vitae. The resources required to collect this information at institutional scale would have been prohibitively expensive, and few institutions made such an effort. Institutions have instead relied upon proxies of research productivity–such as numbers of PhDs awarded or total dollars received in federal research grants–to demonstrate their research strengths. However, recent advances in scholarly communications technology and networked information offer new opportunities for institutions to collect the scholarly outputs of its researchers. Indexes of journal publications like Scopus, PubMed, and Web of Science provide new sources for the discovery and collection of research outputs, particularly for scientific disciplines, and a variety of open source, commercial, and locally-developed platforms now support institutional aggregation of publications metadata. The adoption of globally accepted persistent identifiers (PIDs) like DOIs for digital publications and datasets and ORCID and ISNI identifiers for researchers provide essential resources for reliably disambiguating unique objects and people, and the incorporation of these identifiers into scholarly communications workflows provide growing opportunities for improved metadata quality and interoperability.

Institutions may now aggregate research information from numerous internal and external sources, including information such as:

• Individual researchers and their institutional affiliations
• Publications metadata
• Grants
• Patents
• Awards & honors received by a researcher
• Citation counts and other measures of research impact

Depending upon institutional needs, the RIM system may also capture additional internal information about faculty, such as:
• Courses taught
• Students advised
• Committee service

National programs to collect and measure the impact of sponsored research has accelerated the adoption of research information management in some parts of the world, such as through the Research Excellence Framework (REF) in the UK and the Excellence for Research in Australia (ERA) in Australia. The effort to collect, quantify, and report on a broad diversity of research outputs has been happening for some time in Europe, where RIM systems are more commonly known as Current Research Information Systems (CRIS), and where efforts like CERIF (the Common European Research Information Format) provide a standard data model for describing and exchanging research entities across institutions.

Here in the US, research information management is emerging as a part of scholarly communications practice in many university libraries, in close collaboration with other campus stakeholders. In the absence of national assessment exercises like REF or ERA, RIM practices are following a different evolution, one with greater emphasis on reputation management for the institution, frequently through the establishment public research expertise profile portals such as those in place at Texas A&M University and the University of Illinois. Libraries such as Duke University are using RIM systems to support open access efforts, and others are implementing systems that convert a decentralized and antiquated paper-based system of faculty activity reporting and annual review into a centralized process with a single cloud-based platform, as we are seeing at University of Arizona and Virginia Tech.

I believe that support for research information management will continue to grow as a new service category for libraries, as Lorcan Dempsey articulated in 2014. Through the OCLC Research Library Partnership and in collaboration with partners from EuroCRIS, I am working with a team of enthusiastic librarians and practitioners from three continents to explore, research, and report on the rapidly evolving RIM landscape, building on previous RLP outputs exploring the library’s contribution to university ranking and researcher reputation. 

One working group is dedicated to conducting a survey of research institutions to gauge RIM activity:

• Pablo de Castro, EuroCRIS
• Anna Clements, University of St. Andrews
• Constance Malpas, OCLC Research
• Michele Mennielli, EuroCRIS
• Rachael Samberg, University of California-Berkeley
• Julie Speer, Virginia Tech University

And a second working group is engaged with qualitative inquiry into institutional requirements and activities for RIM adoption:
• Anna Clements, University of St. Andrews
• Carol Feltes, Rockefeller University
• David Groenewegen, Monash University
• Simon Huggard, La Trobe University
• Holly Mercer, University of Tennessee-Knoxville
• Roxanne Missingham, Australian National University
• Malaica Oxnam, University of Arizona
• Annie Rauh, Syracuse University
• John Wright, University of Calgary

Our research efforts are just beginning, and I look forward to sharing more about our findings in the future.

About Rebecca Bryant

Rebecca Bryant is Senior Program Officer at OCLC where she leads research and initiatives related to research information management in research universities.

Mail | Web | Twitter | LinkedIn | More Posts (1)

Pages

Subscribe to code4lib aggregator