You are here

Feed aggregator

Open Knowledge Foundation: “Strinkwishism”: How to build organizational capacities around Open Data through embedded fellowships

planet code4lib - Wed, 2017-02-08 10:00

Recognizing that capacity building is central to economic growth, reductions in poverty and equal opportunities, Open Knowledge International with the support of the Open Data for Development (OD4D) Network is expanding its work with civil society organisations (CSOs) through the Embedded Fellowship Programme.

In the last three months, I worked as an embedded fellow, consulting with Women Environmental Programme (WEP), by sharing data skills, and working on a data project with the team in their Nigeria office. The timing seems right. Not only is the information revolution upon us but trends towards democratization, government decentralization and economic liberalization have profoundly reshaped how universities, NGOs and other public-interest organizations do their work, thus, presenting them with new challenges and opportunities.

Getting to Know the Team

But how has it been like, working with Women Environment Programme in the last few months? WEP works basically on women’s empowerment across the world, and as their name implies, most of their work is focused on advocacy for women. I spent 2 hours highlighting what the needs were for a data project, and for sustainability reasons, I developed a questionnaire to ascertain the level of knowledge and skills of the team I will be working with in days to come.

In human capacity building, we refer to this as a bottom-up approach. Ninety percent of the staff mentioned that they wanted to learn more about PDF extraction. Simple right? That’s what you get from questionnaires, but in the real world, while I was conducting training, they found out that they needed more, and you know what more means in the short time-frame of 6 weeks? Information overload! Here’s a brief look at the topics we covered and the feedback we received.

Exposing the Wealth of Open Tools

While highlighting data projects we could work on, such as creating advocacy products from citizen surveys, I was focused on getting some of the team members to be able to use some of the tools I introduced them to – Timeline.js for creating project or campaign timelines; Infogr.am for creating visualizations; Google Fusion Tables for publishing; licensing data using Creative Commons; Google Form for Surveys, using the Kobo Toolbox; analyzing and visualizing qualitative data using Wordle and a bunch of tools that can save them time in achieving their various tasks.

What did I get from working with them on using some of these tools? “Strinkwish” as some of the staff will say, as we engaged in hands-on training on each tool. During one of the sessions, I had to ask, and they told me it’s an organizational coined word, meaning extraordinaire! Hope you also have an organizational dictionary.

With Gai Cliff, the Senior Programmes Officer at WEP

Evaluating the effects of capacity development, such as this, is not straightforward and the short- and longer-term perspectives need to be considered.In the short term though, staff mentioned how this has been helpful in their work:

“I am so excited that I could quickly use Tabula to extract pdf files, and also create a visualization for qualitative analysis”, Evelyn Ugbe, a Programmes Officer at WEP said as she was working on her new report on women advocacy.

With hands-on training like this, one cannot really measure the level of impact until you ask what participants have learnt, and I was amazed by the response of the staff after having 3 sessions with them. Another comment from staff member Emmanuel sheds light on the organisational-level improvements:

“It’s such a  right time to have you. I am head of human resources and using Google Drive, coupled with the Google Fusion Tables had made my task easier, especially that I have been able to create a collaborative way, of getting feedback from staff.”

So in the short term, increases in knowledge and skills can be measured and in the longer term, one can measure whether people are doing things differently as a result of the capacity building by applying the skills they learned. I will be looking out for this, in the coming months.

The three levels of capacity development (Adapted from UNDP, 1997)

Building Infrastructural Capacity Building is Paramount

With my experience of training and coaching staff and individuals, I have come to realize that skills and knowledge of trainees might not be enough, especially when it is within an organization: the system development capacity and organizational development capacity (described above) are more important and often unrecognised. As such, I was not surprised at times when some hands-on sessions became frustrating for the team because of unreliable and slow internet. At one point I needed to bring my internet router, so I could get my rhythm on. Also, when one of the staff members, Elizabeth, tells me her computer just keeps going off because her laptop battery isn’t working and power isn’t stable, I was drawn to how critical infrastructural efficiency lies at the foundation of successful open data capacity building.

I was also able to identify that WEP needs a simple and slick website together with an email hosting service that allows staff to easily exchange information for its kind of work. Gai Cliff, a Senior Monitoring and Evaluation Officer at WEP asked in one of the sessions about one of the those I had introduced to them, “So how do we get the paid version, we are limited to some added functionality. Do you always get this when you do hands-on training using open-source tools?”. I had this several times! I like the fact that we had some talks about this challenges, and how going forward this can be sorted so we can easily publish some of the advocacy works I will be producing from the data projects.

What’s Next?

“Olu, we will love to continue this in the coming year so you can produce our advocacy materials. These sessions have been helpful for my team, and also that you can play an advisory role on the organizational capacity development”, Priscilla Achakpa, the Chief Executive Officer at WEP mentioned. “Absolutely” I replied while thinking of my calendar for the first quarter of the coming year, and also visualizing the support each of the staff had requested for, which I think could do well to accomplish the long term goals of this fellowship, and as such, ensuring the 1,110 minutes committed to this first phase, can become sustainable!

Open Knowledge Foundation: “Strinkwishism”: How to build organizational capacities around Open Data through embedded fellowships

planet code4lib - Wed, 2017-02-08 10:00

Recognizing that capacity building is central to economic growth, reductions in poverty and equal opportunities, Open Knowledge International with the support of the Open Data for Development (OD4D) Network is expanding its work with civil society organisations (CSOs) through the Embedded Fellowship Programme.

In the last three months, I worked as an embedded fellow, consulting with Women Environmental Programme (WEP), by sharing data skills, and working on a data project with the team in their Nigeria office. The timing seems right. Not only is the information revolution upon us but trends towards democratization, government decentralization and economic liberalization have profoundly reshaped how universities, NGOs and other public-interest organizations do their work, thus, presenting them with new challenges and opportunities.

Getting to Know the Team

But how has it been like, working with Women Environment Programme in the last few months? WEP works basically on women’s empowerment across the world, and as their name implies, most of their work is focused on advocacy for women. I spent 2 hours highlighting what the needs were for a data project, and for sustainability reasons, I developed a questionnaire to ascertain the level of knowledge and skills of the team I will be working with in days to come.

In human capacity building, we refer to this as a bottom-up approach. Ninety percent of the staff mentioned that they wanted to learn more about PDF extraction. Simple right? That’s what you get from questionnaires, but in the real world, while I was conducting training, they found out that they needed more, and you know what more means in the short time-frame of 6 weeks? Information overload! Here’s a brief look at the topics we covered and the feedback we received.

Exposing the Wealth of Open Tools

While highlighting data projects we could work on, such as creating advocacy products from citizen surveys, I was focused on getting some of the team members to be able to use some of the tools I introduced them to – Timeline.js for creating project or campaign timelines; Infogr.am for creating visualizations; Google Fusion Tables for publishing; licensing data using Creative Commons; Google Form for Surveys, using the Kobo Toolbox; analyzing and visualizing qualitative data using Wordle and a bunch of tools that can save them time in achieving their various tasks.

What did I get from working with them on using some of these tools? “Strinkwish” as some of the staff will say, as we engaged in hands-on training on each tool. During one of the sessions, I had to ask, and they told me it’s an organizational coined word, meaning extraordinaire! Hope you also have an organizational dictionary.

With Gai Cliff, the Senior Programmes Manager at WEP

Evaluating the effects of capacity development, such as this, is not straightforward and the short- and longer-term perspectives need to be considered.In the short term though, staff mentioned how this has been helpful in their work:

“I am so excited that I could quickly use Tabula to extract pdf files, and also create a visualization for qualitative analysis”, Evelyn Ugbe, a Programmes Officer at WEP said as she was working on her new report on women advocacy.

With hands-on training like this, one cannot really measure the level of impact until you ask what participants have learnt, and I was amazed by the response of the staff after having 3 sessions with them. Another comment from staff member Emmanuel sheds light on the organisational-level improvements:

“It’s such a  right time to have you. I am head of human resources and using Google Drive, coupled with the Google Fusion Tables had made my task easier, especially that I have been able to create a collaborative way, of getting feedback from staff.”

So in the short term, increases in knowledge and skills can be measured and in the longer term, one can measure whether people are doing things differently as a result of the capacity building by applying the skills they learned. I will be looking out for this, in the coming months.

The three levels of capacity development (Adapted from UNDP, 1997)

Building Infrastructural Capacity Building is Paramount

With my experience of training and coaching staff and individuals, I have come to realize that skills and knowledge of trainees might not be enough, especially when it is within an organization: the system development capacity and organizational development capacity (described above) are more important and often unrecognised. As such, I was not surprised at times when some hands-on sessions became frustrating for the team because of unreliable and slow internet. At one point I needed to bring my internet router, so I could get my rhythm on. Also, when one of the staff members, Elizabeth, tells me her computer just keeps going off because her laptop battery isn’t working and power isn’t stable, I was drawn to how critical infrastructural efficiency lies at the foundation of successful open data capacity building.

I was also able to identify that WEP needs a simple and slick website together with an email hosting service that allows staff to easily exchange information for its kind of work. Gai Cliff, a Senior Monitoring and Evaluation Officer at WEP asked in one of the sessions about one of the those I had introduced to them, “So how do we get the paid version, we are limited to some added functionality. Do you always get this when you do hands-on training using open-source tools?”. I had this several times! I like the fact that we had some talks about this challenges, and how going forward this can be sorted so we can easily publish some of the advocacy works I will be producing from the data projects.

What’s Next?

“Olu, we will love to continue this in the coming year so you can produce our advocacy materials. These sessions have been helpful for my team, and also that you can play an advisory role on the organizational capacity development”, Priscilla Achakpa, the Chief Executive Officer at WEP mentioned. “Absolutely” I replied while thinking of my calendar for the first quarter of the coming year, and also visualizing the support each of the staff had requested for, which I think could do well to accomplish the long term goals of this fellowship, and as such, ensuring the 1,110 minutes committed to this first phase, can become sustainable!

LibUX: Could Springshare cultivate a developer community?

planet code4lib - Wed, 2017-02-08 05:11

If we dumb it down a little, we can conceptualize the history of Springshare like this:

  1. The iPhone launches, sparking the wildfire of a point-of-need ethic making libraries painfully aware their web presence — whether its total lack, or bottlenecked by city, county, or institutional IT — would fail to respond to increasing demand for services and content.This had been kindling for a few years. “Web 2.0” was a thing; the 23 Things, encouraging organizations to experiment with blogging, videos, and social, launched just a year or two prior. However, it was the iPhone that first put the internet in our users’ pockets.
  2. Springshare takes advantage of this zeitgeist with their new content management system (LibGuides 1) in several key areas:
    1. it requires little to no technical knowledge or systems support. These live in the cloud, and they just work. This is an important point because at this stage there hadn’t quite been that web services boom in our discipline, which influences things later. More on that below.
    2. LibGuides relieves libraries of overbearing IT departments. Librarians are empowered to make content unencumbered by bureaucracy or code. Here, the return on investment is huge, whether it’s measured by outreach, patrons helped, time saved, expertise inspired – all this compounded by the organizational agency libraries were now capable of exercising on the web.
    3. LibGuides are specifically branded for libraries. Without developers and the like in the decision-making rooms, more powerful alternatives like WordPress or Drupal — the names of which, at a glance, are sort of meaningless — were barely on the radar. I suspect this mattered more than the fact that they both require some technical expertise, especially back then. Libraries jump through hoops to make shitty vendor products work, so I suspect Springshare’s specific branding is to credit.
  3. Five or six years ago libraries increasingly reallocated their staffing budget to create new web services positions, create spaces for in-house designers and developers. Library schools rebrand themselves, merging with communications or computer sciences colleges, and churning out not the first but singularly-focused web librarians. LibGuides hadn’t much changed while the WordPresses of the world were much improved. The job LibGuides is designed to do — painless hosting and content management, largely — is being done better by other services, and now there is growing expertise at decision making tables to identify this.
  4. So, Springshare identifies a new job and pivots.
  5. LibGuides 2 is released as part of a larger suite of LibApps, which extend the benefits above by providing solutions to library-specific problems like event management, staff scheduling, virtual reference, assessment – all of which are individually available as WordPress plugins or Drupal modules, or even other SaaS, but are still exponentially more attractive as part of an emerging ecosystem of Springshare services. It’s easier to use, content is already there, and it doesn’t require much technical knowledge or systems support to administer. This time, however, it’s not just convenient hosting but tools that are core to modern librarianship.
We are crossing into the Library DevOps Era

The big libraries are now hiring entire developer teams, no longer with the “librarian” pretense tagged-on to job titles, but full-on “software developers” – or, in enterprise speak, “business analysts.” Smaller institutions will follow.

What this means is that it is cheaper and easier than ever to spin-up homegrown solutions. With a reasonably sound WordPress developer and a few hundred dollars to spend on premium plugins, libraries can host their own website, with event management, chat, robust statistics, internal staffing, editorial roles, not to mention a baked-in and crazy robust API offering out-of-the-box virtually unlimited extensibility to your homegrown system. Host it with Amazon Web Services for dollars a year. Spare thousands from the budget.

So, what to do then with Springshare?

Developers will seek alternatives when LibApps no longer appears serviceable to their increasingly complex needs. If they were extendible, malleable to change and integration in the same way WordPress can be extended through plugins and APIs, then to developers LibApps in these edge cases are no longer obstacles but already part of the solution.

WordPress is the model, here. It’s not only that the platform is easily shaped to purpose but there is a robust community of developers. Beginning with the WordPress Plugin repository, a curated app store — hello, fellow kids! — allowed for an economy of supply and demand where solutions needed were solved publicly under a GPL license. As the repository grew, demand for further functionality grew, demand for developers grew, who contributed to the growth of the repository.

The value of such community can’t be understated. It’s what sustains the virtuous cycle, the trust that even if WordPress isn’t the best tool, it could be the right tool. WordPress is always an option.

If Springshare can be that for libraries, they’ll outlast even the integrated library systems.

Ed Summers: Discourse and Pragmatics

planet code4lib - Wed, 2017-02-08 05:00

In keeping with past semesters I’m going to try posting my written notes for class here. This is mostly peer pressure on myself to think about how I’m writing a bit more publicly. Although the reality is it’s mostly going to be lost on the Web.

I’m taking two classes this semester Discourse Analysis (edci788) and Documentation, Collection and Appraisal (lbsc785). The latter I’m technically co-teaching with my advisor Ricky Punzalan, but the reality is I’m learning lots about appraisal practices from him as well as the practitioners who are in the class. The Discourse Analysis class requires a written summary each week, and this is the first of those.

It’s always a bit weird and maybe risky learning in public, but you only live once right? Ahem. I’d love to hear from you in comments/annotations here or on Twitter or email if any of this gives you any ideas or prompts any questions.

The readings this week focused on discourse and pragmatics. Paltridge (2012) defines pragmatics as the study of how the meaning of spoken and written discourse is related to the context in which that speech and writing occurs. Context here is taken to be the particular social situation that the discourse takes place in, the other text or speech it is situated with, and any background knowledge that it relies upon.

One of the foundational concepts in pragmatics is speech act theory, which is the idea that words do things in the world. Words have a literal meaning that can be analyzed for its truth or falsehood. But words also can be used to effect change in the word, to perform actions. Searle distinguished between these two types of acts as locutionary and illocutionary acts. And the actual action that is caused by the words is the perlocutionary act.

One practical example of this is the act of saying “I do” in a marriage ceremony. The words have a literal meaning, and perform the action of becoming legally married. They are also tied to the social situation in which they occur, the marriage ceremony, their partners speech and the speech of the marriage official. This example also highlights how various conditions can influence whether a specific speech act works or not. Austin called these felicity conditions, which Searle interpreted somewhat rigidly as rules.

Pragmatics is also specifically concerned with the theoretical perspective of the cooperative principle: that discourse is a function of participants having a shared interest or purpose, which provides a unifying shape to the discourse, which prevents it from just being a series of random and disconnected topics. This idea was introduced by Grice (1975) in which he provides four categories or maxims that help identify the operation of the cooperative principle in discourse:

  • quantity: make contribution informative, but not more informative than needed
  • quality: try to make a contribution that is true (not false, or lacking in evidence)
  • relation: moves in topic need to fit certain parameters
  • manner: how something is said (not what)

Grice uses these maxims in order to show how speech and language do not simply fit into either a formal (scientific) or informal (humanistic) analysis. To do this he introduces the idea of the implicature which is a meaning that is not explicitly provided in the literal analysis of the words in discourse, but can be ascertained by looking at how speech interacts with the four maxims in various ways:

  • when a maxim is quietly violated
  • when a participant explicitly opts-out from a maxim
  • when the fulfillment of one maxim is in contradiction, tension with another maxim: a clash
  • when a maxim is openly disregarded or flouted

Grice uses very short snippets of conversation, mostly just paired statements: A says this, B says this in response. He uses these snippets to illustrate the fulfillment of the four maxims, and how this can give rise to implicatures, or meanings that are not explicitly provided in the literal text.

In contrast Kasper (2006) also looks at pragmatics but uses much longer sequences of conversation. This makes sense because Kasper uses the lens of Conversation Analysis to examine pragmatics, or meaning making. CA requires looking at more than just pairs of utteranaces–at conversations. Kasper critiques the rationalist foundations of Speech Act theory, by questioning the idea that the meaning of an utterance is related to the internal state of the speaker, and that in turn, the listener receives and internalizes that meaning. This telementation model, where meaning is being transmitted from speaker to listener does not, in Kasper’s eyes, sufficiently describe the way that meaning is arrived at or generated. For Kasper meaning is co-constructed by participants, and rather than being transmitted it is emergent and highly contextual. Conversation Analysis’ attention to the specific details of full conversations allows meaning and context to be understood in its specificity as collaborative ventures, where the whole can be larger than the sum of its parts.

Taguchi (2015) provides an example of using cross-cultural speech act theory to look at competencies of language learners. Culture is an important dimension to understanding the speech acts because the mechanics of speech, and the significance of particular word choices are not necessarily portable across cultures. Taguchi is specifically interested in how spending a year abroad can change the learners cultural awareness and their ability to general speech acts, or their language comptency. The specific research question was to see if cultutral adjustment is correlated with language skill.

To achieve this Taguchi measures intercultural competence and pragmatic competenece in a group of 20 Japanese language learners before and after their semester abroad. Intercultural competence is measured using a tool called the Cross-Cultural Adapatability Inventory, which is essentially a survey of 50 questions that measures several factors using a Likert scale. Pragmatic competence is measures using an oral discourse complete test (DCT). This test collects what language learners think they would say in a particular situation, and the responses were then evaluated by Japanese speakers with respect to the speech style and speech act using a six point scale. The results were then analyzed statistically using the t-test to see if there was any correlation between changes in cultural adaptability and language use. They found that intercultural competence was correlated with appropriate speech acts, but not with speech style. The authors conjectured that this could be the failing of the DCT, or perhaps with their relatively small sample size.

The readings this week provided lots of different views on the idea of speech acts and discourse pragmatics. It was clear to me on reading them that this is a very deep area of research, where there is a great deal of theoretical work to draw on. I haven’t completely decided yet what I am going to be studying as part of my research project yet. I’m specifically interested in looking at how archivists decide what is valuable when collecting material from the Web, and I have three different data sources in mind:

  • a set of interviews I conducted with web archivists about their appraisal process
  • online conversations in Internet Relay Chat between volunteer archivists in the ArchiveTeam community
  • written collection development policies from different institutions

I think that discourse pragmatics could be used in all three, but probably would work best in the first two because of their conversational aspect. The idea of value in appraisal work is a slippery concept, and I think Grice’s idea of implicatures could be very useful in reading between the lines of how archivists ascribe value to material. Also, looking at the discussion through a cooperative lens could be useful since archivists do tend to look at what they are doing as a cooperative enterprise: a community of practice that is centered on preserving material for use by records creators and researchers. I also think Kasper’s use of conversational analysis could uncover emergent meanings in the interviews or transcripts to help uncover new understandings about this community of practice, its cooperative ideas and activities. I’m not particularly keen on making statistical claims like Taguchi, mostly because I don’t think questions of value lend themselves to statistical analyses so much as they do qualitative measures. But I’d like to be proven wrong if there are good tools for achieving that.

References

Grice, H. P. (1975). Syntax and semantics: Speech acts. In (Vol. 3). New York: Academic Press.

Kasper, G. (2006). Pragmatics & language learning. In K. Bardovi-Harlig, J. C. Félix-Brasdefer, & A. S. Omar (Eds.), (pp. 281–314). Natl Foreign Lg Resource Ctr.

Paltridge, B. (2012). Discourse analysis: An introduction. Bloomsbury Publishing.

Taguchi, N. (2015). Cross-cultural adaptability and development of speech act production in study abroad. International Journal of Applied Linguistics, 25(3), 343–365.

DuraSpace News: COAR Seeks Comments on Vision for Next Generation Repositories

planet code4lib - Wed, 2017-02-08 00:00

From Kathleen Shearer, Executive Director, Confederation of Open Access Repositories (COAR)

Göttingen, Germany  COAR is pleased to announce the publication of the initial outcomes of the COAR Next Generation Repositories Working Group for public comment. 

Jonathan Rochkind: ruby VCR, easy trick for easy re-record

planet code4lib - Tue, 2017-02-07 22:15

I do a lot of work with external HTTP API’s, and I love the vcr for use in writing tests/specs involving these. It records the interaction, so most of the time the tests are running based on a recorded interaction, not actually going out to the remote HTTP server.

This makes the tests run faster, it makes more sense on a CI server like Travis, it let’s tests run automatically without having to hard-code credentials in for authenticated services (make sure to use VCR’s filter_sensitive_data feature, figuring out the a convenient way to do that with real world use cases is a different discussion), and it even lets people run the tests without having credentials themselves at all to make minor PRs and such.

But in actual local dev, I sometimes want to run my tests against live data for sure, often as the exactly HTTP requests change as I edit my code. Sometimes I need to do this over and over again in a cycle. Previously, I was doing things like manually deleting the relevant VCR cassettes files , to ensure I was running with live data, or avoid VCR “hey, this is a new request buddy” errors.

Why did I never think of using the tools VCR already gives us to make it a lot easier on myself?

Normally works as always, but I just gotta VCR=all ./bin/rspec to run that run with brand newly recorded cassettes. Or VCR=all ./bin/rspec some_specific_spec.rb to re-record only that spec, or only the specs I’m working on, etc.

Geez, I should have figured that out years ago. So I’m sharing with you.

Just don’t ask me if it makes more sense to put VCR configuration in spec_helper.rb or rails_helper.rb. I still haven’t figured out what that split is supposed to be about honestly. I mean, I do sometimes VCR specs of service objects that do not have Rails dependencies…. but I usually just drop it (and all my other config) in rails_helper.rb and ignore the fact that rspec these days is trying to force us to make a choice I don’t really understand the implications or utility of and don’t want to think about.


Filed under: General

District Dispatch: Progress! Email Privacy Act clears House

planet code4lib - Tue, 2017-02-07 17:02

Congratulations library advocates! For the second time in just over 9 months, the US House of Representatives last night passed the Email Privacy Act (H.R. 387 in this Congress) by voice vote.  Propelled by more than 1,100 library supporters, the bill now moves to the Senate where the timing of its consideration – and ultimate fate – are not yet clear.

Source: rstreet.org, http://www.rstreet.org/2015/08/04/white-house-offers-tepid-response-on-email-privacy/

As previously discussed in DD, the bill’s primary purpose and benefit is to finally update the anachronistic Electronic Communications Privacy Act (ECPA). This would to require law enforcement authorities to obtain a judicial search warrant based on probable cause in order to obtain the actual content of an individual’s email, texts, tweets, cloud-stored files and photos or other electronic information. Under ECPA as still written, no such warrant typically is required for electronic communications older than six months. (This ACLU infographic lays out the problem well.)

Next month will mark the 6th anniversary of ALA’s charter membership in the Digital Due Process coalition, formed to harness the grassroots and Washington muscle of many organizations and companies in the service of ECPA reform.  With just one Senate vote between us and that goal, we’re not about to let up now.  Please stay tuned for yet another action alert, this time focused on the Senate, once we and our partners know more about when that will have the best chance of putting the Email Privacy Act on the President’s desk.

The post Progress! Email Privacy Act clears House appeared first on District Dispatch.

David Rosenthal: Coronal Mass Ejections (again)

planet code4lib - Tue, 2017-02-07 16:00
Back in 2014 I blogged about one of digital preservation's less well-known risks, coronal mass ejections (CME).  Additional information accumulated in the comments. Last October:
"President Barack Obama .. issued an Executive Order that defines what the nation’s response should be to a catastrophic space weather event that takes out large portions of the electrical power grid, resulting in cascading failures that would affect key services such as water supply, healthcare, and transportation.Two recent studies bought the risk back into focus and convinced me that my 2014 post was too optimistic. Below the fold, more gloom and doom.

Mark Gilbert's How Space Could Trigger a Future Economic Crisis reports on a new paper in Space Weather:
In four scenarios envisaging the economic impact of a solar storm, the mildest triggers a daily loss to the U.S. economy of $6.2 billion, or 15 percent of daily output; the worst case sees a cost of $41.5 billion, wiping out every dollar the world’s largest economy generates each day.and:
A study published last month by the Cambridge Centre for Risk Studies estimates that a solar storm would have the potential to wipe between $140 billion to $613 billion off the global economy in a five-year time span, depending on the severity of the impact.According to a NASA blog post, the probability is 12% per decade:
In February 2014, physicist Pete Riley of Predictive Science Inc. published a paper in Space Weather entitled "On the probability of occurrence of extreme space weather events." In it, he analyzed records of solar storms going back 50+ years. By extrapolating the frequency of ordinary storms to the extreme, he calculated the odds that a Carrington-class storm would hit Earth in the next ten years.

The answer: 12%.Macroeconomic impactSo there is about an 1-in-8 chance that in the next decade we will face one of the Cambridge scenarios. They divide the economic impact of the severe scenario's CME impacting the US into four areas:
  • Direct Impacts. It takes 5 months to restore power to 95% of the US population.
  • Indirect Supply Chain Impacts. The impact of power outages on international supply chains is bigger than their direct impact.
  • Macroeconomic Impacts. There is a large initial hit to US domestic product, but a fairly rapid recovery as government spends on recovery.
  • Insurance Impacts. For various reasons, insurance companies bear only about 14% of the economic loss, but this still amounts to about 4 time the total catastrophe losses they bear in a normal year.
I've always said that the chief threat to digital preservation is economic; digital information being very vulnerable to interruptions in the money supply. In the context of economic losses of the magnitude envisaged by the Cambridge report, digital preservation systems would be very low on the priority list for recovery funds.

The risk of CME's is one reason Facebook has advanced for their investment in optical storage for cold data. A CME could destroy the electronics in the racks, but it would not destroy the data on the DVDs. Actually, a CME is equally unlikely to destroy the data on hard disk platters, but destroying the drive electronics makes that data very expensive to recover.

DPLA: Color Our Collections 2017

planet code4lib - Tue, 2017-02-07 15:00

That’s right, folks — #ColorOurCollections is back for kids and grown-ups alike! On your next lunch break, free evening, or Saturday afternoon, try your hand at coloring cultural heritage collections from institutions across the country.

This year’s selection from DPLA includes an array of art, posters, inventions, landscapes and animals. For even more choices, last year’s images are still fair game too!

Color your favorites and share them with us all week at @DPLA or on Facebook using #ColorOurCollections.

Download all DPLA #ColorOurCollections coloring pages

 

To learn more about the campaign and find other participating institutions, visit colorourcollections.org.

LITA: Call for LITA Guides

planet code4lib - Tue, 2017-02-07 14:56

 

 

 

 

 

 

 

 

 

LITA is looking to expand its popular LITA Guide series. Topics for consideration include:

  • Tools for big data
  • Developing in-house technology expertise
  • Budgeting for technology
  • Writing a technology plan
  • K-12 technology
  • Applications of agile development for libraries
  • Grant writing for library technology
  • Security for library systems

Do you have expertise in any of these areas? Reach out to Marta Deyrup, Acquisitions Editor.

Save

Save

State Library of Denmark: juxta – image collage with metadata

planet code4lib - Tue, 2017-02-07 14:38

Creating large collages of images to give a bird’s eye view of a collection seems to be gaining traction. Two recent initiatives:

Combining those two ideas seemed like a logical next step and juxta was born: A fairly small bash-script for creating million-scale collages of images, with no special server side.  There’s a small (just 1000 images) demo at SBLabs.

Presentation principle

The goal is to provide a seamless transition from the full collection to individual items, making it possible to compare nearby items with each other and locate interesting ones. Contextual metadata should be provided for general information and provenance.

Concretely, the user is presented with all images at once and can zoom in to individual images in full size. Beyond a given threshold, metadata are show for the image currently under the cursor, or finger if a mobile device is used. An image description is displayed just below the focused image, to avoid disturbing the view. A link to the source of the image is provided on top.

Overview of historical maps

Meta-data for a specific map

Technical notes, mostly on scaling

On the display side, OpenSeadragon takes care of the nice zooming. When the user moves the focus, a tiny bit of JavaScript spatial math resolves image identity and visual boundaries.

OpenSeadragon uses pyramid tiles for display and supports the Deep Zoom protocol can be implemented using only static files. The image to display is made up of tiles of (typically) 256×256 pixels. When the view is fully zoomed, only the tiles within the viewport are requested. When the user zooms out, the tiles from the level above are used. The level above is half the width and half the height and is thus represented by ¼ the amount of tiles. And so forth.

Generating tiles is heavy

A direct way of creating the tiles is

  1. Create one large image of the full collage (ImageMagick’s montage is good for this)
  2. Generate tiles for the image
  3. Scale the image down to 50%×50%
  4. If the image is larger than 1×1 pixel then goto 2

Unfortunately this does not scale particularly well. Depending on size and tools, it can take up terabytes of temporary disk space to create the full collage image.

By introducing a size constraint, juxta removes this step: All individual source images are scaled & padded to have the exact same size. The width and height of the images are exact multiples of 256. Then the tiles can be created by

  1. For each individual source image, scale, pad and split the image directly into tiles
  2. Create the tiles at the level above individually by joining the corresponding 4 tiles below and scale to 50%×50% size
  3. If there are more than 1 tile or that tile is larger than 1×1 pixel then goto 2

As the tiles are generated directly from either source images or other tiles, there is no temporary storage overhead. As each source image and each tile are processed individually, it is simple to do parallel processing.

Metadata takes up space too

Displaying image-specific metadata is simple when there are just a few thousand images: Use an in-memory array of Strings to hold the metadata and fetch it directly from there. But when the number of images goes into the millions, this quickly becomes unwieldy.

juxta groups the images spatially in buckets of 50×50 images. The metadata for all the images in a bucket are stored in the same file. When the user moved the focus to a new image, the relevant bucket is fetched from the server and the metadata are extracted. A bucket cache is used to minimize repeat calls.

Most file systems don’t like to hold a lot of files in the same folder

While the limits differ, common file systems such as ext, hfs & ntfs all experience performance degradation with high numbers of files in the same folder.

The Deep Zoom protocol in conjunction with file-based tiles means that the amount of files at the deepest zoom level is linear to the number of source images. If there are 1 million source images, with full-zoom size 512×512 pixels (2×2 tiles), the number of files in a single folder will be 2*2*1M = 4 million. Far beyond the comfort-zone fo the mentioned file systems (see the juxta readme for tests of performance degradation).

juxta mitigates this by bucketing tiles in sub-folders. This ensures linear scaling of build time at least up to 5-10 million images. 100 million+ images would likely deteriorate build performance markedly, but at that point we are also entering “is there enough free inodes on the file system?” territory.

Unfortunately the bucketing of the tile files is not in the Deep Zoom standard. With OpenSeadragon, it is very easy to change the mapping, but it might be more difficult for other Deep Zoom-expecting tools.

Some numbers

Using a fairly modern i5 desktop and 3 threads, generating a collage of 280 5MPixel images, scaled down to 1024×768 pixels (4×3 tiles) took 88 seconds or about 3 images/second. Repeating the experiment with a down-scale to 256×256 pixels (smallest possible size) raised the speed to about 7½ image/second.

juxta comes with a scale-testing script that generates sample images that are close (but not equal) to the wanted size and repeats them for the collage. With this near-ideal match, processing speed was 5½ images/second for 4×3 tiles and 33 images/second for 1×1 tiles.

The scale-test script has been used up to 5 million images, with processing time practically linear to the number of images. At 33 images/second that is 42 hours.


Open Knowledge Foundation: Open Data by default: Lorca City Council is using OpenSpending to increase transparency and promote urban mobility.

planet code4lib - Tue, 2017-02-07 10:00

Castillo de Lorca. Torre Alfonsina (Public Domain)

Lorca, a city located in the South of Spain with currently 92,000 inhabitants, launched its open data initiative on January 9th 2014. Initially it offered 23 datasets containing transport, mobility, statistical and economic information. From the very beginning, OpenSpending was the tool selected by Lorca City Council because of its capabilities and incredible visualization abilities.

The first upload of datasets was done in 2013, on the previous version of OpenSpending. With the OpenSpending relaunch last year, Lorca City Council continued to make use of the OpenSpending datastore, while the TreeMap view of the expenditure budget was embedded on the council’s open data website.

In December 2016, the council’s open data website was redesigned, including budget datasets built with the new version at next.openspending.org. The accounting management software of Lorca allows the automatic conversion of data files to csv. format, so these datasets are compatible with the requested formats established by OpenSpending.

Towards more transparency and becoming a smart city

In 2015, when the City of Lorca transparency website was launched, the council decided to continue with the same strategy focused on visualization tools to engage citizens with an intuitive approach to the budget data.

Lorca is a city pioneer in the Region of Murcia in terms of open data and transparency. So far, 125 datasets have been released and much information is available along with the raw data.

It deserves to be highlighted that there are pilot project initiatives to bring open data to schools, which was carried out during the past year. In 2017, we will resume to teach the culture of open data to school children with the main goal to demonstrate how to work with data by using open data.

In the close future the council plans to open more data directly from the sources, i.e. achieve policy of open data by default.

And of course Lorca intends to continue exploring other possibilities that Open Spending offers us to provide all this data to the citizenry. In addition, Lorca is working to become a smart city (article in Spanish only) – open data is a key element in this goal. Therefore, Lorca’s open data initiative will be a part of the Smart Social City strategy from the very beginning. 

Open Knowledge Foundation: Open Data by default: Lorca City Council is using OpenSpending to increase transparency and promote urban mobility.

planet code4lib - Tue, 2017-02-07 10:00

Castillo de Lorca. Torre Alfonsina (Public Domain)

Lorca, a city located in the South of Spain with currently 92,000 inhabitants, launched its open data initiative on January 9th 2014. Initially it offered 23 datasets containing transport, mobility, statistical and economic information. From the very beginning, OpenSpending was the tool selected by Lorca City Council because of its capabilities and incredible visualization abilities.

The first upload of datasets was done in 2013, on the previous version of OpenSpending. With the OpenSpending relaunch last year, Lorca City Council continued to make use of the OpenSpending datastore, while the TreeMap view of the expenditure budget was embedded on the council’s open data website.

In December 2016, the council’s open data website was redesigned, including budget datasets built with the new version at next.openspending.org. The accounting management software of Lorca allows the automatic conversion of data files to csv. format, so these datasets are compatible with the requested formats established by OpenSpending.

Towards more transparency and becoming a smart city

In 2015, when the City of Lorca transparency website was launched, the council decided to continue with the same strategy focused on visualization tools to engage citizens with an intuitive approach to the budget data.

Lorca is a city pioneer in the Region of Murcia in terms of open data and transparency. So far, 125 datasets have been released and much information is available along with the raw data.

It deserves to be highlighted that there are pilot project initiatives to bring open data to schools, which was carried out during the past year. In 2017, we will resume to teach the culture of open data to school children with the main goal to demonstrate how to work with data by using open data.

In the close future the council plans to open more data directly from the sources, i.e. achieve policy of open data by default.

And of course Lorca intends to continue exploring other possibilities that Open Spending offers us to provide all this data to the citizenry. In addition, Lorca is working to become a smart city (article in Spanish only) – open data is a key element in this goal. Therefore, Lorca’s open data initiative will be a part of the Smart Social City strategy from the very beginning. 

DuraSpace News: Telling Fedora 4 Stories at the University of Alberta with Geoff Harder, Peter Binkley, and Leah Vanderjagt

planet code4lib - Tue, 2017-02-07 00:00

“Telling Fedora 4 Stories” is an initiative aimed at introducing project leaders and their ideas to one another while providing details about Fedora 4 implementations for the community and beyond.

District Dispatch: Archived webinar on Sci-Hub and resource sharing now available

planet code4lib - Mon, 2017-02-06 21:56

Plan ahead! One hour CopyTalk webinars occur on the first Thursday of every month at 11 a.m. Pacific / 2 p.m. Eastern.

An archived copy of the CopyTalk webinar “Open Access ‘Pirates:’ Sci-Hub and #icanhazpdf as Resource Sharing” is now available. Originally webcasted on February 2, 2017, by the Office for Information Technology Policy’s Copyright Education subcommittee, this webinar was one our most popular CopyTalk of all time.

Presenters were Carolyn Caffrey Gardner from California State University Dominguez Hills and Gabriel J. Gardner from California State University Long Beach. They showed their latest research on who uses Sci-Hub or other guerrilla fulfillment sites and why. In addition, they describe the various ways people use and build guerrilla sites, both centralized (active and planned deployment) and decentralized (crowd-sourcing). Is this just a supply and demand issue, or is something else afoot?

You can watch the full CopyTalk and view the slides on the Office for Information Technology Policy’s website here.

Plan ahead! One hour CopyTalk webinars occur on the first Thursday of every month, 11 a.m. Pacific / 2 p.m. Eastern. Live stream: ala.adobeconnect.com/copytalk

The post Archived webinar on Sci-Hub and resource sharing now available appeared first on District Dispatch.

LITA: What’s so super about supercomputing? A joint LITA and ACRL webinar

planet code4lib - Mon, 2017-02-06 17:44

What’s so super about supercomputing? A very basic introduction to high performance computing

Presenters: Jamene Brooks-Kieffer and Mark J. Laufersweiler
Tuesday February 28, 2017
2:00 pm – 3:30 pm Central Time

Register Online, page arranged by session date (login required)

This 90 minute webinar provides a bare-bones introduction to high-performance computing, also known as HPC, supercomputing, and under many other monikers. This program is a unique attempt to connect the academic library to introductory information about HPC. Librarians who are learning about researchers’ data-intensive work should consider familiarizing themselves with the computing environment often used to conduct that work.

Academic librarians, particularly, face a landscape in which many of their users conduct part or all of their research using computation. Bibliometric analysis, quantitative statistical analysis, and geographic data visualizations are just a few examples of computationally-intensive work underway in humanities, social science, and science fields.

Covered topics will include:

  • Why librarians should care about HPC
  • HPC terminology and working environment
  • Examples of problems appropriate for HPC
  • HPC resources at institutions and nation-wide
  • Low-cost entry-level programs for learning distributed computing

The webinar slide set and a handout that includes a HPC glossary of basic HPC terminology as well as HPC resources will be made available.

Details here and Registration here

Webinar takeaways will include:

  • Attendees will learn the basic terminology of high performance computing.
  • Attendees will be introduced to the working environment commonly used for high performance computing.
  • Attendees will gain information on institutional and national high performance computing resources available to researchers.

Jamene Brooks-Kieffer brings a background in electronic resources to her work as Data Services Librarian at the University or Kansas. She regularly teaches on data management practices to audiences of faculty, graduate students, and undergraduates. She has engaged library professionals in many in-person and virtual programs at venues including Electronic Resources & Libraries, Coalition for Networked Information, and a Great Plains Network / Greater Western Library Association webinar series.

Dr. Mark Laufersweiler has, since the Fall of 2013, served as the Research Data Specialist for the University of Oklahoma Libraries. He is currently assisting the educational mission of the Libraries by developing and offering workshops, seminars and short courses, helping to inform the university community on best practices for data management and data management planning. He is the university’s representative as a member of the Software Carpentry Foundation and is an active instructor as well. He is a strong advocate of open source software and open access to data.

Look here for current and past LITA continuing education offerings

Questions or Comments?

contact LITA at (312) 280-4268 or Mark Beatty, mbeatty@ala.org
contact ACRL at (312) 280-2522 or Margot Conahan, mconahan@ala.org

District Dispatch: Alarming new FCC moves

planet code4lib - Mon, 2017-02-06 16:45

ALA is concerned about announcements made in last week’s “Friday media dump”

Last Friday, Federal Communications Commission Chairman Ajit Pai rescinded close to a dozen policies of the FCC, including rulemakings on expanding the program providing Internet service to low income households, rulings on several TV stations’ violations of political file rules and further restricting TV shared services and joint sales agreements. Chairman Pai also announced the end of the Commission’s probe into the controversial wireless “zero rating” data plans.

The American Library Association has been a proud partner in initiatives to support broadband opportunity and access to information, including the expansion of the Lifeline program. We also have supported many policies that improve equity and access to information the Chairman unilaterally rescinded on Friday. We believe these moves will make the digital divide wider and are troubled by the direction this Chairman appears to be heading with “Friday news dumps” that give little to no time for discussion or dissent. Please see below for a statement from ALA President Julie Todaro on Friday’s alarming moves by the FCC:

On February 3, 2017, the Federal Communications Commission (FCC) revoked all of the designations of Lifeline Broadband Providers and ordered the retraction of multiple reports, including the “E-rate Modernization Progress Report” and “Improving the Nation’s Digital Infrastructure.”

The American Library Association (ALA) is dismayed by these actions to reduce digital opportunity and revise the public record. ALA President Julie Todaro released the following statement.

“The American Library Association (ALA) strenuously objects to recent actions by the Federal Communications Commission (FCC). First, the ALA is alarmed by the sudden revocation of the nine Lifeline Broadband Provider designations. Reducing options for Lifeline broadband services is a step back in efforts to close the homework gap and digital divide, and is at odds with Chairman Pai’s stated desire to advance digital empowerment. The 2016 Lifeline modernization order represented a critical milestone in our national commitment to connect low-income Americans to the broadband that powers educational and economic opportunity. ALA and our nation’s 120,000 libraries are committed to advancing digital opportunity for all, and we urge the FCC to increase the number of broadband options available for Lifeline customers.

“The ALA also calls for the FCC to maintain an accurate and complete historical record. While new FCC leadership may have new policy directions, the public record should not be permanently altered. Governmental agencies must be accountable in this regard. We urge the reversal of the retraction decisions and an agreement that the FCC will not order the removal of any other documents from the public record. Such actions undermine the credibility of the FCC and Chairman Pai’s recent move to increase transparency of the Commission’s rulemaking.

“Full and public debate with the accompanying historical record preserved on these foundational internet issues that affect every person in this country should be the standard we expect and demand.”

The post Alarming new FCC moves appeared first on District Dispatch.

Islandora: Islandora Foundation: New Members

planet code4lib - Mon, 2017-02-06 16:12

The Islandora Foundation is funded entirely by support from our member organizations, so we are very grateful to announce that we are welcoming two new members: the University of Texas at Austin and Digital Echidna.

UT Austin has long been a major implementer of Islandora and engaged with the community. They join the Islandora Foundation as a Collaborator.

Digital Echidna is newer to the scene, but has already made a mark with the contribution of several modules to the Islandora community and sponsorship of Islandoracon. They join as Members.

These members, plus renewed commitments from our existing members, bring our Lobstometre up another few notches:

If your institution would also like to support Islandora and become more engaged with the community, please consider membership. You can also support Islandora as an Individual Member with a donation of your choosing.

OCLC Dev Network: Calling CABs: Obtaining 3,000 required and recommended readings each semester

planet code4lib - Mon, 2017-02-06 14:00

As part of the process of optimizing the alignment of our book collection with the teaching and learning needs of the colleges, the Claremont Colleges Library launched a service designed to provide students with improved access to approximately 3,000 required and recommended readings each semester known as Course Adopted Books (CABs).

Library of Congress: The Signal: FADGI’s 10th Anniversary: Adapting to Meet the Community’s Needs

planet code4lib - Mon, 2017-02-06 13:53

This is a guest post by Kate Murray, IT Specialist in the Library of Congress’s Digital Collections and Management Services.

Note the dot over the “i” in Guidelines connects with the “g” in Agencies, which reflects FADGI’s collaborative ethos of working together and that guidelines should always intersect with agency needs.

Started in 2007 as a collaborative effort by federal agencies, FADGI has many accomplishments under its belt, including the widely implemented Technical Guidelines for Digitizing Cultural Heritage Materials (newly updated in 2016); open source software, including OpenDICE and AutoSFR and BWF MetaEdit; file format comparison projects; standards work, including the MXF AS-07 Application Specification and Sample Files; projects related to scanning motion picture film; embedded metadata in Broadcast Wave, DPX and TIFF (PDF) files and many more. Check out the handy summary chart (PDF) of our accomplishments, impacts and benefits to date.

Our 10th anniversary is 2017, so it’s a good time to think about a bit of an update as we head into our second decade.

First let’s talk about our name. “FADGI” (fah – jee), we readily admit, does not exactly roll off the tongue. But we’re a well-established brand name now so “FADGI” we stay but with an update. Up until now, the FADGI acronym stood for the Federal Agencies Digitization Guidelines Initiative because we’ve mainly been focused on developing technical guidelines, methods and practices for the digitization of historical content in a sustainable manner. In recent years however, the FADGI Still Image and Audio-Visual working groups have expanded their projects to include selected aspects of born-digital content alongside content reformatted through digitization.

FADGI 2.0 is now reborn as the Federal Agencies Digital Guidelines Initiative. Same acronym that we’ve grown to love, same great people (now up to 20 federal agencies) now with a new logo, updated website and expanded scope. FADGI will still focus on determining performance measures for digitization and develop methods for validation, recommending methods for digitization and exploring sustainable digital formats for still image and audiovisual material. But we’ll add some new ingredients to the mix, including recommending methods for creating and maintaining sustainable born-digital material. One example of this revised scope is the Creating and Archiving Born Digital Video project, which includes high-level recommended practices (PDF) for file creators.

More good news on the FADGI front is that our published guidelines will now carry the CC0 1.0 Universal license to declare unambiguously that the work is available for worldwide use and reuse. Because FADGI work is the product of US federal government personnel in the scope of their employment and therefore is not subject to copyright in the United States (17 U.S.C. §105), FADGI’s work products have always been in the public domain. The inclusion of the CC0 1.0 Universal license clarifies these statements for both US and international users of the FADGI guidelines.

All United States federal agencies and institutions involved in the creation or collection of digitized or born-digital content of a cultural, historical or archival nature are welcome to participate in FADGI. Please join us as we look forward to our next chapter and our next 10 years!

Pages

Subscribe to code4lib aggregator