You are here

Feed aggregator

David Rosenthal: The Miner's Dilemma

planet code4lib - Tue, 2015-01-13 16:00
I've pointed out how economies of scale lead to concentration of mining power in block-chain based peer-to-peer (P2P) systems such as Bitcoin, which impairs the decentralized nature of such systems, their major selling point. As Ittay Eyal points out in an important post entitled The Miner's Dilemma:
The dismantling of overly large pools is one of the most important and difficult tasks facing the Bitcoin community.Pools are needed to generate consistent income but:
[Miners] can get steady income from pools well below 10%, and they have only little incentive to use very large pools; it's mostly convenience and a feeling of trust in large entities.Source: blockchain.infoAs I write, the three largest pools (F2Pool, AntPool and GHash.IO) controlled 50% of the mining power for the past 24 hours, so Eyal is right to say:
Gavin Andresen, chief scientist of the Bitcoin Foundation, has repeatedly urged miners to use smaller pools, and researchers, including ourselves, have suggested technical fixes to reduce pool size (here, and here). But alas, community pressure has only had limited success, and technical solutions are still under development and far from production.Eyal's post, and the detailed analysis in arXiv.org, are important because they show how the block withholding attack on mining pools that has been known since 2011, and has been used at least once in practice, can create a countervailing pressure that would limit the size of mining pools. Below the fold I discuss the details and the implications for my analysis.

The block withholding attack does not appear to be prevalent:
Long term block withholding attacks are difficult to hide, since miners using an attacked pool would notice the reduced revenue density. Nevertheless, such attacks are rarely reported, and we can therefore conclude that they are indeed rare.However, a pool that mounts such an attack can increase its revenue:
This attack affects the revenues of the pools in several ways. The victim pool’s effective mining rate is unchanged, but its total revenue is divided among more miners. The attacker’s mining power is reduced, since some of its miners are used for block withholding, but it earns additional revenue through its infiltration of the other pool. And finally, the total effective mining power in the system is reduced, causing the Bitcoin protocol to reduce the difficulty.
Taking all these factors into account, we observe that a pool might be able to increase its revenue by attacking other pools.Eyal shows the decision process for each pool:
Since pools can decide to start or stop attacking at any point, this can be modeled as the miner’s dilemma — an instance of the iterative prisoner’s dilemma. Attacking is the dominant strategy in each iteration, but if the pools can agree not to attack, both benefit in the long run.Apparently the pools have such an agreement:
The fact that such attacks do not persist may indicate that the active pools have reached an implicit or explicit agreement not to attack one another.But Eyal shows that this is an unstable equilibrium:
Our results imply that block withholding by pools leads to an unfavorable equilibrium. Nevertheless, due to the anonymity of miners, a single pool might be tempted to attack, leading the other pools to attack as well. The implications might be devastating for open pools: If their revenues are reduced, miners will prefer to form closed pools that cannot be attacked in this manner. Though this may be conceived as bad news for public mining pools, on the whole it may be good news to the Bitcoin system, which prefers small pools.Open public pools are those whose miners are anonymous, and therefore untrusted. Pools whose miners are trusted are closed; they behave as a single miner. The assumption here is that only open public pools can grow large enough to threaten the network, which appears to be the case at present.

The block withholding attack provides an incentive for miners to join closed pools, which are assumed to be smaller. But it isn't a very powerful incentive. It depends on the breakdown of an agreement not to use an attack which has to be sustained over a significant period, which is "difficult to hide", and which does not seem likely to have a large impact on miner's income compared, for example, to market forces. Eyal points out that:
[Miners] have only little incentive to use very large pools; it's mostly convenience and a feeling of trust in large entities.Nevertheless, these weak incentives have concentrated 50% of the mining power in only three pools. So there must be some doubt that the not-very-powerful incentive provided by the possibility of the breakdown of the agreement not to use the block withholding attack would overcome "convenience and a feeling of trust in large entities".

Note that of the three pools controlling 50% of the mining power, two (F2Pool and GHash.IO) are secretive. The third is AntPool, which as I understand it is based on p2pool, a P2P pool protocol. A goal of p2pool is to ensure that an attack by a pool using it can only be performed by modifying the code running at the pool's miners, not by code running at a pool manager. Thus an attack by a p2pool-based pool would be visible to the pool's miners, as a conventional pool's attack would not be. This mitigates the bad effects of a large public pool. But does it leave a p2pool unable to respond to a block withholding attack by attacking back? I need to study p2pool more, so apologies if I misrepresent it.

Neither the withholding attack nor p2pool mitigates the bad effects of a large closed pool, or a large single miner. 

Although I'm skeptical of the practical impact of Eyal's analysis, it led me to think that my economies of scale argument needs to be refined. Eyal shows that the incentives for open and closed pools (or single miners) are different, and I think that is true for the effects of economies of scale too.

The costs of mining, and thus the benefits of economies of scale, apply to the individual miner (and thus to closed pools). There are thus powerful incentives causing the majority of mining power to be generated by large miners. But a pool bears none of the costs of actual mining, only the costs of running the pool. A miner increases scale by investing in hardware, which costs a lot. A public pool grows by attracting miners, which costs very little. Even though Eyal points out that miners have "little incentive to use very large pools" it is clear that large miners prefer very large pools. I need to think more about the forces that are operating to drive the domination by very large pools, but clearly the advantages to very large miners must be a major factor.  Are they more sensitive to small changes in reward variance?

Nick Ruest: Preliminary stats of #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, #CharlieHebdo

planet code4lib - Tue, 2015-01-13 12:00
#JeSuisAhmed $ wc -l *json 148479 %23JeSuisAhmed-20150109103430.json 94874 %23JeSuisAhmed-20150109141746.json 5885 %23JeSuisAhmed-20150112092647.json 249238 total $ du -h 2.7G . #JeSuisCharlie $ wc -l *json 3894191 %23JeSuisCharlie-20150109094220.json 1758849 %23JeSuisCharlie-20150109141730.json 226784 %23JeSuisCharlie-20150112092710.json 15 %23JeSuisCharlie-20150112092734.json 5879839 total $ du -h 32G . #JeSuisJuif $ wc -l *json 23694 %23JeSuisJuif-20150109172957.json 50603 %23JeSuisJuif-20150109173104.json 5941 %23JeSuisJuif-20150110003450.json 42237 %23JeSuisJuif-20150112094500.json 5064 %23JeSuisJuif-20150112094648.json 127539 total $ du -h 671M . #CharlieHebdo $ wc -l *json 4444585 %23CharlieHebdo-20150109172713.json 108 %23CharlieHebdo-20150109172825.json 1164717 %23CharlieHebdo-20150109172844.json 1068074 %23CharlieHebdo-20150112094427.json 69446 %23CharlieHebdo-20150112094446.json 185263 %23CharlieHebdo-20150112155558.json 6932193 total $ du -h 39G . Total

Preliminary and non-depuped, we're looking at roughly 74.4G of data, and 13,188,809 tweets after 5.5 days of capturing the 4 hash tags.

tags: twarc#JeSuisCharlie#JeSuisAhmed#JeSuisJuif#CharlieHebdo

Hydra Project: Duoc UC, Chile, becomes a Hydra Partner

planet code4lib - Tue, 2015-01-13 09:30

[English version below]

Estamos encantados de anunciar que Duoc UC (http://www.duoc.cl), en Santiago de Chile, se ha convertido en el más reciente Hydra Socio formales, y nuestro primer socio en América Latina. Duoc ha estado trabajando con Hydra para construir la “Biblioteca Digital Patrimonial” (http://loncofilu.cl), un repositorio digital de planos arquitectónicos, fotografías, planes de restauración y documentos históricos relacionados con los edificios históricos más preciados de Chile, y que representa el trabajo producido por los estudiantes de la Escuela de Construcción Duoc UC. Para el 2015 están planeando el desarrollo de dos repositorios adicionales basados Hydra que se centrarán en la recogida de proyectos de títulos  de estudiantes y de audio y producciones visuales de la Escuela de Comunicación.

En su carta de intención, Duoc dice que se han comprometido no sólo a la construcción de más proyectos con Hydra, sino también para la construcción de una comunidad de Hydra en América Latina mediante la traducción de la documentación en talleres españoles y explicaciones por otras instituciones de América Latina interesados en la construcción de repositorios de Hydra.

Bienvenidos, Duoc UC!

We are delighted to announce that Duoc UC (http://www.duoc.cl), in Santiago, Chile, has become the latest formal Hydra partner, and our first partner institution in Latin America. Duoc has been working with Hydra to build the “Heritage Digital Library” (http://loncofilu.cl), a digital repository of architectural drawings, photographs, restoration plans and historical documents related to the most precious historic buildings in Chile, and representing work produced by the students of Duoc’s Faculty of Construction. In 2015 they are planning to develop two additional repositories based on Hydra that will focus on the collection of student thesis projects and audio and visual productions from their Faculty of Communication.

In their letter of intent, Duoc says they are committed not only to building more projects with Hydra, but also to building a Hydra community in Latin America through the translation of documentation into Spanish and offering workshops to other Latin American institutions interested in building Hydra repositories.

Welcome, Duoc UC!

Ed Summers: Bowie

planet code4lib - Tue, 2015-01-13 02:18

Bowie by Simon Critchley
My rating: 5 of 5 stars

If you are a Bowie fan, you will definitely enjoy this. If you are curious why other people are so into Bowie you will enjoy this. If you’ve never read any Critchley and are interested in something quick and accessible by him you will enjoy this. I fell into the first and third categories so I guess I’m guessing about the second. But I suspect it’s true.

I finished the book feeling like I understand the why and how of my own fascination with Bowie’s work much better. I also want to revisit some of his albums like Diamond Dogs, Heathen and Outside which I didn’t quite connect with at first. I would’ve enjoyed a continued discussion of Bowie’s use of the cutup technique, but I guess that fell out of the scope of the book.

I also want to read some more Critchley too — so if you have any recommendations please let me know. The sketches at the beginning of each chapter are wonderful. OR Books continues to impress.

William Denton: Clapping Music on Sonic Pi

planet code4lib - Tue, 2015-01-13 02:10

A while ago I bought a Raspberry Pi, a very small and cheap computer, and I never did much with it. Then a few days ago I installed Sonic Pi on it and I’ve been having a lot of fun. (You don’t need to run it on a Pi, you can run it on Linux, Mac OS X or Windows, but I’m running it on my Pi and displaying it on my Ubuntu laptop.)

My Raspberry Pi.

Sonic Pi is a friendly and easy-to-use GUI front end that puts Ruby on top of SuperCollider, “a programming language for real time audio synthesis and algorithmic composition.” SuperCollider is a bit daunting, but Sonic Pi makes it pretty easy to write programs that make music.

I’ve written before about “Clapping Music” by Steve Reich, who I count as one of my favourite composers: I enjoy his music enormously and listen to it every week. “Clapping Music” is written for two performers who begin by clapping out the same 12-beat rhythm eight times, then go out of phase: the first performer keeps clapping the same rhythm, but the second one claps a variation where the first beat is moved to the end of the 12 beats, so the second becomes first. That phasing keeps on until it wraps around on the 13 repetition and they are back in phase.

Here’s one animated version showing how the patterns shift:

And here’s another:

Here’s the code to have your Pi perform a rather mechanical version of the piece. The clapping array defines when a clap should be made. There are 13 cycles that run through the clapping array 4 times each. The first time through cycle is 0, and the two tom sounds are the same. The second time through cycle is 1, so the second tom is playing one beat ahead. Third time through cycle is 2, so the second tom is two beats ahead. It’s modulo 12 so it can wrap around: if the second tom is on the fifth cycle and ten beats in, there’s no 15th beat, so it needs to play the third beat.

use_bpm 300 load_sample :drum_tom_lo_soft load_sample :drum_tom_mid_soft clapping = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0] 13.times do |cycle| puts "Cycle: #{cycle}" 4.times do |reps| 12.times do |beat| sample :drum_tom_lo_soft, pan: -0.5 if clapping[beat] == 1 sample :drum_tom_mid_soft, attack: 0.05, pan: 0.5 if clapping[(cycle + beat) % 12] == 1 sleep 1 end end end

If you’re running Sonic Pi, just paste that in and it will work. It sounds like this (Ogg format):

It only does four repetitions of each cycle because my Pi is old and not very powerful and for some reason eight made it go wonky. It’s not perfect even now, but the mistakes are minimal. I think a more recent and more powerful Pi would be all right, as would running Sonic Pi on a laptop or desktop.

It’s lacking all the excitement of a performance by real humans (some of which could be faked with a bit of randomization and effects), but it’s very cool to be able to do this. Algorithmic music turned into code!

DuraSpace News: Get the Scoop on How Institutions Research and Select Hosted Repository Solutions

planet code4lib - Tue, 2015-01-13 00:00

Winchester, MA  Find out first-hand how institutions research and select hosted repository solutions at the January 22 Hot Topics webinar “Doing It: How Non-ARL Institutions are Managing Digital Collections”.

DuraSpace News: CALL DSpace Interest Group Proposals for OR2015

planet code4lib - Tue, 2015-01-13 00:00

From Maureen Walsh, Institutional Repository Services Librarian, The Ohio State University Libraries

Conference Theme: LOOKING BACK, MOVING FORWARD: OPEN REPOSITORIES AT THE CROSSROADS

Conference Dates: June 8-11, 2015

Conference Location: Indianapolis, Indiana

Conference Website: http://www.or2015.net/

Important dates

DuraSpace News: CALL Fedora Interest Group Proposals for OR2015

planet code4lib - Tue, 2015-01-13 00:00

From David Wilcox, Fedora Product Manager, DuraSpace; Co-chair, OR2015 Fedora Interest Group

Evergreen ILS: 2015 conference registration open

planet code4lib - Mon, 2015-01-12 20:50

Registration now is open for the 2015 Evergreen International Conference, to be held on May 13-16 in Hood River, Oregon, USA.
https://www.eventbrite.com/e/evergreen-2015-international-conference-tickets-15029293020

The conference venue is the Best Western Plus Hood River Inn. Booking details is available on the venue page of the conference website. The website will be updated as information comes available.

Stay tuned for information about submitting proposals, sponsoring the conference, and exhibiting.

Questions? Contact conference chair Buzzy Nielsen, buzzy@hoodriverlibrary.org, 541-387-7062.

Nick Ruest: Preliminary look at 3,893,553 #JeSuisCharlie tweets

planet code4lib - Mon, 2015-01-12 19:45

Background

Last Friday (January 9, 2015) I started capturing #JeSuisAhmed, #JeSuisCharlie, #JeSuisJuif, and #CharlieHebdo with Ed Summers' twarc. I have about 12 million tweets at the time of writing this, and plan on writing up something a little bit more in-depth in the coming weeks. But for now, some preliminary analysis of #JeSuisCharlie, and if you haven't seen these two posts ("A Ferguson Twitter Archive", "On Forgetting and hydration") by Ed Summers, please do check them out.

How fast were the tweets coming in? Just to try and get a sense of this, I did a quick recording of tailing the twarc log for #JeSuisCharlie capture.

Hydration

If you checked out both of Ed's post, you'll have noticed that the Twitter ToS forbid the distribution of tweets, but we can distribute the tweet ids, and based on that we can "rehydrate" the data set locally. The tweet ids for each hashtag will be/are available here. I'll update and release the tweet ids files as I can.

We're looking at just around 12 million tweets (un-deduped) at the time of writing, so the hydration process will take some time. I'd highly suggest using GNU Screen or tmux

Hydrate

  • #JeSuisCharlie: % twarc.py --hydrate %23JeSuisCharlie-ids-20150112.txt > %23JeSuisCharlie-tweets-20150112.json
  • #JeSuisAhmed: % twarc.py --hydrate %23JeSuisAhmed-ids-20150112.txt > %23JeSuisAhmed-tweets-20150112.json
  • #JeSuisJuif: % twarc.py --hydrate %23JeSuisJuif-ids-20150112.txt > %23JeSuisJuif-tweets-20150112.json
  • #CharlieHebdo: % twarc.py --hydrate %23CharlieHebdo-ids-20150112.txt > %23CharlieHebdo-tweets-20150112.json
Map

#JeSuisCharlie tweets with geo coordinates.

In this data set, we have 51,942 tweets with geo coordinates availble. This represents about 1.33% of the entire data set (3,893,553 tweets).

©

How do you make this?

  • Create the geojson % ~/git/twarc/utils/geojson.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped.geojson

  • Give the geojson a variable name.

  • Use Leaflet.js to put all the tweets with geo coordinates on a map like this.

Top URLs

Top 10 URLs tweeted from #JeSuisCharlie.

  1. (11220) http://www.newyorker.com/culture/culture-desk/cover-story-2015-01-19?mbid=social_twitter
  2. (2278) http://www.europe1.fr/direct-video
  3. (1615) https://www.youtube.com/watch?v=4KBdnOrTdMI&feature=youtu.be
  4. (1347) https://www.youtube.com/watch?v=-bjbUg9d64g&feature=youtu.be
  5. (1333) http://www.amazon.com/Charlie-Hebdo/dp/B00007LMFU/
  6. (977) http://www.clubic.com/internet/actualite-748637-opcharliehebdo-anonymous-vengeance.html
  7. (934) http://www.maryam-rajavi.com/en/index.php?option=com_content&view=article&id=1735&catid=159&Itemid=506
  8. (810) http://www.lequipe.fr/eStore/Offres/Achat/271918
  9. (771) http://srogers.cartodb.com/viz/123be814-96bb-11e4-aec1-0e9d821ea90d/embed_map
  10. (605) https://www.youtube.com/watch?v=et4fYWKjP_o

Full list of urls can be found here.

How do you get the list?

  • % cat %23JeSuisCharlie-cat-20150115-tweets-deduped.json | ~/git/twarc/utils/unshorten.py > %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json
  • % cat %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json | ~/git/twarc/utils/urls.py| sort | uniq -c | sort -n > %23JeSuisCharlie-cat-20150115-urls.txt
Twitter Clients

Top 10 Twitter clients used from #JeSuisCharlie.

  1. (1283521) Twitter for iPhone
  2. (951925) Twitter Web Client
  3. (847308) Twitter for Android
  4. (231713) Twitter for iPad
  5. (86209)TweetDeck
  6. (82616) Twitter for Windows Phone
  7. (70286) Twitter for Android Tablets
  8. (44189) Twitter for Websites
  9. (39174) Instagram
  10. (21424) Mobile Web (M5)

Full list of clients can be found here.

How do you get this?

  • % ~/git/twarc/utils/source.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped-source.html
Word cloud

Word cloud from #JeSuisCharlie tweets.

I couldn't get the word cloud to embed nice, so you'll have to check it out here.

How do you create the word cloud?

  • % git/twarc/utils/wordcloud.py %23JeSuisCharlie-cat-20150115-tweets.json > %23JeSuisCharlie-wordcloud.html
tags: twarc#JeSuisCharlie#JeSuisAhmed#JeSuisJuif#CharlieHebdo

FOSS4Lib Recent Releases: Avalon Media System - 3.2

planet code4lib - Mon, 2015-01-12 16:40
Package: Avalon Media SystemRelease Date: Friday, December 19, 2014

Last updated January 12, 2015. Created by Peter Murray on January 12, 2015.
Log in to edit this page.

Indiana University and Northwestern University are delighted to announce Avalon Media System 3.2, completed and released on December 19, 2014. As part of a series of minor 3.x releases, Avalon 3.2 provides support for important content management efficiencies and other improvements.

Release 3.2 adds the following capabilities:

District Dispatch: Afterschool funding available through states

planet code4lib - Mon, 2015-01-12 16:09

Photo by the San Jose Library

As discussed in previous District Dispatch entries, Congress passed in late December its massive $1.01 trillion CROmnibus bill providing FY15 funding for much of the Federal government. With the return of the new Congress on January 6, the discussion on the FY16 budget begins anew and ALA will be fighting for library funding.

For FY15 programs of interest to the library community, the CROmnibus package provided level funding for most programs while a small number of programs received slight increases or decreases. It is safe to say that the appropriations package presents no major new library or educational initiatives.

One example of a library program receiving a slight increase is the 21st Century Community Learning Centers, which received an increase of $2.3 million (0.2% of its budget). As with many Federal education programs, funding for 21STCCLC is awarded directly to state educational agencies that control how the grants are apportioned. Libraries have opportunities to apply for many of the grants.

The way this program works is that funds are sent to states who then make competitive grants to “local educational agencies (LEAs), community-based organizations, faith-based organizations, or other public or private entities that can demonstrate experience, or the promise of success, in providing educational and related activities. In making awards, States give priority to applications that: (1) propose to target services to students who attend schools identified as in need of improvement under Title I; and (2) are submitted jointly by at least one LEA that receives funds under Part A of Title I and at least one community-based organization or other public or private entity. States must make awards of at least $50,000 per year for a period of 3 to 5 years.”

Background on 21STCCLC can be viewed here. The Department of Education Guidance answers everything libraries need to know about the program with a helpful table of contents.

A good resource for libraries to be aware of for participating in these grant programs is the Afterschool Alliance, which provides good information and knowledge of all things related to 21STCCLC. The Afterschool Alliance is the main national organization advocating for after school programs.

State Education Agency offices are also a good resource of grants, since they are awarded from the state level. A list of contacts for 21STCCLC in each state is available here and State Educational Agencies here.

The post Afterschool funding available through states appeared first on District Dispatch.

Open Knowledge Foundation: Open Data Handbook 2015 comeback – and you want to be a part of it!

planet code4lib - Mon, 2015-01-12 14:12

There is famous saying that says that outside of a dog, a book is a man’s best friend. We at Open Knowledge tend to agree. This is why we decided to take one of Open Knowledge key resources, the Open Data Handbook, and give it a bit of a face lift in this upcoming year.

The open data handbook has been an important resource for the open knowledge community for years. The handbook introduces and discusses legal, social and technical aspects of open data. It has been used by a wide range of stakeholders from open data beginners to data wizards, from government officials to journalists and civil society activists. It examines the following questions which are relevant to all: what is “open”, why to open up data, and the how to ‘open’ data?

Since it was first written, the handbook is read by thousands of users each month and has been translated into 18 languages (making the most widely translated Open Data resource out there) . However, open data is both a fast moving and a relatively field. As such, it is not surprising that open data initiatives have been launched and open data policies approved, we, as a community, have learned a lot about the opportunities and the pitfalls of open data. The last version of the book is from 2011 and at the time, government open data portals were few and far between and the open government partnership had only just launched. The book represents what we new/thought then but as the open data movement expanded both in terms of numbers and in geographical spread, we have decided that it is high time that we incorporate our learnings into a new version. This version of the Open Data handbook will focus mainly on one main type of open data: open government data, but a number of the sections can be applied to other types of open data. This project is supported by Partnership for Open Data – a collaboration between Open Knowledge, Open Data Institute and the World Bank.

So much of this knowledge, these stories and the brilliant ideas about what works and what doesn’t work is in this community. Therefore, we believe that the process of creating the updated version of handbook should be, as its always been, a community project. This process can not only strengthen the community through a joint project, but also to help us to learn from peers, listen to members who usually do not participate in daily channels and to create a handbook, rich in content, experience and a wide spectrum of knowledge.

There are a number of ways you can get involved! You can submit your stories or comment on the “alpha” version we are planning to launch in February. The handbook will be part of a larger community owned resource platform and we have

How can you help?

  • Contribute a short open data story – We are looking to different stories about open government data stories in various fields. It can be a success story or even a failure that you think we should all learn about. If you want to contribute a story please fill this form and we will get back in touch with you.

  • Revise the first draft of the book – The current chapters in the open data handbook are being review by Open Knowledge staff – we are updating and producing new . Our goal is to release an ‘alpha’ version of the book the week before open data day, so it can be revised, commented on and added to by the community.

  • Propose a resource – We are putting together a list of open data resources – If you know of other resources about open data, in any language, please give us a shout. At the end of each section, we will have a “further reading” section and we’d love to share as many resources as possible.

  • Send us a short video about open data – In the internet world, a handbook doesn’t have to be text only. Send us a video of you / your organization and answer the following questions:

    Tell us an example of open data having an social and/or economic impact in your city/country/region What is your main obstacle dealing with Open Data?
    How do you / your community engage with open data?
    What do you think is the next big thing for Open Data in 2015?

The videos will be embedded in the handbook and on our YouTube channel!

Who can write to the book? Everyone! While we are editing the book are editing the book, we want your input. Unfortunately, we can’t promise that every story / idea will ultimately be part of the book. If you think that we are missing something, please let us know! We will try to include as much as possible!

If you have any comments or suggestions, please email us at handbook [at] okfn [dot] org

2015 is going to be great for open data, let’s write about it together.

Hydra Project: Announcing Avalon 3.2

planet code4lib - Mon, 2015-01-12 13:33

Indiana University and Northwestern University are delighted to announce Avalon Media System 3.2, completed and released on December 19, 2014. As part of a series of minor 3.x releases, Avalon 3.2 provides support for important content management efficiencies and other improvements.

Release 3.2 adds the following capabilities:

  • Bulk item management actions, including publish, un-publish, change collection, delete, and assign access
  • Avalon dropbox subdirectories are accessible to collection managers using the web interface
  • Upgrade to Hydra 7 framework
  • Numerous interface improvements and bug fixes

For more details on each of these new features, visit the What’s New in Avalon 3.2 wiki page: https://wiki.dlib.indiana.edu/display/VarVideo/What%27s+New+in+Avalon+3.2

LibUX: “Social” the Right Way is a Timesuck

planet code4lib - Mon, 2015-01-12 12:16

Pew Research Center’s 2014 Social Media Update, published Friday, validates pretty much any argument libraries have to make for actively reaching out through social media. Your audience is there.

58% of ALL [U.S.] adults are on Facebook. 31% of ALL seniors are on Facebook. #libweb #libux http://t.co/laYJyW1ffg

— Library UX Data (@libuxdata) January 9, 2015

This is a numbers game. Whether libraries should be there is sort of head-scratchingly moot, but brand decisions about which social bandwagon to jump1 should be made only when libraries are prepared to commit real resources to their upkeep. I say “resources,” I mostly mean time – but marketing dollars are not misspent on Facebook ads.

Crafting good content is not an insubstantial timesuck. Knowing your audience, time spent analyzing metrics, helps mitigate people’s capacity to detect bullshit. And this is important. Poor content not only reflects poorly on your library, but for channels like Facebook that highlight popular or relevant content, posts that bomb negatively impact the overall visibility of your brand.

A basic level of engagement requires just the right amount of content, too. Part of this just has to do with currency, right? Old content tends to roll off. It’s too many thumb swipes down. Wisemetrics finds that, on average, the half-life of a Facebook post is about 90 minutes. Hell, a tweet is lost to the void in just 18 minutes. The point is that you have to post regularly to stay on people’s radar – and, for Facebook especially, if you’re off-radar long enough the algorithm [allegedly] guarantees subsequent posts will reach fewer people.

I think, here, it is also important to mention that users expect brands to actively monitor their channels. By having an account, you wade into the pool. It ain’t a billboard over the highway. You and your audience are on the same level. You’re sharing fluids. You’re friends. If they contact you and you don’t answer, that’s not just the passive neglect of strangers on the sidewalk: it’s a dis; it hits home. On Twitter, specifically, 42% expect a response within the hour. Outright ignoring someone is like a punch in the gut.

How Much to Post

We have to be a little honest about the realities of library social media. We’re on board, sure, but we most likely haven’t the benefit of a marketing team. The social accounts are managed on the side and probably aren’t part of anyone’s actual job description. Roger. So, where do we get the most bang for our buck?

Post to Twitter at least 5 times a day. If you can swing up to 20 posts, you might be even better off. Post to Facebook five to 10 times per week. Fast Company

Several studies show that posting too little risks losing “connection with your audience,” that brands should shoot for ten (10) posts per week. Posting more often is annoying.

For Twitter, it depends how you measure success. If you’re looking at retweets, replies, and clicks per tweet, “if you want to wring the most value out of every tweet you send, tweet about five times each day.” If you measure overall response per day, well, 30 tweets a day ought to do it.

This Fast Company article shares optimal post frequency for other platforms, if you’re interested.

2 Hours Per Channel

The timesuck to do social right is determined by the time required to

  • inspect your account’s metrics to understand who your followers are, when they’re on, and what they like
  • analyze your history of content so you know what works, what doesn’t
  • craft enough good content specific to your audience and the medium2
  • schedule that content for optimum reach
  • monitor and respond

Really, this is unique to you. For me, according to my Harvest account, last week I spent two hours scheduling just six tweets (for the entire week! I suck) and ten Facebook posts. This is a little short of previous weeks where I posted more and spent about 4 hours. I include time spent looking at analytics, corresponding about what needs to be posted, and optimizing content I’m sharing in our WordPress network (we use this plugin, which lets you tweak OpenGraph metadata and Twitter Cards).

So, my gut-checked suggestion is that it’s reasonable to expect to spend at least two hours per channel – minimum. Real content marketers for brands-with-budgets certainly devote a lot more, but I think it’s important to recognize the timesuck for what it is and reconcile decisions to go whole hog on a new channel with the human resources required to maintain it.

  1. If your library is on any another social platform except Facebook – wtf are you doing?
  2. People can tell what is auto-tweeted and cross-posted.

The post “Social” the Right Way is a Timesuck appeared first on LibUX.

District Dispatch: Speakers to explore library funding options at 2015 ALA Midwinter Meeting

planet code4lib - Mon, 2015-01-12 06:14

Thinking about new funding sources for your library? Join leaders from the Institute of Museum and Library Services (IMLS) when they discuss federal library funding resources at the 2015 American Library Association (ALA) Midwinter Meeting in Chicago. The session, titled “All Eyes on IMLS: Funding Priorities and Reauthorization,” takes place from 10:30 to 11:30 a.m. on Saturday, January 31, 2015, in the McCormick Convention Center, room W183A.

MLK Digital Commons in Washington, D.C. Photo by Phil Freelon

During the sessions, speakers will shed new light on the Library Services and Technology Act (LSTA), the primary source of annual funding for libraries in the federal budget. Library staff are encouraged to attend the conference session to learn more about Institute of Museum and Library Services’ priorities for the next two years, which will shape the agency’s discretionary and Library Services and Technology Act Grants to States programs. Additionally, participants will learn more about how they can support the Museum and Library Services Act while the law undergoes a reauthorization period in 2016.

Speakers include Maura Marx, acting director of the Institute of Museum and Library Services
and Robin Dale, associate deputy director for state programs for the Institute of Museum and
Library Services.

View other ALA Washington Office Midwinter Meeting conference sessions

The post Speakers to explore library funding options at 2015 ALA Midwinter Meeting appeared first on District Dispatch.

Alf Eaton, Alf: Searching for mergeable tables

planet code4lib - Mon, 2015-01-12 01:10

Among CartoDB’s many useful features is the ability to merge tables together, via an interface which lets you choose which column from each to use as the shared key, and which columns to import to the final merged table.

Google's Fusion Tables similarly encourages merging of tabular data. Fusion Tables creates a virtual merged table, allowing updates to the source tables to be replicated to the final merged table as they occur.

CartoDB can also merge tables using location columns, counting items from one table (with latitude and longitude, or addresses) that are positioned within the areas defined in another table (with polygons).

I've found that UK parliamentary constituencies are useful for visualising data, as they have a similar population number in each constituency and they have at least two identifiers in published ontologies which can be used to merge data from other sources*. The UK parliamentary constituency shapefiles published by the Ordnance Survey as part of the Boundary-Line dataset contain polygons, names and two identifiers for each area: one is the Ordnance Survey’s own “unit id” and one is the Office for National Statistics’ “GSS code”.

Once the parliamentary constituency shapefile has been imported to a base table, any CSV table that contains either of those identifiers can easily be merged with the base table to create a new, merged table and associated visualisation.

So, the task is to find other data sets that contain either the OS “unit id” or the ONS “GSS code”.

The URLs for the data types of these codes are defined in the Ordnance Survey’s “administrative geography and civil voting area” ontology:

The values themselves can also be expressed as URLs:

GSSE14000929 GSS URLhttp://statistics.data.gov.uk/doc/statistical-geography/E14000929 Unit ID24896 Unit ID URLhttp://data.ordnancesurvey.co.uk/id/7000000000024896

However, unlike the Linked Data/SPARQL interfaces, most CSV or Excel files that are currently published (such as those produced by the Office for National Statistics as a result of census analysis) don’t define the data type of each column using URLs. Although there’s usually a property name in the first row, there’s rarely a datapackage.json file defining a basic data type (number, string, date, etc), and practically never a JSON-LD context file to map those names to URLs.

Given an index of CSV files, like those in CKAN-based stores such as data.gov.uk, how can we identify those which contain either unit IDs or GSS codes?

As Thomas Levine's commasearch project demonstrated at csvconf last year, if you have a list of all (or even just some) of the known members of a collection of typed entities (e.g. a list of all the countries in the world), it’s easy enough to find other datasets that contain them: as long as at least a certain proportion of the distinct values of a column match those in the known collection, the data type can be guessed, and can be assigned a URL.

TODO: compile lists of values for known data types, particularly from Wikidata. For example: country names (a list of names that changes slowly), members of parliament (a list of names that changes regularly), years (a range of numbers that grows gradually), gene identifiers (a list of strings that grows over time), postcodes (a list of known values, or values matching a regular expression).

Related tools Footnotes

* On the downside, parliamentary constituencies can be changed (currently every 5 years), as population density shifts around the country and the current government decides to increase or decrease the number of constituencies. This makes it difficult to use the constituencies for long-term comparisons.

Mark E. Phillips: Digital Preservation System Interfaces: UNT Libraries Coda Repository

planet code4lib - Mon, 2015-01-12 01:04

I mentioned to a colleague that I would be happy to do a short writeup of some of the interfaces that we have for our digital preservation system.  This post is trying to move forward that conversation a bit.

System 1, System 2

At UNT we manage our digital objects in a consistent and unified way.  What this means in practice is that there is one way to do everything,  items are digitized, collected, or created, staged for ingest into the repository and everything moves into the system in the same way.  We have two software stacks that we use for managing our digital items,  Aubrey and Coda.

Aubrey is our front-end interface which provides end user access to resources,  search, browsing, and display.  For managers it provides a framework for defining collections, partners, and most importantly it has a framework for creating and managing metadata for the digital objects.  Most of the interaction (99.9%) of the daily interaction with the UNT Libraries Digital Collections is through Aubrey with one of its front-end user interfaces,  The Portal to Texas History, the UNT Digital Library, or The Gateway to Oklahoma History.

Aubrey manages the presentation versions of a digital object,  locally we refer to this package of files as an Access Content Package, or ACP.  The other system in this pair is a system we call Coda.  Coda is responsible for managing the Archival Information Packages (AIP) in our infrastructure.  Coda was designed to manage a collection of BagIt Bags,  help with the replication of these bags and allow curators and managers to access the master digital objects if needed.

What does it look like though?

The conversation I had with a colleague was around user interfaces to the preservation archive, how much or how little we are providing and our general thinking about that system’s user interfaces.  Typically these interfaces are “back-end” and usually are never seen by a larger audience because of layers of authentication and restriction.  I wanted to take a few screenshots and talk about some of the interactions that users have with these systems.

 Main Views

The primary views for the system include a dashboard view which gives you an overview of the happenings within the Coda Repository.

UNT Libraries’ Coda Dashboard

From this page you can navigate to lists for the various sub-areas within the repository.  If you want to view a list of all of the Bags in the system you are able to get there by clicking on the Bags tile.

Bag List View – UNT Libraries’ Coda Repository

The storage nodes that are currently registered with the system are available via the Nodes button.  This view is especially helpful in gauging the available storage resources and deciding which storage node to write new objects to.  Typically we use one storage node until it is completely filled and then move onto another storage node.

Nodes List View – UNT Libraries’ Coda Repository

For events in the coda system including ingest, replication, migration, and fixity check we create and store a PREMIS Event.  These are aggregated using the PREMIS Event Service

PREMIS Event List View – UNT Libraries’ Coda Repository

The primary Coda instance is considered the Coda instance of record and additional Coda instances will poll the primary for new items to replicate.  They do this using ResourceSync to broadcast available resources and their constituent files.  Because the primary Coda system does not have queued items this list is empty.

Replication Queue List View – UNT Libraries’ Coda Repository

To manage information about what piece of software is responsible for an event on an object we have a simple interface to list PREMIS Agents that are known to the system.

PREMIS Agents List View – UNT Libraries’ Coda Repository

Secondary Views

With the primary views out of the way the next level that we have screens for are the detail views.  There are detail views for most of the previous screens once you’ve clicked on a link.

Below is the detail view of a Bag in the Coda system.  You will see the parsed bag-info.txt fields as well as PREMIS Events that are associated with this resource.  You have the buttons at the top which will get you to a list of URLS that when downloaded will re-constitute a given Bag of content and the ATOM Feed for the object.

Bag Detail View – UNT Libraries’ Coda Repository

Here is a URLS list,  if you download all of these files and keep the hierarchy of the folders you can validate the Bag and have a validated version of the item plus additional metadata.  This is effectively the Dissemination Information Package for the system.

Coda URLs List – UNT Libraries’ Coda Repository

An Atom Feed is created for each document as well which can be used by the AtomPub interface for the system.  Or just to look at and bask in the glory of angle brackets.

Atom Feed for Bag – UNT Libraries’ Coda Repository

Below is the detail view of a PREMIS Event in the repository.  You can view the Atom Feed for this document or navigate to the Bag in the system that is associated with this event.

PREMIS Event Detail View – UNT Libraries’ Coda Repository

The detail of a storage node in the system.  These nodes are updated to reflect the current storage statistics for the storage nodes in the system.

Node Detail View – UNT Libraries’ Coda Repository

The detail view of a PREMIS Agent is not too exciting but is included for completeness.

Agent Detail View – UNT Libraries’ Coda Repository

Interacting with Coda

When there is a request for the master/archival/preservation files for a given resource we find the local identifier for the resource,  put that into the Coda repository and do a quick search

Dashboard with Search – UNT Libraries’ Coda Repository

You will end up with search results for one or more Bags in the repository.  If there is more than one for that identifier select the one you want (based on the date, size, or number of files) and go grab the files.

Search Result – UNT Libraries’ Coda Repository

Statistics.

The following screens show some of the statistics views for the system.  They include the Bags added per month and over time,  number of files added per month and over time, and finally the number of bytes added per month and over time.

Stats: Monthly Bags Added – UNT Libraries’ Coda Repository

Stats: Running Bags Added Total – UNT Libraries’ Coda Repository

Stats: Monthly Files Added – UNT Libraries’ Coda Repository

Stats: Running Total of Files Added – UNT Libraries’ Coda Repository

Stats: Monthly Size Added – UNT Libraries’ Coda Repository

Stats: Running Total Sizes – UNT Libraries’ Coda Repository

What’s missing.

There are a few things missing from this system that one might notice.  First of all is the process of authentication to the system.  At this time the system is restricted to a small list of IPs in the library that have access to the system.  We are toying around with how we want to handle this access as we begin to have more and more users of the system and direct IP based authentication becomes a bit unwieldy.

Secondly there is a full set of Atom Pub interfaces for each of the Bag, Node, PREMIS Event, PREMIS Agent, and Queue sections.  This is how new items are added to the system.  But that it a little bit out of scope for this post.

If you have any specific questions for me let me know on twitter.

DuraSpace News: CALL for Proposals for Open Apereo 2015

planet code4lib - Mon, 2015-01-12 00:00
From Ian Dolphin, Executive Director, Apereo Foundation, Laura McCord, Open Apereo 2015 Planning Committee Chair, Reba-Anna Lee, Open Apereo 2015 Program Committee Co-chair  

Access Conference: Details on AccessYYZ

planet code4lib - Sun, 2015-01-11 20:00

Access is headed to Toronto on September 8th-11th, 2015, so mark those calendars! We know that it’s a bit earlier than usual this year, but we hope that giving advance notice will allow attendees to plan accordingly.

Hackfest will be happening on September 8th at Ryerson University’s Heaslip House, while the remainder of the conference (September 9th-11th) will unfold at the beautiful Bram & Bluma Appel Salon on the second floor of the Reference Library (789 Yonge St.) in downtown Toronto.

Keep your eyes on the website in the coming weeks–we’ll announce more details as we have them!

Pages

Subscribe to code4lib aggregator