Last year I blogged about Andrew Odlyzko's perceptive analysis of the business of scholarly publishing. Now he's back with an invaluable, must-read analysis of the economics of the communication industry entitled Will smart pricing finally take off?. Below the fold, a taste of the paper and a validation of one of his earlier predictions from the Google Scholar team.

Among his observations are:
  • "by some measures the US spends almost 50% more in telecom services than it does for electricity."
  • Content is not king; "net of what they pay to content providers, US cable networks appear to be getting more revenue out of Internet access and voice services than out of carrying subscription video, and all on a far smaller slice of their transport capacity".
  • True streaming video, with its tight timing constraints, is not a significant part of the traffic. Video is a large part, "but it is almost exclusively transmitted as faster-than-real-time progressive downloads". Doing so allows for buffering to lift the timing constraints.
  • "The main function of data networks is to cater to human impatience. Thus "Overprovisioning is not a bug but a feature, as it is indispensable to provide low transmission latency". "Once you have overengineered your network, it becomes clearer that pricing by volume is not particularly appropriate, as it is the size and availability of the connection that creates most of the value."
  • "it seems safe to estimate worldwide telecom revenues for 2011 as being close to $2 trillion. About half the revenue ... comes from wireless."
  • "with practically all [wireline] costs coming from ... installing the wire to the end user, the marginal costs of carrying extra traffic are negligible. Hence charging according to the volume of traffic cannot easily be justified on the basis of costs.
  • "a modern telecom infrastructure for the US, with fiber to almost every premise, would not cost more than $450 billion, well under one year's annual revenue. But there is no sign of willingness to spend that kind of money ... Hence we can indeed conclude that modern telecom is less about high capital investment and far more a game of territorial control, strategic alliances, services and marketing, than of building a fixed infrastucture."
  • "Yet another puzzle is the claim that building out fiber networks to the home is impossibly expensive. Yet at the cost of $1,500 per household (in excess of the $1,200 estimate ... for the Google project in Kansas City, were it to reach every household), and at a cost of capital of 8% ..., this would cost only $10 per house per month. The problem is that managers and their shareholders expect much higher rates of return than 8% per year. One of the paradoxes is that the same observers who claim that pension funds cannot hope to earn 8% annually are also predicting continuation of much higher corporate profit rates."
Back in 2002, Odlyzko analyzed the usage of online content through time after its publication. Initially, the decay was rapid but after a while usage settled to a low constant level or increased. On this basis he predicted that there would be much wider citation of older articles.
Of the articles that were most frequently downloaded [from First Monday] in 1999, 6 of the top 10 were published in previous years! This supports the thesis that easy online access leads to much wider usage of older materials. [Section 9]After an initial period, frequency of access does not vary with age of article, and stays pretty constant with time (after discounting for general growth in usage). [Section 10] Now The Google Scholar team have followed their Rise of the Rest paper, which I blogged about here, with a validation of Odlyzko's prediction. Their new paper On the Shoulders of Giants: The Growing Impact of Older Articles takes another look at the effect that the dramatic changes as scholarly communications migrated to the Web have had on the behavior of authors. The two major changes have been:
  • The greater accessibility of the literature, caused by digitization of back content, born-digital journals and pre-print archives, and relevance ranking by search engines.
  • The great increase in the volume of publication, caused by the greatly reduced cost of on-line publication and the reduction of competition for space.
The paper shows that in most fields, the proportion of citations to articles more than 10 years old has increased significantly (28% to 36% overall) from 1990 to 2013. The same holds true for 15 and 20-year old articles. The rate of increase is accelerating. There are some outliers, Chemical and Materials Science and Engineering excluding Computer Science both show little change. Computer Science, on the other hand, shows a significant increase, but this is bi-modal, 5/18 of the CS subject categories show less than 30% increase whereas 11/18 show 50% or more.

Islandora: Meet Your Developer: Daniel Lamb

planet code4lib - Mon, 2014-11-17 14:30

It's been a while since we last Met a Developer, but we're getting back into it with recent Islandora Camp CO instructor and discoverygarden, Inc Team Lead Daniel Lamb. Most of Danny's contributions to Islandora's code have come to us by way of dgi's commitment to open source, but he did recently take on the Herculean task of coming up with the perfect one-line documentation to sum up the behvaior of a tetchy delete button. Here's Danny in his own words:

Please tell us a little about yourself. What do you do when you’re not at work?

When I'm not at work, I'm spending time with my wonderful family.  I have a beautiful wife and an amazing two year old son, and they're what keeps me going when times are tough.  I love cooking, and am very passionate about what I eat and how I prepare it.  I also reguarly exercise, and really enjoy lifting weights.  I've got a great life going and I want to keep it for as long as possible!   Academically, my background is in Mathematics and Physics, not Computer Science.  But close enough, right?  I've held jobs processing data for astronomers, crunching numbers as an actuary, and even making crappy facebook games before landing at discoverygarden.   How long have you been working with Islandora? How did you get started? I've been working with Islandora for about two years.  I started because of my job with discoverygarden, which was kind enough to take me in after being abused by the video game industry.  The first thing I developed for Islandora was the testing code, which is how I got to learn the stack.   Sum up your area of expertise in three words: Asynchronous distributed processing   What are you working on right now? I've got my finger in a lot of pies right now.  I'm managing my first project for discoverygarden, as well as finishing up the code for one of the longest running projects in the company's history.  It's for an enterprise client, and I've had to make a lot of innovations that I hope can eventually find their way back into the core software.  I'm also working on a statistical model to help management with scoping and allocation.  On top of all that, I'm researching frameworks and technologies for integrating with Fedora 4, which I hope to play a role in when the time finally comes.   What contribution to Islandora are you most proud of? Most of the awesome stuff I've done has been for our enterprise client, so I can't talk about it.  Well, I could, but then I'd have to kill you :P  I guess as far as impact on the software in general, I'm most proud of the lowly IslandoraWebTestCase, which is working in every module out there to help keep our development head as stable as possible.   What new feature or improvement would you most like to see? Asynchronous distributed processing :D  When we make the move to Fedora 4 and Drupal 8, this concept should be at the core of the software.  It’s what will allow us to split the stack apart on multiple machines to keep things running smoothly when we have to scale up and out.   What’s the one tool/software/resource you cannot live without? ZOMG I could never live without Vim!  It's the greatest text editor ever!  Put me in Eclipse or Netbeans and I'll litter :w's all over the place and hit escape a bunch of times unnecessarily.  Vim commands have been burned into my lizard brain.   If you could leave the community with one message from reading this interview, what would it be? You CAN contribute.  I know the learning curve is steep, but you don't need a background in Computer Science to contribute.  Pick up something small, and work with it until you feel comfortable.  And if you're afraid to try your hand as a developer, there's always something to do *cough documentation cough*.

FOSS4Lib Recent Releases: VuFind - 2.3.1

VuFind - 2.3.1
Release Date: Monday, November 17, 2014

Bug fix release.

D-Lib: New Opportunities, Methods and Tools for Mining Scientific Publications

planet code4lib - Mon, 2014-11-17 12:43
Guest Editorial by Petr Knoth, Drahomira Herrmannova, Lucas Anastasiou and Zdenek Zdrahal, Knowledge Media Institute, The Open University, UK; Kris Jack, Mendeley, Ltd., UK; Nuno Freire, The European Library, The Netherlands and Stelios Piperdis, Athena Research Center, Greece

D-Lib: Progress

planet code4lib - Mon, 2014-11-17 12:43
Editorial by Laurence Lannom, CNRI

D-Lib: A Keyquery-Based Classification System for CORE

planet code4lib - Mon, 2014-11-17 12:43
Article by Michael Voelske, Tim Gollub, Matthias Hagen and Benno Stein, Bauhaus-Universitat Weimar, Weimar, Germany

D-Lib: AMI-diagram: Mining Facts from Images

planet code4lib - Mon, 2014-11-17 12:43
Article by Peter Murray-Rust, University of Cambridge, UK, Richard Smith-Unna, University of Cambridge, UK, and Ross Mounce, University of Bath, UK

D-Lib: Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers

planet code4lib - Mon, 2014-11-17 12:43
Article by Giovanni Yoko Kristianto, The University of Tokyo, Tolyo, Japan; Goran Topic, National Institute of Informatics, Tokyo, Japan; and Akiko Aizawa, The University of Tokyo and National Institute of Informatics, Tokyo, Japan

D-Lib: Towards Semantometrics: A New Semantic Similarity Based Measure for Assessing a Research Publication's Contribution

planet code4lib - Mon, 2014-11-17 12:43
Article by Petr Knoth and Drahomira Herrmannova, KMi, The Open University

D-Lib: Experiments on Rating Conferences with CORE and DBLP

planet code4lib - Mon, 2014-11-17 12:43
Article by Irvan Jahja, Suhendry Effendy and Roland H. C. Yap, National University of Singapore

D-Lib: The Social, Political and Legal Aspects of Text and Data Mining (TDM)

planet code4lib - Mon, 2014-11-17 12:43
Article by Michelle Brook, Content Mine; Peter Murray-Rust, University of Cambridge; Charles Oppenheim, City, Northampton and Robert Gordon Universities

D-Lib: GROTOAP2 - The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles

planet code4lib - Mon, 2014-11-17 12:43
Article by Dominika Tkaczyk, Pawel Szostek and Lukasz Bolikowski, Centre for Open Science, Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Poland

D-Lib: The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers

planet code4lib - Mon, 2014-11-17 12:43
Article by Richard Smith-Unna and Peter Murray-Rust, University of Cambridge, UK

D-Lib: Report on the Research Data Alliance (RDA) 4th Plenary Meeting

planet code4lib - Mon, 2014-11-17 12:43
Conference Report by Yolanda Meleco, Research Data Alliance/U.S.

D-Lib: Towards a Marketplace for the Scientific Community: Accessing Knowledge from the Computer Science Domain

planet code4lib - Mon, 2014-11-17 12:43
Article by Mark Kroell, Stefan Klampfl and Roman Kern, Know-Center GmbH, Graz, Austria

D-Lib: A Comparison of Two Unsupervised Table Recognition Methods from Digital Scientific Articles

planet code4lib - Mon, 2014-11-17 12:43
Article by Stefan Klampfl, Know-Center GmbH, Graz, Austria; Kris Jack, Mendeley Ltd., London, UK and Roman Kern, Knowledge Technologies Institute, Graz University of Technology, Graz, Austria

D-Lib: Annota: Towards Enriching Scientific Publications with Semantics and User Annotations

planet code4lib - Mon, 2014-11-17 12:43
Article by Michal Holub, Robert Moro, Jakub Sevcech, Martin Liptak and Maria Bielikova, Slovak University of Technology in Bratislava, Slovakia

D-Lib: Discovering and Visualizing Interdisciplinary Content Classes in Scientific Publications

planet code4lib - Mon, 2014-11-17 12:43
Article by Theodoros Giannakopoulos, Ioannis Foufoulas, Eleftherios Stamatogiannakis, Harry Dimitropoulos, Natalia Manola and Yannis Ioannidis, University of Athens, Greece

D-Lib: Efficient Blocking Method for a Large Scale Citation Matching

planet code4lib - Mon, 2014-11-17 12:43
Article by Mateusz Fedoryszak and Lukasz Bolikowski, University of Warsaw, Poland


