You are here

Feed aggregator

Jonathan Brinley: Data Persistence with Docker Volumes

planet code4lib - Sun, 2016-09-11 17:57

Docker containers are designed to be ephemeral. You can destroy one and spin up an exact replica in seconds. Everything that defines the container can be found in the Dockerfile that declares how to build it.

This model does not, however, explain what to do with persistent data. Things like databases or uploaded media. In a production environment, I would recommend delegating these tasks to an external service, like Amazon’s RDS or S3.

For local development, though, you can use volumes for storing persistent data. Volumes come in two main flavors: data volumes and host directory mounts.

The latter is perhaps the most straightforward. You connect a directory in your container to a directory on your host machine, so they are essentially sharing the file system. Indeed, when you’re actively working on code, this is the simplest way to share your local code with your running containers. Mount the root directory of your project as a volume in your container, and anytime you update code, your container will also have the updates.

# docker run --rm -it -v="/your/local/dir:/srv/www/public" nginx:stable-alpine /bin/sh

This runs a container that has its /srv/www/public directory shared with your host system’s /your/local/dir directory. Updates you make to files either on your local system or in the container are automatically shared with the other.

Data volumes do not map directly to your host filesystem. Docker stores the data somewhere, and you generally don’t need to know where that is. When using Docker for Mac, one of the key differences is that a host mount shares files using osxfs (which currently has some performance issues) while a data volume stores its data inside the Docker virtual machine (which is subsequently much more performant for I/O). While I use host mounts for things like uploaded media, I prefer to use a data volume for storing databases.

# docker volume create --name=mysqldata

Once the volume is created, you can mount it into one or more containers.

# docker run --rm -v="mysqldata:/var/lib/mysql" mysql:5.5

The contents of our volume “mysqldata” will be available to MySQL in the /var/lib/mysql directory. The data volume itself doesn’t have a directory name (in contrast to the prior best practice of using a directory within a data-only containers). I think of a volume as a single directory that can be mounted wherever I want in a container.

Terry Reese: MarcEdit PreUpdate Notes

planet code4lib - Sun, 2016-09-11 04:33

Three significant changes will be coming as part of MarcEdit’s Sunday update.  These impact the MARCEngine, Regular Expression Processing, and the Linked Data Platform.

MARCEngine Changes:

In May 2016, I had to make some changes to the MARCEngine to remove some of the convenience functions that allowed mnemonic processing to occur on UTF8 data or HTML entitles on MARC8 data.  Neither of these should happen, and there were issues that came up related to titles that actually included HTML entitles in the titles.  So, this kind of processing had to be removed.

Over the past few months, I’ve been re-evaluating how these functions use to work, and have been bringing them back.  Sunday will mark the reintroduction of many of these functions (though, not the HTML entity translation when it’s not specifically appropriate).  The upside, is that coupled with the new encoding work, the tool will be able to support more mixed encoding use-cases.

Regular Expression Processing

The most significant changes to the Regular Expression engine processing is how the multi-line processing works in the Replace Function.  To protect users, I’d set up the match any character “.” match all characters but a new line character.  This was done to keep users from accidently deleting data.  However, this meant that when using the multi-line processing option, it really only worked when fields were side by side.  By removing this limitation, users working with the multi-line option will now have full access to the entire record with the Regular Expression processing.  And with that, a word of warning…be careful.  The Multi-line processing is the easiest way to accidently delete bibliographic data through greedy matches.

Additionally, I’ve added an option to the Replace Function dialog that makes it easier to know that MarcEdit has this option.  Right now, users have to know that you need to add a /m to the end of your expression to initiate the multi-line mode.  You can still do that – but for users that don’t know that this is the case, a new option has been added to turn on the Multiline option (see below).

The MultiLine Evaluation will be enabled and selectable when the Use regular expressions option is checked.

Linked Data Platform Changes

The Linked Data Platform will be seeing two significant changes.  The first change is occurring in the code to support linking fields like the 880.  This has meant adding a new special processing instruction “linking”, which can now be used to perform reconciliation against data in these fields.  This is particularly important for Asian languages, Arabic languages, Hebrew, etc.

The second change is in the rules file itself.  I’ve profiled the 880 field, as well as a wide range of other collections.

Finally, I’ve added a note to the Main Window that helps users find their rules file for edit, as well as points to the knowledge-base articles and videos explaining the process. (see below)


Other Changes:

Other changes that will be made in this update:

  1. ISSN Report – tweaked the process due to a bug.
  2. MARCCompare – added a new output type; in addition to the HTML output, there will be a diff file output
  3. Preferences Window/Language – Added a help icon that points to the knowledge-base articles related to fonts and font recommendations.

These changes will be part of the Sunday update, and at this point, look like they will be applicable to all versions of MarcEdit (though, I do have a few UI tweaks that I need to complete on the mac side).

If you have questions, let me know.  Otherwise, these changes will be made available on 9/11/2016 9/12/2016 [evening].


Ed Summers: Nicolini (2)

planet code4lib - Sun, 2016-09-11 04:00

Chapter 2 of Nicolini (2012) provides a quick tour through a few thousand years of philosophy to highlight the deep roots of practice theory.

When describing the types of activity of the human mind Aristotle added to Plato’s episteme (scientific knowledge) two more categories: phronesis (practical wisdom) and techne (art or skill). Phronesis in particular was a non-inferential, non-deductive and highly improvisational form of knowledge. Nicolini draws on the work of Nussbaum (1986) in his reading of Aristotle. He reminds us that Greek society at the time was highly segmented by slavery, and that it was a luxury of the ruling class to be able to dedicate one’s life to learning (episteme). Those that mastered practice, the artisans, were second class citizens at best. So there was a hierarchy to episteme, phronesis and techne.

In the centuries following Aristotle this aspect to his work was all but lost until Marx and Nietzsche rediscovered it, and turned the hierarchy on its head, with practice becoming the fundamental principle. Marx’s focus on human activity, can be found in his discussion of praxis which eventually becomes production in his later writing. Production is a word that he used to cover all human material practices. Marx’s philosophy hinged on the importance of putting ideas into practice in the world, as Nicolini says:

[Marx] makes clear the aim of science is not that of producing theoretical knowledge but more of obtaining practical mastery of the world in order to satisfy the practical needs of mankind.

I can’t help but be reminded of the American Pragmatism here too (Pierce, James and Dewey) and was a bit surprised that Nicolini doesn’t mention them at all. Shrug. At any rate, it’s clear that Nicolini sees Marx as opening up a new space for thought, a space that Nietzsche and Heidegger would later fill. Quite a bit of the chapter is also devoted to Heidgegger’s idea of Dasein or being in the world, which quite a few later practice theorists draw on. Heidegger positions everyday practices prior to representation – echoing Marx’s inversion of knowledge and practice. Heidegger introduces the idea of breakdown, which makes everyday practices visible. Breakdown is an idea that gets used a great deal in infrastructure studies. To understand it Nicolini borrows Heidegger’s thought experiment of hammering a nail:

The hammer belongs to the environment and can be unthinkingly used by the carpenter. The carpenter does not need to ‘think a hammer’ in order to drive in a nail. His or her capacity to act depends upon the familiarity with the act of hammering. His/her use of the practical item‘hammer’is its significance to him/her in the setting ‘hammering’ and ‘carpentry’ … The hammer as such acquires a separate ‘existence’ only when it breaks or is lost:> that is, when its unreflective use becomes problematic. (Nicolini, p. 34)

I first encountered Heidegger’s idea of breakdown a few years ago when I read Winograd & Flores (1986), which applied the idea to the context of computing and design. Since then it’s popped up in the context of infrastructure studies as well as work centered on repair. It might be useful to return to learn more about the origins in Heidegger’s work, perhaps through Dreyfus (1991) who Nicolini references quite a bit.

The chapter ends with a discussion of Wittgenstein and practice theory. I’ve encountered Wittgenstein’s work back when I was looking for to understand the semantic web agenda back when I was working at the Library of Congress a few years ago. His earlier and later career make for such a fascinating embodiment of the problems of philosophy particular where sense making and mathematics intersect Halpin (2011). So it was fun to find another parallel between practice theory and my own interests.

Nicolini references Shotter (1996) when drawing attention to three ways in which Wittgenstein’s work informs practice theory:

  • meaning is found outside in social practices, not in internal contemplation
  • the function of following rules (or not following rules) as practices that can only be understood through hints, tips and examples.
  • practices provide a criteria of truth, and understanding is demonstrated by being extrapolate rules/practices further–how to go on

Wittgenstein’s ideas of forms of life is also influential Johannessen (1988), because it brings attention to specific ways of acting, and day-to-day performances. It is through the study of these that rules or practices can be observed.


Dreyfus, H. L. (1991). Being-in-the-world: A commentary on Heidegger’s Being and Time, Division I. MIT Press.

Halpin, H. (2011). Sense and reference on the web. Minds and Machines, 21(2), 153–178. Retrieved from

Johannessen, K. S. (1988). The concept of practice in Wittgenstein’s later philosophy. Inquiry, 31(3), 357–369.

Nicolini, D. (2012). Practice theory, work, and organization: An introduction. Oxford University Press.

Nussbaum, M. (1986). The fragility of goodness: Luck and ethics in Greek tragedy and philosophy. Cambridge University Press.

Shotter, J. (1996). Problems of theoretical psychology. In C. W. Tolman, F. Cherry, R. van Hezewijk, & I. Lubek (Eds.), Problems of theoretical psychology (pp. 3–12). Captus Press.

Winograd, T., & Flores, F. (1986). Understanding computers and cognition: A new foundation for design. Intellect Books.

Ed Summers: Nicolini (1)

planet code4lib - Sat, 2016-09-10 04:00

As I previously mentioned I’m taking a look at Practice Theory as part of my coursework this semester. First up is reading Davide Nicolini’s Practice Theory, Work, and Organization. This post is about chapter one, but I may cover multiple chapters in subsequent posts.

Even though Nicolini is writing a book about Practice Theory he resists the urge to try to establish a single, monolithic, unified practice theory, and has structured his book around six separate strands:

He cites Schatzki (2001) when mentioning this emphasis on practice theories rather than a unified practice theory. This same work is cited pretty heavily elsewhere in the first chapter, and seeing that he dedicates a whole chapter to him Schatzki is clearly important to Nicolini. If anyone is looking for a Wikipedia page to write, there appears to be no article for Schatzki, despite the fact that he is mentioned prominently in the article for Practice Theory. Maybe that can be something I can create a stub for when it comes to the chapter on Schatzki…

Nicolini also distinguishes between weak and strong practice theory. In weak practice theory the techniques of paying attention to the mundane details of activity are used in an effort to catalog and describe various practices in a particular domain or context. Hard practice theory does this as well but goes an extra step in trying to explain how the practices are generated in various contexts over time. Strong practice theory takes identified practices as the unit of analysis and build ontological analyses upon them.

It strikes me that in my own first tentative steps in using practice theory I’ve definitely been more in the weak camp. I’ve applied an attention to mundane details to identify practices, but haven’t done much analysis of how those practices are sustained over time–the ontological work. I’m hopeful that this book will give me the tools to help me shift towards trying more of that. While there is clearly an ordering to weak/strong I wonder if there may be hidden humanistic benefits to a weak approach–where letting the reader infer connections, rather than explicitly giving them could be useful. I can’t help but be reminded of Hard and Soft sci-fi.


Schatzki, T. R. (2001). The practice turn in contemporary theory. In T. R. Schatzki, K. K. Cetina, & E. von Savigny (Eds.),. Routledge.

District Dispatch: ALA appoints Nick Gross OITP Research Associate

planet code4lib - Fri, 2016-09-09 19:33

This week, the American Library Association (ALA) announced that Nick Gross will serve as a Research Associate in its Office for Information Technology Policy (OITP). In that role, Nick will provide policy research assistance on copyright, telecommunications, and the other issues within the OITP portfolio. In particular, he will work on various initiatives related to ALA’s policy advocacy efforts to influence the presidential transition teams and incoming officials in the next Presidential Administration.

ALA appointed Nick Gross as an OITP Research Associate

Nick just completed a term as ALA’s Google Policy Fellow, during which he explored diverse areas of information policy, such as copyright law, e-book licenses and access, information access for underserved populations, telecommunications policy, digital literacy, online privacy, the future of libraries, and others. Google, Inc., pays the summer stipends for the fellows and the respective host organizations determine the fellows’ work agendas.

Nick Gross is a Ph.D. candidate at the School of Media & Journalism at the University of North Carolina, Chapel Hill, specializing in media law and policy. He completed a J.D. at the University of Miami School of Law and was a staff attorney for the U.S. Court of Appeals for the Eleventh Circuit and is a member of the California Bar.

The post ALA appoints Nick Gross OITP Research Associate appeared first on District Dispatch.

Access Conference: Access 2016 and PKP Code Sprint

planet code4lib - Fri, 2016-09-09 13:45

If the Hackfest and Access 2016 program aren’t enough for you, UNB Libraries will also be welcoming PKP to Fredericton for their Fall 2016 Code Sprint on Sunday and Monday, October 2-3, at the Wu Conference Centre, just before Access kicks off!

Developers and non-developers are welcome to attend to work on plugins, new documentation, and more!

For more information on registration (free!), projects, program, and other details, check out PKP’s website! Access 2016 attendees interested in finding out more about the latest from PKP (including OJS 3.0 and OMP 1.2), please stick around for PKP’s post-conference workshop on Friday, October 7!

FOSS4Lib Recent Releases: pycounter - 0.15.1

planet code4lib - Fri, 2016-09-09 12:12

Last updated September 9, 2016. Created by wooble on September 9, 2016.
Log in to edit this page.

Package: pycounterRelease Date: Wednesday, August 31, 2016

Open Knowledge Foundation: OpenSpending collaborates with Mexico’s Ministry of Finance to standardise and visualise government budget data

planet code4lib - Fri, 2016-09-09 09:30


On September 8, 2016, Mexico became the first country to formally adopt the Open Fiscal Data Package, an international open data standard promoted by the Global Initiative for Fiscal Transparency (GIFT), in collaboration with Open Knowledge International and the World Bank, with the support of Omidyar Network. This collaboration is a pioneering step for publishing fiscal information in open formats. Mexico set an example to OpenSpending community who are intending to make use of the Open Fiscal Data Package and the new tools.

The announcement was made during an event hosted by the Ministry of Finance of Mexico to present the Executive’s Budget Proposal for 2017. The Ministry also revealed that it published the 2008-2016 Federal Budget on its website. The data was prepared using the OpenSpending Viewer, a tool which allows the users to upload and analyze data, and create visualizations.

One of Open Knowledge International’s core projects is OpenSpending, a free and open platform looking to track and analyse public fiscal information globally. The OpenSpending community is made up of citizens, organisations and government partners interested in using and sharing public fiscal data like government budget and spending information. The OpenSpending project is also involved in the creation of tools and standards to ensure this public information is more comparable and useful for a wide-range of users.

For the past few months, OpenSpending, in collaboration with the Global Initiative for Fiscal Transparency and WB-BOOST initiative team, has been working with the Ministry of Finance of Mexico to pilot the OpenSpending tools and the Open Fiscal Data Package (OFDP). The OFDP powers the new version of the OpenSpending tools used to publish Mexico’s Federal Budget data. The OFDP helps make data releases more comparable and useful.

The data package, embedded on Ministry of Finance’s web page, enables users to analyse the 2008-2016 budget, to create visualizations on all or selected spending sectors and share their personalized visualizations. All data is available for download in open format, while the API allows users to create their own apps based on this data.

Explore the visualization here.

In the next few months, the OpenSpending team will pilot the OFDP specification in a number of other countries. The specification and the OpenSpending tools are free and available to use to any interested stakeholder. To find out more, get in touch with us on the discussion forum.

Upload financial data, browse datasets and learn more about public finances from around the world by visiting OpenSpending – let’s work together to build the world’s largest fiscal data repository.

Ed Summers: Practice Theory

planet code4lib - Fri, 2016-09-09 04:00

This semester I’m going to be doing an independent study with Professor Andrea Wiggins to research and apply Practice Theory in my own work. I’ve included part of my research proposal (which is still a bit in flux at the moment) below. The plan is to use this web space to write up notes about my reading and work as I go. I thought I’d share it in case it shows up in your feed reader–hey I’m told people still use them. Writing here also puts a little bit of pressure on myself to stick to the plan as best I can. I’ll tag all the posts with practice.


The purpose of this independent study is for me to perform a detailed analysis of interview transcripts obtained during a previous study of appraisal practices among web archivists. The goal is to use practice theory as a critical lens for coding the transcripts. Weekly readings will introduce the field of practice theory. Each week I will write up brief summaries of the readings on my blog. The data analysis will be packaged up and results will be written up as a paper that could serve as an initial draft submission for a conference or a journal.


For the past year I have been researching how Web archivists go about doing their work to better understand existing and potential designs for Web archiving technology. Specifically, I’ve been interested in how archivists decide what to collect from the Web, and how they perform these actions using automated agents (software). The goal in this work is to help inform the design of archival systems for collecting and preserving Web content.

While the archivist’s work is sometimes guided by institutional collection development policies, not all organizations have them; and even when they do, considerable interpretive work often needs to be done when putting these policies into practice. The situation is further complicated by the fact that the tools that are available to the archivist in their Web archiving work, and the material of the Web itself are changing rapidly. This churn makes it very difficult to add specific detail to collection development policies without it becoming quickly out of date. By necessity they must remain at a fairly high level, which leaves the archivist with quite a bit of room for experimentation and practice. The practice of Web archiving is relatively young, compared with the longer established archival science. As a result there remain large questions about how the materiality of the Web, and tools for working with it impacts archival science at a theoretical and practical level.

Existing survey work done by the International Internet Preservation Consortium (Marill, Boyko, & Ashenfelder, 2004) and the National Digital Stewardship Alliance (Bailey et al., 2013; NDSA, 2012 ) described high level characteristics of Web archives work, particularly at the level of national libraries and universities. However these surveys intentionally did not provide a very rich picture of the day to day work of Web archivists. In order to better understand how these appraisal decisions are being enacted in Web archives I decided to conduct a series of unstructured interviews with active Web archivists to see what common themes and interaction patterns emerged from descriptions of their work. I employed a grounded theory research methodology which allowed me explore theoretical perspectives that emerged during iterative data collection and analysis. Through coding of my field notes I was able to observe a set of high level themes, which I reported on in a paper I will be presenting at CSCW 2017.

One overarching theme that emerged in this work was the ways in which the archivists and their software agents both worked together to produce the Web archive. I became increasingly interested in ways of viewing this interaction, which led me to reflect on the use of sociotechnical theory as a possible lens for further analysis of the interviews. After some consultation with Professor Wiggins I decided to spend some time exploring the sociotechnical theory literature in order to build a list of readings and a work plan for taking another look at my interview data using a sociotechnical theoretical lens, and more detailed coding of the actual interview transcripts.

I found an excellent overview from Sawyer & Jarrahi (2014) about the application of sociotechnical theory in Information Systems. This led to the realization that sociotechnical theory, while seemingly narrow, was in fact a large intellectual space that had many different branches and connections into IS and ICT. In fact it felt like such a broad area that I wouldn’t have time to thoroughly review the literature while also doing data analysis and writing.

In order to further refine my focus I decided to read Geiger (2015) and Ford (2015) which are two recent dissertations that have looked at Wikipedia as a sociotechnical system. I was drawn to their work because of the parallels between studying a collaboratively built encyclopedia and the study of archives of Web content. Both Geiger and Ford examine a medium or artifact that predates the Internet, the encyclopedia, but which has subsequently been transformed by the emergence of the Web as a sociotechnical artifact. Their ethnographic approach led them to the use of participant observation as a method, which aligned nicely with the first phase of my study. While there were certainly theoretical angles (the study of algorithms) that I can draw on, increasingly I found that it was their focus on participation that I found compelling for my own work.

Last spring Cliff Lampe UMD to give a talk about citizen interaction design. While describing his work Lampe stressed what he saw as a turn toward practice in the HCI community. He recommended a series of resources for further exploration of the subject including Kuutti & Bannon (2014). Since I had been having some difficulty in focusing my exploration of sociotechnical theory, and Ford and Geiger seemed to also point towards the importance of practice in their ethnographic work I decided to focus my independent study on three texts that came up many times in the literature I reviewed. I wanted to read books instead of articles because it seemed like a broad area and deep area that would benefit from a few deep dives rather than a survey approach to the literature.


Nicolini, D. (2012). Practice theory, work, and organization: An introduction. Oxford University Press.

This text was recommended by Kuutti & Bannon (2014) for providing an overview of the field of practice theory, and its theoretical and philosophical foundations in phenomenology, ethnomethodlogy and activity theory. I’m hopeful that this text will provide a useful and current picture of the field, which can be useful in diving off into other readings later in the semester.

Dourish, P. (2004). Where the Action Is: The Foundations of Embodied Interaction. MIT Press.

Dourish is a heavily cited figure in HCI and sociotechnical literature. Where the Action Is in particular helped establish the theoretical foundations for incorporating social practices into system design. I’m particularly interested in how Dourish grounds HCI in the philosophical work of Heidegger and Wittgenstein. I guess it could be arguable about whether Dourish belongs in the practice theory camp. I guess I’ll know more after reading this book. I really wanted to make sure I connected the dots between practice theory and information technology.

Suchman, L. (1986). Plans and situated actions. Cambridge University Press.

This book by Suchman is constantly referenced in HCI literature as helping to establish a theoretical focus on the social and material properties of computer systems. As an anthropologist her use of ethnographic analysis is particular interest to me. I wanted to read it first hand instead of just citing it as a touchstone.


I’ve left some breathing room in the reading schedule near the end of the semester to allow for additional reading encountered during the reading of the main texts, and also for addition ideas from Professor Wiggins. I also wanted to leave time for coding, analysis and writing since the goal of this independent study is a paper.

Week 1

Nicolini, chapters 1-3

Week 2

Nicolini, chapters 4-5

Week 3

Nicolini, chapter 6-7

Week 4

Nicolini, chapters 8-9

Week 5

Dourish, chapters 1-2


Week 6

Dourish, chapters 3-4


Week 7

Dourish, chapters 5-7


Week 8

Suchman, chapters 1-4


Week 9

Suchman, chapters 5-8

Paper Outline

Week 10

Data Analysis

Week 11

Data Analysis


Week 12


Week 13


Week 14


Week 15

Final paper due.


Bailey, J., Grotke, A., Hanna, K., Hartman, C., McCain, E., Moffatt, C., & Taylor, N. (2013). Web archiving in the United States: A 2013 survey. National Digital Stewardship Alliance. Retrieved from

Ford, H. (2015). Fact factories: Wikipedia and the power to represent (PhD thesis). University of Oxford. Retrieved from

Geiger, R. S. (2015). Robots.txt: An ethnographic investigation of automated software agents in user-generated content platforms (PhD thesis). University of California at Berkeley.

Kuutti, K., & Bannon, L. J. (2014). The turn to practice in HCI: Towards a research agenda. In Proceedings of the 32nd annual ACM Conference on Human Factors in Computing Systems (pp. 3543–3552). Association for Computing Machinery. Retrieved from

Marill, J., Boyko, A., & Ashenfelder, M. (2004). Web harvesting survey. International Internet Preservation Consortium. Retrieved from

NDSA. (2012). Web archiving survey report. National Digital Stewardship Alliance. Retrieved from

Sawyer, S., & Jarrahi, M. H. (2014). CRC handbook of computing. In A. Tucker & H. Topi (Eds.),. Chapman; Hall.

Galen Charlton: A small thought on library and tech unions in light of a lockout

planet code4lib - Fri, 2016-09-09 01:12

I’ve never been a member of a union. Computer programmers — and IT workers in general — in the U.S. are mostly unorganized. Not only that, they tend to resist unions, even though banding together would be a good idea.

It’s not necessarily a matter of pay, at least not at the moment: many IT workers have decent to excellent salaries. Of course not all do, and there are an increasing number of IT job categories that are becoming commoditized. Working conditions at a lot of IT shops are another matter: the very long hours that many programmers and sysadmins work are not healthy, but it can be very hard to be first person in the office to leave at a reasonable quitting time day.

There are other reasons to be part of a union as an IT worker. Consider one of the points in the ACM code of ethics: “Respect the privacy of others.” Do you have a qualm about writing a web tracker? It can be hard to push back all by yourself against a management imperative to do so. A union can provide power and cover: what you can’t resist singly, a union might help forestall.

The various library software firms I’ve worked for have not been exceptions: no unions. At the moment, I’m also distinctly on the management side of the table.

Assuming good health, I can reasonably expect to spend another few decades working, and may well switch from management to labor and back again — IT work is squishy like that. Either way, I’ll benefit from the work — and blood, and lives — of union workers and organizers past and future. (Hello, upcoming weekend! You are literally the least of the good things that unions have given me!)

I may well find myself (or more likely, people representing me) bargaining hard with or against a union. And that’s fine.

However, if I find myself sitting, figuratively or literally, on the management side of a negotiation table, I hope that I never lose sight of this: the union has a right to exist.

Unfortunately, the U.S. has a long history of management and owners rejecting that premise, and doing their level best to break unions or prevent them from forming.

The Long Island University Faculty Federation, which represents the full time and adjunct faculty at the Brooklyn campus of LIU, holds a distinction: it was the first union to negotiate a collective bargaining agreement for faculty at a private university in the U.S.

Forty-four years later, the administration of LIU Brooklyn seems determined to break LIUFF, and have locked out the faculty. Worse, LIU has elected not to continue the health insurance of the LIUFF members. I have only one word for that tactic: it is an obscenity.

As an aside, this came to my attention last week largely because I follow LIU librarian and LIUFF secretary Emily Drabinski on Twitter. If you want to know what’s going on with the lockout, follow her blog and Twitter account as well as the #LIUlockout hashtag.

I don’t pretend that I have a full command of all of the issues under discussion between the university and the union, but I’ve read enough to be rather dubious that the university is presently acting in good faith. There’s plenty of precedent for university faculty unions to work without contracts while negotiations continue; LIU could do the same.

Remember, the union has a right to exist. Applies to LIUFF, to libraries, and hopefully in time, to more IT shops.

If you agree with me that lockouts are wrong, please consider joining me in donating to the solidarity fund for the benefit of LIUFF members run the by American Federation of Teachers.

DuraSpace News: TRY IT OUT: DSpace 6.0 Release Candidate #3 Available

planet code4lib - Fri, 2016-09-09 00:00

From Tim Donohue, DSpace tech lead, on behalf of the DSpace committers team

Austin, TX  The third release candidate of 6.0 is now available for download and testing. 6.0-RC3 (Release Candidate #3) is a pre-release of 6.0, and we hope that the 6.0 final release will follow closely in its footsteps.

LITA: Social Media For My Institution – a new LITA web course

planet code4lib - Thu, 2016-09-08 20:29

Social Media For My Institution: from “mine” to “ours”

Instructor: Dr. Plamen Miltenoff
Wednesdays, 9/21/2016 – 10/12/2016
Blended format web course

Register Online, page arranged by session date (login required)

This course is for librarians who want to explore the institutional application of social media. Based on the established academic course at St. Cloud State University “Social Media in Global Context” (more information at ). A theoretical introduction will assist participants to detect and differentiate the private use of social media from the structured approach to social media for an educational institution. Legal and ethical issues will be discussed, including future trends and management issues. The course will include hands-on exercises on creation and dissemination of textual and multimedia content and patrons’ engagement. Brainstorming on suitable for the institution strategies regarding resources, human and technological, workload share, storytelling, and branding.

This is a blended format web course:

The course will be delivered as 4 separate live webinar lectures, one per week on Tuesdays, September 21, 28, October 5, and 12 at 2pm Central. You do not have to attend the live lectures in order to participate. The webinars will be recorded and distributed through the web course platform, Moodle, for asynchronous participation. The web course space will also contain the exercises and discussions for the course.

Details here and Registration here


By the end of this class, participants will be able to:

  • Move from the state of personal use of social media (SM) and contemplate the institutional approach
  • Have a hands-on experience with finding and selecting multimedia resources and their application for branding of the institution
  • Participants will acquire the foundational structure of the elements, which constitute meaningful institutional social media
    michael schofield headshot

Dr. Plamen Miltenoff is an information specialist and Professor at St. Cloud State University. His education includes several graduate degrees in history and Library and Information Science and in education. His professional interests encompass social Web development and design, gaming and gamification environments. For more information see

And don’t miss other upcoming LITA fall continuing education offerings:

Online Productivity Tools: Smart Shortcuts and Clever Tricks
Presenter: Jaclyn McKewan
Tuesday September 20, 2016
11:00 am – 12:30 pm Central Time
Register Online, page arranged by session date (login required)

Questions or Comments?

For questions or comments, contact LITA at (312) 280-4268 or Mark Beatty,

FOSS4Lib Recent Releases: veraPDF - 0.22

planet code4lib - Thu, 2016-09-08 19:16

Last updated September 8, 2016. Created by Peter Murray on September 8, 2016.
Log in to edit this page.

Package: veraPDFRelease Date: Wednesday, September 7, 2016

SearchHub: Third Annual Solr Developer Survey

planet code4lib - Wed, 2016-09-07 17:03

It’s that time of the year again – time for our third annual survey of the Solr marketplace and ecosystem. Every day, we hear from organizations looking to hire Solr talent. Recruiters want to know how to find and hire the right developers and engineers, and how to compensate them accordingly.

Lucidworks is conducting our annual global survey of Solr professionals to better understand how engineers and developers at all levels of experience can take advantage of the growth of the Solr ecosystem – and how they are using Solr to build amazing search applications.

This survey will take about 2 minutes to complete. Responses are anonymized and confidential. Once our survey and research is completed, we’ll share the results with you and the Solr community.

As a thank you for your participation, you’ll be entered in a drawing to win one of our blue SOLR t-shirts plus copies of the popular books Taming Text and Solr in Action. Be sure to include your t-shirt size in the questionnaire.

Take the survey today

Past survey results: 2015, 2014

The post Third Annual Solr Developer Survey appeared first on

ACRL TechConnect: A High-Level Look at an ILS Migration

planet code4lib - Wed, 2016-09-07 16:00

My library recently performed that most miraculous of feats—a full transition from one integrated library system to another, specifically Innovative’s Millennium to the open source Koha (supported by ByWater Solutions). We were prompted to migrate by Millennium’s approaching end-of-life and a desire to move to a more open system where we feel in greater control of our data. I’m sure many librarians have been through ILS migrations, and plenty has been written about them, but as this was my first I wanted to reflect upon the process. If you’re considering changing your ILS, or if you work in another area of librarianship & wonder how a migration looks from the systems end, I hope this post holds some value for you.


No migration is without its problems. For starters, certain pieces of data in our old ILS weren’t accessible in any meaningful format. While Millennium has a robust “Create Lists” feature for querying & exporting different types of records (patron, bibliographic, vendor, etc.), it does not expose certain types of information. We couldn’t find a way to export detailed fines information, only a lump sum for each patron. To help with this post-migration, we saved an email listing of all itemized fines that we can refer to later. The email is saved as a shared Google Doc which allows circulation staff to comment on it as fines are resolved.

We also discovered that patron checkout history couldn’t be exported in bulk. While each patron can opt-in to a reading history & view it in the catalog, there’s no way for an administrator to download everyone’s history at once. As a solution, we kept our self-hosted Millennium instance running & can login to patrons’ accounts to retrieve their reading history upon request. Luckily, this feature wasn’t heavily used, so access to it hasn’t come up many times. We plan to keep our old, self-hosted ILS running for a year and then re-evaluate whether it’s prudent to shut it down, losing the data.

While some types of data simply couldn’t be exported, many more couldn’t emigrate in their exact same form. An ILS is a complicated piece of software, with many interdependent parts, and no two are going to represent concepts in the exact same way. To provide a concrete example: Millennium’s loan rules are based upon patron type & the item’s location, so a rule definition might resemble

  • a FACULTY patron can keep items from the MAIN SHELVES for four weeks & renew them once
  • a STUDENT patron can keep items from the MAIN SHELVES for two weeks & renew them two times

Koha, however, uses patron category & item type to determine loan rules, eschewing location as the pivotal attribute of an item. Neither implementation is wrong in any way; they both make sense, but are suited to slightly different situations. This difference necessitated completely reevaluating our item types, which didn’t previously affect loan rules. We had many, many item types because they were meant to represent the different media in our collection, not act as a hook for particular ILS functionality. Under the new system, our Associate Director of Libraries put copious work into reconfiguring & simplifying our types such that they would be compatible with our loan rules. This was a time-consuming process & it’s just one example of how a straightforward migration from one system to the next was impossible.

While some data couldn’t be exported, and others needed extensive rethinking in the new ILS, there was also information that could only be migrated after much massaging. Our patron records were a good example: under Millennium, users logged in on an insecure HTTP page with their barcode & last name. Yikes. I know, I felt terrible about it, but integration with our campus authentication & upgrading to HTTPS were both additional costs that we couldn’t afford. Now, under Koha, we can use the campus CAS (a central authentication system) & HTTPS (yay!), but wait…we don’t have the usernames for any of our patrons. So I spent a while writing Python scripts to parse our patron data, attempting to extract usernames from institutional email addresses. A system administrator also helped use unique identifying information (like phone number) to find potential patron matches in another campus database.

A more amusing example of weird Millennium data was active holds, which are stored in a single field on item records & looks like this:


Can you tell what’s going on here? With a little poking around in the system, it became apparent that letters like “NNB” stood for “date not needed by” & that other fields were identifiers connecting to patron & item records. So, once again, I wrote scripts to extract meaningful details from this silly format.

I won’t lie, the data munging was some of the most enjoyable work of the migration. Maybe I’m weird, but it was both challenging & interesting as we were suddenly forced to dive deeper into our old system and understand more of its hideous internal organs, just as we were leaving it behind. The problem-solving & sleuthing were fun & distracted me from some of the more frustrating challenges detailed above.

Finally, while we had a migration server where we tested our data & staff played around for almost a month’s time, when it came to the final leap things didn’t quite work as expected. The CAS integration, which I had so anticipated, didn’t work immediately. We started bumping into errors we hadn’t seen on the migration server. Much of this is inevitable; it’s simply unrealistic to create a perfect replica of our live catalog. We cannot, for instance, host the migration server on the exact same domain, and while that seems like a trivial difference it does affect a few things. Luckily, we had few summer classes so there was time to suffer a few setbacks & now that our fall semester is about to begin, we’re in great shape.

Difference & Repetition

Koha is primarily used by public libraries, and as such we’ve run into a few areas where common academic library functions aren’t implemented in a familiar way or are unavailable. Often, it’s that our perspective is so heavily rooted in Millennium that we need to think differently to achieve the same effect in Koha. But sometimes it’s clear that what’s a concern to us isn’t to other libraries.

For instance, bib records for serials with large numbers of issues is an ongoing struggle for us. We have many print periodicals where we have extensive holdings, including bound editions of past issues. The holdings display in the catalog is more oriented towards recent periodicals & displaying whether the latest few issues have arrived yet. That’s fine for materials like newspapers or popular magazines with few back issues, and I’ve seen a few public libraries using Koha that have minimalistic periodical records intended only to point the patron to a certain shelf. However, we have complex holdings like “issues 1 through 10 are bound together, issue 11 is missing, issues 12 through 18 are held in a separate location…” Parsing the catalog record to determine if we have a certain issue, and where it might be, is quite challenging.

Another example of the public versus academic functions: there’s no “recall” feature per se in Koha, wherein a faculty member could retrieve an item they want to place on course reserve from a student. Instead, we have tried to simulate this feature with a mixture of adjustments to our loan rules & internal reports which show the status of contested items. Recall isn’t a huge feature & isn’t used all the time, it’s not something we thought to research when selecting our new ILS, but it’s a great example of a minute difference that ended up creating a headache as we adapted to a new piece of software.

Moving from Millennium to Koha also meant we were shifting from a closed source system where we had to pay additional fees for limited API access to an open source system which boasts full read access to the database via its reporting feature. Koha’s open source nature has been perhaps the biggest boon for me during our migration. It’s very simple to look at the actual server-side code generating particular pages, or pull up specific rows in database tables, to see exactly what’s happening. In a black box ILS, everything we do is based on a vague adumbration of how we think the system operates. We can provide an input & record the output, but we’re never sure about edge cases or whether strange behavior is a bug or somehow intentional.

Koha has its share of bugs, I’ve discovered, but thankfully I’m able to jump right into the source code itself to determine what’s occurring. I’ve been able to diagnose problems by looking at open bug reports on Koha’s bugzilla tracker, pondering over perl code, and applying snippets of code from the Koha wiki or git repository. I’ve already submitted two bug patches, one of which has been pulled into the project. It’s empowering to be able to trace exactly what’s happening when troubleshooting & submit one’s own solution, or just a detailed bug report, for it. Whether or not a patch is the best way to fix an issue, being able to see precisely how the system works is deeply satisfying. It also makes it much easier to me to design JavaScript hacks that smooth over issues on the client side, be it in the staff-facing administrative functions or the public catalog.

What I Would Do Differently

Set clearer expectations.

We had Millennium for more than a decade. We invested substantial resources, both monetary & temporal, in customizing it to suit our tastes & unique collections. As we began testing the new ILS, the most common feedback from staff fell along the lines “this isn’t like it was in Millennium”. I think that would have been a less common observation, or perhaps phrased more productively, if I’d made it clear that a) it’ll take time to customize our new ILS to the degree of the old one, and b) not everything will be or needs to be the same.

Most of the customization decisions were made years ago & were never revisited. We need to return to the reason why things were set up a certain way, then determine if that reason is still legitimate, and finally find a way to achieve the best possible result in the new system. Instead, it’s felt like the process was framed more as “how do we simulate our old ILS in the new one” which sets us up for disappointment & failure from the start. I think there’s a feeling that a new system should automatically be better, and it’s true that we’re gaining several new & useful features, but we’re also losing substantial Millennium-specific customization. It’s important to realize that just because everything is not optimal out of the box doesn’t mean we cannot discover even better solutions if we approach our problems in a new light.

Encourage experimentation, deny expertise.

Because I’m the Systems Librarian, staff naturally turn to me with their systems questions. Here’s a secret: I know very little about the ILS. Like them, I’m still learning, and what’s more I’m often unfamiliar with the particular quarters of the system where they spend large amounts of time. I don’t know what it’s like to check in books & process holds all day, but our circulation staff do. It’s been tough at times when staff seek my guidance & I’m far from able to help them. Instead, we all need to approach the ongoing migration as an exploration. If we’re not sure how something works, the best way is to research & test, then test again. While Koha’s manual is long & quite detailed, it cannot (& arguably should not, lest it grow to unreasonable lengths) specify every edge case that can possibly occur. The only way to know is to test & document, which we should have emphasized & encouraged more towards the start of the process.

To be fair, many staff had reasonable expectations & performed a lot of experiments. Still, I did not do a great job of facilitating either of those as a leader. That’s truly my job as Systems Librarian during this process; I’m not here merely to mold our data so it fits perfectly in the new system, I’m here to oversee the entire transition as a process that involves data, workflows, staff, and technology.

Take more time.

Initially, the ILS migration was such an enormous amount of work that it was not clear where to start. It felt as if, for a few months before our on-site training, we did little but sit around & await a whirlwind of busyness. I wish we had a better sense of the work we could have front-loaded such that we could focus efforts on other tasks later on. For example, we ended up deleting thousands of patron, item, and bibliographic records in an effort to “clean house” & not spend effort migrating data that was unneeded in the first place. We should have attacked that much earlier, and it might have obviated the need for some work. For instance, if in the course of cleaning up Millennium we delete invalid MARC records or eliminate obscure item types, those represent fewer problems encountered later in the migration process.


As we start our fall semester, I feel accomplished. We raced through this migration, beginning the initial stages only in April for a go-live date that would occur in June. I learned a lot & appreciated the challenge but also had one horrible epiphany: I’m still relatively young, and I hope to be in librarianship for a long time, so this is likely not the last ILS migration I’ll participate in. While that very thought gives me chills, I hope the lessons I’ve taken from this one will serve me well in the future.

LITA: LITA Personas Task Force Survey

planet code4lib - Wed, 2016-09-07 15:11

The LITA Personas Task Force seeks your help in developing personas in order to identify who are a natural fit for LITA. We invite everyone who works in the overlapping space between libraries and technology, whether or not you belong to LITA, to participate. This survey is designed to assess your needs and identify how you interact with LITA.

We anticipate this survey will take approximately 10 – 15 minutes to complete. Data will be gathered anonymously and kept confidential. You may be offered the opportunity to participate in a virtual interview at a later date. This is optional and will require you to provide your contact information if you are interested. Names and emails will not be associated with your survey responses. The Survey closes on Friday, Sept. 30th, 2016, so don’t delay!

If you have any questions regarding LITA personas, please contact either

Hong Ma at
Yoo Young Lee at

We thank you in advance for your time and support.

LITA Personas Task Force Members:

Callan Bignoli
Lynne Edgar
Eric Frierson
Isabel Gonzalez-Smith
Amanda L. Goodman
TJ Lamanna
Yoo Young Lee
Hong Ma
Frank Skornia
Nadaleen Tempelman-Kluit

LITA: The President’s Post – #1

planet code4lib - Wed, 2016-09-07 14:16

Hello fellow LITAns!  For those of you who don’t know me my name is Aimee Fifarek and I will be serving as your fearless leader for the coming year. I have been a LITA member since I joined ALA in 1997 when I started my first professional job as the Louisiana State University Libraries System Administrator.  It’s hard to believe nearly 20 years have passed since I was a baby librarian running NOTIS in a mainframe environment. So many people in LITA-land have helped me over the course of my career, and I am happy to be able to repay those favors, in part, by serving as your President.

My plan is to do monthly posts during my tenure to share information about what is happening at the LITA Board level and share information about new and upcoming initiatives.  Communication is always in issue with an organization of our size and sometimes the wonkier bits of association business don’t always get communicated widely even though they are often news you can use.  Feel free to contact me – online or off – about anything LITA-related and I will do my best to respond in a timely fashion.

First, some old business, at least for me: committee appointments.  It’s what I spent my tenure as LITA VP doing and I’m happy to say appointments have been fully transitioned to our new VP Andromeda Yelton.  She has gotten off to an excellent start by coding an interface for the appointments database that she and her new Appointments Committee can use to manage all of those volunteer forms you submit.  Between the new committee and Andromeda’s app we are well on our way to defeating the traditional “black hole” nature of the appointments process.

Although it is tempting to think of Committee Appointments as an annual process, it really happens year round as people need to drop off committees for one reason or another or as new committees and task forces are formed.  If you are looking to get more involved with LITA, add some professional experience to your resume, or just want to give back, please do consider volunteering for a committee.  You get to meet new people, go in depth on issues and processes, and have the chance to make the Association that much better.  Check out the options on the LITA Committee Page and don’t be shy about letting us know about your prior experience and special skills.  The more info you put into the volunteer form the better we will be at matching you up with an excellent opportunity.

Speaking of new committees, did you know that as of last year LITA has a Diversity and Inclusion Committee?  With the volume of issues being discussed within the realm of technology in general and librarianship in particular it was well past time for LITA to establish a formal commitment to establish Diversity as a fundamental principle of LITA.  Thanks to Carli Spina who has agreed to be the committee’s first chair and to Evvivia Weinraub for being the first Board Liaison.  Their work will be fundamental to the committee’s ongoing success.

Before I leave the topic of committees I’d just like to send a big thank you to Michelle Frisque and Margaret Heller, our newest Interest Group and Committee Chair Coordinators.  If you are not familiar with this role, these are the folks who make sure the IG and Committee Chairs get the info they need to have successful meetings throughout the year.  We are happy to have them on board.  I would be remiss if I didn’t thank the outgoing inhabitants of those roles, David Lee King and Lauren Pressley, who did an admirable job.

Now, onto some new business.  The first LITA Board Meeting is TODAY September 7th at 11am Pacific.  I encourage everyone to tune in at, and not just the fans of parliamentary procedure snafus (you know who you are!).  We will be discussing, and hopefully adopting, the new LITA Strategic Plan.  Once adopted, this document will stay as is over the next two years and help guide LITA’s activities, specifically helping us to decide how to spend our most valuable commodity:  our time.  The document has four major focus areas:  Member Engagement, Organizational Sustainability, Education and Professional Development, and Advocacy and Information Policy. You can check out the final draft of the Strategic Plan, along with a very preliminary draft of the tactical plan, at the ALA Connect Node 256917.

Advocacy and Information Policy is definitely a growth area for us, and we will be starting out in this plan with some baby steps.  Although LITA will always be the home for library technologists within ALA, we have to think critically about what LITA’s purpose is in a world where everyone does technology.  This strategic plan item formalizes the idea that, as the group that has been thinking about and working with technology for the longest time, we are in an excellent position to guide the development of policies surrounding technology for our libraries and our world.  In the coming year we will be working on building a closer relationship with the units within ALA that are currently working in this area, like the Office for Information Technology Policy. Our goal is not to duplicate efforts already being made, but rather to lend our expertise to the policy decisions that affect all of us.

So that’s my update for September.  But before I go I want to extend hearty thanks to Brianna Marshall who is stepping down as LITA’s first Blog Editor.  She did an amazing job assembling a team and creating policies to bring you the quality content you get regularly through the LITA Blog.  Being first at something is always a challenge and Brianna met that challenge head on.  She is leaving the Blog in the capable hands of Lindsay Cronk, who has big ideas of her own and has been most helpful to me in my first post.  Brianna and Lindsay are just two more examples of the dedication and expertise that has made LITA a great place to be for the last 50 years.  More on that in my next post.

— Aimee

DuraSpace News: VIVO Updates for Sept 4–Woods Hole, Tech Docs, Wiki Improvements, Modeling Fellowships

planet code4lib - Wed, 2016-09-07 00:00

Woods Hole VIVO launched  The Marine Biology Laboratory Woods Hole Oceanographic Institution in Woods Hole, Massachusetts, has a new VIVO, and its beautiful!  See  Congratulations to the Library at MBLWHOI for creating this wonderful new site!

DuraSpace News: NEW DEMO from the Hydra-in-a-Box Tech Team

planet code4lib - Wed, 2016-09-07 00:00

From Mike Giarlo, software architect, Stanford University Libraries, on behalf of the Hydra-in-a-Box tech team

Palo Alto, CA  Development on the Hydra-in-a-Box repository application continues, and here's our latest demo. Thanks to the Chemical Heritage Foundation and Indiana University for contributing to these sprints!


Subscribe to code4lib aggregator