You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 16 hours 52 min ago

Evergreen ILS: Evergreen 3.0 development update #3: moving right to left

Fri, 2017-04-28 16:13

Common Teal or Eurasian Teal (Anas crecca) near Hodal, Faridabad, Haryana, India. CC-BY-SA image by J.M. Garg

As of this writing, 21 patches have been committed to the master branch since the previous development update.

One set of patches I’d like to highlight are the ones by Dan Scott for bug 1681009, which merge the main RTL (right-to-left) CSS stylesheet for the public catalog into the LTR (left-to-right). The RTL stylesheet was introduced in 2.12.0 so that the public catalog, when translated to languages such as Arabic that use right-to-left scripts, would align display elements correctly. However, having a separate stylesheet breaks the DRY (don’t repeat yourself) guidelines. Sometimes, the problem with repeating yourself isn’t when you repeat yourself, but when you forget to repeat yourself. In this case, new CSS styles that affect right or left alignment would need to be added to both the LTR and RTL stylesheets — but it’s all too easy to forget to do that, particularly if you’re a contributor who doesn’t regularly write in an RTL language.

Consequently, the main OPAC stylesheet, or rather, the Template Toolkit template that generates the stylesheet (Open-ILS/src/templates/opac/css/style.css.tt2) now looks like this:

[%- PROCESS "opac/parts/css/colors.tt2"; PROCESS "opac/parts/css/fonts.tt2"; rtl = ctx.get_i18n_l(ctx.eg_locale).rtl; %] ... #dash_wrapper { [% IF rtl == 't' -%] margin-right: 0.5em; [%- ELSE %] margin-left: 0.5em; [%- END %] }

The rtl template variable is set based on the user’s active locale and is used to generate left- or right-aligned styles as needed.

Moving forward, some guidelines that folks contributing to the public catalog CSS should be aware of include:

  • Avoid the use of inline styles in favor of defining CSS classes. This is a good idea in general, but it’s particularly important in the case of margin-right/margin-left and so forth. If anybody wants a small bug to work on, bug 1687031 identifies some places where cleanup is needed.
  • When you add new styles to style.css.tt2, remember to add a conditional for styles that touch left/right display.

If you cannot read Arabic, a little trick you can use on occasion is mark your home locale as an RTL one, e.g., by doing update config.i18n_locale set rtl = true where code ~ '^en'; and reloading Apache.

In this case, the “Messages” heading is still aligned to the left when, were English an RTL language, it more commonly would be expected to be on the right.

Duck trivia

The Eurasian Teal duck (Anas crecca) had its Linnaean name assigned by Linnaeus himself. As Avibase reveals, in Jordan that duck can be referred to as .


Updates on the progress to Evergreen 3.0 will be published every Friday until general release of 3.0.0. If you have material to contribute to the updates, please get them to Galen Charlton by Thursday morning.

DPLA: DPLAfest 2017: Post-Fest Wrap Up

Fri, 2017-04-28 16:00

From all of us at DPLA, we would like to offer our sincere thanks to all who joined us in Chicago last week for DPLAfest 2017! Whether you attended the fest or joined the conversation from afar, we deeply appreciate the contributions of each participant, presenter and volunteer. For those who couldn’t make it to Chicago (or who couldn’t be in multiple great sessions at once), we have compiled the following one-stop guide to the news, notes, media, and other materials documenting DPLAfest 2017.

Announcements & Milestones

DPLAfest officially kicked off Thursday morning at Chicago Public Library’s Harold Washington Library Center with a warm welcome to the city of Chicago and a report on DPLA’s recent milestones and new initiatives from DPLA Executive Director Dan Cohen. Highlights included:

  • Our Values – We proudly unveiled DPLA’s new Values Statement, which outlines the core commitments of our organization and our staff. We intend for this document to be a dynamic guide for our practice going forward and a reference against which we can track our progress as we continually strive to embody these values throughout the institution.
  • Expansion of the DPLA partner network – In addition to several new Service Hubs accepted this year, DPLA was proud to officially welcome the Library of Congress as a contributing Content Hub. With these new collections and others from established partners, DPLA now makes over 16 million items from 2,350 libraries, archives, and museums freely discoverable for all.
  • Implementing Rights has been well received by cultural heritage professionals within the DPLA network and around the world. Digital libraries in Brazil, Australia, New Zealand, and India will be joining the project, with interest from additional libraries on every continent.
  • Next steps for Ebooks – Children across the country downloaded over 1.5 million books through Open eBooks and DPLA is exploring new ways of facilitating discovery of free, open content with generous support from the Sloan Foundation.
  • Expanding Education Work – With funding from the Teagle Foundation, DPLA will expand its Education Advisory Committee, continue to develop primary source sets, and build and pilot a curriculum for professional development for instructors in K-12 and higher education during 2017 and 2018.

DPLAfest 2017, photo by Ron Gould


We were thrilled to have staff from American Libraries magazine join us throughout the fest. Check out their articles on the themes, conversations, and selected topics that emerged during the event.

I missed it! Where are the slides?

Presenters, we would love for you to share your slides and presentation materials with the broad DPLA community. If you have not already, you can upload your slides to Sched by logging in and selecting your session.

To find presentation slides and notes from DPLAfest 2017 sessions, visit the online agenda and select a session. Some presenters also shared slides during the fest via Twitter. Check Storify below for links.

Join our Team

DPLA staffers at DPLAfest 2017, photo by Ron Gould

Before the close of the fest, we posted a job opening for a full time Developer for Frontend and Web Applications. Apply to join our team and help us spread the word about this opportunity!

Special Thanks

DPLA would like to acknowledge and thank the gracious DPLAfest 2017 host organizations, Chicago Public Library, Black Metropolis Research Consortium, Chicago Collections, and Reaching Across the Illinois Library System (RAILS) as well as the generous sponsors of DPLAfest 2017, Datalogics, OCLC, Lyrasis, Sony, and an anonymous donor.


View selected photos from DPLAfest 2017 in our Flickr album.


Special thanks to the many DPLAfest-goers who helped capture the sessions and conversations taking place on social media. View a selection of highlights in Storify below or explore the full conversation using #DPLAfest.

<noscript>[<a href="" target="_blank">View the story &#8220;DPLAfest 2017&#8221; on Storify</a>]</noscript>


Open Knowledge Foundation: Steady but Slow – Open Data’s Progress in the Caribbean

Thu, 2017-04-27 02:08

Over the last two years, the SlashRoots Foundation has supported the Caribbean’s participation in the Open Knowledge International’s Global Open Data Index, an annual survey which measures the state of  “open” government across the world. We recently completed the 2016 survey submissions and were asked to share our initial reactions before the full GODI study is released in May.

In the Global Open Data Index, each country is assessed based on the availability of “open data” as defined in the Open Knowledge Foundation’s Open Data Definition across key thematic areas that Governments are expected to publish information on. These include: National Maps, National Laws, Government Budget, Government Spending, National Statistics, Administrative Boundaries, Procurement, Pollutant Emissions, Election Results, Weather Forecast, Water Quality, Locations, Draft Legislation, Company Register, and Land Ownership.

For the 2016 survey, the Caribbean was represented by ten countries—Antigua & Barbuda, Barbados, Bahamas, Dominican Republic, Jamaica, Guyana, Trinidad and Tobago, St. Lucia, St. Kitts & Nevis, and St. Vincent & the Grenadines. As the Caribbean’s Regional Coordinator, we manage and source survey submissions from citizens, open data enthusiasts, and government representatives. These submissions then undergo a quality review process led by global experts. This exercise resulted in 150 surveys for the region and provided both an excellent snapshot of how open data in the Caribbean is progressing and how the region ranks in a global context.

Unfortunately, progress in the Caribbean has been mixed, if not slow. While Caribbean governments were early adopters of Freedom of Information legislation–7 countries (Antigua and Barbuda, Belize, Dominican Republic, Guyana, Jamaica, St. Vincent and the Grenadines, Trinidad and Tobago) having passed FOI law–the digital channels through which many citizens are increasingly accessing government information remain underdeveloped. Furthermore, the publication of raw and baseline data, beyond references in press releases, remains a challenge across the region.

For example, St. Kitts, which passed FOI legislature in 2006, only had 2 “open” data sets, Government Budget and Legislature, published readily online. Comparatively, Puerto Rico, the Dominican Republic and Jamaica governments have invested in open data infrastructure and websites to improve the channels through which citizens access information. Impressively, the Dominican Republic’s data portal consisted of 373 data sets from 71 participating Ministries, Departments and Agencies. However, updates to data portals and government websites remain a challenge. In the case of Jamaica’s open data portal, which launched in 2016, it has received a handful of updates since its first publication. While St Lucia and Trinidad & Tobago have published no updates since the first month of the portal’s publication.

Despite these shortcomings, Caribbean governments and civil society organisations continue to make important contributions to the global open data discourse that demonstrate tangible benefits of open data adoption in the lives of Caribbean citizens. These range from research demonstrating the economic impact of open data to community-led initiatives helping to bridge the data gaps that constrain local government planning. In December 2016, Jamaica became the fourth country in the region, after Guyana, the Dominican Republic and Trinidad & Tobago, to indicate its interest in joining the Open Government Partnership, a multilateral initiative consisting of 73 member countries that aims to secure concrete commitments from governments to promote transparency, empower citizens, fight corruption, and harness new technologies to strengthen governance.

Find out on how the Caribbean ranks in the full GODI report to be published on May 2nd.

LITA: Jobs in Information Technology: April 26, 2017

Wed, 2017-04-26 18:59

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Yewno, Product Manager, Content, Redwood City, CA

Sonoma County Library, Curator, Wine Library, Healdsburg Regional Library Full-time, Santa Rosa, CA

University of Chicago, Social Sciences Data and Sociology Librarian, Chicago, IL

University of Chicago, Data Research Services and Biomedical Librarian, Chicago, IL

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Open Knowledge Foundation: OKI Agile: How to create and manage user stories

Wed, 2017-04-26 16:15

This is the first in a series of blogs on how we are using the Agile methodology at Open Knowledge International. Originating from software development, the Agile manifesto describes a set of principles that prioritise agility in work processes: for example through continuous development, self-organised teams with frequent interactions and quick responses to change ( In this blogging series we go into the different ways Agile can be used to work better in teams and to create more efficiency in how to deliver projects. The first blog is dedicated to user stories, a popular agile technique. 

User stories are a pretty nifty way of gathering requirements in an agile environment, where one of the key values is responding to change over following a plan. They are a good anchor for conversation that can then take place at the right time.

What is a user story?

A user story is a short sentence that encapsulates three things:

  1. Who?
  2. What?
  3. Why?

Notice that this does not include “How?” The “How?” is left to the team delivering the requirement. After all, the team consists of the professionals. They know how to deliver the best solution. 

The most common way to encapsulate a user story is to use the template:

  • As a [WHO] I want [WHAT] so that [WHY]

Be careful not to sneak in any Hows into that template. That usually happens in the What so stay focussed! Words like by, using or with should be avoided like the plague because they usually result in a How. Basically avoid anything that has to do with the actual implementation.

Bad user stories
    • As a government official I want a Range Rover so that I can get from A to B quickly
  • Problem: A Range Rover is an actual implementation, it might not be what is needed even though it is what’s believed to be desired.
      • As a visitor to a website I want to be able to read a landing page using my spiffy MacBook Air and have the content presented in the Lato typeface, size 14, and with good white space between paragraphs so that I can know what the website is all about
    • Problem: A whole lot! What about GNU/Linux and Windows users? What if there is a better typeface out there? What about language of the content? The Why isn’t really a why. The list goes on. Don’t go into detail. It’s bad practice and creates more problems than it solves.
    Good user stories
    • As a government official I want means of transportation so that I can get from A to B quickly
    • As a website visitor I want to know what the website is about so that I can see how it can help me
    Why shouldn’t we go into details?

    It’s really quite simple. We expect the requirements to change and we’d just waste a lot of time going into the details of something that might change or get thrown out. We’re trying to be efficient while still giving the team an understanding of the broader picture. An extreme example would be that between project start and time when the team is going to tackle a user story the world might have moved to virtual governments that don’t need transportation any more (technology moves fast).

    The team also consists of experts so they know what works best (if not, why are they tasked to deliver?). The customers are the domain experts so they know best what is needed. In the website visitor example above, the team would know the best way of showing what a website is about (could be a landing page) but the customer knows what the customer is going to offer through the website and how they help people.

    We also value interactions and individuals over processes and tools. In an ever changing requirements environment we want non-details which can when the time comes be the basis for a conversation about the actual implementation. The team familiarises itself with the requirement at the appropriate time. So when starting work on the transportation user story, the team might discuss with the customer and ask questions like:

    • “How fast is quickly?”,
    • “Are A and B in the same city, country, on Earth?”,
    • “Are there any policies we need to be aware of?” etc.
    Acceptance of user stories

    Surely the customer would still want to be able to have a say in how things get implemented. That’s where acceptance criteria comes in. The customer would create a checklist for each user story when the time comes in a joint meeting, based on discussion. That’s the key thing. It comes out of a discussion.

    This criteria tells the team in a bit more detail what they need to fulfill to deliver the requirement (user story). For the government in need of transport this might be things like:

    • Main area of interest/focus is London area
    • Applicable to/usable in other cities as well
    • Allows preparations for a meeting while in transit
    • Very predictable so travel time can be planned in detail
    • Doesn’t create a distance between me and the people I serve

    Then the implementation team might even pick public transportation to solve this requirement. A Range Rover wasn’t really needed in the end (albeit this would probably go against the “satisfy the customer” principle but hey! I’m teaching you about user stories here! Stay focussed!).

    How is this managed?

    One key thing we want to get out of user stories is to not scope the requirement in detail until it becomes clear that it’s definitely going to be implemented. How then do you know what you’ll be doing in the future?

    User stories can be of different sizes; From very coarse to detailed stories. The very coarse ones don’t even need to be written as user stories. They’re often referred to as epics.

    Many break requirements into three stages. The releases or the projects or whatever the team works on. Then each of these can be broken up into features and each feature can be broken up into tasks. It’s up to the team to decide when it’s best to formulate these as user stories and it really depends on the team and the project.

    Some might have epics as the big long term project, break that up into user stories, and then break each user story up into tasks. Others might have a single product, with the releases (what you want to achieve in each release: “The geospatial release”) at the top and then have features as sentences (epics) underneath the release and then transform the sentences into user stories you work on.

    Whatever way you do, this is the general guideline of granularity:

    • Coarsest: Long-term plans of what you’ll be doing
    • Mid-range: Delivery in a given time period (e.g. before deadlines)
    • Finest: What team will deliver in a day or two

    The reason the finest level is in a day or two is to give the team a sense of progress and avoid getting stuck at: “I’m still doing the guildhall thing” which is very demoralizing and inefficient (and not really helpful for others who might be able to help).

    There is a notion of the requirements iceberg or pyramid which tries to visualise the three stages. The bottom stage is larger and bigger items (the coarse stuff), mid range is what you’re delivering in a time period, and the finest is the smallest blocks of work. That’s what’s going to be “above” surface for the core team. That’s still just a fraction of the big picture.

    When should who be involved?

    So the core team has to decide at what stage of the iceberg they want to write the user stories, and that kind of depends on the project, the customer, and the customer’s involvement. So we need to better understand “the team”.

    The core team should always be present and work together. Who is in the core team then? If that’s not clear, there’s a story/joke, about the pig and the chicken, that can guide us:

    A pig and a chicken decided to open up a restaurant. They were discussing what name to give the restaurant when the chicken proposed the name: Ham & Eggs. The pig sneered its nose and said: “That’s unfair, I’d be committed but you’d only be involved!”

    That’s the critical distinction between the core team and others. The core team is the pigs. Everyone else who is only involved to make the project happen is a chicken. The pigs run the show. The chickens are there to make sure the pigs can deliver.

    Chickens come in various sizes and shapes. It can be team managers (planning persons), unit heads, project managers, biz-dev people, and even customers.

    The term customer is pretty vague. You usually don’t have all your customers involved. Usually you only have a single representative. For bespoke/custom development (work done at the request of someone else), that person is usually the contact person for the client you’re working for.

    At other times the single customer representative is an internal person. That internal individual is sometimes referred to as the product owner (comes from Scrum) and is a dedicated role put in place when there is no single customer, e.g. the product is being developed in-house. That person then represents all customers and has in-depth knowledge about all customers or has access to a focus group or something.

    This individual representative is the contact point for the team. The one who’s there for the team to help them deliver the right thing. More specifically this individual:

    • Creates initial user stories (and drives creation of other user stories)
    • Helps the team prioritise requirements (user stories)
    • Accepts stories (or rejects) when the team delivers
    • Is available to answer any questions the team might have

    So the representative’s role is to provide the implementers with enough domain knowledge to proceed and deliver the right thing. This individual should not have any say in how the core team will implement it. That’s why the team was hired/tasked with delivering it, because they know how to do it. That’s also why user stories do not focus on the how.

    The core team, the pigs, need to decide at what intersections in the iceberg they want to have this representative present (where discussions between the core team and the representative will happen). When they go from coarsest to mid-range or from mid-range to finest. So in a weird sense, the core team decides when the customer representative decides what will be done.

    As a rule of thumb: the user stories feed into the stage above the intersection where representative is present.

    So if the representative helps the team go from coarse to mid-range, the user stories are created for the mid-range stage. If the representative is there for mid-range to finest, the user stories are going to be very fine-grained.

    As a side note, because the chickens are there to make sure the pigs can deliver, they will always have to be available to answer questions. Many have picked up the standup activity from the Scrum process to discuss blockers, and in those cases it’s important that everyone involved, both pigs and chickens, is there so the chickens can act quickly to unblock the pigs.

    Now go and have fun with user stories. They shouldn’t be a burden. They should make your life easier… or at least help you talk to chickens.

    District Dispatch: The fight for library funding is on in the U.S. Senate

    Wed, 2017-04-26 14:04

    The Fight for Libraries! has moved to the United States Senate. Today, two “Dear Appropriator” letters began circulating in the Senate, one seeking $186.6 million for Library Services and Technology Act (LSTA) and the other $27 million for the Innovative Approaches to Libraries (IAL) program for FY 2018. Senators Jack Reed (D-RI) and Susan Collins (R-ME) are again championing funds for LSTA, while Sens. Reed, Grassley (R-IA) and Stabenow (D-MI) are leading the fight for IAL. For more information about each program and the appropriations process, visit our previous posts on this topic or watch our most recent webinar.

    Senators have until May 19 to let our champions know that they will sign the separate LSTA and IAL “Dear Appropriator” letters, so there’s no time to lose. Use ALA’s Legislative Action Center today to contact both of your Senators and ask them to support federal funding for libraries by signing on to both the Reed/Collins LSTA and Reed/Grassley/Stabenow IAL Dear Appropriator letters.

    Many Senators will only sign if their constituents ask them to. Let them know why libraries are important to your community and ask them directly to show their support.

    Last month, library advocates succeeded in convincing a record one-third of all Members of the House to sign the House versions of these LSTA and IAL letters. We need you to keep that momentum going by collectively convincing at least half of all Senators to do the same!

    Given the President’s proposal to eliminate the Institute of Museum and Library Services (IMLS) and virtually all other library funding sources, the support of both your Senators is more important than ever before. Five minutes of your time could help preserve over $210 million in library funding that’s at serious risk.

    To take action, visit the Action Center for additional talking points and easy-to-send email templates.

    Have a few more minutes to invest in the fight for library funding? Here are some fast and enormously helpful things you can do as well:

    1. Share your library’s federal funding story and support for LSTA and IAL on Twitter using the #SaveIMLS hashtag. Tell us how IMLS funding supports your local community through LSTA or other means. (If you aren’t sure which IMLS grants your library has received, you can check the searchable database available on the IMLS website.)
    2. Whether you tweet it or not, tell us your story so we can make sure that your Members of Congress know how federal library funding is working for them and their constituents at home.
    3. Sign up to receive our action alerts so we can let you know when and how to take action, and send you talking points and background information to make that easy, all through the year.
    4. Participate in Virtual Library Legislative Day starting on May 1 and sign up for our Thunderclap.

    Thank you for your indispensable support. Together, we can win the Fight for Libraries!

    The post The fight for library funding is on in the U.S. Senate appeared first on District Dispatch.

    William Denton: York U job: head of science library / physical sciences librarian

    Wed, 2017-04-26 01:06

    At York University Libraries, where I work, there is a search on right now for Physical Sciences Librarian and Head of Steacie Science and Engineering Library.

    The deadline for applications is 2 June 2017. If you know a librarian with a background in the physical sciences who might be looking for a job, please send them the link.

    I’m on the search committee, so I can’t give any tips, but I’ll point out a few things:

    • York University pays well. For historical pay equity reasons there’s a sort of grid that determines salaries based on the year one got one’s MLIS, so there’s no bargaining that will happen. Someone who got their MLIS in 2007, ten years ago, could expect to make about $120,000.
    • Librarians are in the York University Faculty Association (a union that takes social and progressive issues very seriously) and have academic status.
    • The benefits are good.
    • Americans are welcome to apply. (In Canada health care is publicly funded, etc.)
    • York University is an exciting place to work!
    • The strategic plan mentioned in the ad is a little hard to find on our site, so have a look.
    • There’s an affirmative action plan in place, and in this search we added this to the standard paragraph: “People with disabilities and Aboriginal people are priorities in the York University Libraries Affirmative Action plan and are especially encouraged to apply. Consideration will also be given to those who have followed non-traditional career paths or had career interruptions.” We mean it.

    If you want to find out more about York and what the job would be like, email me at and I can put you in touch with someone not on the search committee.

    Library Tech Talk (U of Michigan): Michigan Daily - From Newspaper Prints To Digital Archives

    Wed, 2017-04-26 00:00

    The Michigan Daily Digital Archives is a joint collaboration between the University of Michigan Library IT division, Michigan Daily, and the Bentley Historical Library. The Michigan Daily Digital Archives provides searchable access to over 300 volumes, 23,000 issues of digitized student newspaper, from 1891 through 2014. New volumes of the newspaper will be added in the future as they become available. The Library IT team developed a robust discovery interface for the archives. The team made the choice of building a discovery system instead of using an out of the box application or vended solutions. The development team followed Scrum-like Agile approach for website development.

    DuraSpace News: A Report from the 2017 DuraSpace Member Summit

    Wed, 2017-04-26 00:00

     Albuquerque, NM The annual DuraSpace Summit report to members was held in Albuquerque, New Mexico, on April 4-5 following the CNI Spring Member Meeting. DuraSpace Members met to focus on strategy and tactics aimed at broadening and extending the organization’s reach in support of global community ecosystem efforts towards preservation and accessibility of cultural heritage and academic resources.

    LITA: Beginning Git and GitHub

    Tue, 2017-04-25 17:01

    In a new LITA web course learn how to use the powerful and popular project management and collaboration tools, Git and GitHub. Perfect for anyone who works with code and on projects such as web sites, apps, classes, scripts, and presentations.

    Beginning Git and GitHub
    Instructors: Kate Bronstad, Web Developer, Tufts University Libraries; and Heather Klish, Systems Librarian, Tufts University Libraries.
    May 4 – June 1, 2017
    Register here, courses are listed by date and you need to log in

    Work smarter, collaborate faster and share code or other files with the library community using the popular version control system Git. Featuring a mix of git fundamentals and hands-on exercises, participants learn the basics of Git, learn how to use key commands, and how to use GitHub to their advantage, including sharing their own work and building upon the projects of others.

    View details and Register here.

    This is a blended format web course

    The course will be delivered as separate live webinar lectures, one per week. You do not have to attend the live lectures in order to participate. The webinars will be recorded for later viewing.

    Check the LITA Online Learning web page for additional upcoming LITA continuing education offerings.

    Questions or Comments?

    For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

    LibUX: Listen: On design in user stories and user experience departments (20:48)

    Tue, 2017-04-25 16:04

    One new episode of Metric (a user experience podcast) over coffee before a string of interviews to round-out the month of April. In this episode:

    • What role does photoshop play in UX?
    • Should “design” be part of a user story?
    • What are the necessary technical abilities for doing UX?
    • What are your thoughts on UX Departments


    You can also  download the MP3 or subscribe to Metric: A UX Podcast on OverCastStitcher, iTunes, YouTube, Soundcloud, Google Music, or just plug our feed straight into your podcatcher of choice.

    Andromeda Yelton: adventures with parsing Django uploaded csv files in python3

    Tue, 2017-04-25 15:46

    Let’s say you’re having problems parsing a csv file, represented as an InMemoryUploadedFile, that you’ve just uploaded through a Django form. There are a bunch of answers on stackoverflow! They all totally work with Python 2! …and lead to hours of frustration if, say, hypothetically, like me, you’re using Python 3.

    If you are getting errors like _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) — and then getting different errors about DictReader not getting an expected iterator after you use .decode('utf-8') to coerce your file to str — this is the post for you.

    It turns out all you need to do (e.g. in your form_valid) is: csv.DictReader(io.StringIO('utf-8')))

    What’s going on here?

    The seek statement ensures the pointer is at the beginning of the file. This may or may not be required in your case. In my case, I’d already read the file in my in order to validate it, so my file pointer was at the end. You’ll be able to tell that you need to seek() if your csv.DictReader() doesn’t throw any errors, but when you try to loop over the lines of the file you don’t even enter the for loop (e.g. print() statements you put in it never print) — there’s nothing left to loop over if you’re at the end of the file.

    read() gives you the file contents as a bytes object, on which you can call decode().

    decode('utf-8') turns your bytes into a string, with known encoding. (Make sure that you know how your CSV is encoded to start with, though! That’s why I was doing validation on it myself. Unicode, Dammit is going to be my friend here. Even if I didn’t want an excuse to use it because of its title alone. Which I do.)

    io.StringIO() gives you the iterator that DictReader needs, while ensuring that your content remains stringy.

    tl;dr I wrote two lines of code (but eight lines of comments) for a problem that took me hours to solve. Hopefully now you can copy these lines, and spend only a few minutes solving this problem!

    LITA: LITA Conference Buddy Program

    Tue, 2017-04-25 14:41

    Going to ALA Annual this summer?

    Sign up for the LITA Conference Buddy Program

    The LITA Conference Buddy Program, which was inspired by the GLBTRT Buddy Program, is designed to make conference attendance more approachable, foster inclusion, and build connections between new LITA members and members who have attended past conferences. We hope that this program will leave both participants with a new friend and a new perspective on LITA participation.

    To participate in the program as either a new or experienced conference attendee, get details and complete the sign up form by June 4, 2017

    If you have any questions about the program, please contact the Diversity & Inclusion Committee at


    The LITA Diversity & Inclusion Committee

    DuraSpace News: DSpace 7 REST API Demo Available Online

    Tue, 2017-04-25 00:00

    From Michele Mennielli, International Business Developer 4Science  

    HangingTogether: Survey on innovation trends and priorities in European research libraries

    Mon, 2017-04-24 12:38

    OCLC Research is currently conducting a survey on innovation trends and priorities in European research libraries. The survey was sent to library directors at 238 institutions in the UK, the Netherlands, Germany, Austria, Switzerland, Denmark, Spain, France, and Italy.*

    Our aim is to gain intelligence regarding trends, capacities and priorities within the European research library community. This will inform future joint activities between OCLC and the research library community in Europe, and scope the opportunity space for OCLC Research, and the OCLC Research Library Partnership.

    Findings from this survey will reveal trends, capacities and priorities within the research library community. We will produce a report, which can be shared broadly. Additionally, OCLC Research will share the data gathered in this survey, so that others can make their own interpretations.

    Survey marker, by Bidgee (Own work) [CC BY 3.0 (], via Wikimedia Commons

    If you or a colleague has received the survey, we would be grateful if you would take some time to fill it out — it should take no more than 25 minutes and the results will be valuable both to us, and to the broader community.

    We are grateful to representatives from OCLC EMEA Regional Council who have helped to guide the development of this survey: Hubert Krekels (Wageningen University), Annette le Roux (University of South Africa Library), and Rupert Schaab (State and University Library Göttingen).

    For questions, please contact Merrilee Proffitt.

    *Institutions were chosen from the Times Higher Education World University Rankings 2016-2017

    About Merrilee Proffitt

    Mail | Web | Twitter | Facebook | LinkedIn | More Posts (289)

    Open Knowledge Foundation: Making European Subsidy Data Open

    Mon, 2017-04-24 10:13

    One month after releasing a joint project of Open Knowledge Germany and Open Knowledge International, we have some great news to share. Due to the extensive outreach of our platform and the data quality report we published, new datasets have been directly sent to us by several administrations. We have recently added new data for Austria, the Netherlands, France and the United Kingdom. Furthermore, first Romanian data recently arrived and should be available in the near future.

    Now that the platform is up and running, we want to explain how we actually worked on collecting and opening all the beneficiary data. is a tool that enables the user to visualize, analyze and compare subsidy data across the European Union thereby enhancing transparency and accountability in Europe. To make this happen we first had to collect the datasets from each EU member state and scrape, clean, map and then upload the data. Collecting the data was an incredible frustrating process, since EU member states publish the beneficiary data in their own country (and regional) specific portals which had to be located and often translated.

    A scraper’s nightmare: different websites and formats for every country

    The variety in how data is published throughout the European Union is mind-boggling. Few countries publish information on all three concerned ESIF Funds (ERDF, ESF, CF) in one online portal, while most have separate websites distinguished by funds. Germany provides the most severe case of scatteredness, not only is the data published by its regions (Germany’s 16 federal states), but different websites for distinct funds exist (ERDF vs. ESF) leading to a total of 27 German websites. Arguably making the German data collection just as tedious as collecting all data for the entire rest of the EU.

    Once the distinct websites were located through online searches, they often needed to be translated to English to retrieve the data. As mentioned the data was rarely available in open formats (counting csv, json or xls(x) as open formats) and we had to deal with a large amount of PDFs (51) and webapps (15) out of a total of 122 files. The majority of PDF files was extracted using Tabula, which worked fine some times and required substantial work with OpenRefine – cleaning misaligned data – for other files. About a quarter of the PDFs could not be scraped using tools, but required hand tailored scripts by our developer.

    However, PDFs were not our worst nightmare: that was reserved for webapps such as this French app illustrating their 2007-2013 ESIF projects. While the idea of depicting the beneficiary data on a map may seem smart, it often makes the data useless. These apps do not allow for any cross project analysis and make it very difficult to retrieve the underlying information. For this particular case, our developer had to decompile the flash to locate the multiple dataset and scrape the data.

    Open data: political reluctance or technical ignorance?

    These websites often made us wonder what the public servants that planned this were thinking? They already put in substantial effort (and money) when creating such maps, why didn’t they include a “download data” button? Was it an intentional decision to publish the data, but make difficult to access? Or is the difference between closed and open data formats simply not understood well enough by public servants? Similarly, PDFs always have to be created from an original file, while simply uploading that original CSV or XLSX file could save everyone time and money.

    In our data quality report we recognise that the EU has made progress on this behalf in their 2013 regulation mandating that beneficiary data be published in an open format. While publication in open data formats has increased henceforth, PDFs and webapps remain a tiring obstacle. The EU should assure the member states’ compliance, because open spending data and a thorough analysis thereof, can lead to substantial efficiency gains in distributing taxpayer money.

    This blog has been reposted from

    DuraSpace News: Webinar: Powering Linked Data and Hosted Solutions with Fedora

    Mon, 2017-04-24 00:00

    Fedora is a flexible, extensible, open source repository platform that forms a strong  base for digital preservation, and supports Linked Data.  Fedora is used in a wide variety of institutions around the world, including libraries, museums, archives, and government organizations. Join us for a webinar on Tuesday, May 16* at 9:30am AEST (convert to your timezone) to learn more about Fedora.

    Ed Summers: robots.txt

    Sun, 2017-04-23 04:00

    The Internet Archive does some amazing work in the Sisyphean task of archiving the web. Of course the web is just too big and changes too often for them to archive it all. But Internet Archive’s crawling of the web and serving it up out of their Wayback Machine, plus their collaboration with librarians and archivists around the world make it a truly public service if there ever was one.

    Recently they announced that they are making (or thinking of making) a significant change to the way they archive the web:

    A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.

    The robots.txt was developed to establish a conversation between web publishers and the crawlers, a.k.a. bots, that come to gather and index content. It allows web publishers to provide guidance to automated agents from companies like Google about what parts of the site to index, and to point to a sitemap that lets them do their job more efficiently. It also allows the web publisher to ask a crawler to slow down with the Crawl-delay directive, if their infrastructure doesn’t support rapid crawling.

    Up until now the Internet Archive have used the robots.txt in two ways:

    • their ia_archiver web crawler consults a publisher’s robots.txt to determine what parts of a website to archive and how often
    • the Wayback Machine (the view of the archive) consults the robots.txt to determine what to allow people to view from the archived content it has collected.

    If the Internet Archive’s blog post is read at face value it seems like they are going to stop doing these things altogether, not just for government websites, but for the entire web. While conversation in Twitter makes it seem like this is a great idea whose time has come, I think this would be a step backwards for the web and for its most preeminent archive, and I hope they will reconsider or take this as an opportunity for a wider discussion.

    I think it’s crucial to look at the robots.txt as an imperfect, but much needed part of a conversation between web publishers and archives of the web. The idea that there is a perfect archive that contains all the things is a noble goal, but it has always been a fantasy. Like all archives the Internet Archive represents only a sliver of a sliver of the thing we call the web. They make all kinds of decisions about what to archive and when, which are black boxed and difficult to communicate. While some people view the robots.txt as nothing better than a suicide note that poorly optimized websites rely on, robots.txt is really just small toehold in providing transparency about the decisions about what to archive from the web.

    If a website really wants to block the Internet Archive it can still do so by limiting access by IP addresses or by ignoring any clients named ia_archiver. If the Internet Archive starts to ignore robots.txt it pushes the decisions about who and what to archive down into the unseen parts of web infrastuctures. It introduces more uncertainty, and reduces transparency. It starts an arms race between the archive and the sites that do not want their content to be archived. It treats the web as one big public information space, and ignores the more complicated reality that there is a continuum between public and private. The idea that Internet Archive is simply a public good obscures the fact that ia_archiver is run by a subsidiary of Amazon, who sell the data, and also make it available to the Internet Archive through a special arrangement. This is a complicated situation and not about a simple technical fix.

    The work and craft of archives is one that respects the rights of content creators and involves them in the process of preservation. Saving things for the long term is an important task that shapes what we know of the past and by extension our culture, history and future. While this process has historically privileged the powerful in society, the web has has lowered the barrier to publishing information, and offers us a real opportunity to transform whose voices are present in the archive. While it makes sense to hold our government to a particular standard, the great thing about the web is that not all web publishers are so powerful. It is important that Internet Archive not abandon the idea of a contract between web publishers and the archive.

    Most importantly we don’t know what the fate of the Internet Archive will be. Perhaps some day it will decide to sell its trove of content to a private company and closes its doors. That’s why its important that we not throw the rights of content creators under the bus, and hold the Internet Archive accountable as well. We need web archives that are partners with web publishers. We need more nuance, understanding and craft in the way we talk about and enact archiving the web. I think archivists and Archive-It subscribers need to step up and talk more about this. Props to the Internet Archive for starting the conversation.

    DPLA: Job Opportunity: Developer (Frontend and Web Applications)

    Fri, 2017-04-21 19:12

    The Digital Public Library of America seeks a full-time Developer to support ongoing work on the DPLA public-facing Web applications.

    We are seeking a curious and enthusiastic individual who recognizes both their technical strengths and areas for growth, who can help us work effectively to further DPLA’s mission to bring together the riches of America’s libraries, archives, and museums, and make them freely available to all. A belief in this mission, and the drive to accomplish it over time in a collaborative spirit within and beyond the organization, is essential.


    Reporting to the Director for Technology, the Developer:

    • Will participate in a high-impact, upcoming site-wide redesign effort
    • Builds out functionality on the DPLA website and internal tools, including tools related to search, content management, and community engagement
    • Customizes and deploys open source software to suit organizational needs
    • Performs other related duties and participates in special projects as assigned.

    As a member of the DPLA Technology Team, the Developer:

    • Contributes to design, development, testing, integration, support, and documentation of user-facing applications and back-end systems.
    • Participates in software engineering team group activities including sprint rituals, code reviews, and knowledge sharing activities.
    • Supports content management policies, process, and workflows, and contributes to the development of new ones.
    • Collaborates with internal and external stakeholders in planning and implementation of applications supporting DPLA’s mission, strategic plan, and special initiatives.
    • Maintains knowledge of emerging technologies to support the DPLA’s evolving services.
    • Embodies and promotes the philosophy of open source, shared, and community-built software and technologies.
    • Brings creative vision around possibilities for work with data that we haven’t yet imagined.


    • 5+ years professional experience in software development or a related discipline.
    • A proven ability to build websites and web applications targeted at the general public and that operate at public scale
    • Experience with server and client webapp languages such as Ruby, Python, and Javascript and associated frameworks.
    • Ability to build building user-centric, accessible websites that conform to responsive design principles and work across a variety of devices
    • A passion for writing clean, performant, and testable code.
    • Understanding the importance of continuous integration, automated testing and deployments, and static analysis of code quality
    • Demonstrated experience working effectively in a team environment and the ability to interact well with stakeholders.
    • Desire and enthusiasm about learning new toolsets, programming languages, or methods to support software development.

    Preferred Qualifications

    • A history of collaboration with open source projects
    • Knowledge of client-side JS frameworks like Angular/React
    • A history of work with data centric applications and search-oriented architecture
    • A successful history of working effectively in a geographically-distributed organization.
    • Experience working on an agile team using methodologies such as Scrum and Kanban

    Nice to Have

    • Mobile development experience, particularly using tools that promote reuse of web assets
    • Experience building and using REST APIs and distributed architectures
    • Advanced knowledge of modern Javascript/Typescrip
    • Experience working with or in a library, museum, archive or other cultural heritage organization
    • Experience working with multiple formats such as audio, video, ebooks, and newspapers in the browser environment

    This position is full-time. DPLA is a geographically-distributed organization, with headquarters in Boston, Massachusetts. Ideally, this position would be situated in the Northeast Corridor between Washington, D.C. and Boston, but remote work based in other locations will also be considered.

    Like its collection, DPLA is strongly committed to diversity in all of its forms. We provide a full set of benefits, including health care, life and disability insurance, and a retirement plan. Starting salary is commensurate with experience.

    About DPLA

    DPLA connects people to the riches held within America’s libraries, archives, museums, and other cultural heritage institutions. Since launching in April 2013, it has aggregated more than 16 million items from 2,350 institutions. DPLA is a registered 501(c)(3) non-profit.

    To apply, send a letter of interest detailing your qualifications, a resume, and a list of 3 references in a single PDF to First preference will be given to applications received by May 12, 2017, and review will continue until the position is filled.

    David Rosenthal: A decade of blogging

    Fri, 2017-04-21 17:09
    A decade ago today I posted Mass-market scholarly communication to start this blog. Now, 459 posts later I would like to thank everyone who has read and especially those who have commented on it.

    Blogging is useful to me for several reasons:
    • It forces me to think through issues.
    • It prevents me forgetting what I thought when I thought through an issue.
    • Its a much more effective way to communicate with others in the same field than publishing papers.
    • Since I'm not climbing the academic ladder there's not much incentive for me to publish papers anyway, although I have published quite a few since I started LOCKSS.
    • I've given quite a few talks too. Since I started posting the text of a talk with links to the sources it has become clear that it is much more useful to readers than posting the slides.
    • I use the comments as a handy way to record relevant links, and why I thought they were relevant.
    There weren't  a lot of posts until in 2011 I started to target one post a week. I thought it would be hard to come up with enough topics, but pretty soon afterwards half-completed or note-form drafts started accumulating. My posting rate has accelerated smoothly since, and most weeks now get two posts. Despite this, I have more drafts lying around than ever.