You are here

Feed aggregator

Open Knowledge Foundation: Educators ask for a better copyright

planet code4lib - Wed, 2018-01-17 11:28

This blog has been reposted from the Open Education Working Group page.


Today we, the Open Education Working Group, publish a joint letter initiated by Communia Association for the Public Domain that urgently requests to improve the education exception in the proposal for a Directive on Copyright in the Digital Single Market (DSM Directive). The letter is supported by 35 organisations representing schools, libraries and non-formal education, and also individual educators and information specialists.


In September 2016 the European Commission published its proposal of a DSM Directive that included an education exception that aimed to improve the legal landscape. The technological ages created new possibilities for educational practices. We need copyright law that enables teachers to provide the best education they are capable of and that fits the needs of teachers in the 21st century. The Directive is able to improve copyright.

However, the proposal does not live up to the needs of education. In the letter we explain the changes needed to facilitate the use of copyrighted works in support of education. Education communities need an exception that covers all relevant providers, and which permits a diversity of educational uses of copyrighted content. We listed four main problems with the Commission’s proposal:

#1:  A limited exception instead of a mandatory one

The European Commission proposed a mandatory exception, which can be overridden by licenses. As a consequence educational exception will still be different in each Member State. Moreover, educators will need a help from a lawyer to understand what they are allowed to do.

#2 Remuneration should not be mandatory

Currently most Member States have exceptions for educational purposes that are completely or largely unremunerated. Mandatory payments will change the situation of those educators (or their institutions), which will have to start paying for materials they are now using for free.

#3: Excluding experts

The European Commission’s proposal does not include all important providers of education as only formal educational establishments are covered by the exception. We note that the European lifelong-learning model underlines the value of informal and non-formal education conducted in the workplace. All these are are excluded from the education exception.

#4: Closed-door policy

The European Commission’s proposal limits digital uses to secure institutional networks and to the premises of an educational establishment. As a consequence educators will not develop and conduct educational activities in other facilities such as libraries and museums, and they will not be able to use modern means of communication, such as emails and the cloud.

To endorse the letter, send an email to Do you want to receive updates on the developments around copyright and education, sign up for Communia’s newsletter Copyright Untangled.

You can read the full letter in this blog on the Open Education website or download the PDF.

DuraSpace News: Registration Open for Fedora Camp at NASA

planet code4lib - Wed, 2018-01-17 00:00
Fedora is the robust, modular, open source repository platform for the management and dissemination of digital content. Fedora 4, the latest production version of Fedora, features vast improvements in scalability, linked data capabilities, research data support, modularity, ease of use and more. Fedora Camp offers everyone a chance to dive in and learn all about Fedora.   The Fedora team will offer a Camp from Wednesday May 16 - Friday May 18, 2018 at the NASA Goddard Space Flight Center  in Greenbelt, Maryland outside of Washington, D.C.

Library of Congress: The Signal: From Code to Colors: Working with the JSON API

planet code4lib - Tue, 2018-01-16 21:26

The following is a guest post by Laura Wrubel, software development librarian with George Washington University Libraries, who has joined the Library of Congress Labs team during her research leave.

The Library of Congress website has an API ( “application programming interface”) which delivers the content for each web page. What’s kind of exciting is that in addition to providing HTML for the website, all of that data–including the digitized collections–is available publicly in JSON format, a structured format that you can parse with code or transform into other formats. With an API, you can do things like:

  • build a dataset for analysis, visualization, or mapping
  • dynamically include content from a website in your own website
  • query for data to feed a Twitter bot

This opens up the possibility for a person to write code that sends queries to the API in the form of URLs or “requests,” just like your browser makes. The API returns a “response” in the form of structured data, which a person can parse with code. Of course, if there were already a dataset available to download that would be ideal. David Brunton explains how bulk data is particularly useful in his talk “Using Data from Historical Newspapers.” Check out LC for Robots for a growing list of bulk data currently available for download.

I’ve spent some of my time while on research leave creating documentation for the JSON API.  It’s worth keeping in mind that the JSON API is a work in progress and subject to change. But even though it’s unofficial, it can be a useful access point for researchers.  I had a few aims in this documentation project: make more people aware of the API and the data available from it, remove some of the barriers to using it by providing examples of queries and code, and demonstrate some ways to use it for analysis. I approached this task keeping in mind a talk I heard at PyCon 2017, Daniele Procida’s “How documentation works, and how to make it work for your project” (also available as a blog post), which classifies documentation into four categories: reference, tutorials, how-to, and explanation. This framing can be useful in making sure your documentation is best achieving its purpose. The JSON API documentation is reference documentation, and points to Jupyter notebooks for Python tutorials and how-to code. If you have ideas about additional “how-to” guides and tutorials would be useful, I’d be interested to hear them!

At the same time that I was digging into the API, I was working on some Jupyter notebooks with Python code for creating image datasets, for both internal and public use. I became intrigued by the possibilities of programmatic access to thumbnail images from the Library’s digitized collections. I’ve had color on my mind as an entry point to collections since I saw Chad Nelson’s DPLA Color Browse project at DPLAfest in 2015.

So as an experiment, I created Library of Congress Colors.

View of colors derived from the Library of Congress Baseball Cards digital collection

The app displays six colors swatches, based on cluster analysis, from each of the images in selected collections. Most of the collections have thousands of images, so it’s striking to see the patterns that emerge as you scroll through the color swatches (see Baseball Cards, for example). It also reveals how characteristics of the images can affect programmatic analysis. For example, many of the digitized images in the Cartoons and Drawings collection include a color target, which was a standard practice when creating color transparencies. Those transparencies were later scanned for display online. While useful for assessing color accuracy, the presence of the target interferes with color analysis of the cartoon, so you’ll see colors from that target pop up in the color swatches for images in that collection. Similarly, mattes, frames, and other borders in the image can skew the analysis. As an example, click through the color bar below to see the colors in the original cartoon by F. Fallon in the Prints and Photographs Division. 

A color swatch impacted by the presence of the color bar photographed near the cartoon  in Prints and Photographs collection

This project was a fun way to visualize the collection while testing the API, and I’ve benefited from working with the National Digital Initiatives team as I developed the project. They and their colleagues have been a source of ideas for how to improve the visualization, connected me with people who understand the image formats, and provided LC Labs Amazon Web Services storage for making the underlying data sets downloadable by others. We’ve speculated about the patterns that emerge in the colors and have dozens more questions about the collections from exploring the results.

View of colors derived from the Library of Congress Works Progress Administration (WPA) poster digital collection

There’s something about color that is delightful and inspiring. Since I’ve put the app out there, I’ve heard ideas from people about using the colors to inspire embroidery, select paint colors, or think about color in design languages. I’ve also heard from people excited to see Python used to explore library collections and view an example of using a public API. I, myself, am curious to see what people may find as they explore Library of Congress collection as data and use the JSON API or one of the many other APIs to create their own data sets. What could LC Labs do to help with this? What would you like to see?

District Dispatch: UPDATE: 50 Senators support CRA to restore Net Neutrality

planet code4lib - Tue, 2018-01-16 17:59

Senate legislation to restore 2015’s strong, enforceable net neutrality rules now has the bipartisan support from 50 of 100 senators and would be assured of passage if just one more Republican backs the effort. The bill is a Congressional Review Act (CRA) resolution from Sen. Ed Markey (D-MA), which would block the Federal Communications Commission’s (FCC) December repeal of net neutrality rules.

The measure is backed by all 49 members of the Senate Democratic caucus, including 47 Democrats and two independents who caucus with Democrats. Sen. Susan Collins (R-ME) is the only Republican to support the bill so far, and supporters are trying to secure one more Republican vote. A successful CRA vote, in this case, would invalidate the FCC’s net neutrality repeal and prevent the FCC from issuing a similar repeal in the future. But the Senate action needs a counterpart in the House, and this Congressional action would be subject to Presidential approval.

ALA is working with allies to encourage Congress to overturn the FCC’s egregious action. Email your members of Congress today and ask them to use a Joint Resolution of Disapproval under the CRA to repeal the December 2017 FCC action and restore the 2015 Open Internet Order protections.

We will continue to update you on the activities above and other developments as we continue to work to preserve a neutral internet.

The post UPDATE: 50 Senators support CRA to restore Net Neutrality appeared first on District Dispatch.

pinboard: Availability Calendar - Kalorama Guest House

planet code4lib - Tue, 2018-01-16 17:55

David Rosenthal: Not Really Decentralized After All

planet code4lib - Tue, 2018-01-16 16:00
Here are two more examples of the phenomenon that I've been writing about ever since Economies of Scale in Peer-to-Peer Networks more than three years ago, centralized systems built on decentralized infrastructure in ways that nullify the advantages of decentralization:

Open Knowledge Foundation: A lookback on 2017 with OK Brazil

planet code4lib - Tue, 2018-01-16 09:30

This blog has been written by Natalia Mazotte and Ariel Kogan, co-directors of Open Knowledge Brazil (OKBR). It has been translated from the original version at by Juliana Watanabe, volunteer of OKBR.

For us at Open Knowledge Brazil (OKBR), the year 2017 was filled with multiple partnerships, support and participation in events; projects and campaigns for mobilisation. In this blog we selected some of these highlights. Furthermore, newsflash for the team: the journalist Natália Mozatte, that was already leading Escola de Datos (School of Data) in Brazil, became co-director with Ariel Kogan (executive director since July 2016).

Foto: Engin_Akyurt / Creative Commons CC0


At the beginning of the year, OKBR and several other organizations introduced the Manifest for Digital Identification in Brazil. The purpose of the Manifest is to be a tool for society to take a stand towards the privacy and safety of personal data of citizens and turn digital identification into a safe, fair and transparent action.

We monitored one of the main challenges in the city of São Paulo and contributed to the mobilisation for this. Along with other civil society organisations, we urged the City Hall of São Paulo for transparency regarding mobility. The reason: on 25 January 2017, the first day of the new increase to the speed limits on Marginais Pinheiros and Tietê, we noticed several news items about the decrease in traffic accidents linked to the policy of reducing speed in certain parts of the city was unavailable on the site of the Traffic Engineering Company (CET).

For a few months, we conducted a series of webinars called OKBR Webinars Serires, about open knowledge of the world. We had the participation of the following experts: Bart Van Leeuwen, entrepreneur; Paola Villareal, Fellow from the Berkman Klein Center, designer/data scientist; Fernanda Campagnucci, journalist and analyst of public policies and Rufus Pollock, founder of Open Knowledge International.

We took part in a major victory for society! Along with the Movimento pela Transparência (PartidáriaMovement for Partisan Transparency), we conducted a mobilisation against the rapporteur’s proposal for a political reform, congressman Vicente Cândido (PT-SP), about hidden contributions from the campaign and the result was very positive. Besides us, a variety of organisations and movements took part in this initiative against hidden donations,: we published and handed out a public statement. The impact was huge: as a consequence, the rapporteur announced the withdrawal of secret donations.

We also participated in #NãoValeTudo, a collective effort to discuss the correct use of technology for electoral purposes along with AppCívico, o Instituto Update, o Instituto Tecnologia e Equidad.


We performed two cycles of OpenSpending. The first cycle initiated in January and involved 150 municipalities. In July, we published the report of cycle 1. In August, we started the second cycle of the game with something new: Guaxi, a robot which was the digital assistant to competitors. It is an expert bot developed with innovative chatbot technology, simulating human interaction with the users. This made the journey through the page of OpenSpending on Facebook easier. The report of the second cycle is available here.

Together with the Board of Assessment of Public Policies from FGV/DAPP we released the Brazilian edition of the Open Data Index (ODI). In total, we built three surveys: Open Data Index (ODI) Brazil, at the national level and ODI São Paulo and ODI Rio de Janeiro, at the municipal level. Months later, we ended the survey “Do you want to build the index of Open Data of your city?” and the result was pretty positive: 216 people have shown an interest to do the survey voluntarily in their town!

In this first cycle of decentralization and expansion of the ODI in the Brazilian municipality, we conducted an experiment with the first group: Arapiraca/AL, Belo Horizonte/MG, Bonfim/RR, Brasília/DF, Natal/RN, Porto Alegre/RS, Salvador/BA, Teresina/PI, Uberlândia/MG, Vitória/ES. We offered training for the local leaders, provided by the staff of the Open Data Index (FGV/DAPP – OKBR) so that they can accomplish the survey required to develop the index. In 2018, we’ll show the results and introduce the reports with concrete opportunities for the town move forward on the agenda of transparency and open data.

We launched LIBRE – a project of microfinance for journalism – a partnership from Open Knowledge Brazil and Flux Studio, with involvement from AppCivico too. It is a microfinance content tool that aims to bring a digital tool to the public that is interested in appreciating and sustaining journalism and quality content. Currently, some first portals are testing the platform in a pilot phase.


We supported the events of Open Data Day in many Brazilian cities, as well as the Hackathon da Saúde (Health Hackathon), an action of the São Paulo City Hall in partnership with SENAI and AppCívico, and participated in the Hack In Sampa event at the City Council of São Paulo.

Natália Mazotte, co-director of OKBR, participated in AbreLatam and ConDatos, annual events which have become the main meeting point regarding open data in Latin America and the Caribbean. It is a time to talk about the status and the impact in the entire region. We also participated in the 7th edition of the Web forum in Brazil with the workshop “Open patterns and access to information: prospects and challenges of the government open data”. Along with other organizations, we organized the Brazilian Open Government meeting.

The School of Data, in partnership with Google News Lab, organised the second edition of the Brazilian Conference of Journalism of Data and Digital Methods (Coda.Br). We were one of the partner organisations for the first Course of Open Government for leadership in Weather, Forest and Farming, initiated by Imaflora and supported by the Climate and Land Use Alliance (CLUA).

We were the focal point in the research “Foundations of the open code as social innovators in emerging economies: a case study in Brazil”, from Clément Bert-Erboul, a specialist in economic sociology and the teacher Nicholas Vonortas.

And more to come in 2018

We would like to thank you to follow and take part of OKBR in 2017. We’re counting on you in 2018. Beyond our plan for the next year, we have the challenge and the responsibility to contribute in the period of the elections so that Brazil proceeds on the agendas of transparency, opening public information, democratic participation, integrity and the fight against corruption.

If you want to stay updated on the news and the progress of our projects, you can follow us on our BlogTwitter and Facebook.

A wonderful 2018 for all of us!

The Open Knowledge Brazil team.

Ed Summers: Programmed Visions

planet code4lib - Tue, 2018-01-16 05:00

I’ve been meaning to read Wendy Hui Kyong Chun for some time now. Updating to Remain the Same is on my to-read list, but I recently ran across a reference to Programmed Visions: Software and Memory in Rogers (2017), which I wrote about previously, and thought I would give it a quick read beforehand.

Programmed Visions is a unique mix of computing history, media studies and philosophy that analyzes the ways in which software has been reified or made into a thing. I’ve begun thinking about using software studies as a framework for researching the construction and operation of web archives, and Chun lays a useful theoretical foundation that could be useful for critiquing the very idea of software, and investigating its performative nature.

Programmed Visions contains a set of historical case studies that it draws on as sites for understanding computing. She looks at early modes of computing involving human computers (ENIAC) which served as a prototype for what she calls “bureaucracies of computing” and the psychology of command and control that is built into the performance of computing. Other case studies involving the Memex, the Mother of All Demos, and John von Neumann’s use of biological models of memory as metaphors for computer memory in the EDVAC are described in great detail, and connected together in quite a compelling way. The book is grounded in history but often has a poetic quality that is difficult to summarize. On the meta level Chun’s use of historical texts is quite thorough and its a nice example of how research can be conducted in this area.

There are two primary things I will take away from Programmed Visions. The first is how software, the very idea of source code, is itself achieved through metaphor, where computing is a metaphor for metaphor itself. Using higher level computer programming languages gives software the appearance of commanding the computer, however the source code is deeply entangled with the hardware itself, the source code is interpreted and compiled by yet more software, which are ultimately reduced to fluctuations in voltages circuitry. The source code and software cannot be extracted from this performance of computing. This separation of software from hardware is an illusion that was achieved in the early days of computing. Any analysis of software must include the computing infrastructures that make the metaphor possible. Chun chooses an interesting passage from Dijkstra (1970) to highlight the role that source code plays:

In the remaining part of this section I shall restrict myself to programs written for a sequential machine and I shall explore some of the consequences of our duty to use our understanding of a program to make assertions about the ensuing computations. It is my (unproven) claim that the ease and reliability with which we can do this depends critically upon the simplicity of the relation between the two, in particular upon the nature of sequencing control. In vague terms we may state the desirability that the structure of the program text reflects the structure of the computation. Or, in other terms, “What can we do to shorten the conceptual gap between the static program text (spread out in”text space“) and the corresponding computations (evolving in time)? (p. 21)

Here Dijkstra is talking about the relationship between text (source code) and a performance in time by the computing machinery. It is interesting to think not only about how the gap can be reduced, but also how the text and the performance can fall out of alignment. Of course bugs are the obvious way that things can get misaligned: I instructed the computer to do X but it did Y. But as readers of source code we have expectations about what code is doing, and then there is the resulting complex computational performance. The two are one, and its only our mental models of computing that allow us to see a thing called software. Programmed Visions explores the genealogy of those models.

The other striking thing about Programmed Visions is what Chun says about memory. Von Neumann popularizes the idea of computer memory using work by McCulloch that relates the nervous system to voltages through the analogy of neural nets. On a practical level, what this metaphor allowed was for instructions that were previously on cards, or in the movements of computer programmers wiring circuits, are moved into the machine itself. The key point Chun makes here is the idea that Von Neumann use of biological metaphors for computing allows him to conflate memory and storage. It is important that this biological metaphor, the memory organ, was science fiction – there was no known memory organ at the time.

The discussion is interesting because it connects with ideas about memory going back to Hume and forward to Bowker (2005). Memories can be used to make predictions, but cannot be used to fully reconstruct the past. Memory is a process of deletion, but always creates the need for more:

If our machines’ memories are more permanent, if they enable a permanence that we seem to lack, it is because hey are constantly refreshed–rewritten–so that their ephemerality endures, so that they may “store” the programs that seem to drive them … This is to say that if memory is to approximate something so long lasting as storage, it can do so only through constant repetition, a repetition that, as Jacques Derrida notes, is indissociable from destruction (or in Bush’s terminology, forgetting). (p. 170)

In the ellided section above Chun references Kirschenbaum (2008) to stress that she does not mean to imply that software is immaterial. Instead Chun describes computer memory as undead, neither alive nor dead but somewhere in between. The circuits need to be continually electrically performed for the memory to be sustained and alive. The requirement to keep the bits moving, reminds me of Kevin Kelly’s idea of movage, and anticipates (I think?) Chun (2016). This (somewhat humorous) description of the computer memory as undead reminded me of the state that archived web content is in. For example when viewing content in the Wayback machine it’s not uncommon to run across some links failing, missing resources, lack of interactivity (search) that was once there. Also, it’s possible to slip around in time as pages are traversed that have been storedat different times. How is this the same and different from traditional archives of paper, where context is lost as well?

So I was surprised in the concluding chapter when Chun actually talks about the Internet Archive’s Wayback Machine (IWM) on pp 170-171. I guess I shouldn’t have been surprised, but the leap from Von Neumann’s first articulation of modern computer architecture forwards to a world with a massively distributed Internet and World Wide Web was a surprise:

The IWM is necessary because the Internet, which is in so many ways about memory, has, as Ernst (2013) argues, no memory–at least not without the intervention of something like the IWM. Other media do not have a memory, but they do age and their degeneratoin is not linked to their regeneration. As well, this crisis is brought about because of this blinding belief in digital media as cultural memory. This belief, paradoxically, threatens to spread this lack of memory everywhere and plunge us negatively into a way-wayback machine: the so-called “digital dark age.” The IWM thus fixes the Internet by offering us a “machine” that lets us control our movement between past and future by regenerating the Internet at a grand scale. The Internet Wayback Machine is appropriate in more ways than one: because web pages link to, rather than embed, images, which can be located anywhere, and because link locations always change, the IWM preserves only a skeleton of a page, filled with broken–rendered–links and images. The IWM, that is, only backs up certain data types. These “saved” are not quite dead, but not quite alive either, for their proper commemoration requires greater effort. These gaps not only visualize the fact that our constant regenerations affect what is regenerated, but also the fact that these gaps–the irreversibility of this causal programmable logic– are what open the World Wide Web as archive to a future that is not simply stored upgrades of the past. (p. 171-172)

I think some things have improved somewhat since Chun wrote those words, but her essential observation remains true: the technology that furnishes the Wayback Machine is oriented around a document based web, where representations of web resources are stored at particular points in time and played back at other points in time. The software infrastructures that generated those web representations are not part of the archive, and so the archive is essentially in an undead state–seemingly alive, but undynamic and inert. It’s interesting to think about how traditional archives have similar characteristics though: the paper documents that lack adequate provenance, or media artifacts that can be digitized but no longer played. We live with the undead in other forms of media as well.

One of my committee members recently asked for my opinion on why people often take the position that since content is digital we can now keep it all. The presumption being that we keep all data online or in near or offline storage and then rely on some kind of search to find it. I think Chun hits on part of the reason this might be when she highlights how memory has been conflated with storage. For some the idea that some data is stored is equivalent to having been remembered as well. But it’s actually in the exercise of the data, its use, or being accessed that memory is activated. This position that everything can be remembered because it is digital has its economical problems, but it is an interesting little philosophical conundrum, that will be important to keep in the back of my mind as I continue to read about memory and archives.


Bowker, G. C. (2005). Memory practices in the sciences (Vol. 205). Cambridge, MA: MIT Press.

Chun, W. H. K. (2016). Updating to remain the same: Habitual new media. MIT Press.

Dijkstra, E. W. (1970). Notes on structured programming. Technological University, Department of Mathematics.

Ernst, W. (2013). Digital memory and the archive. In J. Parikka (Ed.) (pp. 113–140). University of Minnesota Press.

Kirschenbaum, M. G. (2008). Mechanisms: New media and the forensic imagination. MIT Press.

Rogers, R. (2017). Doing web history with the internet archive: Screencast documentaries. Internet Histories, 1–13.

David Rosenthal: The Internet Society Takes On Digital Preservation

planet code4lib - Mon, 2018-01-15 16:01
Another worthwhile initiative comes from The Internet Society, through its New York chapter. They are starting an effort to draw attention to the issues around digital presentation. Shuli Hallack has an introductory blog post entitled Preserving Our Future, One Bit at a Time. They kicked off with a meeting at Google's DC office labeled as being about "The Policy Perspective". It was keynoted by Vint Cerf with respondents Kate Zwaard and Michelle Wu. I watched the livestream. Overall, I thought that the speakers did a good job despite wandering a long way from policies, mostly in response to audience questions.

Vint will also keynote the next event, at Google's NYC office February 5th, 2017, 5:30PM – 7:30PM. It is labeled as being about "Business Models and Financial Motives" and, if that's what it ends up being about it should be very interesting and potentially useful. I hope to catch the livestream.

District Dispatch: Tax season is here: How libraries can help communities prepare

planet code4lib - Fri, 2018-01-12 14:52
This blog post, written by Lori Baux of the Computer & Communications Industry Association, is one in a series of occasional posts contributed by leaders from coalition partners and other public interest groups that ALA’s Washington Office works closely with. Whatever the policy – copyright, education, technology, to name just a few – we depend on relationships with other organizations to influence legislation, policy and regulatory issues of importance to the library field and the public.

It’s hard to believe, but as the holiday season comes to an end, tax season is about to begin.

For decades, public libraries have become unparalleled resources in their communities, far beyond their traditional, literary role. Libraries assist those who need it most by providing free Internet access, offering financial literacy classes, job training, employment assistance and more. And for decades, libraries have served as a critical resource during tax season.

Each year, more and more Americans feel as though they lack the necessary resources to confidently and correctly file their taxes on time. This is particularly true for moderate and lower-income individuals and families who are forced to work multiple jobs just to make ends meet. The question is “where is help available?”

Libraries across the country are stepping up their efforts to assist local taxpayers in filing their taxes for free. Many libraries offer in-person help, often serving as a Volunteer Income Tax Assistance (VITA) location or AARP Tax-Aide site. But appointments often fill up quickly, and many communities are without much, if any free in-person tax assistance.

There is an option for free tax prep that libraries can provide—and with little required from already busy library staff. The next time that a local individual or family comes looking for a helping hand with tax preparation, libraries can guide them to a free online tax preparation resource—IRS Free File:

  • Through the Free File Program, those who earned $66,000 or less last year—over 70 percent of all American taxpayers—are eligible to use at least one of 12 brand-name tax preparation software to file their Federal (and in many cases, state) taxes completely free of charge. More information is available at Free File starts on January 12, 2018.
  • Free File complements local VITA programs, where people can get in-person help from IRS certified volunteers. There are over 12,000 VITA programs across the country to help people in your community maximize their refund and claim all the credits that they deserve, including the Earned Income Tax Credit (EITC). Any individual making under $54,000 annually may qualify. More information on VITAs is available at More information about AARP Tax-Aide can be found here.

With help from libraries and volunteers across the nation, we can work together to ensure that as many taxpayers as possible have access to the resources and assistance that they need to file their returns.

The Computer & Communications Industry Association (CCIA) hosts a website – – that provides resources to inform and assist eligible taxpayers with filing their taxes including fact sheets, flyers and traditional and social media outreach tools. CCIA also encourages folks to download the IRS2Go app on their mobile phone.

Thanks to help from libraries just like yours, we can help eligible taxpayers prepare and file their tax returns on time and free of charge.

Lori Baux is Senior Manager for Grassroots Programs, directing public education and outreach projects on behalf of the Computer & Communications Industry Association (CCIA), an international not-for-profit membership organization dedicated to innovation and enhancing society’s access to information and communications.

The post Tax season is here: How libraries can help communities prepare appeared first on District Dispatch.

Open Knowledge Foundation: New edition of Data Journalism Handbook to explore journalistic interventions in the data society

planet code4lib - Fri, 2018-01-12 09:48

This blog has been reposted from

The first edition of The Data Journalism Handbook has been widely used and widely cited by students, practitioners and researchers alike, serving as both textbook and sourcebook for an emerging field. It has been translated into over 12 languages – including Arabic, Chinese, Czech, French, Georgian, Greek, Italian, Macedonian, Portuguese, Russian, Spanish and Ukrainian – and is used for teaching at many leading universities, as well as teaching and training centres around the world.

A huge amount has happened in the field since the first edition in 2012. The Panama Papers project undertook an unprecedented international collaboration around a major database of leaked information about tax havens and offshore financial activity. Projects such as The Migrants Files, The Guardian’s The Counted and ProPublica’s Electionland have shown how journalists are not just using and presenting data, but also creating and assembling it themselves in order to improve data journalistic coverage of issues they are reporting on.

The Migrants’ Files saw journalists in 15 countries work together to create a database of people who died in their attempt to reach or stay in Europe.

Changes in digital technologies have enabled the development of formats for storytelling, interactivity and engagement with the assistance of drones, crowdsourcing tools, satellite data, social media data and bespoke software tools for data collection, analysis, visualisation and exploration.

Data journalists are not simply using data as a source, they are also increasingly investigating, interrogating and intervening around the practices, platforms, algorithms and devices through which it is created, circulated and put to work in the world. They are creatively developing techniques and approaches which are adapted to very different kinds of social, cultural, economic, technological and political settings and challenges.

Five years after its publication, we are developing a revised second edition, which will be published as an open access book with an innovative academic press. The new edition will be significantly overhauled to reflect these developments. It will complement the first edition with an examination of the current state of data journalism which is at once practical and reflective, profiling emerging practices and projects as well as their broader consequences.

“The Infinite Campaign” by Sam Lavigne (New Inquiry) repurposes ad creation data in order to explore “the bizarre rubrics Twitter uses to render its users legible”.

Contributors to the first edition include representatives from some of the world’s best-known newsrooms data journalism organisations, including the Australian Broadcasting Corporation, the BBC, the Chicago Tribune, Deutsche Welle, The Guardian, the Financial Times, Helsingin Sanomat, La Nacion, the New York Times, ProPublica, the Washington Post, the Texas Tribune, Verdens Gang, Wales Online, Zeit Online and many others. The new edition will include contributions from both leading practitioners and leading researchers of data journalism, exploring a diverse constellation of projects, methods and techniques in this field from voices and initiatives around the world. We are working hard to ensure a good balance of gender, geography and themes.

Our approach in the new edition draws on the notion of “critical technical practice” from Philip Agre, which he formulates as an attempt to have “one foot planted in the craft work of design and the other foot planted in the reflexive work of critique” (1997). Similarly, we wish to provide an introduction to a major new area of journalism practice which is at once critically reflective and practical. The book will offer reflection from leading practitioners on their experiments and experiences, as well as fresh perspectives on the practical considerations of research on the field from leading scholars.

The structure of the book reflects different ways of seeing and understanding contemporary data journalism practices and projects. The introduction highlights the renewed relevance of a book on data journalism in the current so-called “post-truth” moment, examining the resurgence of interest in data journalism, fact-checking and strengthening the capacities of “facty” publics in response to fears about “alternative facts” and the speculation about a breakdown of trust in experts and institutions of science, policy, law, media and democracy. As well as reviewing a variety of critical responses to data journalism and associated forms of datafication, it looks at how this field may nevertheless constitute an interesting site of progressive social experimentation, participation and intervention.

The first section on “data journalism in context” will review histories, geographies, economics and politics of data journalism – drawing on leading studies in these areas. The second section on “data journalism practices” will look at a variety of practices for assembling data, working with data, making sense with data and organising data journalism from around the world. This includes a wide variety of case studies – including the use of social media data, investigations into algorithms and fake news, the use of networks, open source coding practices and emerging forms of storytelling through news apps and data animations. Other chapters look at infrastructures for collaboration, as well as creative responses to disappearing data and limited connectivity. The third and final section on “what does data journalism do?”, examines the social life of data journalism projects, including everyday encounters with visualisations, organising collaborations across fields, the impacts of data projects in various settings, and how data journalism can constitute a form of “data activism”.

As well as providing a rich account of the state of the field, the book is also intended to inspire and inform “experiments in participation” between journalists, researchers, civil society groups and their various publics. This aspiration is partly informed by approaches to participatory design and research from both science and technology studies as well as more recent digital methods research. Through the book we thus aim to explore not only what data journalism initiatives do, but how they might be done differently in order to facilitate vital public debates about both the future of the data society as well as the significant global challenges that we currently face.

LITA: This is Jeopardy! Or, How Do People Actually Get On That Show?

planet code4lib - Thu, 2018-01-11 20:55

This past November, American Libraries published a delightful article on librarians that have appeared on the iconic game show Jeopardy! It turns out one of our active LITA members also recently appeared on the show. Here’s her story…

On Wednesday, October 18th, one of my lifelong dreams will come true: I’ll be a contestant on Jeopardy!

It takes several steps to get onto the show: first, you must pass an online exam, but you don’t really learn the results unless you make it to the next stage: the invitation to audition. This step is completed in person, comprising a timed, written test, playing a mock game with other aspiring players in front of a few dozen other auditionees, and chatting amiably in a brief interview, all while being filmed. If you make it through this gauntlet, you go into “the pool”, where you remain eligible for a call to be on the show for up to 18 months. Over the course of one year of testing and eligibility, around 30,000 people take the first test, around 1500 to 1600 people audition in person, and around 400 make it onto the show each season.

For me, the timeline was relatively quick. I tested online in October 2016, auditioned in January 2017, and thanks to my SoCal address, I ended up as a local alternate in February. Through luck of the draw, I was the leftover contestant that day. I didn’t tape then, but was asked back directly to the show for the August 3rd recording session, which airs from October 16th to October 20th.

The call is early – 7:30am – and the day’s twelve potential contestants take turns with makeup artists while the production team covers paperwork, runs through those interview stories one-on-one, and pumps up the contestants to have a good time. Once you’re in, you’re sequestered. There’s no visiting with family or friends who accompanied you to the taping and no cellphones or internet access allowed. You do have time to chat with your fellow contestants, who are all whip smart, funny, and generally just as excited as you are to get to be on this show. There’s also no time to be nervous or worried: you roll through the briefing onto the stage for a quick run-down on how the podiums work (watch your elbows for the automated dividers that come up for Final Jeopardy!), how to buzz in properly (there’s a light around the big game board that you don’t see at home that tells you when you can ring in safely), and under no circumstances are you to write on the screen with ANYTHING but that stylus!

Next, it’s time for your Hometown Howdy, the commercial blurb that airs on the local TV station for your home media market. Since I’d done it before when I almost-but-not-quite made it on the air in February, I knew they were looking for maximum cheese. My friends and family tell me that I definitely delivered.

Immediately before they let in the live studio audience for seating, contestants run through two quick dress rehearsal games to get out any final nerves, test the equipment for the stage crew, and practice standing on the risers behind the podiums without falling off.

Then it’s back to the dressing room, where the first group is drawn. They get a touch-up on makeup, the rest of the contestant group sits down in a special section of the audience, and it’s off to the races! There are three games filmed before the lunch break, then the final two are filmed. The contestants have the option to stay and watch the rest of the day if they’re defeated, but most choose to leave if it’s later on in the filming cycle. The adrenaline crash is pretty huge, and some people may need the space to let out their mixed feelings. If you win, you are whisked back to the dressing room for a quick change, a touch-up again, and back out to the champion’s podium to play again.

You may be asking, when do contestants meet Alex? Well, it happens exactly twice, and both times, the interactions are entirely on film and broadcast in (nearly) their entirety within the show. To put all of those collusion rumors around the recent streak of Austin Rogers to rest, the interview halfway through the first round and the hand-shaking at the end of the game are the only times that Alex and the contestants meet or speak with one another; there is no “backstage” where the answer-giver and the question-providers could possibly mingle. Nor do the contestants ever get to do more than wave “hello” to the writers for the show. Jeopardy! is very careful to keep its two halves very separated. The energy and enthusiasm of the contestant team – Glenn, Maggie, Corina, Lori, and Ryan – is genuine, and when your appearance is complete, you feel as though you have joined a very special family of Jeopardy! alumni.

Once you’ve been a contestant on Jeopardy!, you can never be on the show again. The only exception is if you do well enough to be asked back to the Tournament of Champions. While gag rules prohibit me from saying more about how I did, I can say that the entire experience lived up to the hype I had built around it since I was a child, playing along in my living room and dreaming of the chance to respond in the form of a question.

Islandora: iCamp EU - Call for Proposals

planet code4lib - Thu, 2018-01-11 18:42

Doing something great with Islandora that you want to share with the community? Have a recent project that the world just needs to know about? Send us your proposals to present at iCampEU in Limerick! Presentations should be roughly 20-25 minutes in length (with time after for questions) and deal with Islandora in some way. Want more time or to do a different format? let us know in your proposal and we'll see what we can do.

You can see examples of previous Islandora camp sessions on our YouTube channel.

The Call for Proposals for iCampEU in Limerick will be open until March 1st.

Type: blog Name * Tell us your name. Institution Tell us where you're joining us from. Email Address * Tell us how to contact you. Session Title * Tell us what you want to call your proposal. You can change this later. Session Details * Tell us about what you want to present. Brief Summary Please give a brief summary that can be printed in the camp schedule if your proposal is accepted. CAPTCHAThis question is for testing whether you are a human visitor and to prevent automated spam submissions. Math question * 7 + 0 = Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Islandora: Islandora Camp EU 2018 - Registration

planet code4lib - Thu, 2018-01-11 18:41

Islandora Camp is heading to Ireland June 20 - 22, 2018, hosted by the University of Limerick. Early Bird rates are available until March 1st, 2018, after which the rate will increase to €399,00.

360,00 €Attendee Information:  Registrant Name Please enter the full name of the person who will attend the event. Email Please provide the email address of the person attending so we can send notices and updates. We promise we'll keep them to a minimum! Institution Track * Admin Developer Please select the curriculum you wish to join.
Admin: For repository and collection managers, librarians, archivists, and anyone else who deals primarily with the front-end experience of Islandora and would like to learn how to get the most out of it, or developers who would like to learn more abut the front-end experience.
Developer: For developers, systems people, and anyone dealing with Islandora at the code-level, or any front-end Islandora users who are interested in learning more about the developer side. Tee Shirt Size N/A Small Medium Large X-Large XX-Large 3X-Large 4X-Large 5X-Large Islandora Camp comes with a t-shirt. What size is preferred? Special Considerations Please let us know about any dietary restrictions or other special considerations that may need to be accommodated. Share contact info? N/A Share my info Opt out We would like to share your name and email address with your fellow attendees (and ONLY them) before the event so you can see who else is going. If you would rather we not include your info, please opt out. Learning Goals What do you want to learn at this camp? Be as general or specific as possible - if you have particular questions or problems you're tackling, or topics you'd like to learn about, please put them here.

District Dispatch: ALA to Congress in 2018: Continue to #FundLibraries

planet code4lib - Thu, 2018-01-11 15:10

2017 was an extraordinary year for America’s libraries. When faced with serious threats to federal library funding, ALA members and library advocates rallied in unprecedented numbers to voice their support for libraries at strategic points throughout the year*. Tens of thousands of phone calls and emails to Congress were registered through ALA’s legislative action center. ALA members visited Congress in Washington and back home to demonstrate the importance of federal funding.

The challenge to #FundLibraries in 2018 is great: not only is Congress late in passing an FY 2018 budget, it’s time to start working on the FY 2019 budget.

ALA members have a lot to be proud of. Thanks to library advocates, Congress did not follow the administration’s lead in March 2017, when the president made a bold move to eliminate the Institute of Museum and Library Services (IMLS) and virtually all federal library funding. In every single state and congressional district, ALA members spoke up in support for federal library funding. We reminded our senators and representatives how indispensable libraries are for the communities they represent. And our elected leaders listened. By the time FY 2018 officially began in October 2017, the Appropriations Committees from both chambers of Congress had passed bills that maintained (and in the Senate, increased by $4 million) funding for libraries.

Despite our strong advocacy, we have not saved library funding for FY 2018. We’re more than three months into the fiscal year, and the U.S. government still does not have an FY 2018 budget. Because the House and Senate have not reconciled their FY 2018 spending bills, the government is operating under a “continuing resolution” (CR) of the FY 2017 budget. What happens when that CR expires on January 19, 2018 is a matter of intense speculation; options include a bi-partisan budget deal, another CR or a possible government shutdown.

While government may seem to be paralyzed, this is no time for library advocates to take a break. The challenge in 2018 is even greater than 2017: not only is Congress late in passing an FY 2018 budget, it’s time to start working on the FY 2019 budget. The president is expected to release his FY 2019 budget proposal in February, and we have no reason to believe that libraries have moved up on the list of priorities for the administration.

2018 is a time for all of us to take our advocacy up a notch. Over the coming weeks, ALA’s Washington Office will roll out resources to help you tell your library story and urge your members of Congress to #FundLibraries. In the meantime, here’s what you can do:

Stay informed. The U.S. budget and appropriations process is more dynamic than ever this year. There is a strong chance that we will be advocating for library funding for FY 2018 and FY 2019 at the same time. Regularly visit, the Washington Office blog, where we will post the latest information on ALA’s #FundLibraries campaign and sign up for ALA’s Legislative Action Center.

Stay involved. What you show your decision-makers at home is important part of our year-round advocacy program because it helps supplement the messages that your ALA Washington team is sharing with legislators and their staff on the Hill. Keep showing them how your library – and IMLS funding – is transforming your community. Plan to attend National Library Legislative Day 2018 in Washington (May 7-8) or participate virtually from home.

Stay proud of your influence. Every day you prove that libraries are places of innovation, opportunity and learning – that libraries are a smart, high-return investment for our nation. When librarians speak, decision-makers listen!

*2017: Federal appropriations and library advocacy timeline March The president announced in his first budget proposal that he wanted to eliminate IMLS and virtually all federal funding for libraries. April ALA members asked their representatives to sign two Dear Appropriator letters sent from library champions in the House to the Chair and Ranking Members of the House Appropriations Subcommittee that deals with library funding (Labor, Health & Human Services, Education and Related Agencies, or “Labor-HHS”). One letter was in support of the Library Services and Technology Act (LSTA), and one letter was for the Innovative Approaches to Literacy program (IAL).

House Results: One-third of the entire House of Representatives, from both parties, signed each Dear Appropriator letter, and nearly 170 Members signed at least one.

May More than 500 ALA members came to Washington, D.C. to meet their members of Congress for ALA’s 2017 National Library Legislative Day. Nearly identical Dear Appropriator letters were sent to Senate Labor-HHS Approps Subcommittee leaders.

Senate Results: 45 Senators signed the LSTA letter, and 37 signed the IAL letter.

July The House Labor-HHS Subcommittee and then the full Committee passed their appropriations bill, which included funding for IMLS, LSTA and IAL at 2017 levels. September The House passed an omnibus spending package, which included 12 appropriations bills. The Senate Labor-HHS Subcommittee and then the full Committee passed their appropriations bill, which included a $4 million increase for LSTA above the 2017 level.  Unable to pass FY 2018 funding measures, Congress passed a continuing resolution, averting a government shutdown. December Congress passed two additional CRs, which run through January 19, 2018.

The post ALA to Congress in 2018: Continue to #FundLibraries appeared first on District Dispatch.

Open Knowledge Foundation: 2017: A Year to Remember for OK Nepal

planet code4lib - Thu, 2018-01-11 09:24

This blog has been cross-posted from the OK Nepal blog as part of our blog series of Open Knowledge Network updates.

Best wishes for 2018 from OK Nepal to all of the Open Knowledge family and friends!!

The year 2017 was one of the best years for Open Knowledge Nepal. We started our journey by registering Open Knowledge Nepal as a non-profit organization under the Nepal Government and as we start to reflect 2017, it has been “A Year to Remember”. We were able to achieve many things and we promise to continue our hard work to improve the State of Open Data in South Asia in 2018 also.

Some of the key highlights of 2017 are:

  1. Organizing Open Data Day 2017

For the 5th time in a row, the Open Knowledge Nepal team led the effort of organizing International Open Data Day at Pokhara, Nepal. This year it was a collaborative effort of Kathmandu Living Labs and Open Knowledge Nepal. It was also the first official event of Open Knowledge Nepal that was held out of the Kathmandu Valley.  

  1. Launching Election Nepal Portal  

On 13th April 2017 (31st Chaitra 2073), a day before Nepalese New Year 2074, we officially released the  Election Nepal Portal in collaboration with Code for Nepal and made it open for contribution. Election Nepal is a crowdsourced citizen engagement portal that includes the Local Elections data. The portal will have three major focus areas; visualizations, datasets, and twitter feeds.

  1. Contributing to Global Open Data Index  

On May 2nd, 2017 Open Knowledge International launched the 4th edition of Global Open Data Index (GODI), a global assessment of open government data publication. Nepal has been part of this global assessment continuously for four years with lots of ups and downs. We have been leading it since the very beginning. With 20% of openness, Nepal was ranked 69 in 2016 Global Open Data Index. Also, this year we helped Open Knowledge International by coordinating for South Asia region and for the first time, we were able to get contributions from Bhutan and Afghanistan.

  1. Launching Local Boundaries   

To help journalists and researchers visualize the geographical data of Nepal in a map, we build Local Boundaries where we share the shapefile of Nepal federal structure and others. Local Boundaries brings the detailed geodata of administrative units or maps of all administrative boundaries defined by Nepal Government in an open and reusable format, free of cost. The local boundaries are available in two formats (TopoJSON and GeoJSON) and can be easily reused to map local authority data to OpenStreetMap, Google Map, Leaflet or MapBox interactively.

  1. Launching Open Data Handbook Nepali Version  

After the work of a year followed by a series of discussion and consultation, on 7 August 2017 Open Knowledge Nepal launched the first version of Nepali Open Data Handbook – An introductory guidebook used by governments and civil society organizations around the world as an introduction and blueprint for open data projects. The handbook was translated with the collaborative effort by volunteers and contributors.  Now the Nepali Handbook is available at

  1. Developing Open Data Curriculum and Open Data Manual  

To organize the open data awareness program in a structured format and to generate resources which can be further use by civil society and institution, Open Knowledge Nepal prepared an Open Data Curriculum and Open Data Manual. It contains basic aspects of open data like an introduction, importance, principles, application areas as well as the technical aspects of open data like extraction, cleaning, analysis, and visualization of data. It works as a reference and a recommended guide for university students, private sectors, and civil society.

  1. Running Open Data Awareness Program

The Open Data Awareness Program was conducted in 11 colleges and 2 youth organization, reaching more than 335+ youths are first of its kind conducted in Nepal. Representatives of Open Knowledge Nepal visited 7 districts of Nepal with the Open Data Curriculum and the Open Data Manual to train youths about the importance and use of open data.

  1. Organizing Open Data Hackathon  

The Open Data Hackathon was organized with the theme “Use data to solve local problems faced by Nepali citizens” at Yalamaya Kendra (Dhokaima Cafe), Patan Dhoka on November 25th, 2017. In this hackathon, we brought students and youths from different backgrounds under the same roof to work collaboratively on different aspects of open data.

  1. Co-organizing Wiki Data-a-thon

On 30th November 2017, we co-organized a Wiki Data-a-thon with Wikimedians of Nepal at Nepal Connection, Thamel on the occasion of Global Legislative Openness Week (GLOW). During the event, we scraped the data of last CA election and pushed those data in WikiData.  

  1. Supporting Asian Regional Meeting  

On 2nd and 3rd December 2017, we supported Open Access Nepal to organize Asian Regional Meeting on Open Access, Open Education and Open Data with the theme “Open in Action: Bridging the Information Divide”. Delegates were from different countries like the USA, China, South Africa, India, Bangladesh, China, Nepal. We managed the Nepali delegates and participants.

2018 Planning

We are looking forward to a prosperous 2018, where we plan to outreach the whole of South Asia countries to improve the state of open data in the region by using focused open data training, research, and projects. For this, we will be collaborating with all possible CSOs working in Asia and will serve as an intermediary for different international organizations who want to promote or increase their activities in Asian countries. This will help the Open Knowledge Network in the long run, and we will also get opportunities to learn from each others’ successes and failures, promote each other’s activities, brainstorm collaborative projects and make the relationship between countries stronger.

Besides this, we will continue also our work of data literacy like Open Data Awareness Program to make Nepalese citizens more data demanding and savvy, and launch a couple of new projects to help people to understand the available data.

To be updated about our activities, please follow us at different medias:


Terry Reese: MarcEdit Updates (All versions)

planet code4lib - Thu, 2018-01-11 05:35

I’ve posted updates for all versions of MarcEdit, including MarcEdit MacOS 3.

MarcEdit 7 (Windows/Linux) changelog:
  • Bug Fix: Export Settings: Export was capturing both MarcEdit 6.x and MarcEdit 7.x data.
  • Enhancement: Task Management: added some continued refinements to improve speed and processing
  • Bug Fix: OCLC Integration: Corrected an issue occuring when trying to post bib records using previous profiles.
  • Enhancement: Linked Data XML Rules File Editor completed
  • Enhancement: Linked Data Framework: Formal support for local linked data triple stores for resolution

One of the largest enhancements is the updated editor to the Linked Data Rules File and the Linked Data Framework. You can hear more about these updates here:

MarcEdit MacOS 3:

Today also marks the availability of MarcEdit MacOS 3. You can read about the update here: MarcEdit MacOS 3 has Arrived!

If you have questions, please let me know.


Terry Reese: MarcEdit MacOS 3 has Arrived!

planet code4lib - Thu, 2018-01-11 05:01

MarcEdit MacOS 3 is the latest branch of the MarcEdit 7 family. MarcEdit MacOS 3 represents the next generational update for MarcEdit on the Mac and is functionally equivalent to MarcEdit 7. MarcEdit MacOS 3 introduces the following features:

  1. Startup Wizard
  2. Clustering Tools
  3. New Linked Data Framework
  4. New Task Management and Task Processing
  5. Task Broker
  6. OCLC Integration with OCLC Profiles
  7. OCLC Integration and search in the MarcEditor
  8. New Global Editing Tools
  9. Updated UI
  10. More


There are also a couple things that are currently missing that I’ll be filling in over the next couple of weeks. Presently, the following elements are missing in the MacOS version:

  1. OCLC Downloader
  2. OCLC Bib Uploader (local and non-local)
  3. OCLC Holdings update (update for profiles)
  4. Task Processing Updates
  5. Need to update Editor Functions
    1. Dedup tool – Add/Delete Function
    2. Move tool — Copy Field Function
    3. RDA Helper — 040 $b language
    4. Edit Shortcuts — generate paired ISBN-13
    5. Replace Function — Exact word match
    6. Extract/Delete Selected Records — Exact word match
  6. Connect the search dropdown
    1. Add to the MARC Tools Window
    2. Add to the MarcEditor Window
    3. Connect to the Main Window
  7. Update Configuration information
  8. XML Profiler
  9. Linked Data File Editor
  10. Startup Wizard

Rather than hold the update till these elements are completed, I’m making the MarcEdit MacOS version available now so that users can be testing and interacting with the tooling, and I’ll finish adding these remaining elements to the application. Once completed, all versions of MarcEdit will share the same functionality, save for elements that rely on technology or practices tied to a specific operating system.

Updated UI

The MarcEdit MacOS 3 introduces a new UI. While the UI is still reflective of MacOS best practices, it also shares many of the design elements developed as part of MarcEdit 7. This includes new elements like the StartUp wizard with Fluffy Install agent:


The Setup Wizard provides users the ability to customize various application settings, as well as import previous settings from earlier versions of MarcEdit.


Updates to the UI

New Clustering tools

MarcEdit MacOS 3 provides MacOS users more tools, more help, more speed…it gives you more, so you can do more.

Download the latest version of MarcEdit MacOS 3 from the downloads page at:


Library of Congress: The Signal: Digital Scholarship Resource Guide: Making Digital Resources, Part 2 of 7

planet code4lib - Wed, 2018-01-10 22:25

This is part two in a seven part resource guide for digital scholarship by Samantha Herron, our 2017 Junior Fellow. Part one is available here, and the full guide is available as a PDF download

Creating Digital Documents

Internet Archive staff members such as Fran Akers, above, scan books from the Library’s General Collections that were printed before 1923.  The high-resolution digital books are made available online at­ within 72 hours of scanning. 

The first step in creating an electronic copy of an analog (non-digital) document is usually scanning it to create a digitized image (for example, a .pdf or a .jpg). Scanning a document is like taking an electronic photograph of it–now it’s in a file format that can be saved to a computer, uploaded to the Internet, or shared in an e-mail. In some cases, such as when you are digitizing a film photograph, a high-quality digital image is all you need. But in the case of textual documents, a digital image is often insufficient, or at least inconvenient. In this stage, we only have an image of the text; the text isn’t yet in a format that can be searched or manipulated by the computer (think: trying to copy & paste text from a picture you took on your camera–it’s not possible).

Optical Character Recognition (OCR) is an automated process that extracts text from a digital image of a document to make it readable by a computer. The computer scans through an image of text, attempts to identify the characters (letters, numbers, symbols), and stores them as a separate “layer” of text on the image.

Example Here is a digitized copy of Alice in Wonderland in the Internet Archive. Notice that though this ebook is made up of scanned images of a physical copy, you can search the full text contents in the search bar. The OCRed text is “under” this image, and can be accessed if you select “FULL TEXT” from the Download Options menu. Notice that you can also download a .pdf.epub, or many other formats of the digitized book.

Though the success of OCR depends on the quality of the software and the quality of the photograph–even sophisticated OCR has trouble navigating images with stray ink blots or faded type–these programs are what allow digital archives users to not only search through catalog metadata, but through the full contents of scanned newspapers (as in Chronicling America) and books (as in most digitized books available from libraries and archives).

ABBYY FineReader, an OCR software.

As noted, the automated OCR text often needs to be “cleaned” by a human reader. Especially with older, typeset texts that have faded or mildewed or are otherwise irregular, the software may mistake characters or character combinations for others (e.g. the computer might take “rn” to be “m” or “cat” to be “cot” and so on). Though often left “dirty,” OCR that has not been checked through prevents comprehensive searches: if one were searching a set of OCRed texts for every instance of the word “happy,” the computer would not return any of the instances where “happy” had been read as “hoppy” or “hoopy” (and conversely, would inaccurately find where the computer had read “hoppy” to be “happy”). Humans can clean OCR by hand to “train” the computer to interpret characters more accurately (see: machine learning).

In this image of some OCR, we can see some of the errors–the “E”s in the title were interpreted as “Q”s, in the third line, a “t’” was interpreted by the computer as an “f”.

Example of raw OCR text.

Even with imperfect OCR, digital text is helpful for both close readings and distant reading. In addition to more complex computational tasks, digital text allows users to, for instance, find the page number of a quote they remember, or find out if a text ever mentions Christopher Colombus. Text search, enabled by digital text, has changed the way that researchers use database and read documents.

Metadata + Text Encoding

Bibliographic search–locating items in a collections–is one of the foundational tasks of libraries. Computer-searchable library catalogs have revolutionized this task for patrons and staff, enabling users to find more relevant materials more quickly.

Metadata is “data about data”. Bibliographic metadata is what makes up catalog records, from the time of card catalogs to our present day electronic databases. Every item in a library’s holdings has a bibliographic record made up of this metadata–key descriptors of an item that help users find an item when they need it. For example, metadata about a book might include its title, author, publishing date, ISBN, shelf location, and so on. In a electronic catalog search, this metadata is what allows users to increasingly narrow their results to materials targeted to their needs: Rich, accurate metadata, produced by human catalogers, allow users to find in a library’s holdings, for example, 1. any text material, 2. written in Spanish, 3.  about Jorge Luis Borges, 4. between 1990-2000.

Washington, D.C. Jewal Mazique [i.e. Jewel] cataloging in the Library of Congress. Photo by John Collier, Winter 1942. //

Metadata needs to be in a particular format to be read by the computer. A markup language is a system for annotating text to give the computer instructions about what each piece of information is. XML (eXtensible Markup Language) is one of the most common ways of structuring catalog metadata, because it is legible to both humans and machines.

XML uses tags to label data items. Tags can be embedded inside each other as well. In the example below, <recipe> is the first tag. All of the tags inside between <recipe> and it’s end tag </recipe>, (<title>, <ingredient list>, and <preparation>) are components of <recipe>. Further, <ingredient> is a component of <ingredient list>.

MARC (MAchine Readable Cataloging) standards, developed in the 1960s by Henriette Avram at the Library of Congress, is the international standard data format for the description of items held by libraries. Here are the MARC tags for one of the hits from our Jorge Luis Borges search above:

The three numbers in the left column are “datafields” and the letters are “subfields”. Each field-subfield combination refers to a piece of metadata. For example, 245$a is the title, 245$b is subtitle, 260$ is the place of publication, and so on. The rest of the fields can be found here.

Here is some example XML.

MARCXML is one way of reading and parsing MARC information, popular because it’s an XML schema (and therefore readable by both human and computer). For example, here is the MARCXML file for the same book from above:

The datafields and subfields are now XML tags, acting as ‘signposts’ for the computer about what each piece of information means. MARCXML files can be read by humans (provided they know what each datafield means) as well as computers.

The Library of Congress has made available their 2014 Retrospective MARC files for public use:

Examples The Library of Congress’s MARC data could be used for cool visualizations like Ben Schmidt’s visual history of MARC cataloging at the Library of Congress. Matt Miller used the Library’s MARC data to make a dizzying list of every cataloged book in the Library of Congress.

An example of the uses of MARC metadata for non-text materials is Yale University’s Photogrammar, which uses the location information from the Library of Congress’ archive of US Farm Security Administration photos to create an interactive map.

TEI (Text Encoding Initiative) is another important example of xml-style markup. In addition to capturing metadata, TEI guidelines standardize the markup of a text’s contents. Text encoding tells the computer who’s speaking, when a stanza begins and ends, and denotes which parts of text are stage instructions in a play, for example.

Example Here is a TEI file of Shakespeare’s Macbeth from the Folger Shakespeare Library. Different tags and attributes (the further specifiers within the tags) describe the speaker, what word they are saying, in what scene, what part of speech the word is, etc. With an encoded text like this, it can easily be manipulated to tell you which character says the most words in the play, which adjective is used most often across all of Shakespeare’s works, and so on. If you were interested in the use of the word ‘lady’ in Macbeth, an un-encoded plaintext version would not allow you to distinguish between references to “Lady” Macbeth vs. when a character says the word “lady”. TEI versions allow you to do powerful explorations of texts–though good TEI copies take a lot of time to create.

Understanding the various formats in which data is entered and stored allows us to imagine what kinds of digital scholarship is possible with the library data.

Example The Women Writers Project encodes with TEI texts by early modern women writers and includes some text analysis tools.

Next week’s installment in the Digital Scholarship Resource Guide will show you what you can do with digital data now that you’ve created it. Stay tuned!

LITA: Jobs in Information Technology: January 10, 2018

planet code4lib - Wed, 2018-01-10 20:09

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

University of Arkansas, Assistant Head of Special Collections, Fayetteville, AR

West Chester University, Electronic Resources Librarian, West Chester, PA

Miami University Libraries, Web Services Librarian, Oxford, OH

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.


Subscribe to code4lib aggregator