You are here

Feed aggregator

Jonathan Rochkind: Command-line utility to visit github page of a named gem

planet code4lib - Mon, 2017-03-13 20:45

I’m working in a Rails stack that involves a lot of convolutedly inter-related dependencies, which I’m not yet all that familiar with.

I often want to go visit the github page of one of the dependencies, to check out the README, issues/PRs, generally root around in the source, etc.  Sometimes I want to clone and look at the source locally, but often in addition to that I’m wanting to look at the github repo.

So I wrote a little utility to let me do gem-visit name_of_gem and it’ll open up the github page in a browser, using the MacOS open utility to open your default browser window.

The gem doesn’t need to be installed, it uses the API (hooray!) to look up the gem by name, and look at it’s registered “Homepage” and “Source Code” links. If either one is a link, it’ll prefer that. Otherwise, it’ll just send you to the Homepage, or failing that, to the page.

It’s working out well for me.

I wrote it in ruby (naturally), but with no dependencies, so you can just drop it in your $PATH, and it just works. I put ~/bin on my $PATH, so I put it there.

I’ll give to give you a gist of it (yes, i’m not maintaining it at present), but thinking about the best way to distribute this if I wanted to maintain it…. Distributing as a ruby gem doesn’t actually seem great, with the way people use ruby version switchers, you’d have to make sure to get it installed for every version of ruby you might be using to get the command line to work in them all.

Distributing as a homebrew package seems to actually make a lot of sense, for a very simple command-line utility like this. But homebrew doesn’t really like niche/self-submitted stuff, and this doesn’t really meet the requirements for what they’ll tolerate. But homebrew theoretically supports a third-party ‘tap’, and I think a github repo itself can be such a ‘tap’, and hypothetically I could set things up so you could cask tap my_github_url, and install (and later upgrade or uninstall!) with brew… but I haven’t been able to figure out how to do that from the docs/examples, just verified that it oughta be possible!  Anyone want to give me any hints or model examples?  Of a homebrew formula that does nothing but install a simple one-doc bash script, and is a third-party tap hosted  on github?

Filed under: General

HangingTogether: Linked data panel at ALIA Information Online 2017 Conference

planet code4lib - Mon, 2017-03-13 17:18

I had the honor of being invited to participate in a linked data panel at the ALIA (Australian Library and Information Association) Information Online conference held in Sydney last month. We panelists were pleasantly surprised by the turnout—about 200 people attended our session, more than anticipated (standing room only). The conference organizers were amazingly prompt at uploading all presentations for the entire conference on the ALIA website.

Here’s my summary of the session’s highlights, reinforced by what was tweeted (#online17).

Monika Szunejko (National Library of Australia): “The links effect: The laws of attraction, linked data, and the national union catalogues of France and Britain” (Rather than speaker notes on the PowerPoint, see the accompanying paper.) Monika highlighted key findings from her case studies of the British Library’s and Bibliothèque nationale de France’s releases of their respective national bibliographies as linked data to inform the National Library of Australia’s own plans.

  • The BL wanted to move away from library-specific formats and adopt standards that would reach audiences outside the library world; data must be open and re-used. It also wanted to enrich professional practice, leveraging staff skills in scripting languages and data conversion. Its success was affirmed by unlocking the value of its national bibliography to broader audiences, influencing Google rankings for its unique material and developing in-house expertise in linked data principles and data modeling. It also built an infrastructure that could be shared with other libraries.
  • The BnF’s main impetus was to connect different databases, breaking down their silos—“a wedding planner between data sets”—enhancing discovery across collections and making them visible on the web to reach non-library audiences. It already had a deep engagement with identifiers as one of the founders of both VIAF and ISNI, and saw linked data as a means to expand their professional practices.

Both organizations see disseminating information as part of their mission, purpose and value and linked data helps them achieve that goal, while reaching new audiences and maximizing their data assets.  Monika concluded that it was important for each library to understand its own context—your own reasons for moving to linked data—and your own capacity and capabilities. She encouraged others to join communities of practice to inform their own implementations.

Jenny Langenstrass & Katrina McAlpine (U. Sydney): “Good data bad data: getting ready for linked data” (the PowerPoint includes speaker notes). This presentation focused on the steps libraries could take now to prepare their legacy MARC data for future linked data implementations, such as data remediation and using identifiers. The University of Sydney has focused on authority work, and particularly disambiguating personal names (over 40 entries for “Robert Scott”). Research information management systems already use identifiers like DOIs and ORCIDs and those for grants so that researchers can be linked to their papers, institutions, grants, datasets and collaborators. Katrina pointed to the Research Data Switchboard ( as a successful example of linking researchers, organizations, datasets and papers across the data collections in Research Data Australia, Dryad, CERN and other international repositories.

Mine on “Linked data – bringing the world closer together” (PowerPoint includes speaker notes). My presentation focused on the opportunities of embedding library data into the “web of data”, not just the “web of documents” we’ve become familiar with. I focused on where a large portion of our communities live and learn—Google and Wikipedia—as platforms where we could leverage the work libraries have done to expose relationships among works, translations and their creators. I also showed some linked data examples from the 2015 International Linked Data Survey for Implementers to illustrate how library data could bridge domains and languages.

There were many good and inspiring sessions during the ALIA Information Online 2017 conference and I encourage you to look at the full program.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

Mail | Web | Twitter | More Posts (73)

Open Knowledge Foundation: Open Data Day 2017 in Kisumu, Kenya – Can we tell stories about our local environment with no open data?

planet code4lib - Mon, 2017-03-13 15:51

This blog is part of the event report series on International Open Data Day 2017. On Saturday 4 March, groups from around the world organised over 300 events to celebrate, promote and spread the use of open data. 44 events received additional support through the Open Knowledge International mini-grants scheme, funded by SPARC, the Open Contracting Program of Hivos, Article 19, Hewlett Foundation and the UK Foreign & Commonwealth OfficeThis event was supported through the mini-grants scheme under the Environment theme.

This post was originally published on the LakeHub website:

On Saturdays the 4th of March we gathered at the hub for Open Data Day, 2017. It’s the third time we are hosting this important workshop, the previous times being in 2014 and 2015. Open Data day events are a great opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business and civil society.

The initial plan for Open Data Day Kisumu was to host a datathon on the climatic implications on jobs and the economy around the city. Ailing levels of Lake Victoria’s water levels, visualise how Lake Victoria has been receding, fish species of Lake Victoria. We wanted to know whether fishes are dying or getting extinct for instance. Visualising the destruction of the Mau Forest would have been another great eye-opener – Global Forest Watch provides several data sets on forest coverage, use, loss, and indigenous peoples. However, due to time-constraints much of this data was not accessible. Bringing us to the realisation of how much Open Data is much needed – but very much so, leadership on initiatives to popularise open data required.

This year Open Data Day Kisumu was a bit different from the past. Unlike previous events where data experts presented informative visuals to the participants and little contribution from the participants, this time our participants, consisting primarily of teenage developers, benefited from a walk through of how Open Data could be useful in their work.

These young group of people, who were mainly girls from a local school, built mobile apps to solve some challenges in the city. One such solution was: a group of girls in the primary school mapped spots known as ‘black spots’ which can cause road carnage. At the workshop, the girls got to learn about how to access publicly available data from the government website on the number of traffic accidents that occur each month. The data, however, did not turn out to be very useful in the form in which it was presented, and therefore strategies on how to clean the data were learned.

Primary school girls present an app they are building. Young learners should be made aware of the benefits of Open Data.

Some other interesting solutions the girls are working on and could greatly benefit from open data include a game that educates the public on safe sex and the dangers of HIV/AIDs and non-adherence to medication. Another solution, dubbed “Kiraia”, is a children’s app that encourages them to take part in governance. The group of girls working on this solution want to make it possible for children to participate in day-to-day activities regarding civic action and government. The team was mentored on which data they could find useful for their work, and how possibly they could leverage existing data to create Open Data for the masses.

Data Scientist Dominic Mutai from PATH, Kisumu leading the session

Even though we had to divert from our original plan, we are still confident that moving forward one of our primary goals as a community interested in Open learning would be to use data to highlight effects of climate change, deforestation on Lake Victoria, the second largest freshwater body in the world. Water levels in the lake have been dropping drastically over the years. As a community, we want to build visualisations that tell a story about the lake.

District Dispatch: Applications open for Oakley Memorial Scholarship

planet code4lib - Mon, 2017-03-13 15:41

In recognition of his life accomplishments and contributions, the American Library Association (ALA) and the Library Copyright Alliance (LCA) have established the Robert L. Oakley Memorial Scholarship to support research and advanced study for librarians in their early-to-mid-careers who are interested and/or active in the fields that Robert Oakley was expert in, namely: intellectual property, public policy, copyright and their impacts on libraries and the ways libraries serve their communities.

Professor and law librarian Robert Oakley was an expert on copyright law and wrote and lectured on the subject. He served on the LCA representing the American Association of Law Librarians (AALL) and played a leading role in advocating for U.S. libraries and the public they serve at many international forums including those of the World Intellectual Property Organization (WIPO) and United Nations Educational Scientific and Cultural Organization (UNESCO). He served as the United States delegate to the International Federation of Library Associations (IFLA) Standing Committee on Copyright and Related Rights from 1997-2003. Mr. Oakley testified before Congress on copyright, open access, library appropriations and free access to government documents and was a member of the Library of Congress’ Section 108 Study Group. A valued colleague and mentor for numerous librarians, Oakley was a recognized leader in law librarianship and library management who also maintained a profound commitment to public policy and the rights of library users.

A $1,000 scholarship to encourage and expand interest in and knowledge of these aspects of librarianship, as well as bring the next generation of advocates, lobbyists and scholars to the forefront with opportunities they might not otherwise have will be awarded annually to an individual or a team of individuals who meet eligibility criteria.

Highest consideration will be given to applicants who are librarian(s) in their early-to-mid-careers. They must demonstrate interest in or be professional actively in copyright and public policy and their impacts on libraries and the ways libraries serve their communities.

Applicants should provide a statement of intent for use of the scholarship funds. Such a statement should include the applicant’s interest and background in intellectual property, public policy and/or copyright and their impacts on libraries and the ways libraries serve their communities. The statement should include information about how the applicant and the library community will benefit from the applicant’s receipt of scholarship. Statements should be no longer than 3 pages (approximately 1,000 words).

The applicant’s resume or curriculum vitae should be included in their application.

Applications must be submitted via e-mail to Carrie Russell, by May 15, 2017.

Awardees may receive the Robert L. Oakley Memorial Scholarship up to two times in a lifetime. Funds may be used for equipment, expendable supplies, travel necessary to conduct, attend conferences, release from library duties or other reasonable and appropriate research expenses.

A review committee will be made up of the members selected by the LCA, with one representative each from the collaborating associations: ALA, the Association of College and Research Libraries and the Association of Research Libraries. ALA’s Washington Office will oversee jury selection.

The post Applications open for Oakley Memorial Scholarship appeared first on District Dispatch.

David Rosenthal: The Amnesiac Civilization: Part 3

planet code4lib - Mon, 2017-03-13 15:00
In Part 2 of this series I criticized Kalev Leetaru's Are Web Archives Failing The Modern Web: Video, Social Media, Dynamic Pages and The Mobile Web for failing to take into account the cost of doing a better job. Below the fold I ask whether, even with unlimited funds, it would actually be possible to satisfy Leetaru's reasonable-sounding requirements, and whether those requirements would actually solve the problems of Web archiving.

Leetaru is as concerned as I am that society retain an accurate record of citizen's information environment. He accurately identifies what, in an ideal world, should be archived:
Today the web is all about streaming video and audio. ... Multimedia is difficult to archive not only because of its size (its quite easy to accumulate a few petabytes of HD video without much difficulty), but also because most streaming video sites don’t make it easy to download the original source files. ... In our device-centric world in which we watch videos from large-format televisions, ultra resolution desktops, low resolution phones, etc it is also important to recognize that streaming sites typically offer multiple versions of a video in different resolutions and compression levels that can result in dramatically different viewing experiences. ... Some platforms also go to extended lengths to try and prevent unauthorized downloading of their content via special encodings, encryption and other protections.So he wants multiple versions of petabytes of video. While from a technical perspective this might be "quite easy", from a funding perspective it isn't. The Internet Archive currently stores around 30PB and adds about 15TB/day, of which I believe the Web archive is about half. Using Amazon S3 pricing, adding 5PB of video would add about 10% to the Archive's budget in storage charges the first year alone, so it would be a big-ish deal. Not to mention the legal problems of dealing with "special encodings, encryption and other protections".

Leetaru also quite reasonably wants comprehensive collections of even the private parts of social media sites:
nearly all major social platforms are moving towards extensive privacy settings and default settings that encourage posts to be shared only with friends. ... This means that even if companies like Facebook decided to make available a commercial data stream of all public content across the entire platform, the stream would capture only a minuscule fraction of the daily life of the platform’s 2 billion users.which he admits is hopeless:
From a web archival standpoint, the major social media platforms are largely inaccessible for archiving. ... Facebook ... continually adapts its technical countermeasures and has utilized legal threats in the past to discourage bulk downloading and distribution of user data. Shifting social norms around privacy mean that regardless of technological or legal countermeasures, users are increasingly walling off their data and making it unavailable for the public access needed to archive it. In short, as social media platforms wall off the Internet, their new private parallel Internets cannot be preserved, even as society is increasingly relying on those new walled gardens to carry out daily life.He and I agree that the future is looking dim for the desktop PC, so he wants to archive all the many mobile versions of every page:
Over the last few years Internet users have increasingly turned to mobile devices from cellphones to tablets to access the Internet. From early mobile-optimized sites to today’s mobile-first world, the Internet of today is gradually leaving its desktop roots behind. Google has been a powerful force behind this transition, penalizing sites that do not offer mobile versions.
Adding mobile web support to web archives is fairly trivial, but it is remarkable how few archives have implemented complete robust mobile support. Even those that offer basic mobile crawling support rarely crawl all versions of a page to test for how differences in device and screen capabilities affect the returned content and the level of dynamic customization in use.I think Leetaru is wrong to claim that mobile support is "fairly trivial", but even "fairly trivial" enhancements incur development, testing and maintenance costs. Not to mention the costs of finding, crawling and storing the many different mobile versions of a site.

Leetaru is expecting Web archives to do many times more crawling and storing than they currently do, with no additional resources. So not going to happen.

But even if it did, this doesn't even begin to address the real problem facing Web archives. Leetaru writes:
An increasing number of servers scan the user agent field and deny access to the mobile edition of a page unless the client is an actual mobile device, meaning an ordinary crawler requesting a mobile page, but using its standard desktop user agent tag will simply be redirected to the desktop version of the page. Some sites go even further, returning versions of the site tailored for tablets versus smartphones and even targeting specific devices for truly customized user experiences, requiring multiple device emulation to fully preserve a page in all its forms. But he doesn't address the major sources of variability among the versions of web page content, which are personalization and geolocation. It used to be the case that society's basic information environment was mass media, and it was safe to assume that all consumers of each of those mediums saw the same content. This hasn't been the case for years; every visitor to a site with a significant audience sees different content. This started with the advertisements. Every visit to every page gets a different selection of ads, based on a real-time auction. Web archives responded by no longer collecting the ads.

A much more advanced form of targeting content has recently become controversial in politics:
In an article for Campaign magazine last February, he described how [Cambridge Analytica] had “helped supercharge Leave.EU’s social media campaign by ensuring the right messages are getting to the right voters online.”There are doubts about Cambridge Analytica's claims, but it is clear that even outside social media sites, the capability to individually tailor the content, not just the ads, at a URI is increasingly likely to be used.

If Leetaru wants to archive every version of a site he needs a Web archive not merely to emulate every possible browser and device combination, but every possible user and location combination. After all, I definitely see a different version of many sites from my laptop when I'm at home from when I'm behind the Great Firewall of Cameron.

There are about 3.4*109 Internet users from about 200 countries, so there are about 6.8*1011 possible versions of every Web page for each browser and device combination. Say there are 100 of these combinations, and the average Web page is about 2.3*106 bytes. So storing a single Web page could take up to about 1.6*1020 bytes, or 160 exabytes.

But storage isn't actually the problem, since deduplication and compression would greatly reduce the storage needed. The problem is that in order to be sure the archive has found all the versions, it has to download them all before it can do the deduplication and compression.

I believe the Internet Archive's outbound bandwidth is around 2*109 byte/s. Assuming the same inbound bandwidth to ingest all those versions of the page, it would take about 8*1010 seconds, or about 2.5*103 years, to ingest a single page. And that assumes that the Web site being archived would be willing to devote 2GB/s of outbound bandwidth for two-and-a-half millenia to serving the archive rather than actual users.

The point here is to make it clear that, no matter how much resource is available, knowing that an archive has collected all, or even a representative sample, of the versions of a Web page is completely impractical. This isn't to say that trying to do a better job of collecting some versions of a page is pointless, but it is never going to provide future researchers with the certainty they crave.

District Dispatch: 2017 Google Policy Fellowship announced

planet code4lib - Mon, 2017-03-13 14:09

The American Library Association (ALA) today announces the opening of the application process for the prestigious Google Policy Fellowship program. The ALA Office for Information Technology Policy (OITP) was a founding participant of the program back in 2008.

Applications for the 2017 Google Policy Fellowship are due by March 24.

For the summer of 2017, the selected fellow will spend 10 weeks in residence at the ALA policy office in Washington, D.C., to learn about national policy and complete a major project. Google provides the $7,500 stipend for the summer, but the work agenda is determined by the ALA and the fellow. Throughout the summer, Google’s Washington office will provide an educational program for all the fellows, including lunchtime talks and interactions with Google Washington staff.

The fellows work in diverse areas of information policy that may include digital copyright, e-book licenses and access, future of reading, international copyright policy, broadband deployment, online privacy, telecommunications policy (including e-rate and network neutrality), digital divide, open access to information, free expression, digital literacy, the future of libraries generally and many other topics. Refer to the National Policy Agenda for Libraries for an overview of current priorities.

Nick Gross was our 2016 Google Policy Fellow. He completed research on telecommunications and copyright policy to help us prepare for the incoming President and Congress in 2017.

Further information about the program, host organizations and the application process is available at the Google Public Policy Fellowship website. ALA encourages all interested graduate students to apply and, of course, especially those in library and information science-related academic programs. Applications are due by Friday, March 24, 2017.

The post 2017 Google Policy Fellowship announced appeared first on District Dispatch.

Islandora: Islandora Licensed Software Procedure - Major Update!

planet code4lib - Mon, 2017-03-13 13:48

Over the past several weeks, the Islandora Roadmap Committee has been working to revise our Licensed Software Acceptance Procedure for clarity and to better reflect how we operate. After a period of community review, they drafted new language and the Islandora Board of Directors has made it official:

Islandora Foundation Licensed Software Acceptance Procedure (LSAP)

The goal of these changes was to make the process clearer and easier to follow for both those wanting to contribute software to Islandora, and for the Roadmap Committee tasked with reviewing those contributions. The new LSAP presents the steps in an order that better matches how the process has tended to work, and provides clear minimal standards for code to come into Islandora, so that contributors can evaluate what work they may need to do in order to have their contribution accepted. The new guidelines also draw a clearer line between the (lower) requirements to send a module to Islandora Labs and the way we measure whether a module should go into the core Islandora GitHub and become part of the regular release schedule. You can compare to the old version stored here.

FOSS4Lib Recent Releases: VuFind - 3.1.3

planet code4lib - Mon, 2017-03-13 13:32
Package: VuFindRelease Date: Monday, March 13, 2017

Last updated March 13, 2017. Created by Demian Katz on March 13, 2017.
Log in to edit this page.

Minor bug fix / translation expansion release.

Library of Congress: The Signal: Developing a Digital Preservation Infrastructure at Georgetown University Library

planet code4lib - Mon, 2017-03-13 12:48

This is a guest post by Joe Carrano, a resident in the National Digital Stewardship Residency program.

Preliminary sketch of Lauinger Library by John Carl Warnecke. Photo courtesy of Georgetown University Library.

The Joseph Mark Lauinger Memorial Library is at home among the many Brutalist-style buildings in and around Washington, D.C. This granite-chip aggregate structure, the main library at Georgetown University, houses a moderate-sized staff that provides critical information needs and assistance to the University community. Within that staff is the the library IT department, half of which is focused on programming, web development and support for computers and systems; the other half is focused on digital services such as digital publishing, digitization and digital preservation. These functions overlap and support each other, which creates a need to work together.

Mingling librarians and IT professionals within the same department is a little different than the way in which many libraries and archives handle the division of services. In some other organizations, the two services are in separate departments and the relationship of librarians and archivists with IT can be dysfunctional. At Georgetown, both types of professionals work closely together, fostering better communication and making it easier to get things done. Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media. They may even have an old collection of floppy disks to donate for testing.

While digital preservation and IT is centered in one department, the preservation files for digitized and born-digital material are spread throughout the library, in different systems and on different storage mediums. My NDSR project focuses in part on bringing together these materials and documenting the workflows of putting them into Academic Preservation Trust, a digital preservation repository.

Georgetown’s decision to use APTrust stemmed from a digital-preservation working group at the library. Group members identified strategies the library could take to improve digital preservation management and methods it should take to implement their goals. Four of the six “next steps” identified in this plan helped form my NDSR project:

1. Implement preservation infrastructure, including a digital-preservation repository
2. Develop and document digital-preservation workflows and procedures
3. Develop a training program and documentation to help build specialized skills for librarians and staff
4. Explore and expand collaborations with internal (university-wide) and external partners to enhance and extend a sustainable infrastructure and to further the library’s involvement at the regional and national levels in digital-preservation strategies.

These goals build upon each other to create a sustainable digital-preservation framework. Membership in — and use of — APTrust fulfilled the first and fourth of these goals. In addition, the difficulty and cost associated with the creation of our own trusted digital repository led the staff to choose this option, which could meet our needs just as well.

APTrust is a distributed, dark, digital-preservation repository, which stores member institutions’ digital materials in Amazon Web Services’ cloud storage (Amazon S3 and Glacier), in two geographic regions – Virginia and Oregon – and in three different “availability zones” within those regions. Digital content is ingested and duplicated six times (once in each zone), so the base 10TB of storage per member institution actually amounts to a total of 60TB. Along with the other preservation actions performed by APTrust, this distribution should help ensure long-term preservation of Georgetown’s digital materials.

As well as being a digital preservation repository, APTrust is also a consortium of higher-education  institutions that are a part of the governance and development of the repository. For instance, I and other Georgetown staff members are in the Bagging Best Practices working group, which determines member-institutions’ needs relating to the BagIt specification [watch the video] and how BagIt is used for packaging and transfer of material into APTrust.

So, while Georgetown gets a hosted digital-preservation repository, it also gets to guide their efforts and participate regionally and nationally in the digital-preservation community. (If you’re interested in the guiding principles of APTrust, check out this Signal interview from 2015).

Joe Carrano. Photo by Mike Matason, Georgetown University Library.

Through implementing APTrust, we are also able to fulfill, in part, steps 2 and 3 mentioned above. Georgetown’s migration of materials into APTrust depends on the creation of tools to manage and upload. This is where the close working relationship with the developer in our department, Terry Brady, has been essential, allowing each of us to draw on each other’s expertise to create custom automated solutions to fit Georgetown’s needs. The code for Terry’s BagIt tool and upload verification tool is available on GitHub.

So far, we’ve completed workflows and ingest for all content that had preservation copies in our DSpace digital repository, DigitalGeorgetown. I am also developing documentation and workflows that can be used so that any staff member can sit down and be able to upload materials into APTrust without much training.

I’ve begun training librarians and archivists in other departments to ensure the sustainability of the project’s outcome. Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people. Other staff here have been smoothly integrated into this process, thanks to our thorough documentation. Their new skills helps speed up our ingest rate. This documentation will be open to the public when complete; we hope that it will be useful to the wider library and archival community.

Currently, we’re working on ingesting materials into APTrust that have their preservation copy on network or external storage and metadata in DigitalGeorgetown. This is less automated due to having to get preservation copies off external storage, rather than from our DSpace servers. We will start moving into items with metadata and/or files from other systems such as ArchivesSpace, embARK (art collections) and the library catalog.

By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown.

Open Knowledge Foundation: OK Sweden’s first ever EU Project (CLARITY), Open Knowledge Awards – and other quarter 4 updates

planet code4lib - Mon, 2017-03-13 11:00

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Sweden team.


2016 was a productive year for Open Knowledge Sweden! We are a team that is passionate about open knowledge, transparency, democracy and value creation. This post documents some of the projects and events we were engaged in for the last quarter of 2016.


OK Sweden is part of the pan-EU CLARITY Project which is in its first year of a two-year term. The project will support the European Member States in their pursuit for greater trust, transparency, and efficiency within their open eGovernment initiatives and highlight best practices within this field.

We have worked hard as a team in this project and occupied leadership roles on many CLARITY related tasks. For example, Serdar Temiz served as Project Manager for the CLARITY packages OK Sweden was responsible for, while Toni Mickiewicz was adjudged the most active person on the project at OK Sweden.

Focus group discussion in Stockholm

We have produced several deliverables and conducted many activities as listed below for the CLARITY Project:

  • We have conducted a stakeholder analysis within the open eGovernment ecosystem
  • We have analysed the key drivers in the uptake of open eGovernment services in Europe
  • We have conducted a needs assessment of societal, public sector and industry and market needs when it comes to the uptake of open eGovernment services
  • We held a Development Sprint Event in Amsterdam in December 2016, where four teams developed blueprints for 4 new open eGoverment services.
  • We have held two foresight focus groups with stakeholders in Sweden and Spain and will host the third group in The Netherlands in March. Results will be in a forthcoming briefing paper on “Considerations for the take-up of open eGovernment services in Europe” – due in March 2017

Please read and don’t hesitate to give us feedback! You also follow us on our website, as well as social media – twitter and LinkedIn group. If you are interested in joining our contact list, drop us a line.

Open Knowledge Awards in Sweden!

OK Sweden had its first ever Open Knowledge Awards to acknowledge people and organisations that are working towards open Knowledge in Sweden! We collaborated with KTH- Royal Institute of Technologies Industrial Marketing and Entrepreneurship Division and was sponsored by Wikimedia.  The award ceremony was held at KTH ITM Schools Indek Building and was managed by Asmen Gul – the OKAwards and Open Data Index Project Manager of OK Sweden. We plan to have the awards every year!

The awards ceremony received several positive feedbacks and a wide press coverage. We can easily say that it was our most attention-grabbing event of the year.

Open Knowledge Awards team and some winners

 You can find more information regarding the award, jury, categories, and winners from here. (AskTheState) – our first Freedom of Information (FoI) Website

Our first FOI website: (AskTheState), managed by Mattias Axel was launched in May.  The project is very important to us because we launched it at the anniversary of the FOI Act, which dates back to 1766 – the first freedom of information legislation in the modern sense which is called the Principle of Public Access (offentlighetsprincipen).

We presented the platform at the Umeå Hacker Space and took part in the sestercentennial – 250th! – Anniversary for the adoption of legal guarantees for freedom of information and a free press in Sweden which was run by the Swedish National Library.

An English translation of the Anders Chydenius FOI text is available here because the Swedish parliament abolished censorship of books and newspapers and required authorities to provide public access to all official records with the passing of ”His Majesty’s Gracious Ordinance Relating to Freedom of Writing. Articles on the same topic by other contributors are also available.

Sweden’s Biggest Internet event

OK Sweden in partnership with Engaging Privacy, ISOC-SE, OKSE, DFRI, and IIS prepared a two-day program [21st and 22nd of November] on privacy and integrity tracks at Sweden’s Biggest Internet event which was organised by The Internet Foundation In Sweden. The keynote speaker of the event was Edward Snowden. Although he could not be present in Sweden in person, he was able to deliver his speech – thanks to digital technologies!

In one of the parallel tracks, the Chairman of OK Sweden – Serdar Temiz, gave a presentation on Data Privacy, Corporates, and States.

Also, OK Sweden member and FragaStaten’s Project Manager, Mattias Axell facilitated workshops for better privacy practices in business and public organisations on both days of the event. You can read more about the schedule of the event here.

Other Events and Activities 

We have two new project members: Malin Isaksson and Julia Navrén. Both will work for the – A Digital Quality Index for Press Freedom and Freedom of Information Act. Funded by the Internet Fund (internetfonden), the project will seek to build relationships and networks and infrastructure for media organisations and the public in Sweden, to make it easier to view the quality of the principle of public access. You can follow the progress of this project via here.

We also created training materials for Open Data with CC license with the help of Wikimedia Sweden, Asmen Gul and Valentin Brutaru run the project.  We are glad to have these materials available for our future educational activities.

Secret Announcement!

As part of our mission to re-work OK Sweden to become a new and improved organisation, we now have our first ever Executive Director, Fredrik Sjöberg, who brings with him many years of project and team management skills. We strongly believe OK Sweden will become more effective and productive with Fredrik’s management and contributions to the team. Our press release regarding this change will follow soon. Until then, it is a secret between you and us. OK?

We are working on great projects for 2017 so stay tuned!

FOSS4Lib Upcoming Events: Islandora Camp - Delft, Netherlands

planet code4lib - Mon, 2017-03-13 02:41
Date: Tuesday, June 13, 2017 - 08:00 to Thursday, June 15, 2017 - 17:00Supports: Islandora

Last updated March 12, 2017. Created by Peter Murray on March 12, 2017.
Log in to edit this page.

The Islandora Foundation invites you to join us in the Netherlands for our next Islandora Camp, taking place June 13 - 15 at Delft University of Technology.

Registration is now open:

DuraSpace News: Fedora Repository Contributor Danny Bernstein–A Musical Approach to Fedora Flexibility

planet code4lib - Mon, 2017-03-13 00:00

The Fedora repository project relies on many individuals and institutions to ensure the project's success. We are grateful for their commitment and will showcase their contributions in a series of community profiles aimed at recognizing our contributors’ achievements, and introducing them to the rest of the community. This is the first article in the Fedora Repository Contributor Profile series.

DuraSpace News: VIVO Updates for March 13–Ontology, Implementation, Camp, Conference

planet code4lib - Mon, 2017-03-13 00:00

From Mike Conlon, VIVO Project Director

What's hot?  Ontology is hot, that's what's hot.  Interested in the ontologies that drive VIVO?  So are we.  Here's some newness around the VIVO ontologies:

Jonathan Rochkind: “This week in rails” is a great idea

planet code4lib - Sun, 2017-03-12 18:49

At some point in the past year or two (maybe even earlier? But I know it’s not many years old) the Rails project  started releasing roughly weekly ‘news’ posts, that mostly cover interesting or significant changes or bugfixes made to master (meaning they aren’t in a release yet, but will be in the next; I think the copy could make this more clear, actually!).

This week in Rails

(Note, aggregated in rubyland too!).

I assume this was a reaction to a recognized problem with non-core developers and developer-users keeping up with “what the heck is going on with Rails” — watching the github repo(s) wasn’t a good solution, too overwhelming, info overload. And I think it worked to improve this situation! It’s a great idea!  It makes it a lot easier to keep up with Rails, and a lot easier for a non-committing or non/rarely-contribution developer to maintain understanding of the Rails codebase.  Heck, even for newbie developers, it can serve as pointers to areas of the Rails codebase they might want to look into and develop some familiarity with, or at least know exist!

I think it’s been quite successful at helping in those areas. Good job and thanks to Rails team.

I wonder if this is something the Hydra project might consider?  It definitely has some issues for developers in those areas.

The trick is getting the developer resources to do it of course. I am not sure who writes the “This Week in Rails” posts — it might be a different core team dev every week? Or what methodology they use to compile it, if they’re writing on areas of the codebase they might not be familiar with either, just looking at the commit log and trying to summarize it for a wider audience, or what.  I think you’d have to have at least some familiarity with the Rails codebase to write these well, if you don’t understand what’s going on yourself, you’re going to have trouble writing a useful summary. It would be interesting to do an interview with someone on Rails core team about how this works, how they do it, how well they think it’s working, etc.


Filed under: General

Brown University Library Digital Technologies Projects: In Progress: The Mark Baumer Digital Collection

planet code4lib - Fri, 2017-03-10 20:55

This is a guest post by Brown University Library’s Web Archiving Intern, Christina Cahoon. Christina is currently finishing her Masters of Library and Information Science degree at the University of Rhode Island. 

After the recent passing of Brown University alumnus and Library staff member Mark Baumer MFA ‘11, the Brown University Library tasked itself with preserving his prolific web presence. I’m working towards that goal with Digital Preservation Librarian, Kevin Powell.  Baumer was a poet and environmental activist who worked within the Digital Technologies Department as Web Content Specialist.  This past October, Baumer began his Barefoot Across America campaign, with plans to walk barefoot from Rhode Island to California in an effort to raise money for environmental preservation and to support the FANG Collective.  Unfortunately, this journey was tragically cut short on January 21, 2017, when Baumer was struck by a vehicle and killed while walking along a highway in Florida.

Baumer was an avid social media user who posted on several platforms multiple times a day.  As such, the task of recording and archiving Baumer’s web presence is quite large and not free from complications.  Currently, we are using Archive-It to crawl Baumer’s social media accounts and news sites containing coverage of Baumer’s campaign, including notices of his passing. While Archive-It does a fairly decent job recording news sites, it encounters various issues when attempting to capture social media content, including content embedded in news articles.  As you can imagine, this is causing difficulties capturing the bulk of Baumer’s presence on the web.

Archive-It’s help center has multiple suggestions to aid in capturing social media sites that have proven useful when capturing Baumer’s Twitter feed; however, suggestions have either not been helpful or are non-existent when it comes to other social media sites like YouTube, Instagram, and Medium.  The issues faced with crawling these websites range from capturing way too much information, as in the case with YouTube where our tests captured every referred video file from every video in the playlist, to capturing only the first few pages of dynamically loading content, as is the case with Instagram and Medium. We are re-configuring our approach to YouTube after viewing Archive-It’s recently-held Archiving Video webinar, but unfortunately the software does not have solutions for Instagram and Medium at this time.  

These issues have caused us to re-evaluate our options for best methods to capture Baumer’s work.  We have tested how WebRecorder works in capturing sites like Flickr and Instagram and we are still encountering problems where images and videos are not being captured.  It seems as though there will not be one solution to our problem and we will have to use multiple services to sufficiently capture all of Baumer’s social media accounts.

The problems encountered in this instance are not rare in the field of digital preservation.  Ultimately, we must continue testing different preservation methods in order to find what works best in this situation.  It is likely we will need to use multiple services in order to capture everything necessary to build this collection.  As for now, the task remains of discovering the best methods to properly capture Baumer’s work.

David Rosenthal: Dr. Pangloss and Data in DNA

planet code4lib - Fri, 2017-03-10 16:00
Last night I gave a 10-minute talk at the Storage Valley Supper Club, an event much beloved of the good Dr. Pangloss. The title was DNA as a Storage Medium; it was a slightly edited section of The Medium-Term Prospects for Long-Term Storage Systems. Below the fold, an edited text with links to the sources.

I'm David Rosenthal, your resident skeptic. You don't need to take notes, the whole text of this talk with links to the sources will be up on my blog tomorrow morning.

Seagate 2008 roadmapMy mantra about storage technologies is "it takes longer than it takes". Disk, a 60-year-old technology, shows how hard it is to predict lead times. Here is a Seagate roadmap slide from 2008 predicting that the then (and still) current technology, perpendicular magnetic recording (PMR), would be replaced in 2009 by heat-assisted magnetic recording (HAMR), which would in turn be replaced in 2013 by bit-patterned media (BPM).

In 2016, the trade press reported that:
Seagate plans to begin shipping HAMR HDDs next year.ASTC 2016 roadmap Here is last year's roadmap from ASTC showing HAMR starting in 2017 and BPM in 2021. So in 8 years HAMR went from next year to next year, and BPM went from 5 years out to 5 years out. The reason for this real-time schedule slip is that as technologies get closer and closer to the physical limits, the difficulty and above all cost of getting from lab demonstration to shipping in volume increases exponentially.

Today I'm aiming my skepticism at the idea of storing data in DNA. The basic idea is obvious; since there are four bases ACGT in theory you can store 2 bits per base along the DNA molecule. On the one hand in practice you can't get 2 bits, but on the other hand it has recently been shown that you can add artificial bases X and Y. There are strong arguments that DNA would be a great off-line archival medium:
  • It is very dense.
  • It is very stable in shirt-sleeve environments over the long term.
  • It is very easy to make lots of copies, which keep stuff safe.
  • The technologies needed to use it have other major applications.
The first demonstration of storing data in DNA was in 1988, but the first to store more than a kilobyte were reported in a 1-page paper in Science from Harvard, and a longer paper in Nature from the European Molecular Biology Lab. The EMBL paper was submitted first, on 15 May 2012 but the Harvard paper was published first, on 16 August 2012; because it had much less content it could be reviewed more quickly. The world thinks Harvard did DNA storage first but in reality it was close and EMBL won by a nose.

The Harvard team wrote and read about 640KB of data divided into 96-bit blocks with a 19-bit address. The EMBL team used a more sophisticated encoding scheme designed to avoid sequences prone to error in synthesis, and including parity-check error detection in each block. They wrote about 740KB, converted the DNA to a form suitable for long-term storage, shipped it with no special precautions, read a sample of the DNA and stored the rest.

So far, so good. Both teams demonstrated that it is possible to write data to, and read it back from, DNA. But neither team could resist hyping their work. Here is Harvard (my emphasis):
costs and times for writing and reading are currently impractical for all but century-scale archives. However, the costs of DNA synthesis and sequencing have been dropping at exponential rates of 5- and 12-fold per year, respectively — much faster than electronic media at 1.6-fold per year.But the EMBL team easily won the hype competition:
our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.And the lay press was off and running, with headlines like:
The 'genetic hard drive' that could store the complete works of Shakespeare (and revolutionise the way computers work)and:
DNA 'perfect for digital storage'But the serious hype came in EMBL's cost estimates:
"In 10 years, it's probably going to be about 100 times cheaper," said Dr. Goldman. "At that time, it probably becomes economically viable."As I wrote at the time:
The EMBL team computes the cost of long-term storage in DNA using market prices for synthesis. They compare this with a model they constructed of the cost of long-term storage on tape. But they didn't need their own model of tape cost; long-term storage is available in the market from Amazon's Glacier, which is rumored to be three independent copies on LT06 tape ... Fewer than three copies would not be reliable enough for long-term storage.

Glacier storage currently costs 1c/GB/mo. Generously, I assume writing costs of $7.5K/MB for DNA ... If, instead of spending $7.5M to write 1GB of data to DNA the money is invested at 0.1%/yr real interest it would generate an income of $7.5K/yr for ever. Even assuming that Amazon never drops Glacier's price to match technological improvement, this would be enough to store three copies of 62.5TB for ever. The same money is storing (three copies of) 62,500 times as much data."about 100 times cheaper" doesn't even come close.

The hype died down for a while, but it has started up again. Nature recently featured a news article by Andy Extance entitled How DNA could store all the world's data, which claimed:
If information could be packaged as densely as it is in the genes of the bacterium Escherichia coli, the world's storage needs could be met by about a kilogram of DNA.The article is based on research at Microsoft that involved storing 151KB in DNA. But this paper concludes (my emphasis):
DNA-based storage has the potential to be the ultimate archival storage solution: it is extremely dense and durable. While this is not practical yet due to the current state of DNA synthesis and sequencing, both technologies are improving at an exponential rate with advances in the biotechnology industry[4].SourceThe Microsoft team don't claim that the solution is at hand any time soon. Reference 4 is a two year old post to Rob Carlson's blog. A more recent post to the same blog puts the claim that:
both technologies are improving at an exponential ratein a somewhat less optimistic light. It may be true that DNA sequencing is getting cheaper very rapidly. But already the cost of sequencing (read) was insignificant in the total cost of DNA storage. What matters is the synthesis (write) cost. Extance writes:
A closely related factor is the cost of synthesizing DNA. It accounted for 98% of the expense of the $12,660 EBI experiment. Sequencing accounted for only 2%, thanks to a two-millionfold cost reduction since the completion of the Human Genome Project in 2003.The rapid decrease in the read cost is irrelevant to the economics of DNA storage; if it were free it would make no difference. Carlson's graph shows that the write cost, the short DNA synthesis cost (red line) is falling more slowly than the gene synthesis cost (yellow line). He notes:
But the price of genes is now falling by 15% every 3-4 years (or only about 5% annually).A little reference checking reveals that the Microsoft paper's claim that:
both technologies are improving at an exponential ratewhile strictly true is deeply misleading. The relevant technology is currently getting cheaper slower than hard disk or flash memory! And since this has been true for around two decades, making the necessary 3-4 fold improvement just to keep up with the competition is going to be hard.

Last week Science published DNA Fountain enables a robust and efficient storage architecture by Yaniv Erlich and Dina Zielinski from Columbia. They describe an improved method for encoding data in DNA that, at 1.83 bits/nucleotide, gets much closer to the Shannon limit of 2 than previous attempts. Their experiment stored 2.2MB at about $3500/MB write cost.

Decades from now, DNA will probably be an important archival medium. But the level of hype around the cost of DNA storage is excessive. Extance's article admits that cost is a big problem, yet it finishes by quoting Goldman, lead author of the 2013 paper in Nature whose cost projections were massively over-optimistic. Goldman's quote is possibly true but again deeply misleading:
"Our estimate is that we need 100,000-fold improvements to make the technology sing, and we think that's very credible," he says. "While past performance is no guarantee, there are new reading technologies coming onstream every year or two. Six orders of magnitude is no big deal in genomics. You just wait a bit."Yet again the DNA enthusiasts are waving the irrelevant absolute cost decrease in reading to divert attention from the relevant lack of relative cost decrease in writing. They need an improvement in relative write cost of at least 6 orders of magnitude. To do that in a decade means halving the relative cost every year, not increasing the relative cost by 10-15% every year.

Despite my skepticism about time-scales, I believe that in the long term DNA has great potential as an archival storage medium. Just as I believe that what is interesting about Facebook's work on optical storage is the system aspects not the medium, I believe that what is interesting about the Microsoft team's work is the system aspects. For example, they discuss how data might be encoded in DNA to permit random access.

Although this is useful research, the fact remains that DNA data storage requires a reduction in relative synthesis cost of at least 6 orders of magnitude over the next decade to be competitive with conventional media, and that currently the relative write cost is increasing, not decreasing.

Harvard Library Innovation Lab: Privacy Concerns vs. Traditions – When the World Changes Around You

planet code4lib - Fri, 2017-03-10 14:50

(This is a guest post from the amazing Jessamyn West, who we’re lucky to have with us this year as a Research Fellow.)

I live in a town of 4500 people. Like most towns in Vermont we have an annual Town Meeting. We vote by Australian Ballot on things like budgets, but there’s time at the end of the meeting for Other Business. This year we discussed whether Randolph should become a sanctuary town. Another topic was the annual publication of the names of people who hadn’t paid their taxes at the time of the town report’s publication. I can remember being a kid and seeing these names in my own hometown town report, often of our town’s poorest residents. I always found the “name and shame” aspect of it troubling, though I know that others feel this is a necessary sanction to insure that taxes get paid promptly.

At this year’s Town Meeting we discussed whether the town should continue to publish the names of people with delinquent taxes in the town report. Delinquent taxes make up about 3% of the town’s tax revenue. You can see the list yourself, it’s on page 35 of this 37 MB document. People had varying opinions of the positive or negative aspects of this practice. A few people said “We’ve always done it that way.” I rarely speak at Town Meeting–I feel my opinions are often already well-represented–but this time I asked to speak and said “We may have always done it this way, but the world has changed. The town now puts the PDF of the town report online which it has been doing since 2010. This means it’s potentially indexed by Google which has been indexing PDFs for the past few years. People who are late on taxes are now perpetually Googleable as scofflaws.”

(Photo by Ramsey Papp. Used with permission.)

I should note at this point that I am aware that there are technical solutions for avoiding Google indexing that are not realistic within the scope of how our town manages their web content.

I went on to say that the people who show up on these lists are often people having trouble; two of the listings from this year are a man and his estate, clearly a person who has died. Most of the people in my area, especially my older, poorer and more rural neighbors, have almost no online footprint. This would give them one, a bad one. I concluded “We should not do this to them.”

The vote was close, the voice vote wasn’t conclusive so we had to do a standing vote. In the end we recommended that the selectboard look into discontinuing this practice. We might even wind up with some sort of compromise solution like the names being posted in the Town Hall but not on the internet. The fight for privacy, online and offline, is taking place everywhere. Make sure you speak up when you see a way that you could help.

Open Knowledge Foundation: Open Food Data and Open Transport Data: Two milestones in Switzerland’s Open Data History

planet code4lib - Fri, 2017-03-10 11:00

This blog post is part of our on-going Network series featuring updates from chapters across the Open Knowledge Network and was written by the Open Knowledge Switzerland team.

Open Knowledge Switzerland pursues the objective to make data freely available for everyone to use and republish in order to increase transparency, innovation and efficiency. 

The image depicts the board of at its meeting on December 8 [sketch by Oleg Lavrovsky].

We are excited to share the curated highlights from Switzerland for the fourth quarter of 2016 with you all.

Launch of the innovation program

In collaboration with Engagement Migros, the development funds of the Migros group, launched the three years program “Business Innovation”. The program supports creative, entrepreneurial and scientific projects in the area of food and nutrition data on their way from the first idea to market readiness by providing new open data, intense coaching and mentorship, and financial support. 

The first event to start out the program was the Open Food Data Hackday, which took place in February 2017, with parallel events in Zurich and Lausanne. If you want to follow the progress of this program, check the website -> for regular updates. 

Launch of

The open data platform of the swiss public transport was launched in early December 2016. is the platform for customer information data on public transport in Switzerland, where you can obtain data on public transport free of charge and get access to specific public transport services. The data published on this platform has been ideally prepared for developers. With the swiss chapter, we have been fighting for open public transport data since 2011, and now it’s finally here! Let’s all go use that data to make travel smarter!

Open Letter to Members of Parliament regarding Open Weather Data

In December 2016, André Golliez, the president of and Edith Graf-Litscher, the co-president of the parliamentary group digital sustainability, co-wrote an open letter to the members of the Council of States. In this open letter, Golliez and Graf-Litscher ask the members of this chamber to support a motion in favour of opening up weather data in Switzerland. You can find the letter here. We will keep you posted on any further development with regards to this matter.

First edition of “Youth hacks Zurich”

The Board and Community Members of the Swiss chapter supported the first edition of “Jugend hackt Zurich” which took place in November and taught teenagers how to code in a bid to make the world a better place.

Some participants at the Youth hacks Zurich

At the end, ten projects were realised among them are Apps (smart lock); games (the life of Ethan Drake); open data (Hotspot spotter), visualisations (IMNS-Is my network safe?) and hardware projects (with raspberry Pi & Arduino as well as diverse robots with EV3). More information is also available on the official website of the Swiss Television and Radio. If you’re interested in finding out more about the Youth Hacks Zurich event, check out the blog which documented the event and the results.  

55 Open Data Apps! 

Since 2014, the Research Center for Digital Sustainability at the University of Bern has been hosting an annual Open Data lecture. As part of the curriculum, students learn to visualise data and develop interactive web-applications. It is the 3rd edition this year, and, already, several open data applications have been developed. These apps can now be found in the Open Data Show Room.

Currently, there are over 55 Open Data Apps in the Show Room which are focusing on categories such as Education, Finance, Society, Health, Environment, Transport etc.

In Q4 of 2016 alone, a total of 23 new apps were being created. And even more, the great news is that the Open Data lecture will be back in spring 2017. To get to the open data showroom, please click on this link.

Keynotes, workshops & roundtables

Last but not least, our community and board members hosted several keynotes, workshops and round-tables in October – December 2016. To highlight one of many examples, we are glad to share Andreas Amsler’s keynote about Open Data – for fun and profit. 


Subscribe to code4lib aggregator