You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 1 week 1 day ago

Ed Summers: Static React

Wed, 2018-01-10 05:00

This post contains some brief notes about building offline, static web sites using React, in order to further the objectives of minimal computing. But before I go there, first let me give you a little background…

The Lakeland Community Heritage Project is an effort to collect, preserve, and interpret the heritage and history of those African Americans who have lived in the Lakeland community of Prince George’s County, Maryland from the late 19th century to the present. This effort has been led by members of the Lakeland community, with help from students from University of Maryland working with Professor Mary Sies to collect photographs, maps, deeds, and oral histories and published them in an Omeka instance at lakeland.umd.edu. As Mary nears retirement she has become increasingly interested in making these resources available and useful to the community of Lakeland, rather than embedded in a software application that is running on servers owned by UMD.

Recently MITH has been in conversation with LCHP to help explore ways that this data stored in Omeka could be meaningfully transferred to the Lakeland community. This has involved first getting the Omeka site back online, since it partially fell offline as the result of some infrastructure migrations at UMD. We also have been collecting and inventorying disk drives of content used by the students as they have collected and transfer devices over the years.

One relatively small experiment I tried recently was to extract all the images and their metadata from Omeka to create a very simple visual display of the images that could run in a browser without an Internet connection. The point was to provide a generous interface from which community members attending a meeting could browse content quickly and potentially take it away with them. Since we were going to be doing this in a environment where there wasn’t stable network access it was important that for the content to be browsed without an Internet connection. We wanted to be able to put the application on a thumb drive, and move it around as a zip file, which could also ultimately allow us to make it available to community members independent of the files needing to be kept online on the Internet.

The first step was getting all the data out of Omeka. This was a simple matter with Omeka’s very clean, straightforward and well documented REST API. Unfortunately, LCHP was running an older version of Omeka (v1.3.1) that needed to be upgraded to 2.x before the API was available. The upgrade process itself leapfrogged a bunch of versions so I wasn’t surprised to run into a small snag, which I was fortunately able to fix myself (go team open source).

I wrote a small utility named nyakara that talks to Omeka and downloads all the items (metadata and files) as well as the collections they are a part of, and places them on the filesystem. This was a fairly straightforward process because Omeka’s database ensures the one-to-many-relationships between a site and its collections, items, and files which means they can be written to the filesystem in a structured way:

omeka.example.org omeka.example.org/site.json omeka.example.org/collections omeka.example.org/collections/1 omeka.example.org/collections/1/collection.json omeka.example.org/collections/1/items omeka.example.org/collections/1/items/1 omeka.example.org/collections/1/items/1/item.json omeka.example.org/collections/1/items/1/files omeka.example.org/collections/1/items/1/files/1 omeka.example.org/collections/1/items/1/files/1/fullsize.jpg omeka.example.org/collections/1/items/1/files/1/original.jpg omeka.example.org/collections/1/items/1/files/1/file.json omeka.example.org/collections/1/items/1/files/1/thumbnail.jpg omeka.example.org/collections/1/items/1/files/1/square_thumbnail.jpg

This post was really meant to be about building a static site with React, and not about extracting data from Omeka. But this filesystem data is kinda like a static site, right? It was really just building the foundation for the next step of building the static site application, since I didn’t really want to keep downloading content from the API as I was developing my application. Having all the content local made it easier to introspect with command line tools like grep, find and jq as I was building the static site.

Before I get into a few of the details here’s a short video that shows what the finished static site looked like:

Lakeland Static Site Demo from Ed Summers on Vimeo.

You can see that content is loaded dynamically as the user scrolls down the page. Lots of content is presented at once in random orderings each time to encourage serendipitous connections between items. Items can also be filtered based on type (buildings, people and documents). If you want to check it out for yourself download this zip file and open up the index.html in the root of your home directory. Go ahead and turn off your wi-fi connection so you can see it working without an Internet connection.

When building static sites in the past I’ve often reached for Jekyll but this time I was interested in putting together a small client side application that could be run offline. This shouldn’t be seen as an either/or situation: it would be quite natural to create a static site using Jekyll that embeds a React application within it. But for the sake of experimentation I wanted to see how far I could go just using React.

Ever since I first saw Twitter’s personal archive download (aka Grailbird) I’ve been thinking about the potential of offline web applications to function as little time capsules for web content that can live independently of the Internet. Grailbird lets you view your Twitter content offline in a dynamic web application where you can view your tweets over time. Over the past few years the minimal computing has been gaining traction in the digital humanities community, as a way to ethically and sustainably deliver web content without necessarily needing to mentally make promises of keeping it online forever.

React seemed like a natural fit because I’ve been using it for the past year on another project. React offers a rich ecosystem of tools, plugins and libraries like Redux for building complex client side apps. The downside of using React is that it is not as easy for people to set up out of the box, or for changing over time if you you aren’t an experienced software developer. With Jekyll it’s not simple, but at least its relatively easy to dive in and edit HTML and CSS. But on the plus side for Reactf you really want to deliver an unchanging finished thing (static) artifact, then maybe these things don’t really matter so much?

At any rate it seemed like a worthwhile experiment. So here are a few tidbits I learned when bending React to the purposes of minimal computing:

The first is to build a static representation of your data. Many React applications rely on an external REST API being available. This type of dependency is an obvious no-no for minimal computing applications, because an Internet connection is needed, and someone needs to keep the REST API service up and running constantly, which is infrastructure and costs money.

One way of getting around this is to take all the structured data your application needs and bundle it up as a single file. You can see the one I created for my application here. As you can see it contains metadata for all the photographs expressed as JSON. But the the JSON itself is part of a global JavaScript variable declaration which allows it to be loaded by the browser without relying on an asynchronous HTTP call. Browsers need to limit the ability of JavaScript to fetch files from the filesystem for security reasons. This JavaScript file is loaded immediately by your web browser when it loads the index.html, and the app can access it globally as window.DATA. Think of it like a static read-only, in memory database for your application. The wrapping HTML will look as simple as something like this:

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Lakeland Community Heritage Project</title> <script src="static/data.js"></script> </head> <body> <div id="app"></div> <script type="text/javascript" src="bundle.js"></script> </body> </html>

Similarly, the image files need to be available locally. I took all the images and saved them into a directory I named static, and named the file using a unique item id (from Omeka) which allowed the metadata and data to be conceptually linked:

lakeland-images/static/{omeka-id}/fullsize.jpg

My React application has an Image component that simply renders the image along with a caption using the >figure<, <img> <figcaption> elements.

  • image
class Image extends Component { render() { return ( <Link to={'/item/' + this.props.item.id + '/'}> <figure className={style.Image}> <img src={'static/' + this.props.item.id + '/fullsize.jpg'} /> <figcaption> {this.props.item.title} </figcaption> </figure> </Link> ) } }

It’s pretty common to use webpack to build React applications, and the [copy-webpack-plugin] will handle copying the files from the static directory into the distribution directory during the build.

You may have noticed that in both cases the data.js and images are being loaded using a relative URL (without a leading slash, or a protocol/hostname). This is a small but important detail that allows the application to be moved around from zip file, to thumb drive to disk drive, without needing paths to be rewritten. The images and data are loaded relative to where the index.html was initially loaded from.

In addition many React applications these days use the new History API in modern browsers. This lets your application have what appear to be normal URLs structured with slashes which you can manage with react-router. However slash URLs are problematic in a offline static site for a couple reasons. The first is that there is no server so you can’t tweak it to respond to any request with the HTML file I included above that will bootstrap your application. This means that if you reload a page you will get a 404 not found.

The other problem is that while the History API works fine for an offline application, the relative links to bundle.js, data.js and the images will break because they will be relative to the new URL.

Fortunately there is a simple solution to this: manage the URLs the way we did before the History API, using hash fragments. So instead of:

file:///lakeland-images/index.html/items/123

you’ll have:

file:///lake-landimages/index.html#/items/123

This way the browser will look to load static/data.js from file:///lakeland-images/ instead of file://lakeland-images/index.html/items/. Luckily react-router lets you simply import and use createHashHistory in your application initialization and it will write these URLs for you.

It’s important to reiterate that this was an experiment. We don’t know if the LCHP is interested in us developing this approach further. But regardless I thought it was worth just jotting down these notes for others considering similar approaches with React and minimal computing applications.

I’ll just close by saying in some ways it seems counter-intuitive to refer to a React application as an example of minimal computing. After working with React off and on for a couple years it still seems quite complicated when you throw Redux into the mix. Assembling the boilerplate needed to get started is still tedious, unless you use create-react-app which is a smart way to start. It’s much easier to get Jekyll out of the box and start using it.

But static sites ultimately rely on a web browser, which is an insanely complicated piece of code. With a few exceptions (e.g. Flash) browsers have been pretty good at maintaining backwards compatibility as they’ve evolved along with the web. JavaScript is so central to a functioning web it’s difficult to imagine it going away. So really this approach is a bet on the browser and the web remaining viable. Whatever happens to the web and the Internet we can probably rely on some form of browser continuing to exist as functioning software, either natively, or in some sort of emulator, for a good time to come…or at least longer than the typical website will be kept online.

Ed Summers: Offline Sites with React

Wed, 2018-01-10 05:00

This post contains some brief notes about building offline, static web sites using React, in order to further the objectives of minimal computing. But before I go there, first let me give you a little background…

The Lakeland Community Heritage Project is an effort to collect, preserve, and interpret the heritage and history of African Americans who have lived in the Lakeland community of Prince George’s County, Maryland since the late 19th century. This effort has been led by members of the Lakeland community, with help from students from the University of Maryland working with Professor Mary Sies. As part of the work they’ve collected photographs, maps, deeds, and oral histories and published them in an Omeka instance at lakeland.umd.edu. As Mary is wrapping up the UMD side of the project she has become increasingly interested in making these resources available and useful to the community of Lakeland, rather than leaving them embedded in a software application that is running on servers owned by UMD.

Sneakernet

Recently MITH has been in conversation with LCHP to help explore ways that this data stored in Omeka could be meaningfully transferred to the Lakeland community. This has involved first getting the Omeka site back online, since it partially fell offline as the result of some infrastructure migrations at UMD. We also have been collecting and inventorying disk drives of content used by the students as they have collected and transfer devices over the years.

One relatively small experiment I tried recently was to extract all the images and their metadata from Omeka to create a very simple visual display of the images that could run in a browser without an Internet connection. The point was to provide a generous interface from which community members attending a meeting could browse content quickly and potentially take it away with them. Since this meeting was in a environment where there wasn’t stable network access it was important that for the content to be browsed without an Internet connection. We also wanted to be able to put the application on a thumb drive, and move it around as a zip file, which could also ultimately allow us to make it available to community members independent of the files needing to be kept online on the Internet at a particular location. Basically we wanted the site to be on the Sneakernet instead of the Internet.

Static Data

The first step was getting all the data out of Omeka. This was a simple matter with Omeka’s very clean, straightforward and well documented REST API. Unfortunately, LCHP was running an older version of Omeka (v1.3.1) that needed to be upgraded to 2.x before the API was available. The upgrade process itself leapfrogged a bunch of versions so I wasn’t surprised to run into a small snag, which I was fortunately able to fix myself (go team open source).

I wrote a small utility named nyaraka that talks to Omeka and downloads all the items (metadata and files) as well as the collections they are a part of, and places them on the filesystem. This was a fairly straightforward process because Omeka’s database ensures the one-to-many-relationships between a site and its collections, items, and files which means they can be written to the filesystem in a structured way:

lakeland.umd.edu lakeland.umd.edu/site.json lakeland.umd.edu/collections lakeland.umd.edu/collections/1 lakeland.umd.edu/collections/1/collection.json lakeland.umd.edu/collections/1/items lakeland.umd.edu/collections/1/items/1 lakeland.umd.edu/collections/1/items/1/item.json lakeland.umd.edu/collections/1/items/1/files lakeland.umd.edu/collections/1/items/1/files/1 lakeland.umd.edu/collections/1/items/1/files/1/fullsize.jpg lakeland.umd.edu/collections/1/items/1/files/1/original.jpg lakeland.umd.edu/collections/1/items/1/files/1/file.json lakeland.umd.edu/collections/1/items/1/files/1/thumbnail.jpg lakeland.umd.edu/collections/1/items/1/files/1/square_thumbnail.jpg

This post was really meant to be about building a static site with React, and not about extracting data from Omeka. But this filesystem data is kinda like a static site, right? It was really just laying the foundation for the next step of building the static site application, since I didn’t really want to keep downloading content from the API as I was developing the application. Having all the content local made it easier to introspect with command line tools like grep, find and jq as I was building the static site.

React

Before I get into a few of the details here’s a short video that shows what the finished static site looked like:

Lakeland Static Site Demo from Ed Summers on Vimeo.

You can see that content is loaded dynamically as the user scrolls down the page. Lots of content is presented at once in random orderings each time to encourage serendipitous connections between items. Items can also be filtered based on type (buildings, people and documents). If you want to check it out for yourself download and unzip this zip file and open up the index.html in the directory that is created. Go ahead and turn off your wi-fi connection so you can see it working without an Internet connection.

When building static sites in the past I’ve often reached for Jekyll but this time I was interested in putting together a small client side application that could be run offline. This shouldn’t be seen as an either/or situation: it would be quite natural to create a static site using Jekyll that embeds a React application within it. But for the sake of experimentation I wanted to see how far I could go just using React.

Ever since I first saw Twitter’s personal archive download (aka Grailbird) I’ve been thinking about the potential of offline web applications to function as little time capsules for web content that can live independently of the Internet. Grailbird lets you view your Twitter content offline in a dynamic web application where you can view your tweets over time. Over the past few years the minimal computing movement has been gaining traction in the digital humanities community, as a way to ethically and sustainably deliver web content without needing to make promises about keeping it online forever, or 25 years (whichever comes first).

React seemed like a natural fit because I’ve been using it for the past year on another project. React offers a rich ecosystem of tools, plugins and libraries like Redux for building complex client side apps. The downside of using React is that it is not as easy for people to set up out of the box, or for changing over time if you you aren’t an experienced software developer. With Jekyll it’s not simple, but at least its relatively easy to dive in and edit HTML and CSS. But on the plus side for React, if you really want to deliver an unchanging, finished (static) artifact, then maybe these things don’t really matter so much?

At any rate it seemed like a worthwhile experiment. So here are a few tidbits I learned when bending React to the purposes of minimal computing:

Static Database

The first is to build a static representation of your data. Many React applications rely on an external REST API being available. This type of dependency is an obvious no-no for minimal computing applications, because an Internet connection is needed, and someone needs to keep the REST API service up and running constantly, which is infrastructure and costs money.

One way of getting around this is to take all the structured data your application needs and bundle it up as a single file. You can see the one I created for my application here. As you can see it contains metadata for all the photographs expressed as JSON. But the the JSON itself is part of a global JavaScript variable declaration which allows it to be loaded by the browser without relying on an asynchronous HTTP call. Browsers need to limit the ability of JavaScript to fetch files from the filesystem for security reasons. This JavaScript file is loaded immediately by your web browser when it loads the index.html, and the app can access it globally as window.DATA. Think of it like a static read-only, in memory database for your application. The wrapping HTML will look as simple as something like this:

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Lakeland Community Heritage Project</title> <script src="static/data.js"></script> </head> <body> <div id="app"></div> <script type="text/javascript" src="bundle.js"></script> </body> </html>

Update: Another more scalable approach to this suggested by Alex Gil after this post went live, is to try using an in browser database like PouchDB. When combined with Lunr for search this could make for quite a rich and extensible data layer for minimal computing browser apps.

Static Images

Similarly, the image files need to be available locally. I took all the images and saved them into a directory I named static, and named the file using a unique item id (from Omeka) which allowed the metadata and data to be conceptually linked:

lakeland-images/static/{omeka-id}/fullsize.jpg

My React application has an Image component that simply renders the image along with a caption using the <figure>, <img> <figcaption> elements.

  • image
class Image extends Component { render() { return ( <Link to={'/item/' + this.props.item.id + '/'}> <figure className={style.Image}> <img src={'static/' + this.props.item.id + '/fullsize.jpg'} /> <figcaption> {this.props.item.title} </figcaption> </figure> </Link> ) } }

It’s pretty common to use webpack to build React applications, and the copy-webpack-plugin will handle copying the files from the static directory into the distribution directory during the build.

URLs

You may have noticed that in both cases the data.js and images are being loaded using a relative URL (without a leading slash, or a protocol/hostname). This is a small but important detail that allows the application to be moved around from zip file, to thumb drive to disk drive, without needing paths to be rewritten. The images and data are loaded relative to where the index.html was initially loaded from.

In addition many React applications these days use the new History API in modern browsers. This lets your application have what appear to be normal URLs structured with slashes which you can manage with react-router. However slash URLs are problematic in a offline static site for a couple reasons. The first is that there is no server so you can’t tweak it to respond to any request with the HTML file I included above that will bootstrap your application. This means that if you reload a page you will get a 404 not found.

The other problem is that while the History API works fine for an offline application, the relative links to bundle.js, data.js and the images will break because they will be relative to the new URL.

Fortunately there is a simple solution to this: manage the URLs the way we did before the History API, using hash fragments. So instead of:

file:///lakeland-images/index.html/items/123

you’ll have:

file:///lake-landimages/index.html#/items/123

This way the browser will look to load static/data.js from file:///lakeland-images/ instead of file://lakeland-images/index.html/items/. Luckily react-router lets you simply import and use createHashHistory in your application initialization and it will write these URLs for you.

Minimal?

It’s important to reiterate that this was an experiment. We don’t know if the LCHP is interested in us developing this approach further. But regardless I thought it was worth just jotting down these notes for others considering similar approaches with React and minimal computing applications.

I’ll just close by saying in some ways it seems counter-intuitive to refer to a React application as an example of minimal computing. As Alex Gil says:

In general we can say that minimal computing is the application of minimalist principles to computing. In reality, though, minimal computing is in the eye of the beholder.

After working with React off and on for a couple years it still seems quite complicated–especially when you throw Redux into the mix. Assembling the boilerplate needed to get started is still tedious, unless you use create-react-app which is a smart way to start. By comparison it’s much easier to get Jekyll out of the box and start using it. But, if the goal is truly to deliver something static and unchanging, then perhaps this up front investment in time is not so significant.

Static sites, thus conceived, ultimately rely on a web browser, which are insanely complicated pieces of code. With a few exceptions (e.g. Flash) browsers have been pretty good at maintaining backwards compatibility as they’ve evolved along with the web. JavaScript is so central to a functioning web it’s difficult to imagine it going away. So really this approach is a bet on the browser and the web remaining viable. Whatever happens to the web and the Internet we can probably rely on some form of browser continuing to exist as functioning software, either natively, or in some sort of emulator, for a good time to come…or at least longer than the typical website is kept online.

Many thanks to Raff Viglianti, Trevor Muñoz and Stephanie Sapienza who helped frame and explore many of the ideas expressed in this post.

Lucidworks: How to Handle Meltdown and Spectre for Solr

Tue, 2018-01-09 21:23

Recent news reports have revealed that most Intel processors are vulnerable to a security flaw that allows processes to read the memory of other processes running on the same Intel CPU. At this time it appears that some of the flaws do appear to affect AMD CPUs as well, but the more serious performance-impacting do not. Because cloud providers use Intel CPUs and virtualization to support multiple clients on the same VM, this can be especially troubling to multi-tenant hosting environments such as Amazon Web Services. However, Google has stated that it believes that it has successfully mitigated the flaw in its Google Cloud Platform, although some user patches are required.

It is important to understand the risk of this bug, but not to overestimate it. To operate, the exploit needs to be already running inside of software in your computer. It does not allow anyone on the internet to take control of your server over http, for instance. If there is an existing vulnerability, it does make it worse as the vulnerable process might be used to read memory from other processes.

There are already operating system patches out for this bug. Unfortunately, the operating system level patch for this bug requires creating a software isolation layer which will have a significant impact on performance. Estimates are that its impact can be between 5-30%. Every piece of software running in the Application space may be affected. The impact will vary, and each application will need to be performance and load tested.

Some customers running on their own internal hardware may decide that, given the vector of the exploit and the performance cost of the fix, they may decide to delay applying it. Other customers running on more vulnerable environments or with more specific security concerns may need to apply it and deal with the performance implications.

Fortunately for Lucidworks customers, Fusion and its open source Solr core are especially adept at scale. For high capacity systems, the most cost-effective solution may be to add a number of additional nodes to allow for the increased weight of the operating system. Additionally, by tuning the Fusion pipeline it may be possible to reduce the number of calls necessary to perform queries or parallelize some calls thus compensating for the loss of performance through optimization in other areas.

In either case Lucidworks is here for our customers. If you’re considering applying the fix, please reach out to your account manager to understand ways that we can help mitigate any issues you may have. If you do not currently have or know your account manager, please file a support request or use the Lucidworks contact us page.

The post How to Handle Meltdown and Spectre for Solr appeared first on Lucidworks.

District Dispatch: Improving Digital Equity: The civil rights priority libraries and school technology leaders share

Mon, 2018-01-08 18:46

This blog post, written by Consortium for School Networking (CoSN) CEO Keith Krueger, is first in a series of occasional posts contributed by leaders from coalition partners and other public interest groups that ALA’s Washington Office works closely with. Whatever the policy – copyright, education, technology, to name just a few – we depend on relationships with other organizations to influence legislation, policy and regulatory issues of importance to the library field and the public.

Learning has gone digital. Students access information, complete their homework, take online courses and communicate with technology and the internet.

The Consortium for School Networking is a longtime ally of the American Library Association on issues related to education and telecommunications, especially in advocating for a robust federal E-rate program.

Digital equity is one of today’s most pressing civil rights issues. Robust broadband and Wi-Fi, both at school and at home, are essential learning tools. Addressing digital equity – sometimes called the “homework gap” – is core to CoSN’s vision, and a shared value with our colleagues at ALA.

That is why the E-rate program has been so important for the past 20 years, connecting classrooms and libraries to the internet. Two years ago the Federal Communications Commission (FCC) modernized E-rate by increasing funding by 60 percent and focused on broadband and Wi-Fi. This action made a difference. CoSN’s 2017 Infrastructure Survey found that the majority of U.S. school districts (85 percent) are fully meeting the FCC’s short-term goal for broadband connectivity of 100 Mbps per 1,000 students.

While this is tremendous progress, we have not completed the job. Recurring costs remain the most significant barrier for schools in their efforts to increase connectivity. More than half of school districts reported that none of their schools met the FCC’s long-term broadband connectivity goal of 1 Gbps per 1,000 students. The situation is more critical in rural areas where nearly 60 percent of all districts receive one or no bids for broadband services. This lack of competition remains a significant burden for rural schools.

And learning doesn’t stop at the school door. CoSN has demonstrated how school systems can work with mayors, libraries, the business community and other local partners to address digital equity. In CoSN’s Digital Equity Action Toolkit, we show how communities are putting Wi-Fi on school buses, mapping out free Wi-Fi homework access from area businesses, loaning Wi-Fi hotspots to low-income families and working to ensure that broadband offerings are not redlining low-income neighborhoods. A great example is the innovative partnership that Charlotte Mecklenburg Schools has established with the Mecklenburg Library System in North Carolina. CoSN also partners with ALA to fight the FCC’s misguided plans to roll back the Lifeline broadband offerings.

Of course, the most serious digital gap is ensuring that all students, regardless of their family or zip code, have the skills to use these new tools effectively. We know that digital literacy and citizenship are essential skills for a civil society and safer world. Librarians have always been on the vanguard of that work, and our education technology leaders are their natural allies. Learn about these efforts and what more we can do by attending CoSN/UNESCO’s Global Symposium on Educating for Digital Citizenship in Washington, DC on March 12, 2018.

As we start 2018, I am often asked to predict the future. What technologies or trends are most important in schools? CoSN annually co-produces the Horizon K-12 report, and I strongly encourage you to read the 2017 Horizon K-12 Report to see how emerging technologies are impacting learning in the near horizons.

However, my top recommendation is that education and library leaders focus on “inventing” the future. Working together, let’s focus on enabling learning where each student can personalize their education – and where digital technologies close gaps rather than make them larger.

Keith R. Krueger, CAE, has been CEO of the Consortium for School Networking for the past twenty-three years. He has had a strong background in working with libraries, including being the first Executive Director the Friends of the National Library of Medicine at NIH.

The post Improving Digital Equity: The civil rights priority libraries and school technology leaders share appeared first on District Dispatch.

Lucidworks: Looking Back at Search in 2017

Mon, 2018-01-08 18:22

2017 was a big year in search technology. As we chronicled last month in our rundown of trends for 2018, search technology has moved far beyond just keywords, faceting, and scale. But let’s take a look back  at the trends that have continued through the past year.

Continued Industry Consolidation

We’ve continued to see consolidation with the exit of Google Search Appliance from the market. Now organizations are re-evaluating technologies, like Endeca, that have been acquired by vendors and products like FAST and have been embedded in other products. Ecommerce companies that have traditionally thought of search as a primary part of what they do, have already migrated to newer systems. In 2017, IT companies stuck with maintaining technology not intended for today’s scale, moved away from legacy technologies in earnest.

Meanwhile other vendors have been downsizing staff but continuing to support the locked-in long tail installation base. You can figure out which ones by looking at current vs past employees on LinkedIn. In 2017, customers started to get wise. No one wants to be the last one on a sinking ship.

In this same time period, I’m proud to say Lucidworks continued to grow in terms of code written, revenue, employees, and even acquisitions.

Technology and Data Consolidation

Not long ago, larger companies tended to have more than one IT department and each of those individual departments had their own search solution. So there would be a search application for sales which was deployed by the sales IT group and then another search app for the HR department deployed by their IT group and then probably yet another search solution for the product teams built by their IT group. With IT consolidation, an ever-increasing mountain of data, and new integrated business practices, there is a greater need than ever to consolidate search technology. There are still single source solutions (especially in sales) but last year, IT departments continued to push to centralize on one search technology.

Meanwhile there are more data sources than ever. There are still traditional sources like Oracle RDBMS, Sharepoint, and file shares. However, there are newer data sources to contend with including NoSQL databases, Slack, and SaaS solutions. With the push towards digital business, and turning information into answers, it is critical to build a common search core to pull data from multiple sources. In 2017, we saw continued movement in this direction.

Scale Out

Virtualization replaced bare metal for most companies years ago. The trend was the joining of the private and public cloud. This move continued against a business backdrop of continued globalization and a technology backdrop of continued mobilization. In 2017, modern companies often conducted business all over the world from palm-sized devices, tablets and laptops.

Meanwhile there are new forms of data emerging. Customers now generate telemetry from their mobile devices. Buildings can now generate everything from presence data to environmental and security information. Factories and brick-and-mortar storefronts now generate data forming the so-called Internet of Things. With machine learning and search technology, companies are now starting to make better use of this sort of data. These trends were nascent in 2017, but still observable.

In a virtualized, cloud-based global world where data is generated from everything everywhere all of the time, companies need search technology that can handle the load whenever, wherever, and however it comes. Old client-server technology was no longer enough to handle these demands. In 2017, horizontal scale was no longer a luxury, but a necessity.

Personalization and Targeting

2017 saw simple search start to abate. While AI and machine learning technologies are relatively new to the search market, some of the more mature tools saw widespread deployment. Many organizations deployed search technology that could capture clicks, queries, and purchases. Modern search technology use this information to provide better, more personalized results.

Collaborative filtering (boosting the top clicked item for a given query) is the most common optimization followed by similarity (MoreLikeThis) but we also saw companies start to deploy Machine Learning powered recommendations especially in digital commerce. These recommendations use information about what a user or similar users have done to suggest choices.

Mainly Custom Apps, but The Rise of Twigkit

In 2017 most companies were still writing their own custom search apps. Unlike previous years, these apps are very AJAX/JavasScript-y/dynamic. Frameworks like Angular ruled the search application market. At the same time, savvy organizations realized that writing yet another search box with typeahead was a waste of time and they started using pre-built components. One of the best toolboxes of pre-tested pre-built components was Twigkit.

Twigkit had been around since 2009 and was a widely respected force in the search industry with relationships with all of the major vendors and customers all over the world. Lucidworks had been recommending it to our customers and even using it in some deployments so we decided to acquire the company and accelerate the technology. The future of Twigkit was announced as at our annual conference last September with the technology becoming part of Lucidworks App Studio.

Happy New Year

Goodbye to 2017 but hello to 2018. It was a great year for search, but not as good as what is coming. If you want to see what’s on the way in 2018, here’s my take on what to watch for in the coming year.

If you find yourself behind the curve, Lucidworks Fusion and Lucidworks App Studio are a great way to acquire the technologies you need to catch up. You might also sign up for Fusion Cloud.

The post Looking Back at Search in 2017 appeared first on Lucidworks.

David Rosenthal: The $2B Joke

Mon, 2018-01-08 18:00
SourceEverything you need to know about cryptocurrency is in Timothy B. Lee's
Remember Dogecoin? The joke currency soared to $2 billion this weekend:
"Nobody was supposed to take Dogecoin seriously. Back in 2013, a couple of guys created a new cryptocurrency inspired by the "doge" meme, which features a Shiba Inu dog making excited but ungrammatical declarations. ... At the start of 2017, the value of all Dogecoins in circulation was around $20 million. ... Then on Saturday the value hit $2 billion. ... "It says a lot about the state of the cryptocurrency space in general that a currency with a dog on it which hasn't released a software update in over 2 years has a $1B+ market cap," [cofounder] Palmer told Coindesk last week. So blockchain, such bubble. Up 100x in a year. Are you HODL-ing or getting your money out?

District Dispatch: Full Text of FCC’s order rolling back Net Neutrality released

Mon, 2018-01-08 17:14

At the end of last week, the FCC released the final order to roll back 2015’s Net Neutrality rules. The 539-page order has few changes from the draft first circulated in November and voted on along party lines by the Republican-controlled commission on December 14. ALA is working with allies to encourage Congress to overturn the FCC’s egregious action.

Procedurally, we are still waiting for the order to appear in the Federal Register and to also be delivered to Congress. These actions will kick off timing for members of Congress to have their shot at stopping the FCC. Right after the vote, members of Congress announced their intent to attempt to nullify the FCC’s actions. The Congressional Review Act (CRA) gives Congress the ability and authority to do this; the CRA allows Congress to review a new agency regulation (in this case, Pai’s “Restoring Internet Freedom” order) and pass a Joint Resolution of Disapproval to overrule it. This would repeal last weeks FCC order, restoring the 2015 Open Internet Order and keeping net neutrality protections in place, and the internet working the way it does now. This Congressional action would be subject to Presidential approval.

Senator Ed Markey (D-MA) is leading the charge and has announced his intention to introduce a resolution to overturn the FCC’s decision using the authority granted by the CRA. Democratic leadership in both Houses have urged their colleagues to support and Sen. Claire McCaskill (D-MO) has just tweeted that she will be the 30th Senator to sign on to the effort.

We will continue to update you on the activities and other developments as we continue to work to preserve a neutral internet. For now, you can email your members of Congress today and ask them to support the CRA to repeal the recent FCC action and restore the 2015 Open Internet Order protections.

The post Full Text of FCC’s order rolling back Net Neutrality released appeared first on District Dispatch.

David Rosenthal: Digital Preservation Declaration of Shared Values

Mon, 2018-01-08 16:00
I'd like to draw your attention to the effort underway by a number of organizations active in digital preservation to agree on a Digital Preservation Declaration of Shared Values:
The digital preservation landscape is one of a multitude of choices that vary widely in terms of purpose, scale, cost, and complexity. Over the past year a group of collaborating organizations united in the commitment to digital preservation have come together to explore how we can better communicate with each other and assist members of the wider community as they negotiate this complicated landscape.

As an initial effort, the group drafted a Digital Preservation Declaration of Shared Values that is now being released for community comment. The document is available here: https://docs.google.com/document/d/1cL-g_X42J4p7d8H7O9YiuDD4-KCnRUllTC2s...

The comment period will be open until March 1st, 2018. In addition, we welcome suggestions from the community for next steps that would be beneficial as we work together.The list of shared values (Collaboration, Affordability, Availability, Inclusiveness, Diversity, Portability/Interoperability, Transparency/information sharing, Accountability, Stewardship Continuity, Advocacy, Empowerment) includes several to which adherence in the past hasn't been great.

There are already good comments on the draft. Having more input, and input from a broader range of institutions, would help this potentially important initiative.

HangingTogether: The OCLC Research Library Partnership: the challenges of developing scholarly services in a decentralized landscape

Sun, 2018-01-07 14:00

On November 1st, North American members of the OCLC Research Library Partnership came together in Baltimore to engage in day-long discussion on three concurrent topics:

My colleagues Karen Smith-Yoshimura and Merrilee Proffitt have previously written on the first two discussions.

Attendees at the North American meeting of OCLC Research Library Partners, 1 Nov 2017

Scholarly communications in a complex institutional environment

While recognizing the landscape of evolving scholarly services and workflows is extremely broad, my colleague Roy Tennant and I chose to frame our conversation around three distinct areas of research university and library engagement:

  • research data management (RDM)
  • research information management (RIM)
  • institutional repositories (IR)

We discussed each focus area for libraries singly but also explored how these services are increasingly intersecting, driven by needs for improved workflows for researchers—and institutions. These service areas may also intersect with non-library services such as annual academic review workflows and institutional reporting, as well as with the growing numbers of resources researchers are using to independently manage their citations, lab notebooks, and more.

We engaged in an in-depth discussion utilizing two recent OCLC Research reports:

and also looked to the recent report of a CNI Executive Roundtable, Rethinking Institutional Repository Strategies, as well as a short 2008 blog by Lorcan Dempsey on “Stitching Costs”, or the costs of integrating services and workflows.

While the topic of our conversation was scholarly communications workflows, services, and interoperability, the theme was the challenge of libraries responding to enterprise-wide needs. Participating institutions represented a variety of states of exploration and implementation, particularly in relation to emerging areas like RDM and RIM, and librarians described their own experiences and challenges, as they seek to build relationships with other institutional stakeholders that have different goals, priorities, and practices. Every institution is decentralized, and each institution is unique in its organization. Relationship building and communications across silos is time consuming and often political, and efforts to develop meaningful collaborations can be stymied by individual personalities or limited knowledge of another unit’s priorities.

The theme of our conversation was the challenge of libraries responding to enterprise-wide needs

This is particularly true for RIM and RDM services, as there are a great many institutional stakeholders, including academic colleges and departments, the research office (usually led by a VP of research), institutional research, graduate school, registrar, human resources,  tech transfer, and public affairs.

Libraries must work collaboratively with other stakeholders across the institution

Roger Schonfeld describes this situation well in the recent Ithaka S+R issue brief, Big Deal: Should Universities Outsource More Core Research Infrastructure?, which explores the rapid development of research workflow tools being adopted by researchers,

“Today almost no university is positioned to address its core interests here in any truly coherent way. The reason is essentially structural. There is no individual or organization within any university . . . that is responsible for the full suite of research workflow services. . . . No campus office or organization has responsibility for anything other than a subset of the system.”

What can you do to learn more?

Libraries are important stakeholders in these conversations but will be ineffectual if they try to act alone. It is increasingly important for librarians to understand the goals and activities of other university stakeholders as well as to succinctly and persuasively communicate their own value proposition. I want to encourage readers to explore a couple of OCLC Research publications that address these challenges:

Join us

Our interactions with OCLC Research Library Partners helps inform our future research plans, as we learn more about the challenges, pain points, and ambitions of our partner libraries. We will be continuing this conversation with our UK and European Research Library Partnership members on February 19 at the University of Edinburgh. We also want to encourage ALL research institutions to share about their practices and collaborations by participating in our Survey of Research Information Management Practices, conducted in collaboration with euroCRIS, which remains open through 31 January 2018.

Mark Matienzo: Notes on ITLP Workshop 1 readings

Sun, 2018-01-07 05:43

I completed my reading and viewing assignments for my cohort’s IT Leadership Program Workshop 1 (January 9-January 11 at UC Berkeley.) This is a brief set of notes for my own use about how all of them tie together.

  • Leaders are made not born and leadership skills don’t always transfer across contexts.
  • Leadership should be reflected in the culture of the organization; developing leaders, even in medium- and lower-level employees is a key part of that. Encourage them to take this on, and protect them when they step up. Leading up (i.e., leading your boss) should be expected, too.
  • Be aware of where you are looking to anticipate change.
  • Don’t be head down; you need to retain focus both of the work around you and broader context. You are responsible for framing problems, not solving them exclusively.
  • Great leaders have diverse networks and the ability to develop relationships with people different from them.
  • Self-mastery is the key to leadership. Great leaders model behavior (poise; emotional capacity) and define direction. Retaining empathy, humanity, dignity, passion, connection to other people in environment of transactional interaction are all hard.
  • Conflict and feeling pressure is necessary. Don’t smooth over either too much; instead, regulate it.
  • Be willing to look at taking large leaps, but take the time to understand them. At the same time, don’t wed yourself to long-term strategic planning processes that might be blocks.
  • Inspire people to move beyond their own perceived limitations and encourage others to break with convention when necessary.

And the readings and videos:

David Rosenthal: Meltdown &amp; Spectre

Fri, 2018-01-05 18:00
This hasn't been a good few months for Intel. I wrote in November about the vulnerabilities in their Management Engine. Now they, and other CPU manufacturers are facing Meltdown and Spectre, three major vulnerabilities caused by side-effects of speculative execution. The release of these vulnerabilities was rushed and the initial reaction less than adequate.

The three vulnerabilties are very serious but mitigations are in place and appear to be less costly than reports focused on the worst-case would lead you to believe. Below the fold, I look at the reaction, explain what speculative execution means, and point to the best explanation I've found of where the vulnerabilities come from and what the mitigations do.

Although CPUs from AMD and ARM are also affected, Intel's initial response was pathetic, as Peter Bright reports at Ars Technica:
The company's initial statement, produced on Wednesday, was a masterpiece of obfuscation. It contains many statements that are technically true—for example, "these exploits do not have the potential to corrupt, modify, or delete data"—but utterly beside the point. Nobody claimed otherwise! The statement doesn't distinguish between Meltdown—a flaw that Intel's biggest competitor, AMD, appears to have dodged—and Spectre and, hence, fails to demonstrate the unequal impact on the different company's products.In addition, Intel's CEO is suspected of insider trading on information about these vulnerabilities:
Brian Krzanich, chief executive officer of Intel, sold millions of dollars' worth of Intel stock—all he could part with under corporate bylaws—after Intel learned of Meltdown and Spectre, two related families of security flaws in Intel processors.Not a good look for Intel. Nor for AMD:
AMD's response has a lot less detail. AMD's chips aren't believed susceptible to the Meltdown flaw at all. The company also says (vaguely) that it should be less susceptible to the branch prediction attack.

The array bounds problem has, however, been demonstrated on AMD systems, and for that, AMD is suggesting a very different solution from that of Intel: specifically, operating system patches. It's not clear what these might be—while Intel released awful PR, it also produced a good whitepaper, whereas AMD so far has only offered PR—and the fact that it contradicts both Intel (and, as we'll see later, ARM's) response is very peculiar.The public release of details about Meltdown and Spectre was rushed, as developers not read-in to the problem started figuring out what was going on. This may have been due to an AMD engineer's comment:
Just after Christmas, an AMD developer contributed a Linux patch that excluded AMD chips from the Meltdown mitigation. In the note with that patch, the developer wrote, "The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault."What is speculative execution? Some things a CPU does, such as fetching a cache miss from main memory, take hundreds of clock cycles. It is a waste to stop the CPU while it waits for these operations to complete. So the CPU continues to execute "speculatively". For example, it can guess which way it is likely to go at a branch, and head off down that path ("branch prediction"). If it is right, it has saved a lot of time. If it is wrong the processor state accumulated during the speculative execution has to be hidden from the real program.

Modern processors have lots of hardware supporting speculative execution. Meltdown and Spectre are both due to cases where the side-effects of speculative execution on this hardware are not completely hidden. They can be revealed, for example, by careful timing of operations of the real CPU which the speculative state can cause to take longer or shorter than normal.

The clearest explanation of the three vulnerabilities I've seen is from Matt Linton and Pat Parseghian on Google's Security blog:
Project Zero discussed three variants of speculative execution attack. There is no single fix for all three attack variants; each requires protection independently.


  • Variant 1 (CVE-2017-5753), “bounds check bypass.” This vulnerability affects specific sequences within compiled applications, which must be addressed on a per-binary basis.
  • Variant 2 (CVE-2017-5715), “branch target injection”. This variant may either be fixed by a CPU microcode update from the CPU vendor, or by applying a software mitigation technique called “Retpoline” to binaries where concern about information leakage is present. This mitigation may be applied to the operating system kernel, system programs and libraries, and individual software programs, as needed.
  • Variant 3 (CVE-2017-5754), “rogue data cache load.” This may require patching the system’s operating system. For Linux there is a patchset called KPTI (Kernel Page Table Isolation) that helps mitigate Variant 3. Other operating systems may implement similar protections - check with your vendor for specifics.
The detailed description in the table below the section I quoted is clear and comprehensive.

These vulnerabilities can be exploited only by running code on the local system. Alas, these days JavaScript means anyone can do that, so ensuring that your browser is up-to-date is very important. As I write I believe that systems rebooted on up-to-date Linux and Windows systems should be protected against both Meltdown and the known exploits for Spectre, and up-to-date Apple systems should have partial protection.

There will be some cases where these fixes will degrade performance significantly, but Google and others report that they aren't common in practice.

It is somewhat worrisome that some of the mitigations depend on ensuring that user binaries do not contain specific code sequences, since there are likely ways for malware to introduce such sequences.

District Dispatch: ALA and NCTET celebrate the 20th anniversary of E-rate

Fri, 2018-01-05 17:19
Sen. Ed Markey (D-MA) delivering opening remarks in May 2017 at the E-Rate briefing in the Russell Senate Office building, organized by the Education and Libraries Network Coalition and National Coalition for Technology in Education & Training.

This month, ALA is teaming up with the National Coalition for Technology in Education and Training (NCTET) to celebrate the 20th Anniversary of E-rate! Join us on Wednesday, January 24, along with advocates and beneficiaries, to discuss E-rate successes and potential at our E-rate Summit, scheduled from 3 to 5:00 p.m. in the Capitol Visitor Center, room 202/3. Anyone with an interest in learning more about E-rate is welcome to join.

The Summit will begin with a welcome from NCTET president, Amanda Karhuse, the National Association of Secondary School Principals followed with remarks delivered by Senator Ed Markey (D-MA). Evan Marwell, CEO of EducationSuperHighway, will open a panel session titled, “E-Rate Past, Present & Future.”

The panel will be moderated by Caitlin Emma, education reporter for Politico. Ms. Emma will engage panelists from the library and K12 schools’ community, covering the gamut of services these beneficiaries are able to provide because of the E-rate program. In addition to hearing from direct beneficiaries, a spokesperson for Maryland Governor Larry Hogan will address the overall impact to states. The afternoon will conclude with remarks from FCC Commissioner Jessica Rosenworcel.

The E-rate program makes it possible for libraries to offer critical and innovative community support across the nation—from urban and suburban centers to remote rural and tribal communities. ALA staff and panelists also look forward to discussing—alongside the significant strides made with E-rate modernization—areas where we can grow and improve when it comes to streamlining the program’s administration and meeting the needs and challenges of library applicants of diverse sizes and capacities.

ALA has been advocating for E-rate since 1996, most recently filing comments with the FCC this past October. This fall, over 140 librarians and libraries around the country shared moving stories about the profound impact of E-rate with the FCC.

You can read more about ALA’s work with E-rate or join us for an afternoon of telecommunications fun on Wednesday, January 24 from 3 to 5:00 p.m. in the Capitol Visitor Center.

The post ALA and NCTET celebrate the 20th anniversary of E-rate appeared first on District Dispatch.

Alf Eaton, Alf: Indexing Semantic Scholar's Open Research Corpus in Elasticsearch

Thu, 2018-01-04 14:18

Semantic Scholar publishes an Open Research Corpus dataset, which currently contains metadata for around 20 million research papers published since 1991.

  1. Create a DigitalOcean droplet using a "one-click apps" image for Docker on Ubuntu (3GB RAM, $15/month) and attach a 200GB data volume ($20/month).
  2. SSH into the instance and start an Elasticsearch cluster running in Docker.
  3. Install esbulk: VERSION=0.4.8; curl -L https://github.com/miku/esbulk/releases/download/v${VERSION}/esbulk_${VERSION}_amd64.deb -o esbulk.deb && dpkg -i esbulk.deb && rm esbulk.deb
  4. Fetch, unzip and import the Open Research Corpus dataset (inside the zip archive is a license.txt file and a gzipped, newline-delimited JSON file): VERSION=2017-10-30; curl -L https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/${VERSION}/papers-${VERSION}.zip -o papers.zip && unzip papers.zip && rm papers.zip && esbulk -index scholar -type paper -id id -purge -verbose -z < papers-${VERSION}.json.gz && rm papers-${VERSION}.json.gz
  5. While importing, index statistics can be viewed at http://${IP_ADDRESS}:9200/scholar/_stats?pretty
  6. After indexing, optimise the Elasticsearch index by merging into a single segment: curl -XPOST 'http://localhost:9200/scholar/_forcemerge?max_num_segments=1'
  7. (recommended) Use ufw to prevent external access to the Elasticsearch service and put a web service (e.g. an Express app) in front of it, mapping routes to Elasticsearch queries.

LITA: Jobs in Information Technology: January 3, 2018

Wed, 2018-01-03 20:12

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

New York University, KARMS Metadata Production & Management Supervisor, New York, NY

University of Rochester, River Campus Libraries, Desktop Support Specialist, Rochester, NY

Town and Country Public Library District, Library Director, Elburn, IL

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

District Dispatch: Campus Vote shares ideas, resources for libraries

Wed, 2018-01-03 17:42

Guest post by Kristen Muthig, Communications and Policy Manager at Fair Elections Legal Network and Campus Vote Project.

Libraries are recognized as vital assets to communities, providing a wide range of services and facilitating learning for all the communities they serve. Another contributing factor to the health and quality of a community is a civic-minded population. The easiest way residents participate in this engagement is through casting a ballot, not just for president or statewide offices, but for local officials and issues. Positions like county commissioners, city council and school board, and issues that revise zoning codes, build new library buildings, or fund senior and transportation services impact a person’s everyday life. Unfortunately, many voters are unaware of these elections or uninterested and don’t participate so their voices are left unheard. As part of their mission to educate and inform, libraries can further contribute to their communities if they provide patrons with voting resources thereby encouraging them to participate.

Low voter turnout rates can be attributed to a lack of confidence in the electoral system, a lack of knowledge in the issues and candidates, or a feeling that one vote won’t make a difference. In the November 2014 general election, 64.6% of Americans reported they were registered to vote according to the U.S. Census. The national voter turnout rate was just over 36%, the lowest rates since World War II. Voter turnout in local, off-year elections like 2017 is normally even lower.

One barrier to voting that can be easily addressed is a lack of knowledge in the registration and voting process including deadlines, voter ID requirements, and options to cast a ballot. As trusted resources and sources of learning, libraries and librarians are naturally ideal providers of this information.

Adding voter assistance to an ever-expanding list of items librarians need to be versed in does not mean they have to be experts in election law. Resources already exist through local elections offices, secretaries of state, and state and national partners. For example, Campus Vote Project has state-specific voter guides (for students and non-students) that include deadlines, links to local forms and resources, voter ID requirements and answers to common questions. Libraries can help eliminate this barrier that often discourages people from the ballot box by including voting information in facilities and in regular communications in the months and days leading up to an election.

Action can be as simple as including a link to a registration form in a newsletter, to being a polling location on Election Day. Below are some examples of how libraries can help.

Online resources

  • 37 states and DC offer online voter registration. Posting a link on the library’s homepage, including it in newsletters, and sharing it on social media give patrons multiple opportunities to see the forms and fill them out before the registration deadline.
  • Facilities could also temporarily dedicate a computer for online voter registration during the week leading up to the voter registration deadline
  • Patrons should also know online voter registration also often allows voters who are already registered to check and update their address or other information.
  • Absentee ballot applications or applications for a mail-in ballot are also available online through secretaries or state, local elections offices or this resource at U.S. Vote Foundation.

Newsletter and social media reminders

Work with local election officials to provide Election Day resources

  • Facilities like schools, town halls, fire stations and libraries are often used as Election Day polling locations because of their accessibility.
  • Each state and county may have different ways to set polling place locations but library systems can work with local election officials to see if making a library facility could be possible.
  • Another way libraries could help with Election Day is offering meeting rooms for local election officials to host poll worker training sessions. Well trained poll workers help elections run smoothly.

Librarians and staff are valuable resources just like the books, classes, and online materials libraries provide. As such, they would be ideal messengers for information that drives more patrons to register and vote, and it would be yet another way libraries contribute to the vitality of communities and patrons they serve.

If you have questions or are looking for registration and voting resources please contact Campus Vote Project: info@campusvoteproject.org

The post Campus Vote shares ideas, resources for libraries appeared first on District Dispatch.

Library of Congress: The Signal: New Year, New You: A Digital Scholarship Guide (in seven parts!)

Wed, 2018-01-03 17:34

To get 2018 going in a positive digital direction, we are releasing a guide for working with digital resources. Every Wednesday for the next seven weeks a new part of the guide will be released on The Signal. The guide covers what digital archives and digital humanities are trying to achieve, how to create digital documents, metadata and text-encoding, digital content and citation management, data cleaning methods, an introduction to working in the command line, text and visual analysis tools and techniques, and a list of people, blogs, and digital scholarship labs to follow to learn more about the topic. If you need all of this information immediately, feel free to binge on the full guide, available now in PDF. (No spoilers!)

This project is part of a larger exploration the Labs team is facilitating to create a reference service for working with collections as data at the Library of Congress. It is also part of the long-running Junior Fellows Summer Internship Program. Last summer, one of the interns we hosted, Samantha Herron from Swarthmore College, created this guide. We think Sam did a great job pulling together an introduction to what digital scholarship is and what you need to know to start planning this type of project, and we are thrilled to feature her work on The Signal. These blog posts also serve as an example of the kind of projects our Jr. Fellows work on and hopefully will inspire some of you recent grads to apply to our 2018 openings. Applications for this summer are due January 26, 2018. A modest stipend is provided.

Now, on to the guide:

Photo by Carol Highsmith. William De Leftwich Dodge’s mural Ambition. Library of Congress Thomas Jefferson Building, Washington, D.C.

Samantha Herron’s Digital Scholarship Resource Guide [part 1 of 7]

Why Digital Materials Matter

Increasingly, digital archives are emerging and expanding. The Library of Congress’ Digital Collections (and therefore its metadata) are always growing, always adding exciting new materials like photographs, newspapers, web archives, audio tracks, maps, and so on. Text, images, and physical objects formerly only available in-person as tangible, hold-able items can now be accessed online as plaintext, digital facsimiles, marc files, .jpgs, .pdfs, hypertext, audio, etc. In addition to making these materials more accessible from all over the world, different digital formats enable exciting, computer-assisted scholarship, projects, and art.

For example:

This is Jane Austen’s Pride and Prejudice.

This is Jane Austen’s Pride and Prejudice.

This is Jane Austen’s Pride and Prejudice.

This is Jane Austen’s Pride and Prejudice.

So is this.

Though all of the above links–a modern day paperback, a digital facsimile, a plaintext copy, an audio recording, and (the catalog record for) a bound copy of the second edition of the book–refer to the same text–Jane Austen’s Pride and Prejudice–the kinds of scholarship, arguments, and manipulations we can do using each version depends on its format.

A contemporary paperback copy of Pride and Prejudice is likely no help in understanding early 19th century bookbinding practices in London, but the 1813 version of the same may give us some insight. Or, a physical, print copy of the book tells us nothing about word frequency (unless we wanted to count each word up by hand), but a computer could easily return vocabulary density information about a digital text copy. Digital copies do not replace physical texts, but instead open up the text to new kinds of computer-assisted analyses. Digital texts and digital data are the basis for what is broadly termed ‘digital scholarship’, the use of software, code, the Internet, GIS, and so on towards new understandings and visualizations of information.

Example: In July 2017, the New York Times covered projects that used data to understand the continued popularity of Jane Austen’s novels, and put forth that the key may have been in the author’s word choice. The authors used a method called “principal components analysis” to graphically represent the presence of naturalism in Austen’s texts.  Another study covered by the article found that the author used a higher rate of intensifiers (very, much, so) than her contemporaries and that, in context, this spoke to Austen’s characteristic use of irony.

Computers can be used to see trends and patterns that go unnoticed by the human eye. This is especially helpful for projects like the Jane Austen case study above, where the corpus of interest (the set of texts/other media used for analysis)–in that case, 127 works of early British fiction–would be too labor-intensive, unwieldy, or inappropriate to read one by one for the purposes of the research.  Computers can “read” a lot of text very quickly, and tell us information about a corpus that would be impossible to pick up from a close reading of a few books.

Next in this series: Creating Digital Documents and Metadata + Text Encoding

LITA: December 2017 ITAL Issue Published

Wed, 2018-01-03 16:10

The December issue (volume 36, number 4) of Information Technology and Libraries (ITAL) is now available at:

https://ejournals.bc.edu/ojs/index.php/ital/index.

The December 2017 issue Reviewed Articles and Communications

“Mobile Website Use and Advanced Researchers: Understanding Library Users at a University Marine Sciences Branch Campus”
Mary J. Markland, Hannah Gascho Rempel, and Laurie Bridges

https://doi.org/10.6017/ital.v36i4.9953

This exploratory study examined the use of the Oregon State University Library’s  website via mobile devices by advanced researchers at an off-campus branch location. Branch campus affiliated faculty, staff, and graduate students were invited to participate in a survey to determine what their research behaviors are via mobile devices including frequency of mobile library website use and the tasks they were attempting to complete. Findings showed that while these advanced researchers do periodically use the library’s website via mobile devices, mobile devices are not the primary mode of searching for articles and books or for reading scholarly sources. Mobile devices are most frequently used for viewing the library website when these advanced researchers are at home or in transit. Results of this survey will be used to address knowledge gaps around library resources and research tools and to generate more ways to study advanced researchers’ use of library services via mobile devices.

“Metadata Provenance and Vulnerability”
Timothy Robert Hart and Denise de Vries

https://doi.org/10.6017/ital.v36i4.10146

The preservation of digital objects has become an urgent task in recent years as it has been realised that digital media have a short life span. The pace of technological change makes accessing these media more and more difficult. Digital preservation is accomplished by two main methods, migration and emulation. Migration has been proven to be a lossy method for many types of digital objects. Emulation is much more complex; however, it allows preserved digital objects to be rendered in their original format, which is especially important for complex types such as those made up of multiple dynamic files. Both methods rely on good metadata in order to maintain change history or construct an accurate representation of the required system environment. In this paper, we present our findings that show the vulnerability of metadata and how easily they can be lost and corrupted by everyday use. Furthermore, this paper aspires to raise awareness and to emphasise the necessity of caution and expertise when handling digital data by highlighting the importance of provenance metadata.

“Everyone’s Invited: A Website Usability Study Involving Multiple Library Stakeholders”
Elena Azadbakht, John Blair, and Lisa Jones

https://doi.org/10.6017/ital.v36i4.9959

This article describes a usability study of the University of Southern Mississippi Libraries’ website conducted in early 2016. The study involved six participants from each of four key user groups – undergraduate students, graduate students, faculty, and library employees – and consisted of six typical library search tasks such as finding a book and an article on a topic, locating a journal by title, and looking up hours of operation. Library employees and graduate students completed the study’s tasks most successfully, whereas undergraduate students performed fairly simple searches and relied on the Libraries’ discovery tool, Primo. The study’s results identified several problematic features that impacted each user group, including library employees. This increased internal buy-in for usability-related changes in a later website redesign.

Editorial Content

Submit Your Ideas
for contributions to ITAL to Ken Varnum, editor, at varnum@umich.edu with your proposal. Current formats are generally

  • Articles – original research or comprehensive and in-depth analyses, in the 3000-5000 word range.
  • Communications – brief research reports, technical findings, and case studies, in the 1000-3000 word range.

Questions or Comments?

For all other questions or comments related to LITA publications, contact LITA at (312) 280-4268 or Mark Beatty, mbeatty@ala.org

Terry Reese: ER&amp;L 2018: 101 Preconference session: Getting Started with MarcEdit 7: New Tools, New Models, New Automation Opportunities

Wed, 2018-01-03 00:20

Anticipating that I would finish up working on MarcEdit 7, I submitted and had accepted a preconference at ER&L 2018.  This will be one of the 4 hour pre-conference sessions, and will focus primarily on automating workflows in MarcEdit, with a special emphasis on some of the new tools in MarcEdit — specifically, the enhanced task management, the Clustering Tooling, the Linked Data work, and the new XML Profiler; though, in sessions like this — if you come with a question, you’ll leave with an answer.

If you are around in Austin, I hope to see you (either here, or around).

–tr

Preconference Description: https://erl18.sched.com/event/CkNY/w03-getting-started-with-marcedit-7-new-tools-new-models-new-automation-opportunities

DuraSpace News: Webinars: VIVO and the Role of Librarians

Wed, 2018-01-03 00:00

From Violeta Ilik, Head of Cataloging and Metadata Services, Stony Brook University Libraries

Evergreen ILS: 2017 Evergreen Community Survey

Tue, 2018-01-02 19:02

Happy New Year! Now that 2017 is behind us, the Evergreen Outreach Committee is beginning to gather highlight the year’s activities for the 2017 annual report.

We are asking Evergreen libraries and consortia to respond to our annual community survey. The survey is essential for capturing a snapshot of the libraries that make up the Evergreen community.

The survey takes about five minutes to complete and requires the following information that you may need to obtain from a report or another source:

  • Total service population
  • Total number of registered users
  • Total number of circulations in 2017

We are hoping to make the data-gathering process for these surveys a little easier in future years. The information you submit for this year’s survey will be available for you to review/update next year so that you will no longer need to re-submit the same information from year to year.

Whether you are a single, standalone library or a large consortium, we want to hear from you! We’ve received great responses over the past two years, but we also know we missed hearing from many Evergreen sites. Please make sure your library is represented.

The survey is available at https://goo.gl/forms/KgkcgLLiInGm6Pj93.

Pages