You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 1 hour 12 min ago

FOSS4Lib Recent Releases: AtoM - Access to Memory - 2.2.1

Thu, 2016-02-11 03:33
Package: AtoM - Access to MemoryRelease Date: Wednesday, February 10, 2016

Last updated February 10, 2016. Created by David Nind on February 10, 2016.
Log in to edit this page.

AtoM 2.2.1 is a bug fix release for AtoM 2.2.

Over a dozen user-reported bug tickets and a handful of feature tickets are closed - these improve some recurring problems with the previous release.

You can view more details from the release announcement page at:
https://wiki.accesstomemory.org/Releases/Release_announcements/Release_2...

OCLC Dev Network: Change to FAST Service

Wed, 2016-02-10 21:45

The original search interface for FAST will be retired on 31 March 2016.

LITA: Jobs in Information Technology: February 10, 2016

Wed, 2016-02-10 21:07

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Penn State University Libraries, Reference and Instruction Librarian, Knowledge Commons, University Park, PA

Penn State University Libraries, Diversity Residency Librarian Program, University Park, PA

Brown University, Senior Library Applications Developer, Providence, RI

Reaching Across Illinois Library System, Systems Supervisor, Burr Ridge, IL

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

District Dispatch: President Submits Budget Request to Congress

Wed, 2016-02-10 20:49

ALA is disappointed that President Obama’s budget would cut federal funds to libraries, which are on the front lines delivering services to all Americans.

President Obama sent to Congress yesterday his $4.23 trillion dollar FY17 budget request, which was greeted with uniform skepticism among many Republicans. This will be the President’s final budget to Congress before he leaves office.

Proponents for library funding are disappointed in the decreased funding request for the Library Services and Technology Act (LSTA) of over $500,000. A more severe cut of nearly $950,000 is recommended for the Grants to States program, which will result in a reduced grant allocation for every state. Also recommended for cuts is the Native American Library Services (cut over $200,000) and the Laura Bush 21st Century Librarian program (cut by $500,000). The National Leadership Libraries program, however, is recommended to receive an increase of $628,000 while funds to IMLS for Research, Evaluation, and Data Collection are increased by over $200,000.

Given the tight budget atmosphere in Washington and a search for programs to cut, the President’s recommended cuts for LSTA could not come at a more inopportune time.

In a statement to the media, ALA President Sari Feldman noted “we are truly disappointed that the President’s budget does not recognize the value libraries bring to our country. Every day America’s libraries create individual opportunity and community progress…by cutting federal funds to libraries, the President’s budget is making it more difficult for libraries to do their job.”

For Innovative Approaches to Literacy (IAL), the news was more encouraging. This program which provides funds for school libraries was recommended to receive level funding at $27 million. This was the first time the President recommended funding for IAL – last year the President’s budget did not recommend any funding for IAL.

Overall, education funding would receive approximately a three percent increase in the President’s request.

 

The post President Submits Budget Request to Congress appeared first on District Dispatch.

Roy Tennant: LC Reviews its File Formats for Preservation Recommendations

Wed, 2016-02-10 20:44

In an ongoing commitment to keep up with the changing world of preservation, the Library of Congress is doing it’s annual review of its “Recommended Formats Statement”. The stated purposes of the document are:

One purpose of the Statement is to provide internal guidance within the Library to help inform acquisitions of collections materials (other than materials received through the Copyright Office). A second purpose is to inform the creative and library communities on best practices for ensuring the preservation of, and long-term access to, the creative output of the nation and the world.

In other words, for those of us interested in keeping both physical and digital works around for as long as possible we should pay attention to what LC says on the subject. For most material types they provide “Preferred” formats and “Acceptable”. In some cases, there are no “Acceptable” recommendations.

Now given that care is taken to try to pick the most long-lived formats it’s unlikely that the recommendations will change very much from year to year, but it’s worth paying attention anyway. This is our chance to provide feedback to the Library of Congress on anything we want to see changed in future iterations of this useful document. If preservation is an orchard you labor in, now is the time to take another look at this. See the blog post linked above for your options for providing feedback.

Library of Congress: The Signal: Keeping Our Tools Sharp: Approaching the Annual Review of the Library of Congress Recommended Formats Statement

Wed, 2016-02-10 18:32

The following post is by Ted Westervelt, head of acquisitions and cataloging for U.S. Serials in the Arts, Humanities & Sciences section at the Library of Congress.

Since first launching its Recommended Formats Statement (then called Recommended Format Specifications in 2014), the Library of Congress has committed to treating it as an important part of its digital preservation strategy. As such, it follows two of the key tenets of digital preservation: collaboration and iteration.

The preservation of and provision of long-term access to the full range of digital objects, especially in these relatively early years, are not ones that can be carried out comprehensively or successfully by a single group or institution. This is an effort that must be carried out collaboratively and cooperatively, with an appreciation of the works of others and an imperative to share the work one is carrying out with others as well. Likewise, the great possibility inherent in the digital world for growth, change and development in the creation and dissemination of digital objects requires us to be responsive to those changes. As the objects we wish to preserve and make accessible change and adapt, the plans, practices and rules we create must change and adapt along with them. In short, everything we do in digital preservation must be if not in a constant state of flux, then sufficiently flexible to changes and ideas from across the field.

The Library of Congress’ Recommended Formats Statement has always had this dual charge in mind. The Statement was developed and implemented to help provide Library of Congress staff with guidance in building the collection. It identifies the technical characteristics of both physical and digital works that best encourage preservation and long-term access. This is not aimed at the migration or transformation of content after acquisition by an archiving institution; but looks more towards informing decisions about the creation or acquisition of that content, where such guidance can be of great value.

By Udo Grimberg / Deutsch: Hund der in den Computer schaut [CC BY 2.0], via Wikimedia Commons.

It is crucial that anyone involved with digital works is aware of the full scope of the digital life cycle. For acquiring or creating digital objects that one cannot see throughout their life cycle is an expenditure of resources with diminished returns (and potentially none, if the object is lost as a result). It is worth remembering that ‘a puppy is not just for Christmas’ and neither is a digital object just for the first moment you hold it. The information that the Statement provides Library staff enables them to make more informed decisions, especially in the acquisition and creation of digital works, which will enhance the likelihood that the content can be managed throughout its lifecycle.

The basic information that the Statement provides also has value for other institutions or organizations. Since the Statement is the result of the collective effort at the Library of Congress of experts in all aspects of the creation and distribution of digital objects, the information and hierarchies provided can be useful for one’s own digital preservation planning no matter in what part of the business cycle one is – as a creator, publisher, vendor, archivist, etc.

In order to meet these needs – to share our knowledge and build on it and to ensure that it is in sync with the current state of digital creation and preservation, the Library has actively engaged with its colleagues who are also stakeholders in the digital world. Communication of this work to others who might be interested has been a consistent effort since it first went public almost two years ago. Through conferences, articles, blog posts and listserv messages, the Library has worked to ensure that the information in the Statement gets to anyone who might find it useful in their own work with preservation and long-term access. Nor has this effort fallen on fallow ground. We are pleased to see steady usage of the Recommended Formats Statement website and downloading of the Statement itself every day and every month. Moreover, the dissemination of this work is now being undertaken by others as well as by the Library itself.

This past autumn, the Digital Preservation Coalition included the Statement in its Digital Preservation Handbook. Around the same time, the Association for Library Collections & Technical Services recommended the Statement as a resource in one of its e-forums. Beyond the undeniable pleasure of this sort of validation of our efforts from such esteemed colleagues, the sharing of our work by others helps increase greatly the exposure of the Statement and the chances that the information in it will get to people who could use that information or who might have valuable input on how to improve it. Both outcomes are crucial to our digital preservation efforts.

In complement to the general dissemination of the Statement for the use of others, the Library determined that an annual review of the Statement would ensure that it remains current and useful, to the Library itself and to other stakeholders. Beyond giving its in-house experts the chance to review their work in light of any new knowledge or experience, the Library actively solicits feedback from across the digital preservation sphere, in order to make the best possible revised and updated version. As malleable as the universe of digital creation can be, we do not expect whole-scale change across the board; but we do know that some things will change, even if just our understanding of them and so reviewing our work is very much worth the effort.

The Library has already completed one round of this review and revision, to very good effect. The feedback from across the spectrum enabled us to create a far more user-friendly document and one with important revisions, most notably to our still image file formats and to our audio works metadata. This revision did not create an entirely new document; but it did create a better one.

By ANKDADA007 / Human_pyramid_by_little_kids [CC BY 1.0], via Wikimedia Commons.

Now we are looking at our second annual revision of the Recommended Formats Statement. Between March and June, our teams of experts here at the Library will be reviewing their sections of the Statement and making whatever changes they feel will improve it in the new edition due out at the end of June. And in this, we very much would like and need the input of our external stakeholders, from some of whom we have heard already. Beyond our general belief that the Statement has some value for anyone involved in digital preservation, given the documented use and dissemination of the Statement, we know that there are those out there who agree with us. So, please share your thoughts, comments, feedback and input with us, either through this post, the contacts page or by e-mailing me (thwe at loc dot gov.)  The work we are attempting to do with the Recommended Formats Statement will have all the more value in this great collaborative effort of digital preservation the more guidance we get from you in developing and improving it.

Richard Wallis: Evolving Schema.org in Practice Pt1: The Bits and Pieces

Wed, 2016-02-10 10:07

I am often asked by people with ideas for extending or enhancing Schema.org how they go about it.  These requests inevitably fall into two categories – either ‘How do I decide upon and organise my new types & properties and relate them to other vocabularies and ontology‘ or ‘now I have my proposals, how do I test, share, and submit them to the Schema.org community?

I touch on both of theses areas in a free webinar I recorded for DCMI/ASIS&T a couple of months ago.  It is in the second in a two part series Schema.org in Two Parts: From Use to Extension .  The first part covers the history of Schema.org and the development of extensions.  That part is based up on my experiences applying and encouraging the use of Schema.org with bibliographic resources, including the set up and work of the Schema Bib Extend W3C Community Group – bibliographically focused but of interest to anyone looking to extend Schema.org.

To add to those webinars, the focus of this post is in answering the ‘now I have my proposals, how do I test, share, and submit them to the Schema.org community?‘ question.  In later posts I will move onto how the vocabulary its examples and extensions are defined and how to decide where and how to extend.

What skills do you need

Not many.  If you want to add to the vocabulary and/or examples you will naturally need some basic understanding of the vocabulary and the way you navigate around the Schema.org site, viewing examples etc.  Beyond that you need to be able to run a few command line instructions on your computer and interact with GitHub.  If you are creating examples, you will need to understand how Microdata, RDFa, and JSON-LD mark up are added to html.

I am presuming that you want to do more than tweak a typo, which could be done directly in the GitHub interface, so in this post I step through the practice of working locally, sharing with others, and proposing via a Github Pull Request your efforts..

How do I start

Environment
You need to set up the environment on your PC, this needs a local installation of Git so that you can interact with the Schema.org source and a local copy of the Google App Engine SDK to run your local copy of the Schema.org site.  The following couple of links should help you get these going.

Getting the Source

This is a two-step process.  Firstly you need your own parallel fork of the Schema.org repository.  If you have not yet, create a user account at Github.com.  They are free, unless you want to keep your work private.

Logged into Github, go to the Schema.org repository page – https://github.com/schemaorg/schemaorg, and select Fork this will create a schemaorg repository copy under your account.

Create yourself a working area on your PC and via a command line/terminal window place yourself in that directory to run the following git command, with MyAccount being replaced with your Github account name:

git clone https://github.com/MyAccount/schemaorg.git

This will download and unwrap a copy of the code into a schemaorg subdirectory of your working directory.

Running a Local Version 
In the directory where you downloaded the code, run the following command:

dev_appserver.py schemaorg

This should result in the output at the command line that looks something like this:

The important line being the one telling you module “default” running at: http://localhost:8080 If you drop that web address into your favourite browser you should end up looking at a familiar screen.

Success! You should now be looking at a version that operates exactly like the live version, but is totally contained on your local PC.  Note the message on the home page reminding you which version you are viewing.

Running a Shared Public Version
It is common practice to want to share proposed changes with others before applying them to the Schema.org repository in Github.  Fortunately there is an easy free way of running a Google App Engine in the cloud.  To do this you will need a Google account which most of us have.  When logged in to your Google account visit this page: https://console.cloud.google.com

From the ‘Select a project‘ menu Create a project..  Give your project a name – choose a name that is globally unique.  There is a convention that we use names that start with ‘sdo-‘ as an indication that it is a project running a Schema.org instance.

To ready your local code to be able to be uploaded into the public instance you need to make a minor change in a file named app.yaml in the schemaorg directory.  Use your favourite text editor to change the line near the top of the file that begins application to have a value that is the same as the project name you have just crated.  Note that lines beginning with a ‘#’ character are commented out and have no effect on operation.  For this post I have created an App Engine project named sdo-blogpost.

To upload the code run the following command:

appcfg.py update schemaorg/

You should get output that indicates the upload process has happened successfully. Dependant on your login state, you may find a browser window appearing to ask you to login to Google. Make sure at this point you login as the user that created the project.

To view your new shared instance go to the following address http://sdo-blogpost.appspot.com – modified to take account of your project name http://<project name>.appspot.com.

Working on the Files 
I will go into the internal syntax of the controlling files in a later post.  However, if you would like a preview, take a look in the data directory you will find a large file named schema.rdfa.  This contains the specification for the core of the Schema.org vocabulary – for simple tweaks and changes you may find things self-explanatory.  Also in that directory you will find several files that end in ‘-examples.txt‘.  As you might guess, these contain the examples that appear in the Schema.org pages.

Evolving and Sharing
How much you use your personal Github schemaorg repositoy fork to collaborate with like minded colleagues, or just use it as a scratch working area for yourself, is up to you.  However you choose to organise yourself, you will find the following git commands, that should be run when located in the schemaorg subdirectory, useful:

  • git status – How your local copy is instep with your repository
  • git add <filename> – adds file to the ones being tracked against your repository
  • git commit <filename> – commits (uploads) local changed or added file to your repository
  • git commit –a – commits (uploads) all changed or added files to your repository

It is recommended to commit as you go.

Requesting Changes 
The mechanism for requesting a change of any type to Schema.org is to raise a Github Pull Request.  Each new release of Schema.org is assembled by the organising team reviewing and hopefully accepting each Pull Request. You can see the current list of requests awaiting acceptance in Github.  To stop the comments associated with individual requests getting out of hand, and to make it easier to track progress, the preferred way of working is to raise a Pull Request as a final step in completing work on an Issue.

Raising an Issue first enables discussion to take place around proposals as they take shape.  It is not uncommon for a final request to differ greatly from an original idea after interaction with others in the comment stream.

So I suggest that you raise an Issue in the Schema.org repository for what you are attempting to solve.  Try to give it a good explanatory Title, and explain what you intend in the comment.   This is where the code in your repository and the appspot.com working version can be very helpful in explaining and exploring the issue.

When ready to request, take yourself to your repository’s home page to create a New Pull request.  Providing you do not create a new branch in the code, any new commits you make to your repository will become part of that Pull Request.  A very handy feature in the real world where inevitably you want to make minor changes just after you say that you are done!

Look out for the next post in this series in which I’ll cover working in the different file types that make up Schema.org and its extensions.

Richard Wallis: Evolving Schema.org in Practice Pt1: The Bits and Pieces

Wed, 2016-02-10 10:07

I am often asked by people with ideas for extending or enhancing Schema.org how they go about it.  These requests inevitably fall into two categories – either ‘How do I decide upon and organise my new types & properties and relate them to other vocabularies and ontology‘ or ‘now I have my proposals, how do I test, share, and submit them to the Schema.org community?

I touch on both of theses areas in a free webinar I recorded for DCMI/ASIS&T a couple of months ago.  It is in the second in a two part series Schema.org in Two Parts: From Use to Extension .  The first part covers the history of Schema.org and the development of extensions.  That part is based up on my experiences applying and encouraging the use of Schema.org with bibliographic resources, including the set up and work of the Schema Bib Extend W3C Community Group – bibliographically focused but of interest to anyone looking to extend Schema.org.

To add to those webinars, the focus of this post is in answering the ‘now I have my proposals, how do I test, share, and submit them to the Schema.org community?‘ question.  In later posts I will move onto how the vocabulary its examples and extensions are defined and how to decide where and how to extend.

What skills do you need

Not many.  If you want to add to the vocabulary and/or examples you will naturally need some basic understanding of the vocabulary and the way you navigate around the Schema.org site, viewing examples etc.  Beyond that you need to be able to run a few command line instructions on your computer and interact with GitHub.  If you are creating examples, you will need to understand how Microdata, RDFa, and JSON-LD mark up are added to html.

I am presuming that you want to do more than tweak a typo, which could be done directly in the GitHub interface, so in this post I step through the practice of working locally, sharing with others, and proposing via a Github Pull Request your efforts..

How do I start

Environment
You need to set up the environment on your PC, this needs a local installation of Git so that you can interact with the Schema.org source and a local copy of the Google App Engine SDK to run your local copy of the Schema.org site.  The following couple of links should help you get these going.

Getting the Source

This is a two-step process.  Firstly you need your own parallel fork of the Schema.org repository.  If you have not yet, create a user account at Github.com.  They are free, unless you want to keep your work private.

Logged into Github, go to the Schema.org repository page – https://github.com/schemaorg/schemaorg, and select Fork this will create a schemaorg repository copy under your account.

Create yourself a working area on your PC and via a command line/terminal window place yourself in that directory to run the following git command, with <MyAccount> being replaced with your Github account name:

git clone https://github.com/&lt;MyAccount&gt;/schemaorg.git

This will download and unwrap a copy of the code into a schemaorg subdirectory of your working directory.

Running a Local Version 
In the directory where you downloaded the code, run the following command:

dev_appserver.py schemaorg

This should result in the output at the command line that looks something like this:

The important line being the one telling you module “default” running at: http://localhost:8080 If you drop that web address into your favourite browser you should end up looking at a familiar screen.

Success! You should now be looking at a version that operates exactly like the live version, but is totally contained on your local PC.  Note the message on the home page reminding you which version you are viewing.

Running a Shared Public Version
It is common practice to want to share proposed changes with others before applying them to the Schema.org repository in Github.  Fortunately there is an easy free way of running a Google App Engine in the cloud.  To do this you will need a Google account which most of us have.  When logged in to your Google account visit this page: https://console.cloud.google.com

From the ‘Select a project‘ menu Create a project..  Give your project a name – choose a name that is globally unique.  There is a convention that we use names that start with ‘sdo-‘ as an indication that it is a project running a Schema.org instance.

To ready your local code to be able to be uploaded into the public instance you need to make a minor change in a file named app.yaml in the schemaorg directory.  Use your favourite text editor to change the line near the top of the file that begins application to have a value that is the same as the project name you have just crated.  Note that lines beginning with a ‘#’ character are commented out and have no effect on operation.  For this post I have created an App Engine project named sdo-blogpost.

To upload the code run the following command:

appcfg.py update schemaorg/

You should get output that indicates the upload process has happened successfully. Dependant on your login state, you may find a browser window appearing to ask you to login to Google. Make sure at this point you login as the user that created the project.

To view your new shared instance go to the following address http://sdo-blogpost.appspot.com – modified to take account of your project name http://<project name>.appspot.com.

Working on the Files 
I will go into the internal syntax of the controlling files in a later post.  However, if you would like a preview, take a look in the data directory you will find a large file named schema.rdfa.  This contains the specification for the core of the Schema.org vocabulary – for simple tweaks and changes you may find things self-explanatory.  Also in that directory you will find several files that end in ‘-examples.txt‘.  As you might guess, these contain the examples that appear in the Schema.org pages.

Evolving and Sharing
How much you use your personal Github schemaorg repositoy fork to collaborate with like minded colleagues, or just use it as a scratch working area for yourself, is up to you.  However you choose to organise yourself, you will find the following git commands, that should be run when located in the schemaorg subdirectory, useful:

  • git status – How your local copy is instep with your repository
  • git add <filename> – adds file to the ones being tracked against your repository
  • git commit <filename> – commits (uploads) local changed or added file to your repository
  • git commit –a – commits (uploads) all changed or added files to your repository

It is recommended to commit as you go.

Requesting Changes 
The mechanism for requesting a change of any type to Schema.org is to raise a Github Pull Request.  Each new release of Schema.org is assembled by the organising team reviewing and hopefully accepting each Pull Request. You can see the current list of requests awaiting acceptance in Github.  To stop the comments associated with individual requests getting out of hand, and to make it easier to track progress, the preferred way of working is to raise a Pull Request as a final step in completing work on an Issue.

Raising an Issue first enables discussion to take place around proposals as they take shape.  It is not uncommon for a final request to differ greatly from an original idea after interaction with others in the comment stream.

So I suggest that you raise an Issue in the Schema.org repository for what you are attempting to solve.  Try to give it a good explanatory Title, and explain what you intend in the comment.   This is where the code in your repository and the appspot.com working version can be very helpful in explaining and exploring the issue.

When ready to request, take yourself to your repository’s home page to create a New Pull request.  Providing you do not create a new branch in the code, any new commits you make to your repository will become part of that Pull Request.  A very handy feature in the real world where inevitably you want to make minor changes just after you say that you are done!

Look out for the next post in this series in which I’ll cover working in the different file types that make up Schema.org and its extensions.

SearchHub: Welcome Trey Grainger!

Wed, 2016-02-10 00:31

We’re happy to announce another new addition to the Lucidworks team! Trey Grainger has joined as Lucidworks SVP of Engineering where he’ll be heading up our engineering efforts for both open source Apache Lucene/Solr and our Lucidworks Fusion platform, and our other product offerings.

Trey most recently served as the Director of Engineering on the Search & Recommendations team at CareerBuilder, where he built out a team of several dozen software engineers and data scientists to deliver a robust semantic search, data analytics, and recommendation engine platform. This platform contained well over a billion documents and powered over 100 million searches per day across a large combination of consumer-facing websites and B2B Software as a Service products.

Trey is also the co-author of Solr in Action, the comprehensive example-driven guide to Apache Solr (his co-author was Tim Potter, another Lucidworks engineer).

Trey received his MBA in Management of Technology from Georgia Tech, studied Computer Science, Business, and Philosophy at Furman University, and has also completed Masters-level work in Information Retrieval and Web Search from Stanford University.

We sat down with Trey to learn more about his passion for search:

When did you first get started working with Apache Lucene?

In 2008, I was the lead engineer for CareerBuilder’s newly-formed search team and was tasked with looking for potential options to replace the company’s existing usage of Microsoft’s FAST search engine. Apache Lucene was a mature option at that point, and Apache Solr was rapidly maturing to the point where it could support nearly all of the necessary functionality that we needed. After some proof of concept work, we decided to migrate to Solr, which enabled us to leverage and extend the best Lucene had to offer, while providing a highly reliable out-of-the-box search server which supported distributed search (scale out with shards, scale up with replicas) and an extensively pluggable architecture and set of configuration options. We started migrating to Solr in 2009 and completed the migration in 2010, by which time the Lucene and Solr projects had actually merged their code bases into one project. Ever since then, I’ve had the tremendous opportunity to help develop, speak about, write about, and run teams pushing forward the tremendous capabilities available in the Lucene/Solr ecosystem.

How has search evolved over the past couple years? Where do you think it’ll be in the next 10?

Over the last decade, the keyword search box has really evolved to become the de facto user interface for exploring data and for navigating most websites and applications. Companies used to pay millions of dollars to license search technology that did little more than basic text search, highlighting, and faceting. As Lucene/Solr came on the scene and commoditized those capabilities, search engineers were able to fully embrace the big data era and focus on building out scalable infrastructure to run their open-source-based search systems. With the rise of cloud computing and virtual machines, Solr likewise developed to scale elastically with automatic sharding, replication, routing, and failover in such a way that most of the hard infrastructure work has now also become commoditized. Lucene/Solr have also become near-real-time systems, enabling an impressive suite of real-time analytics and matching capabilities.

With all of these changes, I’ve seen the value proposition for search shift significantly from “providing a keyword box”, to “scalable navigation through big data”, and another massive shift is now underway. Today, more companies than ever are viewing search not just as infrastructure to enable access to data, but instead as the killer application needed to provide insights and highly-relevant answers to help their customers and move their businesses forward.

I thus anticipate seeing an ever growing focus on domain-driven relevance over the coming years. We’re already seeing industry-leading companies develop sophisticated semantic search capabilities that drive tremendous customer value, and I see the next decade being one where such intelligent capabilities are brought to the masses.

What do you find most exciting in the current search technology landscape?

The current frontier of search relevancy (per my answer to the last question) is what most excites me right now in the search technology landscape. Now that core text search, scaling, and cluster management have become much more commoditized, we’re beginning to see increased focus on relevancy as a key competitive differentiator across many search applications. Doing relevancy well includes adding capabilities like query intent inference, entity extraction, disambiguation, semantic and conceptual search, automatic classification and extraction of knowledge from documents, machine-learned ranking, using clickstream feedback for boosting and collaborative filtering, per-user personalization and recommendations, and evolving search to be able to able to provide answers instead of just lists of documents as a response to natural language questions. Many of these capabilities require external systems to support sophisticated workflows and feedback loops (such as those already built into Lucidworks Fusion through the combination pipelines with Solr + Spark), and Lucidworks is at the forefront of pushing this next generation of intelligent search applications.

Where are the biggest challenges in the search space?

Some of the most fun challenges I’ve tackled in my career have been building systems for inferring query intent, recommendation systems, personalized search, and machine-learned relevancy models. There’s one key thing I learned about search along the way: nothing is easy at scale or in the tail. It took me years of building out scalable search infrastructure (with mostly manual relevancy tuning) before I had sufficient time to really tackle the long tail of relevancy problems using machine learning to solve them in an optimal way.

What’s particularly unique about the search space is that it requires deep expertise across numerous domains to do really well. For example, the skillsets needed to build and maintain scalable infrastructure include topics like distributed systems, data structures, performance and concurrency optimization, hardware utilization, and network communication. The skills needed to tackle relevancy include topics like domain expertise, feature engineering, machine learning, ontologies, user testing, and natural language processing. It’s rare to find people with all of these skillsets, but to really solve hard search problems well at scale and in the tail, all of these topics are important to consider.

What attracted you to Lucidworks?

Interesting problems and a shared vision for what’s possible. What attracted me to Lucidworks is opportunity to work with visionaries in the search space building search technology that will help the masses derive intelligence from their data both at scale and in the tail. Search is a really hard problem, and I’m excited to be in great company trying to solve that problem well.

What will you be working on at Lucidworks?

As SVP of Engineering, I’ll be heading up our engineering efforts around both open source Lucene/Solr, as well as Lucidworks Fusion and our other exciting product offerings. With Lucidworks employing a large percentage of Lucene/Solr committers, we take good stewardship of the open source project very seriously, and I’m excited to be able to work more on the strategic direction of our open source contributions. Additionally, I’ll be working to drive Fusion as the next generation platform for building search-driven, intelligent applications. I’m incredibly excited to be working with such a top-notch team at Lucidworks, and am looking forward to building out what will be the most scalable, dependable, easy to use, and highly relevant search product on the market.

Welcome, Trey!

The post Welcome Trey Grainger! appeared first on Lucidworks.com.

DuraSpace News: Fedora 4 at The Art Institute of Chicago, An Interview With Stefano Cossu

Wed, 2016-02-10 00:00

Carol Minton Morris from DuraSpace interviewed Stefano Cossu, Director of Application Services, Collections at The Art Institute of Chicago (AIC), to learn about how Fedora 4 is implemented at AIC.

• What’s your role with Fedora at your organization or institution?

DuraSpace News: DSPACE USERS: Feedback Requested on Next User Interface for DSpace

Wed, 2016-02-10 00:00

From Tim Donohue, DSpace Tech Lead

You may recall, in late 2015, we held a DSpace User Interface (UI) Prototype Challenge. From our amazing developer community, we received a total of nine UI prototype submissions, using a variety of technologies (e.g. Java web frameworks, client side Javascript, Ruby on Rails).

https://wiki.duraspace.org/display/DSPACE/DSpace+UI+Prototype+Challenge

Video demos of UI prototypes/technologies

District Dispatch: ALA disappointed at White House budget cut to state grants to libraries

Tue, 2016-02-09 23:06

Cutting funds to libraries hinders services being delivered directly to people in their communities

Libraries directly deliver a wide range of services to Americans in their local communities throughout the nation. (Pictured here: Cherry Hill Public Library)

After reviewing President Obama’s Fiscal 2017 Budget released today, Sari Feldman, president of the American Library Association (ALA), issued the following statement:

We are truly disappointed that the President’s budget does not recognize the value libraries bring to our country. Every day America’s libraries create individual opportunity and community progress. It is ironic that the President has cut federal funds considering libraries are on the front lines directly serving all Americans without exception and that our work with individuals and communities advances our country in education, employment, and entrepreneurship. By cutting federal funds to libraries, the President’s budget is making it more difficult for libraries to do their job.

Feldman said cutting nearly a million dollars to grants to state library programs means: fewer children will benefit from reading and learning programs; fewer people will get the skills training they need to seek and sustain employment; fewer small businesses can research markets in order to grow; fewer Americans can search for health care resources and maintain health records; and fewer parents can investigate how to send their children to college and apply for financial aid.

 

The post ALA disappointed at White House budget cut to state grants to libraries appeared first on District Dispatch.

District Dispatch: Webinar explores collaboration to serve military &amp; their families through libraries

Tue, 2016-02-09 22:23

ALA’s Larra Clark (left) and Ann Estes (right foreground) with the National Foundation for Credit Counseling, led a webinar on how public libraries can get involved in a new initiative to serve military and their families.

The ALA and the National Foundation for Credit Counseling (NFCC) introduced their new partnership to the first round of potential pilot libraries last week. The effort provides an opportunity for libraries to add to their capacity to serve military members and their families with customized financial education and resources in concert with certified credit counselors.

Larra Clark, deputy director of the ALA Office for Information Technology Policy, and Ann Estes, NFCC vice president of business development, shared information about program goals and benefits, research related to financial needs and program outcomes, and information about the NFCC’s member and communications reach.

As noted earlier, members of the U.S. armed forces, Coast Guard, veterans, and their families face financial challenges often not adequately addressed by resources designed for the general public. ALA and NFCC will leverage local member agencies and libraries to help improve the financial lives of service members, veterans and their families.

If you are interested, you can download the slides and view the webinar here: https://nfcc.adobeconnect.com/p7d6dz0ru36/.

We had a little bit of a hiccup with the sound in the first five minutes, so please jump ahead to the 5:10 mark to catch the full audio.

If you would like to learn more about this initiative, please contact Larra Clark at lclark@alawash.org or by phone at: 202-403-8213. NFCC and ALA will announce the local communities and libraries where the program will first be launched in the coming weeks.

The post Webinar explores collaboration to serve military & their families through libraries appeared first on District Dispatch.

FOSS4Lib Recent Releases: Open Journal Systems - 2.4.8

Tue, 2016-02-09 21:53
Package: Open Journal SystemsRelease Date: Tuesday, February 9, 2016

Last updated February 9, 2016. Created by David Nind on February 9, 2016.
Log in to edit this page.

The Public Knowledge Project is pleased to announce the release of OJS 2.4.8.

This release builds on the 2.4.7 and 2.4.7-1 releases to collect the numerous minor fixes and tweaks that have since been completed. It adds numerous translation updates and several new features. It includes a substantial improvement to the PKP LOCKSS PLN plugin1.

Our thanks go out to our partners and the many community translators and developers whose contributions make our work possible.

David Rosenthal: The Malware Museum

Tue, 2016-02-09 16:00
Mikko Hypponen and Jason Scott at the Internet Archive have put up the Malware Museum:
a collection of malware programs, usually viruses, that were distributed in the 1980s and 1990s on home computers. Once they infected a system, they would sometimes show animation or messages that you had been infected. Through the use of emulations, and additionally removing any destructive routines within the viruses, this collection allows you to experience virus infection of decades ago with safety.The museum is an excellent use of emulation and well worth a visit.

I discussed the issues around malware in my report on emulation. The malware in the Malware Museum is too old to be networked, and thus avoids the really difficult issues that running software with access to the network that is old, and thus highly vulnerable, causes.

Even if emulation can ensure that only the virtual machine and not its host is infected, and users can be warned not to input any personal information to it, this may not be enough. The goal of the infection is likely to be to co-opt the virtual machine into a botnet, or to act as a Trojan on your network. If you run this vulnerable software you are doing something that a reasonable person would understand puts other people's real machines at risk. The liability issues of doing so bear thinking about.

FOSS4Lib Upcoming Events: Managing Assets as Linked Data with Fedora 4

Tue, 2016-02-09 15:15
Date: Wednesday, April 6, 2016 - 13:30 to 16:30Supports: Fedora Repository

Last updated February 9, 2016. Created by Peter Murray on February 9, 2016.
Log in to edit this page.

From the announcement:

Manage Metadata (Diane Hillmann and Jon Phipps): It’s not just me that’s getting old

Tue, 2016-02-09 14:19

Having just celebrated (?) another birthday at the tail end of 2015, the topics of age and change have been even more on my mind than usual. And then two events converged. First I had a chat with Ted Fons in a hallway at Midwinter, and he asked about using an older article I’d published with Karen Coyle way back in early 2007 (“Resource Description and Access (RDA): Cataloging Rules for the 20th Century”). The second thing was a message from Research Gate that reported that the article in question was easily the most popular thing I’d ever published. My big worry in terms of having Ted use that article was that RDA had experienced several sea changes in the nine (!) years since the article was published (Jan./Feb. 2007), so I cautioned Ted about using it.

Then I decided I needed to reread the article and see whether I had spoken too soon.

The historic rationale holds up very well, but it’s important to note that at the time that article was written, the JSC (now the RSC) was foundering, reluctant to make the needed changes to cut ties to AACR2. The quotes from the CC:DA illustrate how deep the frustration was at that time. There was a real turning point looming for RDA, and I’d like to believe that the article pushed a lot of people to be less conservative and more emboldened to look beyond the cataloger tradition.

In April of 2007, a mere few months from when this article came out, ALA Publishing arranged for the famous “London Meeting” that changed the course of RDA. Gordon Dunsire and I were at that meeting–in fact it was the first time we met. I didn’t even know much about him aside from his article in the same DLIB issue. As it turns out, the RDA article was elevated to the top spot, thus stealing some of his thunder, so he wasn’t very happy with me. The decision made in London to allow DCMI to participate by building the vocabularies was a game changer, and Gordon and I were named co-chairs of a Task Group to manage that process.

So as I re-read the article, I realized that the most important bits at the time are probably mostly of historical interest at this point. I think the most important takeaway is that RDA has come a very long way since 2007, and in some significant ways is now leading the pack in terms of its model and vocabulary management policies (more about that to come).

And I still like the title! …even though it’s no longer a true description of the 21st Century RDA.

DuraSpace News: VIVO Conference Call for Papers, Workshops and Posters Open through March 14

Tue, 2016-02-09 00:00

From the VIVO 2016 Conference organizers

Austin, TX  The Seventh Annual VIVO Conference will be held August 17-19, 2016 at the Denver Marriott City Center in Denver, Colorado. The organizers are pleased to issue this call for contributions to the program.

The VIVO Conference creates a unique opportunity for people from around the world to come together to explore ways to use semantic technologies and linked open data to promote scholarly collaboration and research discovery.

DuraSpace News: Status Update on DSpace 6.0

Tue, 2016-02-09 00:00

From Tim Donohue, DSpace Tech Lead

Pages