The Solr suggester search component was previously discussed on this blog in the post Solr Suggester by Solr committer Erick Erickson. This post shows how to add a Solr suggester component to a Fusion query pipeline in order to provide the kind of auto-complete functionality expected from a modern search app.
By auto-complete we mean the familiar set of drop-downs under a search box which suggest likely words or phrases as you type. This is easy to do using Solr’s FST-based suggesters. FST stands for “Finite-State Transducer”. The underlying mechanics of an FST allow for near-matches on the input, which means that auto-suggest will work even when the inputs contain typos or misspellings. Solr’s suggesters return the entire field for a match, making it possible to suggest whole titles or phrases based on just the first few letters.
The data in this example is derived from data collected by the Movie Tweetings project between 2013 and 2016. A subset of that data has been processed into a CSV file consisting of a row per film, with columns for a unique id, the title, release year, number of tweets found, and average rating across tweets:id,title,year,ct,rating ... 0076759,Star Wars: Episode IV - A New Hope,1977,252,8.61111111111111 0080684,Star Wars: Episode V - The Empire Strikes Back,1980,197,8.82233502538071 0086190,Star Wars: Episode VI - Return of the Jedi,1983,178,8.404494382022472 1185834,Star Wars: The Clone Wars,2008,11,6.090909090909091 2488496,Star Wars: The Force Awakens,2015,1281,8.555815768930524 ...
After loading this data into Fusion, I have a collection named “movies”. The following screenshot shows the result of a search on the term “Star Wars”.
The search results panel shows the results for the search query “Star Wars”, sorted by relevancy (i.e. best-match). Although all of the movie titles contain the words “Star Wars”, they don’t all begin with it. If you’re trying to add auto-complete to a search box, the results should complete the initial query. In the above example, the second best-match isn’t a match at all in an auto-complete scenario. Instead of using the default Solr “select” handler to do the search, we can plug in an FST suggester, which will give us not just auto-complete, but fuzzy autocomplete, through the magic of FSTs.
Fusion collections are Solr collections which are managed by Fusion. To add a Lucene/Solr suggester to the “movies” collection requires editing the Solr config files according to the procedure outlined in the “Solr Suggester” blogpost:
- define a field with the correct analyzer in file schema.xml
- define a request handler for auto-complete in file solrConfig.xml
Fusion sends search requests to Solr via the Fusion query pipeline Solr query stage, therefore it’s also necessary to configure a Solr query stage to access the newly configured suggest request handler.
The Fusion UI provides tools for editing Solr configuration files. These are available from the “Configuration” section on the collection “Home” panel, seen on the left-hand side column in the above screenshot. Clicking on the “Solr Config” option shows the set of available configuration files for collection “movies”:
Clicking on file schema.xml opens an edit window. I need to define a field type and specify how the contents of this field will be analyzed when creating the FSTs used by the suggester component. To do this, I copy in the field definition from the very end of the “Solr Suggester” blogpost:<!-- text field for suggestions, taken from: https://lucidworks.com/blog/2015/03/04/solr-suggester/ --> <fieldType name="suggestTypeLc" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " /> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
After clicking the “Save” button, the Fusion UI displays the notification message: “File contents saved and collection reloaded.”
Next I edit the solrConfig.xml file to add in definition for the suggester search component and corresponding request handler:
This configuration is based on Solr’s “techproducts” example, based on the Suggester configuration docs in the Solr Reference Guide. The suggest search component is configured with parameters for the name, and implementation type of the suggester, the field to be analyzed, the analyzer used. We also specify the optional parameter weightField which, if present, returns an additional document field that can be used for sorting.
For this example, the field parameter is movie_title_txt. The suggestAnalyzerFieldType specifies that the movie title text will be analyzed using the analyzer defined for field type suggestTypeLc, (added to the schema.xml file for the “movies” collection in the previous step). Each movie has two kinds of ratings information: average rating and count (total number of ratings from tweets). Here, the average rating value is specified:<searchComponent name="suggest" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">mySuggester</str> <str name="lookupImpl">FuzzyLookupFactory</str> <str name="dictionaryImpl">DocumentDictionaryFactory</str> <str name="storeDir">suggester_fuzzy_dir</str> <str name="field">movie_title_txt</str> <str name="weightField">rating_tf</str> <str name="suggestAnalyzerFieldType">suggestTypeLc</str> </lst> </searchComponent>
For details, see Solr wiki Suggester seachComponent section.
The request handler configuration specifies the request path and the search component:<requestHandler name="/suggest" class="solr.SearchHandler"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.count">10</str> <str name="suggest.dictionary">mySuggester</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler>
For details, see Solr wiki Suggester requestHandler section.
After each file edit, the collection configs are saved and the collection is reloaded so that changes take effect immediately.
Finally, I configure a pipeline with a Solr query stage which permits access to the suggest request handler:
Lacking a UI with the proper JS magic to show autocomplete in action, we’ll just send a request to the endpoint, to see how the suggest request handler differs from the default select request handler. Since I’m already logged into the Fusion UI, from the browser location bar, I request the URL:
The power of the FST suggester lies in its robustness. Misspelled and/or incomplete queries still produce good results. This search also returns the same results as the above search:
Under the hood, Lucidworks Fusion is Solr-powered, and under the Solr hood, Solr is Lucene-powered. That’s a lot of power. The autocompletion for “Solr-fu” is “Solr-Fusion”!
The post Fusion plus Solr Suggesters for More Search, Less Typing appeared first on Lucidworks.com.
Our friendly fundraising lobster, the Lobstometre (r before e because we are Canadian like that) has gotten another bump this week, thanks to new Collaborator Florida Virtual Campus, a renewed partnership with LYRASIS, and support from Individual Members totalling more than $1500. We are more than halfway to our minimum fundraising goal and would like to say a very big 'THANK YOU!" to the supporters who have gotten us here.
Wow, it turns out if you have a ton of clients materialize over the fall, you have no time to tell the internet about them!
So here’s what I’m up to:
- Running for LITA president! Yup. If you’re a member in good standing of LITA, you’ll get your ballot in March, and I’d really appreciate your vote. Stay tuned for my campaign page and official LITA candidate profile.
- Coding for Measure the Future! This consists largely in arguing with Griffey about privacy. And also being, as far as I can tell, the first person on the internet to have gotten a Django app running on an Intel Edison, a tiny adorable computer that fits in the palm of my hand.
- Coding for Wikimedia! So…that happened. I’m doing an internal project for The Wikipedia Library, improving the usability of their journal access application system (and creating the kernel of a system that, over time, might be able to open up lots more possibilities for them).
- Coding for CustomFit! We’ve debuted straight-shaped sweaters along with our original hourglass (a coding process which was not unlike rebuilding an airplane in flight), so now you can make sweaters for people who may not want normatively-feminine garments. Yay! Also I implemented a complete site redesign last fall (if you’re wondering, “can Andromeda take a 12-page PDF exported from Photoshop, translate it into CSS, and rewrite several hundred templates accordingly”, the answer turns out to be yes). Anyway, if you’d been thinking of taking the CustomFit plunge but not gotten around to it yet, please go check that out – there’s a ton of great new stuff, and more on the way.
- Keynoting LibTechConf! My talk will be called “The Architecture of Values”, and it’ll be about how our code does (or, spoiler alert, doesn’t) implement our library values. Also the other keynoter is Safiya Noble and I am fangirling pretty hard about that.
Last updated February 4, 2016. Created by wooble on February 4, 2016.
Log in to edit this page.
Now includes a bare-bones SUSHI client executable, better support for DB1, BR1, BR2 reports, and the ability to output COUNTER 4 TSV reports (from either programatically-built reports, reports parsed from other formats, or reports fetched with SUSHI)
What library technology topic are you passionate about?
Have something to teach?
The Library Information Technology Association (LITA) Education Committee invites you to share your expertise with a national audience! For years, LITA has offered online learning programs on technology-related topics of interest to LITA Members and wider American Library Association audience.
All topics related to the intersection of technology and libraries are welcomed. Possible topics include:
- Research Data ManagementCC by www.gotcredit.com
- Supporting Digital Scholarship
- Technology and Kids or Teens
- Managing Technical Projects
- Creating/Supporting Library Makerspaces, or other Creative/Production Spaces
- Data-Informed Librarianship
- Diversity and Technology
- Accessibility Issues and Library Technology
- Technology in Special Libraries
- Ethics of Library Technology (e.g., Privacy Concerns, Social Justice Implications)
- Library/Learning Management System Integrations
- Technocentric Library Spaces
- Social Media Engagement
- Intro to… GitHub, Productivity Tools, Visualization/Data Analysis, etc.
Instructors receive a $500 honorarium for an online course or $100-150 for webinars, split among instructors. For more information, access the online submission form. Check out our list of current and past course offerings to see what topics have been covered recently. We’re looking forward to a slate of compelling and useful online education programs this year!
LITA Education Committee.
Questions or Comments?
For questions or comments related to teaching for LITA, contact LITA at (312) 280-4268 or Mark Beatty, email@example.com
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week:
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Ask Your Members of Congress to Help Bring the Bucks Home while They’re at Home
It’s “appropriations” season again in Washington. That time every year when the President submits a budget to Congress and, in theory at least, Congress drafts and votes on bills to federally fund everything from llama farming to, well, libraries. Nevermind where llamas get their cash, but libraries in every state in the nation benefit from funds allocated by Congress for the Library Services and Technology Act (LSTA), the only federally funded program specifically dedicated to supporting libraries. Last year, libraries received just under $183 million in LSTA funding, about $156 million of which flowed to states as matching grants.
Neither llama farmers nor libraries, however, benefit from federal funding without considerable convincing. That’s where you and your Members of Congress come in.
Starting in mid-February, individual Members of Congress will start signing letters addressed to their influential colleagues who sit on the powerful Appropriations Committees in both chambers of Congress. Those letters will ask the Committee to provide specific dollar amounts for specific programs, LSTA included. The math is easy: the more Members of Congress who sign the “Dear Appropriator” letter asking for significant LSTA funding, the better the odds of that money actually being awarded by the Appropriations Committee and eventually flowing to your state. Similarly, the more librarians and library supporters who ask their Members of Congress to sign that LSTA Dear Appropriator letter, the better the odds that LSTA will be funded and funded well.
So, how can you help? That’s easy, too.
We are asking library supporters to reach out and request a meeting with their Representatives and Senators while Members of Congress are home for the Presidents’ Day recess from February 15 – 20. The message to deliver at these meetings couldn’t be more simple or straightforward: “Please add your name to the LSTA Dear Appropriator letter.”
Members of Congress may be considering signing letters in support of other programs, but they will most likely sign the LSTA letter if they hear from constituents back home … or better yet, if they can visit your library and see the positive impact LSTA-funded programs are having on their constituents.
Please take a moment this week to reach out to your Member of Congress’ and Senators’ offices and request a meeting with the Member or his or her “District Director” anytime during the week of February 15 to discuss LSTA and the Dear Appropriator letters. Once you’ve met, please let the Washington Office know how it went and we will follow up on your great work.
Your Representative and Senators work for you and will love hearing about all of the great things that LSTA money does for their constituents. They’ll be happy to hear from you! Please, set that Presidents’ Week meeting today.
The post Federal Dollars on the Line for State Library Programs appeared first on District Dispatch.
Next Tuesday STAPLR + a live feed of anonymous desk activity data + Twitter streams will be the basis for a performance by the students in Doug Van Nort’s class DATT 3200, Performing Telepresence, which will take place simultaneously in the DisPerSion Lab and all the branches of York University Libraries. You can watch, listen, participate and help perform from anywhere in the world. If you’re in or near Toronto, you can experience it in person.
Tuesday 9 February 2016, 3:30 – 5:30 pm
William Denton (York University Libraries)
Doug Van Nort (School of the Arts, Media, Performance & Design)
Students of DATT 3200 Performing Telepresence
Reimagine the realtime streams emanating from, to and about York University Libraries in its physical and virtual homes. Featuring:
William Denton’s sonification of YUL reference desks (listen remotely at staplr.org)
Sound, Light and Text Instruments
Performed between all branches of York University Libraries (Bronfman, Frost, Maps, Scott, SMIL, Steacie) and the DisPerSion Lab by DATT students, using Twitter as their interface.
Experience the immersive version at the DisPerSion Lab (334 Centre for Fine Arts),
Watch/Listen to the virtual feed (video, audio, Twitter) at dispersionlab.org
Participate and help perform the piece by tweeting @dispersion_lab
January marked the tenth anniversary of Yonik Seeley’s fateful post on the Apache incubator listserv back in January of 2006:
Hello Incubator PMC folks, I would like to propose a new Apache project named Solr.
The project is being proposed as a sub-project of Lucene, and the Lucene PMC has agreed to be the sponsor.
Seeley also included the full proposal which includes cultivating an active open source community as a top priority with Doug Cutting as the sponsor and the first three initial committers: Seeley himself, Bill Au, and Chris “Hoss” Hostetter. And here we are, 10 years later and Apache Solr is the most deployed open source search technology on the planet with thousands of production instances.
We’ve updated our ‘history of Solr’ infographic with the results of our developer survey from the fall. More survey results on the way.
Learn more about Lucidworks Fusion, our Solr-powered application development platform for building intelligent search-driven apps.
Because the cost penalties for peak access to storage and for small requests are so large ..., if Glacier is not to be significantly more expensive than local storage in the long term preservation systems that use it will need to be carefully designed to rate-limit accesses and to request data in large chunks.Now, 40 months later, Simon Sharwood at The Register reports that people who didn't pay attention are shocked that using Glacier can cost more in a month than enough disk to store the data 60 times over:
Last week, a chap named Mario Karpinnen took to Medium with a tale of how downloading 60GB of data from Amazon Web Services' archive-grade Glacier service cost him a whopping US$158.
Karpinnen went into the fine print of Glacier pricing and found that the service takes your peak download rate, multiplies the number of gigabytes downloaded in your busiest hour for the month and applies it to every hour of the whole month. His peak data retrieval rate of 15.2GB an hour was therefore multiplied by the $0.011 per gigabyte charged for downloads from Glacier. And then multiplied by the 744 hours in January. Once tax and bandwidth charges were added, in came the bill for $158.Karpinnen's post is a cautionary tale for Glacier believers, but the real problem is he didn't look the gift horse in the mouth:
But doing the math (and factoring in VAT and the higher prices at AWS’s Irish region), I had the choice of either paying almost $10 a month for the simplicity of S3 or just 87¢/mo for what was essentially the same thing,He should have asked himself how Amazon could afford to sell "essentially the same thing" for one-tenth the price. Why wouldn't all their customers switch? I asked myself this in my post on the Glacier announcement:
In order to have a competitive product in the the long-term storage market Amazon had to develop a new one, with a different pricing model. S3 wasn't competitive. As Sharwood says:
Karpinnen's post and Oracle's carping about what it says about AWS both suggest a simple moral to this story: cloud looks simple, but isn't, and buyer beware applies every bit as much as it does for any other product or service.The fine print was written by the vendor's lawyers. They are not your friends.
“Self-publishing represents the future of literature. Its willingness to experiment, it’s greater speed to market, it’s quicker communication with the audience, its greater rewards and creative control for creators, its increasing popularity all augur for the continued expansion of self-publishing and its place as the likely wellspring for our best new works” (LaRue, 2014, para. 13).
The self-publishing movement is alive and well in public libraries across the nation, especially within the fiction genre. In a recent American Libraries magazine article, “Solving the Self-Published Puzzle,” Langraf lists several public libraries acquiring self-published books to develop their collections with local authors and with works of regional interest.
I think of how this movement will grow among other types of library communities, and most importantly, how self-publishing technology has made it possible for all of us to publish and access high-quality digital and print resources. Will academic librarians assist teaching faculty to publish their own digital textbooks? Will creative writing classes add an eBook publishing component into their curriculum? Will special library collections, archives, or museums use these online platforms to create wonderful monographs or documents of archived material that will reach a greater audience? The possibilities are endless.
What was most interesting to me while reading the American Libraries piece is that libraries are including independent publishing advice and guidance workshops in their makerspace areas. The freedom of becoming a self-published author comes with a to-do-list: cover illustrations, ebook format conversion (EPUB, MOBI, etc.), online editing, metadata, price and royalties, contracts, and creation of website and social media outlets for marketing purposes. These are a few of the many things to think about. Much needs to be learned and librarians can become proficient in these areas in order to create their own creative projects or assist patrons in self-publishing. It is refreshing to see that an author can trespass the gatekeepers of publishing to get their project published and that our profession can make this phenomenon more accessible to our communities.
We can convert writers into authorpreneurs, a term I recently discovered (McCartney, 2015). The speed of publishing is awesome – no waiting. A project can appeal to a particular audience not accessible through traditional routes of publishing. If the author is interested, indie writers have platforms to get picked up by renowned publishing houses and agents. Traditional authors may also make a plunge into self-publishing. The attraction for librarians is that the published books can be distributed through platforms like Overdrive currently being used by libraries. In addition, eBook publishing sites make it possible for users to view their item on several mobile devices through apps or eReaders. The file type conversions to become readable in all devices are done by many of the organizations listed below.
I have recently become fascinated by the self-publishing movement and plan to write more about the ongoing developments. I have yet to read my first self-published book and plan to do so soon. For now, I leave you with some resources that may help you begin thinking about how to use self-publishing to serve your communities and create innovative ways to expand your library services.
The Self Publishers Association
52 Novels: https://www.52novels.com/
Tools and services that help you complete your book and make it available to millions of potential readers
Kindle Direct Publishing (KDP)
KDP EDU: https://kdp.amazon.com/edu
KDP Kids: https://kdp.amazon.com/kids
and many more genres…
The Book Designer: http://www.thebookdesigner.com/
Inscribe Digital: http://www.inscribedigital.com
Kobo Writing Life: https://www.kobo.com/writinglife
Ingram Spark: https://www.ingramspark.com/
Project Gutenberg Self-Publishing Press: http://self.gutenberg.org/
Indie Title Reviews
Libraries struggle with indie market collection development. It is not readily available in the usual book review sources heavily used for mainstream titles– so the librarian is left to search within blogs and other social media outlets to learn of new worthy titles for purchase. Please find a list of self-publishing collection development resources for libraries/readers below.
SelfPublishing Review: http://www.selfpublishingreview.com/
Friedman, J. (2015). Helping indie authors succeed: What inde authors need to know about the library market. Publishers Weekly, 262(39), 52.
Gross, A. (2015). Digital winners in the bay area. Publishers Weekly, 262(24), 18-20.
Landgraf, G. (October 30, 2015). Solving the self-published puzzle. American Libraries Magazine. Retrieved from http://americanlibrariesmagazine.org/2015/10/30/solving-the-self-published-puzzle/
LaRue, J. (2015). From maker to mission. Library Journal, 140(16), 41.
LaRue, J. (2014). The next wave of tech change. Library Journal, 139(16), 47.
McCartney, J. (2015). A look ahead to self-publishing in 2015. Publishers Weekly, 262(3), 36-38.
Peltier-Davis, C. A. (2015). The cybrarian’s web 2: An a-z guide to free social media tools, apps, and other resources. Medford, NJ: Information Today.
Palmer, A. (2014). What every Indie author needs to know about e-books. Publishers Weekly, 261(7), 52-54.
Quint, B. (2015). So you want to be published. Information Today, 32(2), 17.
Scardilli, B. (2015). Public libraries embrace self-publishing services. Information Today, 32(5), 1-26.
Staley, L. (2015). Leading self-publishing efforts in communities. American Libraries, 46(1/2), 18-19.
Saturday night I was passing by Soundscapes, the best music store I know in Toronto, so I went in to see what they had. I usually only buy CDs at gigs now, because I use a streaming music service, but I saw something that isn’t available streaming and I would like to have as a real object: the full eight-hour performance of Sleep by Max Richter.
It’s eight CDs plus a Blu-ray that has everything on one disc. I don’t actually have a CD player any more—it broke a long time ago, and then my DVD player broke a couple of years ago—so I needed to rip it (I use FLAC) to listen to it. I put in the first disc and was very surprised: the disc wasn’t part of Sleep!
Rhythmbox recognized it as Toggo Music 41, which is some kind of compilation CD by many different artists. The disc is printed as CD 1 of Sleep and has the Deutsche Grammophon label on it, however. Very strange! What’s going on in the DG factory?
I phoned Soundscapes, and they said I should bring it back for credit or exchange. They only have one copy of the box in at a time, though. I asked how long it would take to get a replacement in and the fella said he didn’t know, I’d have to bring it in.
I wanted to buy a physical copy so a local store could get some of my money, but now because of this bizarre printing error I’m going to have to make three visits there just to get the right version. I think it’s important to go to some extra effort to support local businesses, but one doesn’t expect Nature to introduce glitch like this into everyday life. However, one must accept it.
From Mike Conlon, VIVO Project Director
Jon Corson-Rikert retires from Cornell. Jon Corson-Rikert, the creator of VIVO, has retired from Cornell. It is hard for me to imagine a better colleague – thoughtful, considerate, creative, insightful, respectful, productive, and genuinely kind. I hope you had a chance to meet Jon, to see him present, to work with him, and to share your thoughts with him. Too often we rush through our days. You may want to stop for a moment and recall moments you may have had with Jon and what those moments mean to you.
Austin, TX The Fedora team is proud to announce that Fedora 4.5.0 was released on January 21, 2016 and is now available.
Hackfest will kicks things off on October 4th, followed by two and half days of library technology madness.
Watch this space for emerging details and our call for proposals.
It’s #ColorOurCollections week and, grown-ups, we’re looking at you! Join the adult coloring craze and put your colorful spin on these illustrations from our collection. For lovers of landscapes, puppies, flowers, creepy-crawlies, celestial maps, and transportation innovation – we’ve got you covered!
Images selected from HathiTrust, Biodiversity Heritage Library, David Rumsey Map Collection, California Historical Society via University of Southern California Libraries, University of Utah Libraries via Mountain West Digital Library, Perkins School for the Blind via Digital Commonwealth and New York Public Library.
Ever thought about joining an Islandora Release Team? If you have worked with Islandora at all, there's a role to suit your skills and we'd like your help. Join Release Manager Dan Aitken and a team of your fellow volunteers to help get Islandora 7.x-1.7 released this April. We are looking for volunteers for the following roles:Documentation:
Documentation will need to be updated for the next release. Any new components will also need to be documented. If you are interested in working on the documentation for a given component, please add your name to any component hereTesters:
All components with JIRA issues set to 'Ready for Test' will need to be tested and verifying. Additionally, testers test the overall functionality of a given component. If you are interested in being a tester for a given component, please add your name to any component here. Testers will be provided with a release candidate virtual machine to do testing on.Auditors:
Each release we audit our README and LICENSE files. Auditors will be responsible auditing a given component. If you are interested in being an auditor for a given component, please add your name to any component listed here.Component Managers:
Are responsible for the code base of their components. If you are interested in being a component manager, please add your name to any component listed here.
More information about contributor roles can be found at http://islandora.ca/resources/contributors. If you'd like to assist with the release but don't know what to do, feel free to drop us a line and we can point you in the right direction so you can help out when the time comes.
The tentative schedule for the release is:
- Code Freeze: February 18, 2016
- First Release Candidate: March 3, 2016
- Release: Mid to late April