You are here

Feed aggregator

Max Planck Digital Library: HTTPS enabled for MPG/SFX

planet code4lib - Mon, 2016-06-27 16:54

The MPG/SFX link resolver is now alternatively accessible via the https protocol. The secure base URL of the productive MPG/SFX instance is: https://sfx.mpg.de/sfx_local.

HTTPS support enables secure third-party sites to load or to embed content from MPG/SFX without causing mixed content errors. Please feel free to update your applications or your links to the MPG/SFX server.

SearchHub: Search Hub 2.0 Public Beta

planet code4lib - Mon, 2016-06-27 13:52
Introduction

For quite some time now, Lucidworks has been hosting a community site named Search Hub (aka LucidFind) that consists of a searchable archive of a number of Apache Software Foundation mailing lists, source code repositories and wiki pages, as well as related content that we’ve deemed beneficial. Previously, we’ve had three goals in building and maintaining such a site:

  1. Provide the community a focused resource for finding answers to questions on our favorite projects like Apache Solr and Lucene
  2. Dogfood our product
  3. Associate the Lucidworks brand with the projects we support

As we’ve grown and evolved, the site has done a good job on #1 and #3. However, we have fallen a bit behind on goal number two, since the site, 22 months after the launch of Lucidworks Fusion, was still running on our legacy product, Lucidworks Search. While it’s easy to fall back on the “if it ain’t broke, don’t fix it” mentality (the site has had almost no down time all these years, even while running on very basic hardware and with a very basic setup and serving decent, albeit not huge, query volume), it has always bothered me that we haven’t put more effort into porting Search Hub to run on Fusion. This post intends to remedy that situation, while also significantly expanding our set of goals and the number of projects we cover. Those goals are, including the original ones from above:

  1. Show others how it’s done by open sourcing the code base under an Apache license. (Note: you will need Lucidworks Fusion to run it.)
  2. Fully instrument the User Interface with the Snowplow Javascript Tracker to capture user interaction data.
  3. Leverage Fusion’s built in Apache Spark capabilities for offline, background enhancement of the index to improve relevance and our analytics.
  4. Deploy machine learning experiments.
  5. Build on Lucidworks View.

While we aren’t done yet, we are far enough along that I am happy to announce we are making Search Hub 2.0 available as a public beta. If you want to cut to the chase and try it out, follow the links I just provided, if you want all the gory details on how it all works, keep reading.

Rebooting Search Hub

When Jake Mannix joined Lucidworks back in January, we knew we wanted to significantly expand the machine learning and recommendation story here at Lucidworks, but we kept coming back to the fundamental problem that plagues all such approaches: where to get real data and real user feedback. Sure, we work with customers all the time on these types of problems, but that only goes so far in enabling our team to control it’s own destiny. After all, we can’t run experiments on the customer’s website (at least not in any reasonable time frame for our goals), nor can we always get the data that we want due to compliance and security reasons. As we looked around, we kept coming back to, and finally settled on, rebooting Search Hub to run on Fusion, but this time with the goals outlined above to strive for.

We also have been working with the academic IR research community on ways to share our user data, while hoping to avoid another AOL query log fiasco. It is too early too announce anything on that front just yet, but I am quite excited about what we have in store and hope we can do our part at Lucidworks to help close the “data gap” in academic research by providing the community with a significantly large corpus with real user interaction data. If you are an academic researcher interested in helping out and are up on differential privacy and other data sharing techniques, please contact me via the Lucidworks Contact Us form and mention this blog post and my name. Otherwise, stay tuned.

In the remainder of this post, I’ll cover what’s in Search Hub, highlight how it leverages key Fusion features and finish up with where we are headed next.

Basics

The Search Hub beta currently consists of:

  • 26 ASF projects (e.g. Lucene, Solr, Hadoop, Mahout) and all public Lucidworks content, including our website, knowledge base and documentation, with more content added automatically via scheduled crawls.
  • 90+ datasources (soon to be 120+) spanning email, Github, Websites and Wikis, each with a corresponding schedule defining its update rate.
  • Nine index pipelines and two query pipelines for processing incoming content and requests.
  • Five different signal capture mechanisms in the UI, including: Page View, Page Ping (heartbeat), Searches, Document clicks, Typeahead search clicks. See below for the gory details on signals.

The application consists of:

If you wish to run Search Hub, see the README, as I am not going to cover that in this blog post.

Next Generation Relevance

While other search engines are touting their recent adoption of search ranking functions (BM25) that have been around for 20+ years, Fusion is focused on bringing next generation relevance to the forefront. Don’t get me wrong, BM25 is a good core ranking algorithm and it should be the default in Lucene, but if that’s your answer to better relevance in the age of Google, Amazon and Facebook, then good luck to you. (As an aside, I once sat next to Illya Segalovich from Yandex at a SIGIR workshop where he claimed that at Yandex, BM25 only got relevance about ~52% of the way to the answer. Others in the room disputed this saying their experience was more like ~60-70%. In either case, its got a ways to go.)

If BM25 (and other core similarity approaches) only get you 70% (at best) of the way, where does the rest come from? We like to define Next Generation Relevance as being founded on three key ideas (which Internet search vendors have been deploying for many years now), which I like to call the “3 C’s”:

  1. Content — This is where BM25 comes in, as well as things like how you index your content, what fields you search, editorial rules, language analysis and update frequency. In other words, the stuff Lucene and Solr have been doing for a long time now. If you were building a house, this would be the basement and first floor.
  2. Collaboration — What signals can you capture about how users and other systems interact with your content? Clicks are the beginning, not the end of this story. Extending the house analogy, this is the second floor.
  3. Context — Who are you? Where are you? What are you doing right now? What have you done in the past? What roles do you have in the system? A user in Minnesota searching for “shovel” in December is almost always looking for something different than a user in California in July with the same query. Again, with the house analogy: this is the attic and roof.

In Search Hub, we’ve done a lot of work on the content already, but it’s the latter two we are most keen to showcase in the coming months, as they highlight how we can create a virtuous cycle between our users and our data by leveraging user feedback and machine learning to learn relevance. To achieve that goal, we’ve hooked in a number of signal capture mechanisms into our UI, all of which you can see in the code. (See snowplow.js, SnowplowService.js and their usages in places like here.)
These captured signals include:

  1. Page visits.
  2. Time on page (approximated by the page ping heartbeat in Snowplow).
  3. Queries executed, including the capture of all documents and facets displayed.
  4. What documents were clicked on, including unique query id, doc id, position in the SERP, facets chosen, and score.
  5. Typeahead click information, including what characters were typed, the suggestions offered and which suggestion
    was chosen.

With each of these signals, Snowplow sends a myriad of information, including things like User IDs, Session IDs, browser details and timing data. All of these signals are captured in Fusion. Over the coming weeks and months, as we gather enough signal data, we will be rolling out a number of new features highlighting how to use this data for better relevance, as well as other capabilities like recommendations.

Getting Started with Spark on Search Hub

The core of Fusion consists of two key open source technologies: Apache Solr and Apache Spark. If you know Lucidworks, then you already likely know Solr. Spark, however, is something that we’ve added to our stack in Fusion 2.0 and it opens up a host of possibilities that were previously something our customers had to do outside of Fusion, in what was almost always a significantly more complex application. At it’s core, Spark is a scalable, distributed compute engine. It ships with machine learning and graph analytics libraries out of the box. We’ve been using Spark for a number of releases now to do background, large scale processing of things like logs and system metrics. As of Fusion 2.3, we have been exposing Spark (and Spark-shell) to our users. This means that Fusion users can now write and submit their own Spark jobs as well as explore our Spark Solr integration on the command line simply by typing $FUSION_HOME/bin/spark-shell. This includes the ability to take advantage of all Lucene analyzers in Spark, which Steve Rowe covered in this blog post.

To highlight these fresh new capabilities, we’ve put together examples of doing tokenization, clustering using the venerable k-Means algorithm, word2vec and Random Forest-based classification (or as we like to call it, 20 newsgroups on steroids.)

All of these demos are showcased in the SparkShellHelpers.scala file. As the name implies, this file contains commands that can be cut and pasted into the Fusion spark shell (bin/spark-shell). I’m going to save the details of running this to a future post, as there are some very interesting data engineering discussions that fall out of working with this data set in this manner.

Contributing

Our long term intent as we move out of beta is to support all Apache projects. Currently, the project specifications are located in the project_config folder. If you would like your project supported, please issue a Pull Request and we will take a look and try to schedule it. If you would like to see some other feature supported, we are open to suggestions. Please open an issue or a pull request and we will consider it.

If you’re project is already supported and you would like to add support for it similar to what is on Lucene’s home page, add a search box that submits to http://searchhub.lucidworks.com/?p:PROJECT_NAME, passing in
your project name (not label) for PROJECT_NAME, as specified in the project_config. For example, for Hadoop, it would be http://searchhub.lucidworks.com/?p:hadoop.

Next Steps

In the coming months, we will be rolling out:

  1. Word2Vec for query and index time synonym expansion. See the Github Issue for the details.
  2. Classification of content to indicate what mailing list we think the message belongs to, as opposed to what mailing list it was actually sent to. Think of it as a “Did you mean to send this to this list?” classifier.
  3. User registration and personalized recommendations, with alerting. For a preview, check out our webinar on June 30th.
  4. Content and collaborative filtering recommendation.
  5. Community analytics, powered by Spark. Find out who in the community you should be listening to for answers!
  6. User Interface improvements.

If you would like to participate in any of these, we welcome Pull Requests on the Github project, otherwise please reach out to us.

Resources

The post Search Hub 2.0 Public Beta appeared first on Lucidworks.com.

Islandora: Islandora CLAW is moving to Drupal 8

planet code4lib - Mon, 2016-06-27 13:35

The initial phases of Islandora CLAW development worked with Drupal 7 as a front-end, but Islandora CLAW has been architected with a pivot to Drupal 8 in mind from its very inception. Drupal 8 has been officially released and development has begun on Drupal 9. Drupal policy will see Drupal 7 become unsupported when Drupal 9 is released, putting it in the same end-of-life territory as Fedora 3. As of this month, Islandora CLAW development has pivoted fully to Drupal 8, ensuring that when the Islandora Community is ready to make the move, there will be a version of Islandora that functions with the latest and best-supported versions of both our front-end and repository layers by pairing Drupal 8 with Fedora 4. This pivot was approved by the Islandora Roadmap Committee, based on a Drupal 8 Prospectus put forth by the CLAW development team.

 

DuraSpace News: Comparing DuraSpace Repository and Cloud Services Options

planet code4lib - Mon, 2016-06-27 00:00

Austin, TX  Are you in the process of researching different types of repository platforms? Or have you lately been trying to understand hosted cloud service options? DuraSpace just made it easier to compare apples to apples and oranges to oranges when it comes to sorting out which DuraSpace-supported repository or cloud service is right for you with two comparison tables. These tables are designed to help you match your use case to a repository or hosted cloud service that meets your needs.

Open Library: Towards better EPUBs at Open Library and the Internet Archive

planet code4lib - Thu, 2016-06-23 21:28

You may have read about our recent downtime. We thought it might be a good opportunity to let you know about some of the other behind the scenes things going on here. We continue to answer email, keep the FAQ updated and improve our metadata. Many of you have written about the quality of some of our EPUBs. As you may know, all of our OCR (optical character recognition) is done automatically without manual corrections and while it’s pretty good, it could be better. Specifically we had a pernicious bug where some books’ formatting led to the first page of chapters not being part of some books’ OCRed EPUB. I personally had this happen to me with a series of books I was reading on Open Library and I know it’s beyond frustrating.

To address this and other scanning quality issues, we’re changing the way EPUBs work. We’ve improved our OCR algorithm and we’re shifting from stored EPUB files to on-the-fly generation. This means that further developments and improvements in our OCR capabilities will be available immediately. This is good news and has the side benefit of radically decreasing our EPUB storage needs. It also means that we have to

  • remove all of our old EPUBs (approximately eight million items for EPUBs generated by the Archive)
  • put the new on-the-fly EPUB generation in place (now active)
  • do some testing to make sure it’s working as expected (in process)

We hope that this addresses some of the EPUB errors people have been finding. Please continue to give us feedback on how this is working for you. Coming soon: improvements to Open Library’s search features!

Jonathan Rochkind: How to see if current version of a gem is greater than X

planet code4lib - Thu, 2016-06-23 19:56

I sometimes need to this, and always forget how. I want to see the currently loaded version of a current gem, and see if it’s greater than a certain version X.

Mainly because I’ve monkey-patched that gem, and want to either automatically stop monkey patching it if a future version is installed, or more likely output a warning message “Hey, you probably don’t need to monkey patch this anymore.”

I usually forget the right rubygems API, so I’m leaving this partially as a note to myself.

Here’s how you do it.

# If some_gem_name is at 2.0 or higher, warn that this patch may # not be needed. Here's a URL to the PR we're back-porting: <URL> if Gem.loaded_specs["some_gem_name"].version >= Gem::Version.new('2.0') msg = " Please check and make sure this patch is still needed\ at #{__FILE__}:#{__LINE__}\n\n" $stderr.puts msg Rails.logger.warn msg end

Whenever I do this, I always include the URL to the github PR that implements the fix we’re monkey-patch back-porting, in a comment right by here.

The `$stderr.puts` is there to make sure the warning shows up in the console when running tests.

Unfortunately:

Gem::Version.new("1.4.0.rc1") >= Gem::Version.new("1.4") # => false

I really want the warning to trigger if I’m using a pre-release too. Hmm.

Aha! Perusing the docs, this seems like it’ll work:

if Gem.loaded_specs["some_gem_name"].version.release >= Gem::Version.new('2.0')

`Gem::Version#release` trims off the prerelease tags.


Filed under: General

DPLA: Historypin wins Knight News Challenge award for “Our Story” project in partnership with DPLA

planet code4lib - Thu, 2016-06-23 13:00
Historypin wins Knight News Challenge award to gather, preserve, and measure the impact of public library-led history, storytelling, and local cultural heritage in rural US communities in partnership with Digital Public Library of America

BOSTON & SAN FRANCISCO —Historypin announced today that they have been awarded $222,000 from the John S. and James L. Knight Foundation as part of its Knight News Challenge on Libraries, an open call for ideas to help libraries serve 21st century information needs. Selected from more than 615 submissions, Historypin’s “Our Story” project, a partnership with the Digital Public Library of America (DPLA), will collaborate with more than a dozen rural libraries in New Mexico, North Carolina and Louisiana to host lively events to gather and preserve community memory, and to measure the impact of these events on local communities.

“Local historical collections are some of the most viewed content in DPLA, and express the deep interest in our shared community history,” according to Emily Gore, Director for Content at DPLA. “Making cultural heritage collections from rural communities accessible to the world is extremely important to us, and this project will help us further share this rich history and the diverse stories to be found.”

“This award gives us the ability to work with small libraries to provide a toolkit–a physical box with posters, materials and guidance–to make it easy for librarians and volunteers to engage their community in memory sharing events,” said Jon Voss, Strategic Partnerships Director for Historypin. “We know through research that getting people across generations and cultures to sit together and share experiences strengthens communities, and this project will help local libraries to better measure their social impact.”

Led by national partners Historypin and DPLA, together with state and local library networks, Our Story aims to expand the national network and projects of thousands of cultural heritage collaborations that both DPLA and Historypin have established and increase the capabilities of small, rural libraries. Participating libraries in Our Story will be supplied with kits and training to guide them through a number of steps, including recruiting staff and volunteers for the project, planning for digitization and preservation, running community events and collecting stories, and measuring engagement and impact, among other important steps. The library kits and training will be based on four key areas — training, programming, preservation, and evaluation — and will pull in methodology and curriculum developed by both DPLA and Historypin in their work with cultural heritage partners throughout the US and around the world.

“The project will help promote civic engagement, while providing libraries with meaningful data, so they can better understand their impact on communities and meet new information needs,” said John Bracken Knight Foundation vice president for media innovation.

The Knight News Challenge, an open call for ideas launched in late February 2014, asked applicants to answer the question, “How might libraries serve 21st century information needs?” Our Story aims to advance the library field in three key areas: measuring the social impact of public libraries, strengthening a national network of digital preservation and content discovery, and demonstrating the potential of open library data. The outputs of the project will be published and openly licensed for reuse in other rural libraries worldwide.

To learn more about DPLA and Historypin’s Our Story project, visit our News Challenge application page: https://www.newschallenge.org/challenge/how-might-libraries-serve-21st-century-information-needs/refinement/our-story-content-collections-and-impact-in-rural-america


About DPLA

The Digital Public Library of America (https://dp.la) strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 13 million items from 2,000 institutions. The DPLA is a registered 501(c)(3) non-profit.

About Historypin

Historypin.org is a global non-profit project that builds community through local history. Over 3,000 cultural heritage organizations and 75,000 individuals have used the site to discover, share and enrich community memory since 2010.


About Knight Foundation

Knight Foundation supports transformational ideas that promote quality journalism, advance media innovation, engage communities and foster the arts. The foundation believes that democracy thrives when people and communities are informed and engaged. For more, visit KnightFoundation.org.

DuraSpace News: Are You Interested in Using Fedora, Hydra or Islandora?

planet code4lib - Thu, 2016-06-23 00:00

Oxford, England  Neil Jeffries, Oxford University, and Tom Cramer, Stanford University, from the Fedora team will hold an informal gathering joined by other Fedora community members, at The King's Arms Pub (just next to Wadham College and the Weston Library in Oxford, England) on July 5 from 5-7 PM, prior to the Jisc and CNI Conference welcome reception.

DuraSpace News: AVAILABLE: New Edition of the Digital Preservation Handbook

planet code4lib - Thu, 2016-06-23 00:00

From Neil Beagrie, Charles Beagrie Limited, Digital Preservation Coalition

Glasgow, Scotland  A new edition of the Digital Preservation Handbook was officially launched at the Guildhall in York yesterday, comprehensively updating the original version first published in 2001: http://handbook.dpconline.org/

OCLC Dev Network: Upcoming Changes to WMS Acquisitions API

planet code4lib - Wed, 2016-06-22 21:00

The WMS Acquisitions API is undergoing backwards incompatible changes in the upcoming July install tentatively scheduled for 7/24/2016.

Brown University Library Digital Technologies Projects: ORCID: Unique IDs for Brown Researchers

planet code4lib - Wed, 2016-06-22 17:42

The Library is coordinating an effort to introduce ORCID identifiers to the campus. ORCID (orcid.org) is an open, non-profit initiative founded by academic institutions, professional bodies, funding agencies, and publishers to resolve authorship confusion in scholarly work. The ORCID repository of unique scholar identification numbers aims to reliably identify and link scholars in all disciplines with their work, analogous to the way ISBN and DOI identify books and articles.

Brown is an institutional member of ORCID, which allows the University to create ORCID records on behalf of faculty and to integrate ORCID identifiers into the Brown Identity Management System, Researchers@Brown profiles, grant application processes, and other systems that facilitate identification of faculty and their works.

Please go to https://library.brown.edu/orcid to obtain an ORCID identifier OR, if you already have an ORCID, to link it to your Brown identity.

Please contact researchers@brown.edu if you have questions or feedback.

Brown University Library Digital Technologies Projects: ORCID: Unique IDs for Brown Researchers

planet code4lib - Wed, 2016-06-22 17:42

The Library is coordinating an effort to introduce ORCID identifiers to the campus. ORCID (orcid.org) is an open, non-profit initiative founded by academic institutions, professional bodies, funding agencies, and publishers to resolve authorship confusion in scholarly work. The ORCID repository of unique scholar identification numbers aims to reliably identify and link scholars in all disciplines with their work, analogous to the way ISBN and DOI identify books and articles.

Brown is an institutional member of ORCID, which allows the University to create ORCID records on behalf of faculty and to integrate ORCID identifiers into the Brown Identity Management System, Researchers@Brown profiles, grant application processes, and other systems that facilitate identification of faculty and their works.

Please go to https://library.brown.edu/orcid to obtain an ORCID identifier OR, if you already have an ORCID, to link it to your Brown identity.

Please contact researchers@brown.edu if you have questions or feedback.

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Library of Congress: The Signal: Library of Congress Advisory Team Kicks off New Digitization Effort at Eckerd College

planet code4lib - Wed, 2016-06-22 15:56

Participants in the Eckerd Digitization Advisory meeting include (l-r) Nancy Schuler, Lisa Johnston, Alexis Ramsey-Tobienne, Alyssa Koclanes, Mary Molinaro (Digital Preservation Network) George Coulbourne (Library of Congress), David Gliem, Arthur Skinner, Justine Sanford, Emily Ayers-Rideout, Nicole Finzer (Northwestern University), Kristen Regina (Philadelphia Museum of Art), Anna Ruth, and Brittney Sherley.

This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and asistant professor of Electronic Resources, Collection Development and Instructional Services.

On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the digitization of the College’s art collection. George Coulbourne of the Library of Congress assembled a team of advisers that included DPOE trainers and NDSR program providers from the Library of Congress, Northwestern University, the Digital Preservation Network, the Philadelphia Museum of Art and Yale University.

Advisers provided guidance on project elements including institutional repositories, collection design, metadata and cataloging standards, funding and partnership opportunities and digitization strategies. Suggestions will be used to design a digitization and preservation strategy that could be used as a model for small academic institutions.

Eckerd College is an innovative undergraduate liberal arts institution known for its small classes and values-oriented curriculum that stresses personal and social responsibility, cross-cultural understanding and respect for diversity in a global society. A charter member of Loren Pope’s 40 Colleges That Change Lives, Eckerd has a unique approach to mentoring that reflects its commitment to students. As a tuition-dependent institution of 1,770 students, Eckerd is seeking ways to design the project to be cost-effective, while also ensuring long-term sustainability.

The initial goal of the project is to digitize the College’s large collection of more than 3000 prints, paintings, drawings and sculptures made by the founding faculty in the visual arts: Robert O. Hodgell (1922-2000), Jim Crane (1927-2015) and Margaret (Pegg) Rigg (1928-2011). Along with Crane (cartoonist, painter and collage artist) and Rigg (art editor of motive (always spelled with a lowercase “m”) magazine, as well as graphic designer, assemblage artist and calligrapher), Hodgell (printmaker, painter, sculptor, and illustrator) contributed regularly to motive, a progressive monthly magazine published by the Methodist Student Movement.

In print from 1941 to 1972, motive was hailed for its vanguard editorial and artistic vision and for its aggressive stance on civil rights, Vietnam, and gender issues. In 1965 the publication was runner-up to Life for Magazine of the Year and in 1966, Time magazine quipped that among church publications it stood out “like a miniskirt at a church social.” An entire generation of activists was shaped by its vision with Hodgell, Crane and Rigg playing an important role in forming and communicating that vision.

Eckerd’s unique position as a liberal arts college influenced by the tenants of the Presbyterian Church made it possible for these artists to converge and produce art that reflected society and promoted the emergence of activism that shaped the identity of the Methodist church at the time. Preserving these materials and making them available for broader scholarship will provide significant insight into the factors surrounding the development of the Methodist Church as it is today. Implementing the infrastructure to preserve, digitize and house the collection provides additional opportunities to add other College collections to the repository in the future.

The gathering also brought together relevant departments within Eckerd College, including representatives from the Library, Visual Arts and Rhetoric faculty, Information Technology Services, Marketing & Communications, Advancement and the Dean of Faculty. Having these key players in the room provided an opportunity to involve the broader campus community so efforts can begin to ensure the long-term sustainability of the project, while also highlighting key challenges unique to the College as seen by the external board of advisors.

Eckerd will now move forward with grant applications for the project, with hopes to integrate DPOE’s Train-the-trainer and an NDSR program to jump start and sustain the project through implementation. Potential partnerships and training opportunities with area institutions and local groups will be explored, as well as teaching opportunities to educate students about the importance of digital stewardship.

Library of Congress: The Signal: Library of Congress Advisory Team Kicks off New Digitization Effort at Eckerd College

planet code4lib - Wed, 2016-06-22 15:56

Participants in the Eckerd Digitization Advisory meeting include (l-r) Nancy Schuler, Lisa Johnston, Alexis Ramsey-Tobienne, Alyssa Koclanes, Mary Molinaro (Digital Preservation Network) George Coulbourne (Library of Congress), David Gliem, Arthur Skinner, Justine Sanford, Emily Ayers-Rideout, Nicole Finzer (Northwestern University), Kristen Regina (Philadelphia Museum of Art), Anna Ruth, and Brittney Sherley.

This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and asistant professor of Electronic Resources, Collection Development and Instructional Services.

On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the digitization of the College’s art collection. George Coulbourne of the Library of Congress assembled a team of advisers that included DPOE trainers and NDSR program providers from the Library of Congress, Northwestern University, the Digital Preservation Network, the Philadelphia Museum of Art and Yale University.

Advisers provided guidance on project elements including institutional repositories, collection design, metadata and cataloging standards, funding and partnership opportunities and digitization strategies. Suggestions will be used to design a digitization and preservation strategy that could be used as a model for small academic institutions.

Eckerd College is an innovative undergraduate liberal arts institution known for its small classes and values-oriented curriculum that stresses personal and social responsibility, cross-cultural understanding and respect for diversity in a global society. A charter member of Loren Pope’s 40 Colleges That Change Lives, Eckerd has a unique approach to mentoring that reflects its commitment to students. As a tuition-dependent institution of 1,770 students, Eckerd is seeking ways to design the project to be cost-effective, while also ensuring long-term sustainability.

The initial goal of the project is to digitize the College’s large collection of more than 3000 prints, paintings, drawings and sculptures made by the founding faculty in the visual arts: Robert O. Hodgell (1922-2000), Jim Crane (1927-2015) and Margaret (Pegg) Rigg (1928-2011). Along with Crane (cartoonist, painter and collage artist) and Rigg (art editor of motive (always spelled with a lowercase “m”) magazine, as well as graphic designer, assemblage artist and calligrapher), Hodgell (printmaker, painter, sculptor, and illustrator) contributed regularly to motive, a progressive monthly magazine published by the Methodist Student Movement.

In print from 1941 to 1972, motive was hailed for its vanguard editorial and artistic vision and for its aggressive stance on civil rights, Vietnam, and gender issues. In 1965 the publication was runner-up to Life for Magazine of the Year and in 1966, Time magazine quipped that among church publications it stood out “like a miniskirt at a church social.” An entire generation of activists was shaped by its vision with Hodgell, Crane and Rigg playing an important role in forming and communicating that vision.

Eckerd’s unique position as a liberal arts college influenced by the tenants of the Presbyterian Church made it possible for these artists to converge and produce art that reflected society and promoted the emergence of activism that shaped the identity of the Methodist church at the time. Preserving these materials and making them available for broader scholarship will provide significant insight into the factors surrounding the development of the Methodist Church as it is today. Implementing the infrastructure to preserve, digitize and house the collection provides additional opportunities to add other College collections to the repository in the future.

The gathering also brought together relevant departments within Eckerd College, including representatives from the Library, Visual Arts and Rhetoric faculty, Information Technology Services, Marketing & Communications, Advancement and the Dean of Faculty. Having these key players in the room provided an opportunity to involve the broader campus community so efforts can begin to ensure the long-term sustainability of the project, while also highlighting key challenges unique to the College as seen by the external board of advisors.

Eckerd will now move forward with grant applications for the project, with hopes to integrate DPOE’s Train-the-trainer and an NDSR program to jump start and sustain the project through implementation. Potential partnerships and training opportunities with area institutions and local groups will be explored, as well as teaching opportunities to educate students about the importance of digital stewardship.

Roy Tennant: The Rise of Bad Infographics

planet code4lib - Wed, 2016-06-22 14:38

Given the ubiquity of infographics on the web today (according to one account they have increased 1200% in three years), you can be forgiven for thinking that they are a new phenomenon. They aren’t. Infographics have actually been around for quite some time, as Edward Tufte pointed out with his popularization of one of the best infographics of all time (see pic and link): Charles Joseph Minard’s portrayal of the losses suffered by Napoleon’s army in the Russian campaign of 1812.

Go ahead and take a look. Study it. I’ll be here when you get back. 

Do you see what he did? He took raw data and made it communicate visually. Let me re-iterate this, as this lesson is too often lost in present day “infographics”. You receive information immediately, without reading it. The minute you understand that the width of the line equals the relative number of troops, you are stunned. The depth of the tragedy has been communicated — if not fully, at least by impression.

The Menard infographic also combines several different planes of information, from troop strength, to temperature, to distance. It is, frankly, brilliant. I’m not suggesting that every library infographic needs to be brilliant, but nearly all of them can be smarter than they are. Either that, or give up the attempt. Seriously.

It’s sad, but many contemporary infographics are hardly anything more than numbers and clip art — often with only a tenuous connection between them. We really must do better.

Minard’s early infographic ably demonstrates the best qualities of an infographic presentation:

  • Information is conveyed at a glance. If you must read a lot of text to get the drift of the message, then you are failing.
  • The whole is greater than the sum of its parts. Menard deftly uses all of the dimensions of a piece of paper to convey distance, temperature, and troop strength all in one graphic. The combination puts across a message that any single element could not.
  • There are layers of information that are well integrated in the whole. An initial impression can be conveyed, but your graphic should also reveal more information under scrutiny.

Unfortunately, library infographics rarely, if ever, even loosely achieve these aims. Humor me, and do a Google Images search on “library infographics” and see what you get. Mostly they are simply numbers that are “illustrated” by some icon or image. They really aren’t infographics of the variety that Tufte champions. They are, unfortunately, mostly pale shadows of what is possible.

So let’s review some of the signs of a bad infographic:

  • Numbers are the most prominent thing you see. If you look at an infographic and it’s only numbers that leap out at you, stop wasting your time. Move on.
  • The numbers are not related at all. Many library infographics combine numbers that have no relation to each other. Who wants to puzzle out the significance of the number “30” next to the number “300,000”? Not me, nor anyone else.
  • The images are only loosely connected to the numbers. Stop putting an icon of a book next to the number of book checkouts. Just stop.

In the end, it’s clear that libraries really need professional help. Don’t think that you can simply take numbers, add an icon, and create a meaningful infographic. You can’t. It’s stupid. Just stop. If we can’t do this right, then we shouldn’t be doing it at all.

Roy Tennant: The Rise of Bad Infographics

planet code4lib - Wed, 2016-06-22 14:38

Given the ubiquity of infographics on the web today (according to one account they have increased 1200% in three years), you can be forgiven for thinking that they are a new phenomenon. They aren’t. Infographics have actually been around for quite some time, as Edward Tufte pointed out with his popularization of one of the best infographics of all time (see pic and link): Charles Joseph Minard’s portrayal of the losses suffered by Napoleon’s army in the Russian campaign of 1812.

Go ahead and take a look. Study it. I’ll be here when you get back. 

Do you see what he did? He took raw data and made it communicate visually. Let me re-iterate this, as this lesson is too often lost in present day “infographics”. You receive information immediately, without reading it. The minute you understand that the width of the line equals the relative number of troops, you are stunned. The depth of the tragedy has been communicated — if not fully, at least by impression.

The Menard infographic also combines several different planes of information, from troop strength, to temperature, to distance. It is, frankly, brilliant. I’m not suggesting that every library infographic needs to be brilliant, but nearly all of them can be smarter than they are. Either that, or give up the attempt. Seriously.

It’s sad, but many contemporary infographics are hardly anything more than numbers and clip art — often with only a tenuous connection between them. We really must do better.

Minard’s early infographic ably demonstrates the best qualities of an infographic presentation:

  • Information is conveyed at a glance. If you must read a lot of text to get the drift of the message, then you are failing.
  • The whole is greater than the sum of its parts. Menard deftly uses all of the dimensions of a piece of paper to convey distance, temperature, and troop strength all in one graphic. The combination puts across a message that any single element could not.
  • There are layers of information that are well integrated in the whole. An initial impression can be conveyed, but your graphic should also reveal more information under scrutiny.

Unfortunately, library infographics rarely, if ever, even loosely achieve these aims. Humor me, and do a Google Images search on “library infographics” and see what you get. Mostly they are simply numbers that are “illustrated” by some icon or image. They really aren’t infographics of the variety that Tufte champions. They are, unfortunately, mostly pale shadows of what is possible.

So let’s review some of the signs of a bad infographic:

  • Numbers are the most prominent thing you see. If you look at an infographic and it’s only numbers that leap out at you, stop wasting your time. Move on.
  • The numbers are not related at all. Many library infographics combine numbers that have no relation to each other. Who wants to puzzle out the significance of the number “30” next to the number “300,000”? Not me, nor anyone else.
  • The images are only loosely connected to the numbers. Stop putting an icon of a book next to the number of book checkouts. Just stop.

In the end, it’s clear that libraries really need professional help. Don’t think that you can simply take numbers, add an icon, and create a meaningful infographic. You can’t. It’s stupid. Just stop. If we can’t do this right, then we shouldn’t be doing it at all.

Roy Tennant: The Rise of Bad Infographics

planet code4lib - Wed, 2016-06-22 14:38

Given the ubiquity of infographics on the web today (according to one account they have increased 1200% in three years), you can be forgiven for thinking that they are a new phenomenon. They aren’t. Infographics have actually been around for quite some time, as Edward Tufte pointed out with his popularization of one of the best infographics of all time (see pic and link): Charles Joseph Minard’s portrayal of the losses suffered by Napoleon’s army in the Russian campaign of 1812.

Go ahead and take a look. Study it. I’ll be here when you get back. 

Do you see what he did? He took raw data and made it communicate visually. Let me re-iterate this, as this lesson is too often lost in present day “infographics”. You receive information immediately, without reading it. The minute you understand that the width of the line equals the relative number of troops, you are stunned. The depth of the tragedy has been communicated — if not fully, at least by impression.

The Menard infographic also combines several different planes of information, from troop strength, to temperature, to distance. It is, frankly, brilliant. I’m not suggesting that every library infographic needs to be brilliant, but nearly all of them can be smarter than they are. Either that, or give up the attempt. Seriously.

It’s sad, but many contemporary infographics are hardly anything more than numbers and clip art — often with only a tenuous connection between them. We really must do better.

Minard’s early infographic ably demonstrates the best qualities of an infographic presentation:

  • Information is conveyed at a glance. If you must read a lot of text to get the drift of the message, then you are failing.
  • The whole is greater than the sum of its parts. Menard deftly uses all of the dimensions of a piece of paper to convey distance, temperature, and troop strength all in one graphic. The combination puts across a message that any single element could not.
  • There are layers of information that are well integrated in the whole. An initial impression can be conveyed, but your graphic should also reveal more information under scrutiny.

Unfortunately, library infographics rarely, if ever, even loosely achieve these aims. Humor me, and do a Google Images search on “library infographics” and see what you get. Mostly they are simply numbers that are “illustrated” by some icon or image. They really aren’t infographics of the variety that Tufte champions. They are, unfortunately, mostly pale shadows of what is possible.

So let’s review some of the signs of a bad infographic:

  • Numbers are the most prominent thing you see. If you look at an infographic and it’s only numbers that leap out at you, stop wasting your time. Move on.
  • The numbers are not related at all. Many library infographics combine numbers that have no relation to each other. Who wants to puzzle out the significance of the number “30” next to the number “300,000”? Not me, nor anyone else.
  • The images are only loosely connected to the numbers. Stop putting an icon of a book next to the number of book checkouts. Just stop.

In the end, it’s clear that libraries really need professional help. Don’t think that you can simply take numbers, add an icon, and create a meaningful infographic. You can’t. It’s stupid. Just stop. If we can’t do this right, then we shouldn’t be doing it at all.

Pages

Subscribe to code4lib aggregator