TALK HAS BEEN CANCELLED

  • Daeil Kim, The New York Times, daeil.kim@nytimes.com

Refinery is an open source web platform for the analysis of large unstructured document collections. It extracts meaningful semantic themes within documents also known as “topics” which can be thought of as word clouds composed of terms that highly co-occur with one another. Once this semantic index is formed, one can extract relevant documents related to these topics and further refine their contents through a summarization process that allows users to search for phrases that are relevant to them within the corpus. The goal of Refinery is to make this whole process easier and to provide some of the latest scalable versions of these learning algorithms in an intuitive web-based interface. Refinery is also meant to be run locally, thus bypassing the need for securing document collections over the internet. The talk will go through some of the technologies involved and a demo of the app.

For more info check out http://www.docrefinery.org.