Aggregate Documents: Making Sense of a Patchwork of Topical Documents
Michael Shilman, Software Architect, Wize.com, USA
Abstract: With the dramatic increase in quantity and diversity of online content, particularly in the form of user generated content, we now have access to unprecedented amounts of information. Whether you are researching the purchase of a new cell phone, planning a vacation, or trying to assess a political candidate, there are now countless resources at your fingertips. However, finding and making sense of all this information is laborious and it is difficult to assess high-level trends. Web sites like Wikipedia, Digg, and Del.icio.us democratize the process of organizing the information from countless document into a single source where it is somewhat easier to understand what is important and interesting.In this talk, I describe a complementary set of automated alternatives to these approaches, back them up with some working examples, and derive some basic principles for aggregating a diverse set of documents into a coherent and useful summary.
Bio: Michael Shilman is Software Architect at Wize.com, where he is responsible for the technology that aggregates and interprets millions of online product reviews to help customers find the best products to match their personal needs. Prior to Wize, Michael was a Research Scientist at Microsoft Research, where he developed and applied both Machine Learning and User Interface techniques to a variety of document understanding problems. His work was the basis for page-level Ink, diagram, and annotation, understanding in the Tablet PC, and was also incorporated in Office and Windows Live. He received his BS, MS, and PhD in Electrical Engineering and Computer Science from the University of California at Berkeley.