- This event has passed.
Data Intensive Social Science: Michael Cafarella
October 12, 2015 @ 12:00 pm - 1:00 pm
Abstract: The social sciences have been historically been early adopters of statistical methods, but have long endured expensive survey-driven data collection methods that yield very small datasets. By applying novel methods from data management and data mining to online content, we can gather orders of magnitude more data than standard social science approaches, thereby enabling new applications and insights. This talk will cover both social science and computer science findings, and will describe two projects. The first uses Twitter-derived signals to accurately quantify a range of time-varying social phenomena, such as unemployment, moviegoing, gun purchases, and others. Our Raccoon system implements a form of assisted user-driven feature selection in order to rapidly identify a handful of signals from more than 150M candidates. The weekly data updates are regularly downloaded by various Wall Street banks as well as governmental institutions. The second system uses Web-extracted information to identify possible human trafficking victims. We use the DeepDive extraction system to obtain high quality relational data from more than 26M raw text posts online. Our human trafficking data has been deployed to several law enforcement organizations, including the NYDA, and has been used in actual arrests. Bio: Michael Cafarella’s research interests include databases, information extraction, data integration, and data mining. He is particularly interested in applying data mining techniques to Web data and scientific applications.