Assistant Professor, Electrical Engineering and Computer Science
Institute for Social Research
My research focuses on data management problems that arise from extreme diversity in large data collections. Big data is not just big in terms of bytes, but also type (e.g., a single hard disk likely contains relations, text, images, and spreadsheets) and structure (e.g., a large corpus of relational databases may have millions of unique schemas). As a result, certain long-held assumptions — e.g., that the database schema is always known before writing a query — are no longer useful guides for building data management systems. As a result, my work focuses heavily on information extraction and data mining methods that can either improve the quality of existing information or work in spite of lower-quality information.