My research focuses on data management problems that arise from extreme diversity in large data collections. Big data is not just big in terms of bytes, but also type (e.g., a single hard disk likely contains relations, text, images, and spreadsheets) and structure (e.g., a large corpus of relational databases may have millions of unique schemas). As a result, certain long-held assumptions — e.g., that the database schema is always known before writing a query — are no longer useful guides for building data management systems. As a result, my work focuses heavily on information extraction and data mining methods that can either improve the quality of existing information or work in spite of lower-quality information.
Dr. Abney has pursued research in natural language understanding and natural language learning, including information extraction, biomedical text processing, integrating text analysis into web search, robust and rapid partial parsing, stochastic grammars, spoken-language information systems, extraction of linguistic information from scanned page images, dependency-grammar induction for low-resource languages, and semisupervised learning.