Dr. Abney has pursued research in natural language understanding and natural language learning, including information extraction, biomedical text processing, integrating text analysis into web search, robust and rapid partial parsing, stochastic grammars, spoken-language information systems, extraction of linguistic information from scanned page images, dependency-grammar induction for low-resource languages, and semisupervised learning.
The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.
At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:
*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.
*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.
*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.
*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.
*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.
Dr. Lee’s research interests lie in machine learning and its applications to artificial intelligence. In particular, he focuses on deep learning and representation learning, which aims to learn an abstract representation of the data by a hierarchical and compositional structure. His research also spans over related topics, such as graphical models, optimization, and large-scale learning. Specific application areas include computer vision, audio recognition, robotics, text modeling, and healthcare.
I primarily work on developing scalable parallel algorithms to solve large scientific problems. This has been done with teams from several different disciplines and application areas. I’m most concerned with algorithms emphasizing in-memory approaches. Another area of research has developed serial algorithms for nonparametric regression. This is a flexible form of regression that only assumes a general shape, such as upward, rather than a parametric form such as linear. It can be applied to a range of learning and classification problems, such as taxonomy trees. I also work some in adaptive learning, designing efficient sampling procedures.