Lu’s research is focused on natural language processing, computational social science, and machine learning. More specifically, Lu works on algorithms for text summarization, language generation, argument mining, information extraction, and discourse analysis, as well as novel applications that apply such techniques to understand media bias and polarization and other interdisciplinary subjects.
My research tackles how human values can be incorporated into machine learning and other computational systems. This includes work on the translation process from human values to computational definitions and work on how to understand and encourage fairness while preventing discrimination in machine learning and data science. My research combines tools from the theory of machine learning with insights from economics, science and technology studies, and philosophy, among others, to improve our theories of the translation process and the algorithms we create. In settings like classification, social networks, and data markets, I explore the ways in which human values play a primary role in the quality of machine learning and data science.
My research focuses on methods, applications, and ethics of Computational Modeling in Human-Computer Interaction (HCI). Understanding and modeling human behavior supports innovative information technology that will change how we study and design interactive user experiences. I envision modeling the human accurately across domains as a theoretical foundation for work in HCI in which computational models provide a foundation to study, describe, and understand complex human behaviors and support optimization and evaluation of user interfaces. I create technology that automatically reasons about and acts in response to people’s behavior to help them be productive, healthy, and safe.
For human-machine systems, I first collect data from human users, whether it’s an individual, a team, or even a society. Different kinds of methods can be used, including self-report, interview, focus groups, physiological and behavioral data, as well as user-generated data from the Internet.
Based on the data collected, I attempt to understand human contexts, including different aspects of the human users, such as emotion, cognition, needs, preferences, locations and activities. Such understanding can then be applied to different human-machine systems, including healthcare systems, automated driving systems, and product-service systems.
Based on the different design theory and methodology, from the perspective of the machine dimension, I apply knowledge of computing and communication as well as practical and theoretical knowledge of social and behavior to design various systems for human users. From the human dimension, I seek to understand human needs and decision making processes, and then build mathematical models and design tools that facilitate integration of subjective experiences, social contexts, and engineering principles into the design process of human-machine systems.
Dr. Abney has pursued research in natural language understanding and natural language learning, including information extraction, biomedical text processing, integrating text analysis into web search, robust and rapid partial parsing, stochastic grammars, spoken-language information systems, extraction of linguistic information from scanned page images, dependency-grammar induction for low-resource languages, and semisupervised learning.
The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.
At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:
*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.
*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.
*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.
*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.
*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.
Dr. Lee’s research interests lie in machine learning and its applications to artificial intelligence. In particular, he focuses on deep learning and representation learning, which aims to learn an abstract representation of the data by a hierarchical and compositional structure. His research also spans over related topics, such as graphical models, optimization, and large-scale learning. Specific application areas include computer vision, audio recognition, robotics, text modeling, and healthcare.
I primarily work on developing scalable parallel algorithms to solve large scientific problems. This has been done with teams from several different disciplines and application areas. I’m most concerned with algorithms emphasizing in-memory approaches. Another area of research has developed serial algorithms for nonparametric regression. This is a flexible form of regression that only assumes a general shape, such as upward, rather than a parametric form such as linear. It can be applied to a range of learning and classification problems, such as taxonomy trees. I also work some in adaptive learning, designing efficient sampling procedures.