The goal of my research is to leverage network analysis techniques to uncover how the brain mediates sex hormone influences on gendered behavior across the lifespan. Specifically, my data science research concerns the creation and application of person-specific connectivity analyses, such as unified structural equation models, to time series data; these are intensive longitudinal data, including functional neuroimages, daily diaries, and observations. I then use these data science methods to investigate the links between androgens (e.g., testosterone) and estradiol at key developmental periods, such as puberty, and behaviors that typically show sex differences, including aspects of cognition and psychopathology.

My research spans security, privacy, and optimization of data collection particularly as applied to the Smart Grid, an augmented and enhanced paradigm for the conventional power grid. I am particularly interested in optimization approaches that take a notion of security and/or privacy into the modeling explicitly. At the intersection of the Intelligent Transportation Systems, Smart Grid, and Smart Cities, I am interested in data privacy and energy usage in smart parking lots. Protection of data and availability, especially under assault through a Denial-of-Service attacks, represents another dimension of my area of research interests. I am working on developing data privacy-aware bidding applications for the Smart Grid Demand Response systems without relying on trusted third parties. Finally, I am interested in educational and pedagogical research about teaching computer science, Smart Grid, cyber security, and data privacy.

My research focuses on developing and applying computational and data-enabled methodology in the broader area of sustainability. Main thrusts are as follows:

- Human mobility dynamics. I am interested in mining large-scale real-world travel trajectory data to understand human mobility dynamics. This involves the processing and analyzing travel trajectory data, characterizing individual mobility patterns, and evaluating environmental impacts of transportation systems/technologies (e.g., electric vehicles, ride-sharing) based on individual mobility dynamics.
- Global supply chains. Increasingly intensified international trade has created a connected global supply chain network. I am interested in understanding the structure of the global supply chain network and economic/environmental performance of nations.
- Networked infrastructure systems. Many infrastructure systems (e.g., power grid, water supply infrastructure) are networked systems. I am interested in understanding the basic structural features of these systems and how they relate to the system-level properties (e.g., stability, resilience, sustainability).

A network visualization (force-directed graph) of the 2012 US economy using the industry-by-industry Input-Output Table (15 sectors) provided by BEA. Each node represents a sector. The size of the node represents the economic output of the sector. The size and darkness of links represent the value of exchanges of goods/services between sectors. An interactive version and other data visualizations are available at http://mingxugroup.org/

The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.

At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:

*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.

*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.

*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.

*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.

*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.

Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

Pascal Van Hentenryck’s research is focused on artificial intelligence, data science, and optimization, with applications in mobility and transportation, energy systems, and computational social choice. He is currently leading the RITMO project, partly funded by MIDAS, which focuses on designing novel models of mobility, mathematical and algorithmic approaches to operate them optimally, and software architectures and data-privacy mechanisms to deploy them. The RITMO project is also in the process of deploying its technology in a number of significant case studies, with a particular focus on social equity.

Our lab’s research interests are in the areas of oncology bioinformatics, multimodality image analysis, and treatment outcome modeling. We operate at the interface of physics, biology, and engineering with the primary motivation to design and develop novel approaches to unravel cancer patientsâ€™ response to chemoradiotherapy treatment by integrating physical, biological, and imaging information into advanced mathematical models using combined top-bottom and bottom-top approaches that apply techniques of machine learning and complex systems analysis to first principles and evaluating their performance in clinical and preclinical data. These models could be then used to personalize cancer patientsâ€™ chemoradiotherapy treatment based on predicted benefit/risk and help understand the underlying biological response to disease. These research interests are divided into the following themes:

- Bioinformatics: design and develop large-scale datamining methods and software tools to identify robust biomarkers (-omics) of chemoradiotherapy treatment outcomes from clinical and preclinical data.
- Multimodality image-guided targeting and adaptive radiotherapy: design and develop hardware tools and software algorithms for multimodality image analysis and understanding, feature extraction for outcome prediction (radiomics), real-time treatment optimization and targeting.
- Radiobiology: design and develop predictive models of tumor and normal tissue response to radiotherapy. Investigate the application of these methods to develop therapeutic interventions for protection of normal tissue toxicities.

The research of Shen’s group covers the following areas that are closely related to data science and large-scale optimization problems.

– We develop new decomposition paradigms for stochastic integer programming models. We focus on two-stage stochastic integer programs, and advance decomposition paradigms based on special structures of specific risk-averse programs, and also based on special integer-programming structures.The new decomposition paradigms can be widely applied to large-scale complex system design and operations management, including optimizing critical interdependent infrastructures such as power grids, transportation systems, and cyber-clouds.

– We optimize carsharing system design/operations and real-time ridesharing problems including supply-demand matching for ride-pooling as well as service pricing.

– We apply risk-averse models and approaches to optimize integrated system design and service operations with multiple resources, multiple stages of service, and multiple stakeholders with diverse decision preferences. We in particular focus on related problems in healthcare operations management.

– We develop data-driven optimization methods that are suited to dispatching power systems with both fluctuating renewable energy sources and flexible loads contributing to balancing reserves via load control. We also study multi-stage stochastic programs over various risk and robustness measures for transmission planning with complex spatio-temporal data correlations.

– We study network interdiction models and design algorithms for specially structured networks (e.g., trees, small-world networks) in defense-related problems.

My research group is engaged in fundamental research in the following areas: Statistical learning theory: We are developing theory and algorithms for predictions problems (e.g., learning to rank and multilabel learning) with complex label spaces and where the available human supervision is often weak. Sequential prediction in a game theoretic framework: We are trying to understand the power and limitations of sequential predictions algorithms when no probabilistic assumptions are placed on the data generating mechanism. High dimensional and network data analysis: We are developing scalable algorithms with provable performance guarantees for learning from high dimensional and network data. Optimization algorithms: We are creating incremental, distributed and parallel algorithms for machine learning problems arising in today’s data rich world. Reinforcement learning: We are synthesizing concepts and techniques from artificial intelligence, control theory and operations research for pushing the frontier in sequential decision making with a focus on delivering personalized health interventions via mobile devices. My research group is pursuing and continues to actively search for challenging machine learning problems that arise across disciplines including behavioral sciences, computational biology, computational chemistry, learning sciences, and network science.

I am broadly interested in statistical inference, which is informally defined as the process of turning data into prediction and understanding. I like to work with richly structured data, such as those extracted from texts, images and other spatiotemporal signals. In recent years I have gravitated toward a field in statistics known as Bayesian nonparametrics, which provides a fertile and powerful mathematical framework for the development of many computational and statistical modeling ideas. My motivation for all this came originally from an early interest in machine learning, which continues to be a major source of research interest. A primary focus of my group’s research in machine learning to develop more effective inference algorithms using stochastic, variational and geometric viewpoints.