Explore ARCExplore ARC

Ho-Joon Lee

By |

Dr. Lee’s research in data science concerns biological questions in systems biology and network medicine by developing algorithms and models through a combination of statistical/machine learning, information theory, and network theory applied to multi-dimensional large-scale data. His projects have covered genomics, transcriptomics, proteomics, and metabolomics from yeast to mouse to human for integrative analysis of regulatory networks on multiple molecular levels, which also incorporates large-scale public databases such as GO for functional annotation, PDB for molecular structures, and PubChem and LINCS for drugs or small compounds. He previously carried out proteomics and metabolomics along with a computational derivation of dynamic protein complexes for IL-3 activation and cell cycle in murine pro-B cells (Lee et al., Cell Reports 2017), for which he developed integrative analytical tools using diverse approaches from machine learning and network theory. His ongoing interests in methodology include machine/deep learning and topological Kolmogorov-Sinai entropy-based network theory, which are applied to (1) multi-level dynamic regulatory networks in immune response, cell cycle, and cancer metabolism and (2) mass spectrometry-based omics data analysis.

Figure 1. Proteomics and metabolomics analysis of IL-3 activation and cell cycle (Lee et al., Cell Reports 2017). (A) Multi-omics abundance profiles of proteins, modules/complexes, intracellular metabolites, and extracellular metabolites over one cell cycle (from left to right columns) in response to IL-3 activation. Red for proteins/modules/intracellular metabolites up-regulation or extracellular metabolites release; Green for proteins/modules/intracellular metabolites down-regulation or extracellular metabolites uptake. (B) Functional module network identified from integrative analysis. Red nodes are proteins and white nodes are functional modules. Expression profile plots are shown for literature-validated functional modules. (C) Overall pathway map of IL-3 activation and cell cycle phenotypes. (D) IL-3 activation and cell cycle as a cancer model along with candidate protein and metabolite biomarkers. (E) Protein co-expression scale-free network. (F) Power-low degree distribution of the network E. (G) Protein entropy distribution by topological Kolmogorov-Sinai entropy calculated for the network E.


Adriene Beltz

By |

The goal of my research is to leverage network analysis techniques to uncover how the brain mediates sex hormone influences on gendered behavior across the lifespan. Specifically, my data science research concerns the creation and application of person-specific connectivity analyses, such as unified structural equation models, to time series data; these are intensive longitudinal data, including functional neuroimages, daily diaries, and observations. I then use these data science methods to investigate the links between androgens (e.g., testosterone) and estradiol at key developmental periods, such as puberty, and behaviors that typically show sex differences, including aspects of cognition and psychopathology.

A network map showing the directed connections among 25 brain regions of interest in the resting state frontoparietal network for an individual; data were acquired via functional magnetic resonance imaging. Black lines depict connections common across individuals in the sample, gray lines depict connections specific to this individual, solid lines depict contemporaneous connections (occurring in the same volume), and dashed lines depict lagged connections (occurring between volumes).

A network map showing the directed connections among 25 brain regions of interest in the resting state frontoparietal network for an individual; data were acquired via functional magnetic resonance imaging. Black lines depict connections common across individuals in the sample, gray lines depict connections specific to this individual, solid lines depict contemporaneous connections (occurring in the same volume), and dashed lines depict lagged connections (occurring between volumes).

Suleyman Uludag

By |

My research spans security, privacy, and optimization of data collection particularly as applied to the Smart Grid, an augmented and enhanced paradigm for the conventional power grid. I am particularly interested in optimization approaches that take a notion of security and/or privacy into the modeling explicitly. At the intersection of the Intelligent Transportation Systems, Smart Grid, and Smart Cities, I am interested in data privacy and energy usage in smart parking lots. Protection of data and availability, especially under assault through a Denial-of-Service attacks, represents another dimension of my area of research interests. I am working on developing data privacy-aware bidding applications for the Smart Grid Demand Response systems without relying on trusted third parties. Finally, I am interested in educational and pedagogical research about teaching computer science, Smart Grid, cyber security, and data privacy.

This figure shows the data collection model I used in developing a practical and secure Machine-to-Machine data collection protocol for the Smart Grid.

This figure shows the data collection model I used in developing a practical and secure
Machine-to-Machine data collection protocol for the Smart Grid.

Ming Xu

By |

My research focuses on developing and applying computational and data-enabled methodology in the broader area of sustainability. Main thrusts are as follows:

  1. Human mobility dynamics. I am interested in mining large-scale real-world travel trajectory data to understand human mobility dynamics. This involves the processing and analyzing travel trajectory data, characterizing individual mobility patterns, and evaluating environmental impacts of transportation systems/technologies (e.g., electric vehicles, ride-sharing) based on individual mobility dynamics.
  2. Global supply chains. Increasingly intensified international trade has created a connected global supply chain network. I am interested in understanding the structure of the global supply chain network and economic/environmental performance of nations.
  3. Networked infrastructure systems. Many infrastructure systems (e.g., power grid, water supply infrastructure) are networked systems. I am interested in understanding the basic structural features of these systems and how they relate to the system-level properties (e.g., stability, resilience, sustainability).

A network visualization (force-directed graph) of the 2012 US economy using the industry-by-industry Input-Output Table (15 sectors) provided by BEA. Each node represents a sector. The size of the node represents the economic output of the sector. The size and darkness of links represent the value of exchanges of goods/services between sectors. An interactive version and other data visualizations are available at http://mingxugroup.org/

Danai Koutra

By |

The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.

At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:

*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.

*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.

*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.

*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.

*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.

Elizaveta Levina

By |

Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

Issam El Naqa

By |

Our lab’s research interests are in the areas of oncology bioinformatics, multimodality image analysis, and treatment outcome modeling. We operate at the interface of physics, biology, and engineering with the primary motivation to design and develop novel approaches to unravel cancer patients’ response to chemoradiotherapy treatment by integrating physical, biological, and imaging information into advanced mathematical models using combined top-bottom and bottom-top approaches that apply techniques of machine learning and complex systems analysis to first principles and evaluating their performance in clinical and preclinical data. These models could be then used to personalize cancer patients’ chemoradiotherapy treatment based on predicted benefit/risk and help understand the underlying biological response to disease. These research interests are divided into the following themes:

  • Bioinformatics: design and develop large-scale datamining methods and software tools to identify robust biomarkers (-omics) of chemoradiotherapy treatment outcomes from clinical and preclinical data.
  • Multimodality image-guided targeting and adaptive radiotherapy: design and develop hardware tools and software algorithms for multimodality image analysis and understanding, feature extraction for outcome prediction (radiomics), real-time treatment optimization and targeting.
  • Radiobiology: design and develop predictive models of tumor and normal tissue response to radiotherapy. Investigate the application of these methods to develop therapeutic interventions for protection of normal tissue toxicities.

Siqian Shen

By |

Siqian Shen is an Associate Professor of Industrial and Operations Engineering at the University of Michigan and also serves as an Associate Director in the Michigan Institute for Computational Discovery & Engineering (MICDE). Her theoretical research interests are in integer programming, stochastic/robust optimization, and network optimization. Applications include optimization and risk analysis of energy, healthcare, cloud-computing, and transportation systems. Her work has been supported by the National Science Foundation, Army Research Office, Department of Energy, and industrial funds. Her work has appeared in journals such as Management ScienceOperations ResearchMathematical ProgrammingManufacturing and Service Operations ManagementINFORMS Journal on ComputingTransportation Research Part BIEEE Transactions on Power Systems, and others. She is the recipient of the INFORMS Computing Society Best Student Paper award (runner-up), IIE Pritsker Doctoral Dissertation Award (1st Place), IBM Smarter Planet Innovation Faculty Award, and Department of Energy (DoE) Early Career Award.

Ambuj Tewari

By |

My research group is engaged in fundamental research in the following areas: Statistical learning theory: We are developing theory and algorithms for predictions problems (e.g., learning to rank and multilabel learning) with complex label spaces and where the available human supervision is often weak. Sequential prediction in a game theoretic framework: We are trying to understand the power and limitations of sequential predictions algorithms when no probabilistic assumptions are placed on the data generating mechanism. High dimensional and network data analysis: We are developing scalable algorithms with provable performance guarantees for learning from high dimensional and network data. Optimization algorithms: We are creating incremental, distributed and parallel algorithms for machine learning problems arising in today’s data rich world. Reinforcement learning: We are synthesizing concepts and techniques from artificial intelligence, control theory and operations research for pushing the frontier in sequential decision making with a focus on delivering personalized health interventions via mobile devices. My research group is pursuing and continues to actively search for challenging machine learning problems that arise across disciplines including behavioral sciences, computational biology, computational chemistry, learning sciences, and network science.

Research to deliver personalized interventions in real-time via people's mobile devices

Research to deliver personalized interventions in real-time via people’s mobile devices

Jon Lee

By |

Jon’s research focus is on nonlinear discrete optimization (NDO). Many practical engineering problems have physical aspects which are naturally modeled through smooth nonlinear functions, as well as design aspects which are often modeled with discrete variables. Research in NDO seeks to marry diverse techniques from classical areas of optimization, for example methods for smooth nonlinear optimization and methods for integer linear programming, with the idea of successfully attacking natural NDO models for practical engineering problems.  On particular area of applied interest is environmental monitoring and the framework of maximum-entropy sampling.