Explore ARCExplore ARC

Ding Zhao

By |

Ding Zhao, PhD, is Assistant Research Scientist in the department of Mechanical Engineering, College of Engineering with a secondary appointment in the Robotics Institute at The University of Michigan, Ann Arbor.

Dr. Zhao’s research interests include autonomous vehicles, intelligent/connected transportation, traffic safety, human-machine interaction, rare events analysis, dynamics and control, machine learning, and big data analysis

 

V. G. Vinod Vydiswaran

By |

V.G.Vinod Vydiswaran, PhD, is Assistant Professor in the Department of Learning Health Sciences with a secondary appointment in the School of Information at the University of Michigan, Ann Arbor.

Dr. Vydiswaran’s research focuses on developing and applying text mining, natural language processing, and machine learning methodologies for extracting relevant information from health-related text corpora. This includes medically relevant information from clinical notes and biomedical literature, and studying the information quality and credibility of online health communication (via health forums and tweets). His previous work includes developing novel information retrieval models to assist clinical decision making, modeling information trustworthiness, and addressing the vocabulary gap between health professionals and  laypersons.

Sriram Chandrasekaran

By |

Sriram Chandrasekaran, PhD, is Assistant Professor of Biomedical Engineering in the College of Engineering at the University of Michigan, Ann Arbor.

Dr. Chandrasekaran’s Systems Biology lab develops computer models of biological processes to understand them holistically. Sriram is interested in deciphering how thousands of proteins work together at the microscopic level to orchestrate complex processes like embryonic development or cognition, and how this complex network breaks down in diseases like cancer. Systems biology software and algorithms developed by his lab are highlighted below and are available at http://www.sriramlab.org/software/.

– INDIGO (INferring Drug Interactions using chemoGenomics and Orthology) algorithm predicts how antibiotics prescribed in combinations will inhibit bacterial growth. INDIGO leverages genomics and drug-interaction data in the model organism – E. coli, to facilitate the discovery of effective combination therapies in less-studied pathogens, such as M. tuberculosis. (Ref: Chandrasekaran et al. Molecular Systems Biology 2016)

– GEMINI (Gene Expression and Metabolism Integrated for Network Inference) is a network curation tool. It allows rapid assessment of regulatory interactions predicted by high-throughput approaches by integrating them with a metabolic network (Ref: Chandrasekaran and Price, PloS Computational Biology 2013)

– ASTRIX (Analyzing Subsets of Transcriptional Regulators Influencing eXpression) uses gene expression data to identify regulatory interactions between transcription factors and their target genes. (Ref: Chandrasekaran et al. PNAS 2011)

– PROM (Probabilistic Regulation of Metabolism) enables the quantitative integration of regulatory and metabolic networks to build genome-scale integrated metabolic–regulatory models (Ref: Chandrasekaran and Price, PNAS 2010)

 

Research Overview: We develop computational algorithms that integrate omics measurements to create detailed genome-scale models of cellular networks. Some clinical applications of our algorithms include finding metabolic vulnerabilities in pathogens (M. tuberculosis) using PROM, and designing multi combination therapeutics for reducing antibiotic resistance using INDIGO.

Research Overview: We develop computational algorithms that integrate omics measurements to create detailed genome-scale models of cellular networks. Some clinical applications of our algorithms include finding metabolic vulnerabilities in pathogens (M. tuberculosis) using PROM, and designing multi combination therapeutics for reducing antibiotic resistance using INDIGO.

Gilbert S. Omenn

By |

Gilbert Omenn, MD, PhD, is Professor of Computational Medicine & Bioinformatics with appointments in Human Genetics, Molecular Medicine & Genetics in the Medical School and Professor of Public Health in the School of Public Health and the Harold T. Shapiro Distinguished University Professor at the University of Michigan, Ann Arbor.

Doctor Omenn’s current research interests are focused on cancer proteomics, splice isoforms as potential biomarkers and therapeutic tar- gets, and isoform-level and single-cell functional networks of transcripts and proteins. He chairs the global Human Proteome Project of the Human Proteome Organization.

Qiang Zhu

By |

Dr. Zhu’s group conducts research on various topics, ranging from foundational methodologies to challenging applications, in data science. In particular, the group has been investigating the fundamental issues and techniques for supporting various types of queries (including range queries, box queries, k-NN queries, and hybrid queries) on large datasets in a non-ordered discrete data space. A number of novel indexing and searching techniques that utilize the unique characteristics of an NDDS are developed. The group has also been studying the issues and techniques for storing and searching large scale k-mer datasets for various genome sequence analysis applications in bioinformatics. A virtual approximate store approach to supporting repetitive big data in genome sequence analyses and several new sequence analysis techniques are suggested. In addition, the group has been researching the challenges and methods for processing and optimizing a new type of so-called progressive queries that are formulated on the fly by a user in multiple steps. Such queries are widely used in many application domains including e-commerce, social media, business intelligence, and decision support. The other research topics that have been studied by the group include streaming data processing, self-management database, spatio-temporal data indexing, data privacy, Web information management, and vehicle drive-through wireless services.

Danai Koutra

By |

The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.

At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:

*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.

*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.

*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.

*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.

*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.

Elizaveta Levina

By |

Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

Jeremy M G Taylor

By |

Jeremy Taylor, PhD, is the Pharmacia Research Professor of Biostatistics in the School of Public Health and Professor in the Department of Radiation Oncology in the School of Medicine at the University of Michigan, Ann Arbor. He is the director of the University of Michigan Cancer Center Biostatistics Unit and director of the Cancer/Biostatistics training program. He received his B.A. in Mathematics from Cambridge University and his Ph.D. in Statistics from UC Berkeley. He was on the faculty at UCLA from 1983 to 1998, when he moved to the University of Michigan. He has had visiting positions at the Medical Research Council, Cambridge, England; the University of Adelaide; INSERM, Bordeaux and CSIRO, Sydney, Australia. He is a previously winner of the Mortimer Spiegelman Award from the American Public Health Association and the Michael Fry Award from the Radiation Research Society. He has worked in various areas of Statistics and Biostatistics, including Box-Cox transformations, longitudinal and survival analysis, cure models, missing data, smoothing methods, clinical trial design, surrogate and auxiliary variables. He has been heavily involved in collaborations in the areas of radiation oncology, cancer research and bioinformatics.

I have broad interests and expertise in developing statistical methodology and applying it in biomedical research, particularly in cancer research. I have undertaken research  in power transformations, longitudinal modeling, survival analysis particularly cure models, missing data methods, causal inference and in modeling radiation oncology related data.  Recent interests, specifically related to cancer, are in statistical methods for genomic data, statistical methods for evaluating cancer biomarkers, surrogate endpoints, phase I trial design, statistical methods for personalized medicine and prognostic and predictive model validation.  I strive to develop principled methods that will lead to valid interpretations of the complex data that is collected in biomedical research.

Johann Gagnon-Bartsch

By |

Johann Gagnon-Bartsch, PhD, is Assistant Professor of Statistics in the College of Literature, Science, and the Arts at the University of Michigan, Ann Arbor.

Prof. Gagnon-Bartsch’s research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, he is working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). They are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Naisyin Wang

By |

Naisyin Wang, PhD, is Professor of Statistics, College of Literature, Science, and the Arts, at the University of Michigan, Ann Arbor.

Prof. Wang’s main research interests involve developing models and methodologies for complex biomedical data. She has developed approaches in information extraction from imperfect data due to measurement errors and incompleteness. Her other methodology developments include model-based mixture modeling, non- and semiparametric modeling of longitudinal, dynamic and high dimensional data. She developed approaches that first gauge the effects of measurement errors on non-linear mixed effects models and provided statistical methods to analyze such data. Most methods she has developed are so called semi-parametric based. One strength of such approaches is that one does not need to make certain structure assumptions about part of the model. This modeling strategy enables data integration from measurements collected from sources that might not be completely homogeneous. Her recently developed statistical methods focus on regularized approach and model building, selection and evaluation for high dimensional, dynamic or functional data.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.