Explore ARCExplore ARC

Sriram Chandrasekaran

By |

Sriram Chandrasekaran, PhD, is Assistant Professor of Biomedical Engineering in the College of Engineering at the University of Michigan, Ann Arbor.

Dr. Chandrasekaran’s Systems Biology lab develops computer models of biological processes to understand them holistically. Sriram is interested in deciphering how thousands of proteins work together at the microscopic level to orchestrate complex processes like embryonic development or cognition, and how this complex network breaks down in diseases like cancer. Systems biology software and algorithms developed by his lab are highlighted below and are available at http://www.sriramlab.org/software/.

– INDIGO (INferring Drug Interactions using chemoGenomics and Orthology) algorithm predicts how antibiotics prescribed in combinations will inhibit bacterial growth. INDIGO leverages genomics and drug-interaction data in the model organism – E. coli, to facilitate the discovery of effective combination therapies in less-studied pathogens, such as M. tuberculosis. (Ref: Chandrasekaran et al. Molecular Systems Biology 2016)

– GEMINI (Gene Expression and Metabolism Integrated for Network Inference) is a network curation tool. It allows rapid assessment of regulatory interactions predicted by high-throughput approaches by integrating them with a metabolic network (Ref: Chandrasekaran and Price, PloS Computational Biology 2013)

– ASTRIX (Analyzing Subsets of Transcriptional Regulators Influencing eXpression) uses gene expression data to identify regulatory interactions between transcription factors and their target genes. (Ref: Chandrasekaran et al. PNAS 2011)

– PROM (Probabilistic Regulation of Metabolism) enables the quantitative integration of regulatory and metabolic networks to build genome-scale integrated metabolic–regulatory models (Ref: Chandrasekaran and Price, PNAS 2010)

 

Research Overview: We develop computational algorithms that integrate omics measurements to create detailed genome-scale models of cellular networks. Some clinical applications of our algorithms include finding metabolic vulnerabilities in pathogens (M. tuberculosis) using PROM, and designing multi combination therapeutics for reducing antibiotic resistance using INDIGO.

Research Overview: We develop computational algorithms that integrate omics measurements to create detailed genome-scale models of cellular networks. Some clinical applications of our algorithms include finding metabolic vulnerabilities in pathogens (M. tuberculosis) using PROM, and designing multi combination therapeutics for reducing antibiotic resistance using INDIGO.

Gilbert S. Omenn

By |

Gilbert Omenn, MD, PhD, is Professor of Computational Medicine & Bioinformatics with appointments in Human Genetics, Molecular Medicine & Genetics in the Medical School and Professor of Public Health in the School of Public Health and the Harold T. Shapiro Distinguished University Professor at the University of Michigan, Ann Arbor.

Doctor Omenn’s current research interests are focused on cancer proteomics, splice isoforms as potential biomarkers and therapeutic tar- gets, and isoform-level and single-cell functional networks of transcripts and proteins. He chairs the global Human Proteome Project of the Human Proteome Organization.

Qiang Zhu

By |

Dr. Zhu’s group conducts research on various topics, ranging from foundational methodologies to challenging applications, in data science. In particular, the group has been investigating the fundamental issues and techniques for supporting various types of queries (including range queries, box queries, k-NN queries, and hybrid queries) on large datasets in a non-ordered discrete data space. A number of novel indexing and searching techniques that utilize the unique characteristics of an NDDS are developed. The group has also been studying the issues and techniques for storing and searching large scale k-mer datasets for various genome sequence analysis applications in bioinformatics. A virtual approximate store approach to supporting repetitive big data in genome sequence analyses and several new sequence analysis techniques are suggested. In addition, the group has been researching the challenges and methods for processing and optimizing a new type of so-called progressive queries that are formulated on the fly by a user in multiple steps. Such queries are widely used in many application domains including e-commerce, social media, business intelligence, and decision support. The other research topics that have been studied by the group include streaming data processing, self-management database, spatio-temporal data indexing, data privacy, Web information management, and vehicle drive-through wireless services.

Danai Koutra

By |

The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.

At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:

*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.

*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.

*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.

*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.

*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.

Elizaveta Levina

By |

Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).

Jeremy M G Taylor

By |

Jeremy Taylor, PhD, is the Pharmacia Research Professor of Biostatistics in the School of Public Health and Professor in the Department of Radiation Oncology in the School of Medicine at the University of Michigan, Ann Arbor. He is the director of the University of Michigan Cancer Center Biostatistics Unit and director of the Cancer/Biostatistics training program. He received his B.A. in Mathematics from Cambridge University and his Ph.D. in Statistics from UC Berkeley. He was on the faculty at UCLA from 1983 to 1998, when he moved to the University of Michigan. He has had visiting positions at the Medical Research Council, Cambridge, England; the University of Adelaide; INSERM, Bordeaux and CSIRO, Sydney, Australia. He is a previously winner of the Mortimer Spiegelman Award from the American Public Health Association and the Michael Fry Award from the Radiation Research Society. He has worked in various areas of Statistics and Biostatistics, including Box-Cox transformations, longitudinal and survival analysis, cure models, missing data, smoothing methods, clinical trial design, surrogate and auxiliary variables. He has been heavily involved in collaborations in the areas of radiation oncology, cancer research and bioinformatics.

I have broad interests and expertise in developing statistical methodology and applying it in biomedical research, particularly in cancer research. I have undertaken research  in power transformations, longitudinal modeling, survival analysis particularly cure models, missing data methods, causal inference and in modeling radiation oncology related data.  Recent interests, specifically related to cancer, are in statistical methods for genomic data, statistical methods for evaluating cancer biomarkers, surrogate endpoints, phase I trial design, statistical methods for personalized medicine and prognostic and predictive model validation.  I strive to develop principled methods that will lead to valid interpretations of the complex data that is collected in biomedical research.

Johann Gagnon-Bartsch

By |

Johann Gagnon-Bartsch, PhD, is Assistant Professor of Statistics in the College of Literature, Science, and the Arts at the University of Michigan, Ann Arbor.

Prof. Gagnon-Bartsch’s research currently focuses on the analysis of high-throughput biological data as well as other types of high-dimensional data. More specifically, he is working with collaborators on developing methods that can be used when the data are corrupted by systematic measurement errors of unknown origin, or when the data suffer from the effects of unobserved confounders. For example, gene expression data suffer from both systematic measurement errors of unknown origin (due to uncontrolled variations in laboratory conditions) and the effects of unobserved confounders (such as whether a patient had just eaten before a tissue sample was taken). They are developing methodology that is able to correct for these systematic errors using “negative controls.” Negative controls are variables that (1) are known to have no true association with the biological signal of interest, and (2) are corrupted by the systematic errors, just like the variables that are of interest. The negative controls allow us to learn about the structure of the errors, so that we may then remove the errors from the other variables.

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Microarray data from tissue samples taken from three different regions of the brain (anterior cingulate cortex, dorsolateral prefrontal cortex, and cerebellum) of ten individuals. The 30 tissue samples were separately analyzed in three different laboratories (UC Davis, UC Irvine, U of Michigan). The left plot shows the first two principal components of the data. The data cluster by laboratory, indicating that most of the variation in the data is systematic error that arises due to uncontrolled variation in laboratory conditions. The second plot shows the data after adjustment. The data now cluster by brain region (cortex vs. cerebellum). The data is from GEO (GSE2164).

Naisyin Wang

By |

Naisyin Wang, PhD, is Professor of Statistics, College of Literature, Science, and the Arts, at the University of Michigan, Ann Arbor.

Prof. Wang’s main research interests involve developing models and methodologies for complex biomedical data. She has developed approaches in information extraction from imperfect data due to measurement errors and incompleteness. Her other methodology developments include model-based mixture modeling, non- and semiparametric modeling of longitudinal, dynamic and high dimensional data. She developed approaches that first gauge the effects of measurement errors on non-linear mixed effects models and provided statistical methods to analyze such data. Most methods she has developed are so called semi-parametric based. One strength of such approaches is that one does not need to make certain structure assumptions about part of the model. This modeling strategy enables data integration from measurements collected from sources that might not be completely homogeneous. Her recently developed statistical methods focus on regularized approach and model building, selection and evaluation for high dimensional, dynamic or functional data.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.

Regularized time-varying ODE coefficients of SEI dynamic equation for the Canadian measles incidence data (Li, Zhu, Wang, 2015). Left panel: time-varying ODE coefficient curve that reflects both yearly and seasonal effects with the regularized yearly effect (red curve) embedded; right panel: regularized (red curve), non-regularized (blue) and two-year local constant (circles) estimates of yearly effects. The new regularized method shows that the yearly effect is relatively large in the early years and deceases gradually to a constant after 1958.

Issam El Naqa

By |

Our lab’s research interests are in the areas of oncology bioinformatics, multimodality image analysis, and treatment outcome modeling. We operate at the interface of physics, biology, and engineering with the primary motivation to design and develop novel approaches to unravel cancer patients’ response to chemoradiotherapy treatment by integrating physical, biological, and imaging information into advanced mathematical models using combined top-bottom and bottom-top approaches that apply techniques of machine learning and complex systems analysis to first principles and evaluating their performance in clinical and preclinical data. These models could be then used to personalize cancer patients’ chemoradiotherapy treatment based on predicted benefit/risk and help understand the underlying biological response to disease. These research interests are divided into the following themes:

  • Bioinformatics: design and develop large-scale datamining methods and software tools to identify robust biomarkers (-omics) of chemoradiotherapy treatment outcomes from clinical and preclinical data.
  • Multimodality image-guided targeting and adaptive radiotherapy: design and develop hardware tools and software algorithms for multimodality image analysis and understanding, feature extraction for outcome prediction (radiomics), real-time treatment optimization and targeting.
  • Radiobiology: design and develop predictive models of tumor and normal tissue response to radiotherapy. Investigate the application of these methods to develop therapeutic interventions for protection of normal tissue toxicities.

Carol Flannagan

Carol Flannagan

By |

As faculty member within the University of Michigan Transportation Research Institute, Dr. Flannagan currently serves as Director of the Center for Management of Information for Safe and Sustainable Transportation (CMISST) and Head of the Statistics and Methods Group for the CDC-funded UM Injury Center. Dr. Flannagan has over 20 years of experience conducting data analysis and research on injury risk related to motor vehicle crashes and was responsible for the development of a model of injury outcome that allows side-by-side comparison of public health, vehicle, roadway and post-crash interventions (utmost.umtri.umich.edu). She has also applied statistical methods to understanding and evaluating benefits of crash-avoidance technologies, including evaluating safety in automated vehicles, and works to develop novel applications of statistics to analysis of driving data. Her current work with CMISST involves the fusion and analysis of large state-level crash databases, which are useful in analyzing the effect of a variety of countermeasures on crash involvement and injury risk. In addition, her group is working to make data available to researchers to expand the community of experts in transportation data analysis.

Q & A with Dr. Carol Flannagan

Q:  When in your career did you realize that data (broadly speaking) could open a lot of doors, in terms of research?

I have loved data analysis since my undergraduate Experimental Design course sophomore year when I learned about analysis of 2X2 tables. Beyond that, I didn’t necessarily see it as a career move or a research program. It was much later at UMTRI, when I started thinking about how much data we had scattered around the building and how great it would be to round up those data and share them, that I though explicitly about data per se as opening research doors.

In 2010, I was asked to head up the Transportation Data Center at UMTRI. In spite of its name, the group was doing very limited things with crash data at the time. After a few years, the group got a new name and some support to grow from UMOR. That, along with a number of research projects over the years, has led to our current incarnation with substantially upgraded data hosting for the state of Michigan’s crash data and strong capabilities in data analysis of all kinds of newer transportation data. For example, we have collaborated with the Engineering Systems group, GM and OnStar to conduct a series of studies funded by NHTSA to analyze driver response to crash avoidance systems using OnStar data on a large scale.

With the MIDAS transportation project, we are moving forward on several fronts. Most importantly, we are developing a high-performance data access and processing system to handle the large driving datasets that UMTRI has collected (and continues to collect). This system can integrate automated video processing with analysis and/or querying of time series data (e.g., speed, acceleration, etc.). For my group, this new capability opens new doors in both data sharing and data analysis.

Q: How has the field, and specifically the use of “Big Data,” changed during your career?

Wow… My first analyses as an undergrad were computed by hand as a matter of course. In grad school, since we needed to access terminals to use MTS (the Michigan Terminal System), it was often faster to just do it by hand (using a calculator and paper and pencil). What that meant is that computation was a true limitation on how much data could be analyzed. Even after personal computers were a regular fixture in research (and at home), computation of any size (e.g., more subjects or more variables) was still slow. My dissertation included a simulation component and I used to set simulations running every night before I went to bed. The simulation would finish around 2-3 a.m., and the computer would suddenly turn on, sending bright light into the room and waking my husband and me up. Those simulations can all be done in minutes or seconds now. Imagine how much more I could have done with current systems!

In the last 5 years, I’ve gotten much more involved in analysis of driving data to understand benefits of crash avoidance systems. We are often searching these data for extracts that contain certain kinds of situations that are relevant to such systems (e.g., hard braking events related to forward collision warnings). This is one use of Big Data—to observe enough that you can be sure of finding examples of what you’re interested in. However, in the last year or two, I have been more involved in full-dataset analyses, large-scale triggered data collection, and large-scale kinematic simulations. These are all enabled by faster computing and because of that, we can find richer answers to more questions.

Q: What role does your association with MIDAS play in this continuing development?

One of the advantages of being associated with MIDAS is that it gives me access to faculty who are interested in data as opposed to transportation. It’s not that I need other topic areas, but I often find that when I listen to data and methods talks in totally different content areas, I can see how those methods could apply to problems I care about. For example, the challenge grant in the social sciences entitled “Computational Approaches for the Construction of Novel Macroeconomic Data,” is tackling a problem for economists that transportation researchers share. That is, how do you help researchers who may not have deep technical skill in data querying get at extracts from very large, complex datasets easily? Advances that they make in that project could be transferred to transportation datasets, which share many of the characteristics of social-media datasets.

Another advantage is the access to students who are in data-oriented programs (e.g., statistics, data science, computer science, information, etc.) who want real-world experiences using their skills. I have had a number of data science students reach out to me and worked closely with two of them on independent studies where we worked together on Big Data analysis problems in transportation. One was related to measuring and tracking vehicle safety in automated vehicles and the other was in text mining of a complaints dataset to try to find failure patterns that might indicate the presence of a vehicle defect.

Q: What are the next research challenges for transportation generally and your work specifically?

Transportation is in the midst of a once-in-a-lifetime transformation. In my lifetime, I expect to be driven to the senior center and my gerontologist appointments by what amounts to a robot (on wheels). Right now, I’m working on research problems that will help bring that to fruition. Data science is absolutely at the core of that transformation and the problems are wide open. I’m particularly concerned that our research datasets need to be managed in a way that opens them to a broad audience where we support both less technical interactions (e.g., data extracts with a very simple querying system) and much more technical interactions (e.g., full-dataset automatic video processing to identify drivers using cell phones in face video). I’m also interested in new and efficient large-scale data collection methods, particularly those based on the idea of “smart” or triggered sampling rather than just acquisition of huge datasets “because the information I want is somewhere in there…if only I can find it…”

My own work has mostly been about safety, and although we envision a future without traffic fatalities (or ideally without crashes), the current fatality count has gone up over the past couple of years. Thus, I spend time analyzing crash data for new insights into changes in safety over time and even how the economy influences fatality counts. Much of my work is for the State of Michigan around predicting how many fatalities there will be in the next few years and identifying the benefits and potential benefits of various interventions. My web tool, UTMOST, (the Unified Theory Mapping Opportunities for Safety Technologies) allows visualization of the benefits, or potential benefits, of many different kinds of interventions, including public health policy, vehicle technology, and infrastructure improvements. I think this kind of integrated approach will be necessary to achieve the goal of zero fatalities over the next 20 years. Finally, a significant part of my research program will continue to be development and use of statistical methods to measure, predict, and understand safety in transportation. How can we tell if AVs are safer than humans? What should future traffic safety data systems look like? How can we integrate data systems (e.g., crash and injury outcome) to better understand safety and figure out how to prioritize countermeasure development? How can machine learning and other big data statistical tools help us make the most of driving data?