Research Overview

Single-cell genomics, rooted in single-cell sequencing, has great potential for providing insight into fundamental questions in biomedical science and drive new health science discoveries, such as: How many cell types and functional states are there in a given tissue?  What is the range of natural variation within a cell type and how is such variability affected by genetic and environmental factors?  What happens at the single-cell level during cell fate determination in the developmental process?  How does cellular heterogeneity with a tumor affect response to therapy and how can we address this with precision medicine?  The list is endless.  However, the explosive growth of single-cell sequencing technologies also brings new computational challenges.  One major challenge is the “sparse read counts data”: because of the minuscule amount of genetic material in a single cell, fragments of the genome are often missing from the sequencing read-out, and existing tools are ineffective in addressing this missing-data problem and piecing together reliable genomic information.

The research team will establish a Michigan Center for Single-Cell Genomic Data Analytics, and connect mathematicians and data scientists with biological researchers to develop, evaluate, and implement a variety of cutting-edge methodologies in sparse data analysis.  These methodologies will address issues in data normalization, batch effect detection and correction, marker selection, classification, rare class identification, differential expression, network and phylogenetic inference, develop tools for cyclic or time-series data, and enable information integration across data types.  The team will apply these methodologies to four biological questions to test their utility: 1) Intra-tumor heterogeneity, cancer stem cells in metastasis and treatment resistance, cancer genome evolution; 2) spermatogenesis as a model for cell fate determination during development; 3) transcriptional complexity and gene regulation at the single-cell level; 4) molecular changes at the single-cell level as a result of environmental exposures and windows of susceptibility.

The outcome from this research project will have much broader impact on biomedical research beyond the four research areas that will be used as test cases.  Sparse data analytics also has wide application beyond health sciences.  For example, Electronic Health Records are inherently sparse, as are consumer data (purchasing, rating, or video viewing habits), location and usage data of mobile devices, connectivity in social networks, medical imaging or land imaging by satellites.  In short, this line of research is conceptually connected with many areas of active research in data science and will produce general-purpose tools for many research areas.

Research Team

  • Jun Li, Associate Professor, Departments of Human Genetics and Computational Medicine and Bioinformatics
  • Anna Gilbert, Professor, Department of Mathematics
  • Laura Balzano, Assistant Professor, Department of Electrical Engineering and Computer Science
  • Justin Colacino, Assistant Professor, Departments of Environmental Health Sciences and Nutritional Sciences
  • Johann Gagnon-Bartsch, Assistant Professor, Department of Statistics
  • Yuanfang Guan, Assistant Professor, Department of Computational Medicine and Bioinformatics
  • Sue Hammoud, Assistant Professor, Departments of Human Genetics, Obstetrics and Gynecology, and Urology
  • Gil Omenn, Professor, Departments of Computational Medicine and Bioinformatics, Human Genetics and School of Public Health
  • Clay Scott, Associate Professor, Department of Electrical Engineering and Computer Science
  • Max Wicha, Professor, Department of Internal Medicine
  • Xiang Zhou, Assistant Professor, Department of Biostatistics


  • The center is hosting a symposium on Aug. 10, 2017 in Palmer Commons. Please see the symposium webpage for more details, an agenda, and a link to registration.
  • Members of this research team come from 10 research labs.  Several students are now involved in the study. They have made progress to define challenges for each lab.
  • In one study, the team has analyzed single-cell RNAseq data for ~13,000 mouse germ cells.  Using principal component analysis and known developmental markers to group cells, they were able to interpret the major cell types involved in spermatogenesis.  This global survey of cellular heterogeneity led to a finer-grained classification of both common and rare cell subtypes, and the known and newly discovered markers provide a tool for future experiments.
  • The team has submitted three grant proposals to federal funding agencies and a private foundation.
  • The team presented its work related to the MIDAS Challenge Award at the Computational Biology Workshop in Statistical Challenges in Single-Cell Biology and the Keystone Symposium in Single Cell Omics.

(July 2017)

3D density map of 13,000 germ cells, as distributed in their gene expression PC1-PC2 space. Regions of higher cell density are shown as taller peaks. By Sue Hammoud, Chris Green, Qianyi Ma, Jun Li.