Loading Events
  • This event has passed.

MIDAS Seminar Series Presents: Jun Li, PhD – University of Michigan

September 23 @ 3:30 pm - 4:30 pm

Weiser Hall, 10th Floor

Jun Li, PhD

Professor of Human Genetics

Professor & Associate Chair for Research of Computational Medicine and Bioinformatics

Faculty, Center for Statistical Genetics; Comprehensive Cancer Center

Member, Depression Center; Michigan Diabetes Research Center; Michigan Metabolomics & Obesity Center

Co-Director, Michigan Center for Single-Cell Genomic Data Analytics

 

Benchmarking at scale: comparing analysis workflows for single-cell genomic data

The rapid adoption of single-cell RNA sequencing (scRNA-seq) has created a new pressure point in computational analyses.  As of July 2019, >450 tools have appeared to address tasks such as normalization, clustering, and imputation.  However, the community still struggles to identify the best tool(s) for any given task.  At the time of publishing a method, the authors typically show how the method outperforms others in author-defined settings, using real data with presumed “truth”, sometimes supplemented with synthetic data simulated under specific models (e.g., clusters or continuous trajectories).  Comparative re-evaluation of available tools tends to be limited to default workflows, using simulations that are not community-agreed or not easily extendable.  To address the difficulty of standardized benchmarking at a large scale, we created >1000 archival quality simulated scRNA-seq datasets with complete knowledge of their underlying clusters, and used them to test 15 clustering algorithms over 225 workflows.  The datasets are transcript count matrices, linked in a hyper-grid of parameters to cover a range of models and known degrees of difficulties.  The differential performance of the 225 workflows in the >1K datasets allowed both global statistical control of the model space and fine-grained assessment of the algorithmic decisions affecting performance.  I will also discuss our vision of developing guidelines to learn statistically-relevant features from real datasets and adjusting the simulations accordingly, for making the open-source in silico data sufficiently real: matching the empirical data/platform to arbitrary closeness, and reusable at any scale.  The ultimate goal of this research is to build a general-purpose support system, including evolving knowledge of available algorithms, checklists for making claims, for mass customization of new pipelines based on the statistical property of the data rather than the biological topic.

Bio: The Li lab studies the genetic and functional basis of complex human diseases using genomic approaches. Currently their NIH-supported projects include the analyses of spontaneous mutation patterns in the human genome (NIGMS R01), multi-omic studies of a genetic rat model of addiction behavior (NIDA U01) and a rat model of metabolic health (NIDDK R01). They are part of the MoTrPAC Consortium (U24 NIH Common Funds) which seeks to discover the molecular transducers of the health benefit of physical exercise. Dr. Li co-directs the Michigan Center for Single-Cell Genomic Data Analytics, which aims to build a strong computational infrastructure to support the rigorous use of single-cell genomic data. An overarching theme in the Li lab is the responsible use of complex data in transparent, reproducible, and community extendable research.

 

For more information on MIDAS or the Seminar Series, please contact midas-contact@umich.edu. MIDAS gratefully acknowledges Wacker Chemie AG for its generous support of the MIDAS Seminar Series.

Details

Date:
September 23
Time:
3:30 pm - 4:30 pm

Venue

Weiser Hall, 10th Floor
500 Church Street
Ann Arbor, MI 48109 United States
+ Google Map