Workshop co-chaired by MIDAS co-director Prof. Hero releases proceedings on inference in big data

By | Al Hero, Educational, General Interest, Research

The National Academies Committee on Applied and Theoretical Statistics has released proceedings from its June 2016 workshop titled “Refining the Concept of Scientific Inference When Working with Big Data,” co-chaired by Alfred Hero, MIDAS co-director and the John H Holland Distinguished University Professor of Electrical Engineering and Computer Science.

The report can be downloaded from the National Academies website.

The workshop explored four key issues in scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models.

The workshop brought together statisticians, data scientists and domain researchers from different biomedical disciplines in order to identify new methodological developments that hold significant promise, and to highlight potential research areas for the future. It was partially funded by the National Institutes of Health Big Data to Knowledge Program, and the National Science Foundation Division of Mathematical Sciences.

Biostatistics Seminar: Jonathan Terhorst, PhD Candidate, University of California, Berkeley

By |

Terhorst

Jonathan Terhorst, PhD Candidate

Statistics, University of California at Berkeley

 

“Robust and Scalable Inference of Population History and Selection

from Hundreds of Whole Genomes”

Abstract: Demographic inference refers to the problem of inferring past population events (migrations, admixture, expansions, etc.) from patterns of mutations in sampled DNA. Apart from intrinsic appeal of understanding the origins of our species, this type of analysis is useful for forming a null model of human evolution, departures from which signal the presence of natural selection, population structure, and other interesting phenomena.

In this talk I will discuss recent statistical and computational innovations which enable us to infer demography using modern data sets consisting of hundreds of whole-genome sequences obtained from populations all over the world. These include momi, a new software package for stable and rapid computation of the expected sample frequency spectrum (SFS) under complex demographic scenarios involving numerous diverged populations, as well as SMC++, a new probabilistic framework which couples the genealogical process for a given individual with allele frequency information from a large related panel. Using these tools, I will demonstrate how we can learn about human expansion in the last 12,000 years, understand the mysterious origins of ancient DNA samples, and estimate when Europeans acquired lighter skin and the ability to digest lactose. Finally, I will discuss some of the statistical aspects of these estimators, in particular an information-theoretic lower bound on the error rate of any SFS-based demographic inference procedure.

All relevant theory will be introduced during the talk; no prior knowledge of population genetics is assumed. Portions of this work are joint with Jack Kamm, Pier Palamara, and Yun Song.

Bio: I am a PhD student in the statistics department at UC Berkeley. I’m interested in statistical / population genetics, machine learning, and generally developing mathematical models and software to help fellow scientists understand their data.

Light refreshments will be served at 3:10 p.m. in room 1690.

Workshop: Refining the Concept of Scientific Inference When Working with Big Data — June 8-9 (webcast)

By | General Interest, News

The National Academies of Sciences, Engineering and Medicine Committee on Applied and Theoretical Statistics is holding a workshop titled “Refining the Concept of Scientific Inference When Working With Big Data” in Washington DC, June 8-9.

The workshop will bring together statisticians, data scientists and domain researchers from different biomedical disciplines to explore four key issues of scientific inference:

  • Inference about causal discoveries driven by large observational data
  • Inference about discoveries from data on large networks
  • Inference about discoveries based on integration of diverse datasets
  • Inference when regularization is used to simplify fitting of high-dimensional models

Prof. Alfred Hero, co-director of the Michigan Institute for Data Science (MIDAS) and Professor of Electrical Engineering and Computer Science, is a co-chair of the event.

More information: