MIDAS Seminar: Chris Miller, PhD, University of Michigan
January 25 @ 4:00 pm - 5:00 pm
Room 340 West Hall
Chris Miller, PhD
Associate Professor of Astronomy
Associate Professor of Physics
College of Literature, Science, and the Arts
University of Michigan
Quantify Systematics from Mislabeled Truth Tables in Supervised Learning
Abstract: Many real world classification problems use ground truth labels created by human annotators. However, observed data is never perfect, and even labels assigned by perfect annotators can be systematically biased due to poor quality of the data they are labeling. This bias is not created by the annotators from measurement error, but is intrinsic to the observational data. We present a method for de-biasing labels which simultaneously learns a classification model, estimates the intrinsic biases in the ground truth, and provides new de-biased labels. We test our algorithm on simulated and real data and show that it is superior to standard denoising algorithms, like instance weighted logistic regression. We apply our technique to galaxy images and find that the morphologies based on supervised machine-learning trained over features such as colors, shape, and concentration show significantly less bias than morphologies based on expert or citizen-science classifiers. This result holds even when there is underlying bias present in the training sets used in the supervised machine learning process.
Bio: Chris Miller is a leader in astroinformatics – mixing computer science, advanced statistics, and data mining to answer key cosmological questions. His specialty is using galaxy clusters to trace the distribution of matter in the universe. After years exploiting the Sloan Digital Sky Survey, he is now heavily involved in the Dark Energy Survey and Dark Energy Spectroscopic Survey, two of the largest current astronomical survey efforts. Professor Miller used his galaxy-cluster research to support the Big Bang theory by aligning findings from opposing cosmological epochs. He was the first to see the signatures of sound waves from the very early universe that were “frozen into” the matter-density distribution that we observe today. His analysis of the current universe synched neatly with the acoustic oscillations of the early universe detected in the cosmic microwave background, and demonstrated the power of combining big-survey with focused observational follow-up data. He has published in a variety of journals outside his own fields of physics and astronomy, including NIPS, ICPR, The Annals of Applied Statistics, and Statistical Science.
Background: BS, Penn State; PhD, University of Maine. Postdoc (2000-2005) Carnegie-Mellon; Faculty (2005-2009) National Optical Astronomy Observatory/Chile. Hired in 2010 at U-M under a presidential initiative for advancing data mining research.