Explore ARCExplore ARC

The 2nd Annual Data for Public Good Symposium

By |

Do you have experience in working alongside community partners in data analysis or program evaluation? Do you want to connect with others who are using their skills for public good? National efforts from organizations such as DataKind, Data Science for Social Good, and Statistics without Borders have been expanding in recent years as more individuals recognize their potential to impact social change.  Great things can happen when individuals are empowered to dedicate time, resources, and knowledge to the pursuit of public good. Whether we work in the foreground or the background, we can all contribute to improving the lives of those around us.

Statistics in the Community (STATCOM), in collaboration with the Center for Education Design, Evaluation, and Research (CEDER) and the Community Technical Assistance Collaborative (CTAC), invite you to attend the 2nd Annual Data for Public Good Symposium hosted by the Michigan Institute for Data Science (MIDAS). The symposium will take place on Tuesday, February 19, 2019 and will showcase the many research efforts and community-based partnerships at U-M that focus on improving humanity by using data for public good. If you are interested in attending, please register here.

10:00 – 10:30: Registration and Networking
10:30 – 11:30: Presentations

  • Partners for Preschool: The Added Value of Learning Activities at Home During the Preschool Year, Amanda Ketner, School of Education
  • University-Community Partnership to Support Ambitious STEM Teaching: Leveraging University of Michigan expertise in education, research, and evaluation to support innovative, interactive teaching across the S.E. Michigan region and beyond, C. S. Hearn, Center for Education Design, Evaluation, and Research (CEDER)
  • Open Data Flint, Stage II, Kaneesha Wallace, MICHR
  • Research-Practice Partnerships at the Youth Policy Lab, A Foster, ISR Youth Policy Lab and School of Education
  • The LOOP Estimator: Adjusting for Covariates in Randomized Experiments, Edward Wu, Statistics

11:30 – 01:00: Lunch/Poster Session
01:00 – 02:00: Presentations

  • Barrier Busters: Unconditional Cash Transfers as a Strategy to Promote Economic Self-Sufficiency, Elise Gahan, School of Public Health
  • Implementing Trauma-Informed Care at University Libraries, Monte-Angel Richardson, School of Social Work
  • Why did the global crude oil price start to rise again after 2016?, Shin Heuk Kang, Economics
  • Poverty and economic hardship in Michigan communities: Data from the Michigan Public Policy Survey (MPPS), Natalie Fitzpatrick, Center for Local, State, and Urban Policy
  • Understanding Networks of Influence on U.S. Congressional Members’ Public Personae on Twitter, Angela Schopke, Chris Bredernitz, Caroline Hodge, School of Information

02:00 – 02:30: UM Student Organization Presentations
02:30 – 04:30: Workshop Debrief & Closing

About the Organizers: STATCOM is a community outreach organization offering the expertise of statistics graduate students – free of charge – to nonprofit governmental and community organizations. CTAC is a community-university partnership convened to serve a universal need identified by community partners around data and evaluation. CEDER is a School of Education center devoted exclusively to offering high-quality designs, evaluations, and research on teaching, learning, leadership, and policy at multiple levels of education. This symposium is part of our effort to bring together university organizations that promote similar ideals and individuals whose research provides a service for the greater good.

Questions: Please contact salernos@umich.edu.






Statistical Analysis with R

By |

This is a two day workshop (February 4 and 5) in R which  is a free and open source environment for data analysis and statistical computing.  While R contains many built-in statistical procedures, a powerful feature of R is the facility for users to extend these procedures to suit their own needs.  Excellent graphing capability is another reason R is gaining wide popularity.

  • How to Obtain R
  • Help Tools
  • Importing / Exporting Data
  • Data Management
  • Descriptive and Exploratory Statistics
  • Common Statistical Analyses (t-test, Regression Modeling, ANOVA, etc.)
  • Graphics
  • Creating Functions


Statistical Analysis with R

By |

This is a two day workshop (February 4 and 5) in R which  is a free and open source environment for data analysis and statistical computing.  While R contains many built-in statistical procedures, a powerful feature of R is the facility for users to extend these procedures to suit their own needs.  Excellent graphing capability is another reason R is gaining wide popularity.

  • How to Obtain R
  • Help Tools
  • Importing / Exporting Data
  • Data Management
  • Descriptive and Exploratory Statistics
  • Common Statistical Analyses (t-test, Regression Modeling, ANOVA, etc.)
  • Graphics
  • Creating Functions


ASA Symposium on Data Science & Statistics

By |


Beyond Big Data: Leading the Way

The ASA’s newest conference, the Symposium on Data Science & Statistics, will take place in Reston, Virginia, May 16-19, 2018. The symposium is designed for data scientistscomputer scientists, and statisticians analyzing and visualizing complex data.

The annual SDSS will combine data science and statistical machine learning with the historical strengths of the Interface Foundation of North America (IFNA) in computational statistics, computing science, and data visualization. It will continue the IFNA’s tradition of excellence by providing an opportunity for researchers and practitioners to share knowledge and establish new collaborations.

Offering sessions centered on the following six topic areas:
Data Science                                            Data Visualization
Machine Learning                                  Computing Science
Computational Statistics                      Applications

Key Dates:
December 5, 2017 – Contributed and E-Poster Online Abstract Submission Opens
January 18, 2018 – Contributed and E-Poster Online Abstract Submission Closes
February 1, 2018 – Conference Registration Opens

Interdisciplinary Seminar in Quantitative Methods (ISQM): Arthur Spirling, PhD, New York University

By |

Arthur Spirling, Ph.D.

Associate Professor, Politics, Data Science

New York University


‘Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It’

ABSTRACT:  Despite the popularity of unsupervised techniques for political science text-as-data research, the importance and implications of preprocessing decisions in this domain have received scant systematic attention. Yet, as we show, such decisions have profound effects on the results of real models for real data. We argue that substantive theory is typically too vague to be of use for feature selection, and that the supervised literature is not necessarily a helpful source of advice. To aid researchers working in unsupervised settings, we introduce a statistical procedure and software that examines the sensitivity of findings under alternate preprocessing regimes. This approach complements a researcher’s substantive understanding of a problem by providing a characterization of the variability changes in preprocessing choices may induce when analyzing a particular dataset. In making scholars aware of the degree to which their results are likely to be sensitive to their preprocessing decisions, it aids replication efforts.

BIO: Arthur Spirling is an Associate Professor of Politics and Data Science at New York University. He is the Deputy Director and the Director of Graduate Studies at the Center for Data Science, and Chair of the Education and Training Working Group of the Moore-Sloan Data Science Environment. He specializes in political methodology and legislative behavior, with an interest in the application of texts-as-data, Bayesian statistics, item response theory and generalized linear models in political science. His substantive field is comparative politics, and he focuses primarily on the United Kingdom. He received his PhD from the University of Rochester, Department of Political Science, in 2008. From 2008 to 2015, he was an Assistant Professor and then the John L. Loeb Associate Professor of the Social Sciences in the Department of Government at Harvard University.

LOGISTICS: Wed October 25, 4pm,  ****LOCATION: 3222 Angell Hall***** Angell Hall is connected to Haven Hall. If you go to the third floor of Haven Hall, you can follow a walking path to get the third floor of Angell Hall without ever leaving the building. See walking map here: https://hr.umich.edu/sites/default/files/m-h-t-a-halls.pdf.

ASA Conference: Women in Statistics and Data Science, La Jolla, California

By |

The American Statistical Association invites you to join us at the 2017 Women in Statistics and Data Science Conference in La Jolla, California—the only conference for the field tailored specifically for women!

Join us to “share WISDOM (Women in Statistics, Data science, and -OMics).”

WSDS will gather professionals and students from academia, industry, and the government working in statistics and data science. Find unique opportunities to grow your influence, your community, and your knowledge.

Whether you are a student, early-career professional, or an experienced statistician or data scientist, this conference will deliver new knowledge and connections in an intimate and comfortable setting.

Learn More!

Interdisciplinary Seminar in Quantitative Methods (ISQM): Christian Hansen, PhD, University of Chicago,

By |


Christian B. Hansen, Ph.D.

Wallace W. Booth Professor of Econometrics and Statistics

The University of Chicago Booth School of Business

‘Targeted Undersmoothing’

ABSTRACT: This paper proposes a post-model selection inference procedure, called targeted undersmoothing, designed to construct uniformly valid confidence sets for functionals of sparse high-dimensional models, including dense functionals that may depend on many or all elements of the high-dimensional parameter vector. The confidence sets are based on an initially selected model and two additional models which enlarge the initial model. We apply the procedure in two empirical examples: estimating heterogeneous treatment effects in a job training program and estimating profitability from an estimated mailing strategy in a marketing campaign. We also illustrate the procedure’s performance through simulation experiments.

BIO: Christian B. Hansen studies applied and theoretical econometrics, the uses of high-dimensional statistical methods in economic applications, estimation of panel data models, quantile regression, and weak instruments. In 2008, Hansen was named a Neubauer Family Faculty Fellow, and he was named the Wallace W. Booth professorship in 2014. Hansen’s recent research has focused on the uses of high-dimensional data and methods in economics applications. The papers “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain” with A. Belloni, D. Chen, and V. Chernzhukov (Econometrica, 2012) and “Inference on Treatment Effects after Selection amongst High-Dimensional Controls” with A. Belloni and V. Chernozhukov (Review of Economic Studies, 2014) present approaches to estimating structural or treatment effects from economic data in canonical instrumental variables and treatment effects models. These papers are extended in “Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach” with V. Chernozhukov and M. Spindler (Annual Review of Economics, 2015) and the forthcoming papers “Inference in High Dimensional Panel Models with an Application to Gun Control” with A. Belloni, V. Chernozhukov, and D. Kozbur (Journal of Business and Economic Statistics) and “Program Evaluation with High-Dimensional Data” with A. Belloni, V. Chernozhukov, and I. Fernández-Val (Econometrica).

Hansen has published articles regarding identification and estimation in panel data models, inference with data that may be spatially and temporally dependent, quantile regression, and instrumental variables models with weak or many instruments. His published work has appeared in several journals including Econometrica, the Journal of Business and Economic Statistics, the Journal of Econometrics, and the Review of Economics and Statistics.  He graduated from Brigham Young University with a bachelor’s degree in economics in 2000. In 2004, he received a PhD in economics from the Massachusetts Institute of Technology, where he was a graduate research fellow of the National Science Foundation. He joined the Chicago Booth faculty in 2004.
List of upcoming speakers on the ISQM website [https://www.isr.umich.edu/cps/events/isqm/]

UM Biostatistics Seminar: Veronika Rockova, PhD, University of Chicago

By |


Veronika Rockova, Ph.D.

Assistant Professor in Econometrics and Statistics

The University of Chicago Booth


‘Fast Bayesian Factor Analysis via Automatic Rotations to Sparsity’

Abstract: Rotational post hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations, and (c) better oriented sparse solutions. To avoid the prespecification of the factor cardinality, we extend the loading matrix to have infinitely many columns with the Indian buffet process (IBP) prior. The factor dimensionality is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the spike-and-slab LASSO prior, a two-component refinement of the Laplace prior. A companion criterion, motivated as an integral lower bound, is provided to effectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional data, which would render posterior simulation impractical. Supplementary materials for this article are available online.

Bio: Veronika Rockova is Assistant Professor in Econometrics and Statistics at the University of Chicago Booth School of Business. Her work brings together statistical methodology, theory and computation to develop high-performance tools for analyzing large datasets. Her research interests reside at the intersection of Bayesian and frequentist statistics, and focus on: data mining, variable selection, optimization, non-parametric methods, factor models, high-dimensional decision theory and inference. She has authored a variety of published works in top statistics journals. In her applied work, she has contributed to the development of risk stratification and prediction models for public reporting in healthcare analytics.

Prior to joining Booth, Rockova held a Postdoctoral Research Associate position at the Department of Statistics of the Wharton School at the University of Pennsylvania. Rockova holds a PhD in biostatistics from Erasmus University (The Netherlands), an MSc in biostatistics from Universiteit Hasselt (Belgium) and both an MSc in mathematical statistics and a BSc in general mathematics from Charles University (Czech Republic).

Besides enjoying statistics, she is a keen piano player.


Light refreshments for seminar guests will be served at 3:00 p.m. in 3755.

Liza Levina, PhD, Chosen IMS Medallion Lecturer in 2019

By | Events, Feature, General Interest, Happenings, News, Research

Professor Liza Levina has been selected to present an Institute of Mathematical Statistics (IMS) Medallion Lecture at the 2019 Joint Statistical Meeting (JSM).

Each year eight Medallion Lecturers are chosen from across all areas of statistics and probability by the IMS Committee on Special Lectures. The Medallion nomination is an honor and an acknowledgment of a significant research contribution to one or more areas of research. Each Medallion Lecturer will receive a Medallion in a brief ceremony preceding the lecture.

Jacob Abernethy and Eric Schwartz: Statistical and Algorithmic Tools to Aid Recovery in Flint

By |

ABSTRACT: Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. On the other hand, city, state, and federal officials have been collecting and organizing a significant amount of data, including many thousands of water samples, information on pipe materials, and city records. Combining all of this information, and utilizing state-of-the-art algorithmic and statistical tools, we have be able to develop a clearer picture as to the source of the problems, to accurately estimate the greatest risks, and to more efficiently direct resources towards recovery.

Bio: Jacob Abernethy is an Assistant Professor in the EECS Department at the University of Michigan, Ann Arbor. He finished his PhD in Computer Science at the UC Berkeley, and was a Simons postdoctoral fellow at the University of Pennsylvania. Jake’s primary interest is in Machine Learning, and he likes discovering connections between Optimization, Statistics, and Economics.

Bio: Eric Schwartz is an Assistant Professor of Marketing at the University of Michigan’s Ross School of Business in Ann Arbor. He received his PhD in Marketing from the Wharton School at the University of Pennsylvania in 2013. His research focuses on predicting customer behavior, understanding its drivers, and examining how firms actively manage their customer relationships through interactive marketing. The quantitative methods he uses are primarily Bayesian statistics, machine learning, dynamic programming, and field experiments.