Interdisciplinary Seminar in Quantitative Methods (ISQM): Arthur Spirling, PhD, New York University

By |

Arthur Spirling, Ph.D.

Associate Professor, Politics, Data Science

New York University


‘Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It’

ABSTRACT:  Despite the popularity of unsupervised techniques for political science text-as-data research, the importance and implications of preprocessing decisions in this domain have received scant systematic attention. Yet, as we show, such decisions have profound effects on the results of real models for real data. We argue that substantive theory is typically too vague to be of use for feature selection, and that the supervised literature is not necessarily a helpful source of advice. To aid researchers working in unsupervised settings, we introduce a statistical procedure and software that examines the sensitivity of findings under alternate preprocessing regimes. This approach complements a researcher’s substantive understanding of a problem by providing a characterization of the variability changes in preprocessing choices may induce when analyzing a particular dataset. In making scholars aware of the degree to which their results are likely to be sensitive to their preprocessing decisions, it aids replication efforts.

BIO: Arthur Spirling is an Associate Professor of Politics and Data Science at New York University. He is the Deputy Director and the Director of Graduate Studies at the Center for Data Science, and Chair of the Education and Training Working Group of the Moore-Sloan Data Science Environment. He specializes in political methodology and legislative behavior, with an interest in the application of texts-as-data, Bayesian statistics, item response theory and generalized linear models in political science. His substantive field is comparative politics, and he focuses primarily on the United Kingdom. He received his PhD from the University of Rochester, Department of Political Science, in 2008. From 2008 to 2015, he was an Assistant Professor and then the John L. Loeb Associate Professor of the Social Sciences in the Department of Government at Harvard University.

LOGISTICS: Wed October 25, 4pm,  ****LOCATION: 3222 Angell Hall***** Angell Hall is connected to Haven Hall. If you go to the third floor of Haven Hall, you can follow a walking path to get the third floor of Angell Hall without ever leaving the building. See walking map here:

Interdisciplinary Seminar in Quantitative Methods (ISQM): Christian Hansen, PhD, University of Chicago,

By |


Christian B. Hansen, Ph.D.

Wallace W. Booth Professor of Econometrics and Statistics

The University of Chicago Booth School of Business

‘Targeted Undersmoothing’

ABSTRACT: This paper proposes a post-model selection inference procedure, called targeted undersmoothing, designed to construct uniformly valid confidence sets for functionals of sparse high-dimensional models, including dense functionals that may depend on many or all elements of the high-dimensional parameter vector. The confidence sets are based on an initially selected model and two additional models which enlarge the initial model. We apply the procedure in two empirical examples: estimating heterogeneous treatment effects in a job training program and estimating profitability from an estimated mailing strategy in a marketing campaign. We also illustrate the procedure’s performance through simulation experiments.

BIO: Christian B. Hansen studies applied and theoretical econometrics, the uses of high-dimensional statistical methods in economic applications, estimation of panel data models, quantile regression, and weak instruments. In 2008, Hansen was named a Neubauer Family Faculty Fellow, and he was named the Wallace W. Booth professorship in 2014. Hansen’s recent research has focused on the uses of high-dimensional data and methods in economics applications. The papers “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain” with A. Belloni, D. Chen, and V. Chernzhukov (Econometrica, 2012) and “Inference on Treatment Effects after Selection amongst High-Dimensional Controls” with A. Belloni and V. Chernozhukov (Review of Economic Studies, 2014) present approaches to estimating structural or treatment effects from economic data in canonical instrumental variables and treatment effects models. These papers are extended in “Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach” with V. Chernozhukov and M. Spindler (Annual Review of Economics, 2015) and the forthcoming papers “Inference in High Dimensional Panel Models with an Application to Gun Control” with A. Belloni, V. Chernozhukov, and D. Kozbur (Journal of Business and Economic Statistics) and “Program Evaluation with High-Dimensional Data” with A. Belloni, V. Chernozhukov, and I. Fernández-Val (Econometrica).

Hansen has published articles regarding identification and estimation in panel data models, inference with data that may be spatially and temporally dependent, quantile regression, and instrumental variables models with weak or many instruments. His published work has appeared in several journals including Econometrica, the Journal of Business and Economic Statistics, the Journal of Econometrics, and the Review of Economics and Statistics.  He graduated from Brigham Young University with a bachelor’s degree in economics in 2000. In 2004, he received a PhD in economics from the Massachusetts Institute of Technology, where he was a graduate research fellow of the National Science Foundation. He joined the Chicago Booth faculty in 2004.
List of upcoming speakers on the ISQM website []