The National Academies Webinar Series: Data Science Undergraduate Education

By |




Webinar Series: Data Science Undergraduate Education

Join the National Academies of Sciences, Engineering, and Medicine for a webinar series on undergraduate data science education. Webinars will take place on Tuesdays from 3-4pm ET starting on September 12 and ending on November 14. See below for the list of dates and themes for each webinar.

This webinar series is part of an input-gathering initiative for a National Academies study on Envisioning the Data Science Discipline: The Undergraduate Perspective. Learn more about the study, read the interim report, and share your thoughts with the committee on the study webpage at

Webinar speakers will be posted as they are confirmed on the webinar series website.

Webinar Dates and Topics

•    9/12/17 – Building Data Acumen
•    9/19/17 – Incorporating Real-World Applications
•    9/26/17 – Faculty Training and Curriculum Development
•    10/3/17 – Communication Skills and Teamwork
•    10/10/17 – Inter-Departmental Collaboration and Institutional Organization
•    10/17/17 – Ethics
•    10/24/17 – Assessment and Evaluation for Data Science Programs
•    11/7/17 – Diversity, Inclusion, and Increasing Participation
•    11/14/17 – Two-Year Colleges and Institutional Partnerships

All webinars take place from 3-4pm ET. You will have the option to register for the entire webinar series or for individual webinars.

MBDH Big Data Seminar Series: Iowa State University, Federal Statistical Research Data Centers: How to Get Involved

By |

Midwest Big Data Hub

The Iowa State University Big Data Seminar Series

‘Federal Statistical Research Data Centers:

Opportunities and How to Get Involved’


Lily Wang, Associate Professor of Statistics

Florence Honore, Assistant Professor of Management

 Zhengyuan Zhu, Associate Professor of Statistics


Federal Statistical Research Data Centers (FSRDCs) are special research facilities where qualified researchers conduct approved statistical analysis on non-public data collected by U.S. Census Bureau and other agencies in the federal statistical system. This presentation will give an introduction of FSRDCs and the exciting opportunities that it can bring to ISU researchers – faculty, research staff, and graduate students.

Register for the event here.

Bing Liu, University of Illinois at Chicago – MIDAS Seminar Series

By |


Bing Liu, PhD

University of Illinois, Chicago

Recorded Seminar


“Lifelong Machine Learning”

Abstract: Lifelong Machine Learning (or Lifelong Learning) is an advanced machine learning paradigm that learns continuously, accumulates the knowledge learned in the past, and uses it to help future learning. In the process, the learner becomes more and more knowledgeable and effective at learning. This learning ability is one of the hallmarks of human intelligence. However, the current dominant machine learning paradigm learns in isolation: given a training dataset, it runs a machine learning algorithm on the dataset to produce a model. It makes no attempt to retain the learned knowledge and use it in future learning. Although this isolated learning paradigm has been very successful, it requires a large number of training examples, and is only suitable for well-defined and narrow tasks. In comparison, we human can learn effectively with a few examples because we have accumulated so much knowledge in the past which enables us to learn with little data or effort. Lifelong learning aims to achieve this capability. As statistical machine learning matures, it is time to break the isolated learning tradition to study lifelong learning. Applications such as intelligent assistants, chatbots, and physical robots that interact with humans and systems in real-life environments are also calling for such lifelong learning capabilities. Without the ability to accumulate the learned knowledge and use it to learn more knowledge incrementally, a system will probably never be truly intelligent. In this talk, I will introduce lifelong learning, discuss related learning paradigms, and present some of our recent work on the topic.


Bio: Bing Liu is a professor of Computer Science at the University of Illinois at Chicago. He received his Ph.D. in Artificial Intelligence from the University of Edinburgh. His research interests include lifelong machine learning, sentiment analysis, data mining, machine learning, and natural language processing. He has published extensively in top conferences and journals in these areas. Two of his papers have received 10-year Test-of-Time awards from KDD, the premier conference of data mining and data science. He also authored four books: one on lifelong machine learning (coming later this month), one on Web data mining, and two on sentiment analysis. Some of his work has also been widely reported in the press, including a front-page article in the New York Times. On professional services, he serves as the current Chair of ACM SIGKDD. He has served as program chair of many leading data mining conferences, including KDD, ICDM, CIKM, WSDM, SDM, and PAKDD, as associate editor of leading journals such as TKDE, TWEB, and DMKD, and as area chair or senior PC members of numerous natural language processing, AI, Web research, and data mining conferences. He is a Fellow of ACM, AAAI and IEEE.

For more information on MIDAS or the Seminar Series, please contact

MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

Tamara Kolda, PhD, Sandia National Labs – MIDAS Seminar Series

By |


Tamara Kolda, PhD

Sandia National Labs


An Overview of Tensor Decompositions for Data Analysis,with Emphasis on Computation and Scalability


Abstract: Tensors are multiway arrays, and tensor decompositions are powerful tools for data analysis and compression. In this talk, we demonstrate the wide-ranging utility of both the canonical polyadic (CP) and Tucker tensor decompositions with examples in neuroscience, social networks, and combustion science. We explain the model-fitting challenges for CP, including nonconvexity and NP-hardness, as well as the benefits, including uniqueness of the decomposition and the interpretability the results. We discuss the different types of tensor decompositions. For instance, a different choice of the fit metric in CP leads to Poisson Tensor Factorization for count data. Tucker has several advantages compared to CP such as the ability to easily compute the rank and even the rank required for a specific level of approximation. We present new results in scalability for both methods. For CP, we present a novel randomization method that not only improves the speed of the computation but also its robustness. For Tucker, we present results on compressing massive data sets by orders of magnitude by discovery of latent low-dimensional manifolds.

Bio: Tamara G. Kolda is a Distinguished Member of the Technical Staff at Sandia National Laboratories in Livermore, CA. She holds a Ph.D. in applied mathematics from the University of Maryland at College Park and is a past Householder Postdoctoral Fellow in Scientific Computing at Oak Ridge National Laboratory. She has received several awards for her work including a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE), an R&D 100 Award, and three best paper prizes. She is a Distinguished Scientist of the Association for Computing Machinery (ACM) and a Fellow of the Society for Industrial and Applied Mathematics (SIAM). She is currently a member of the SIAM Board of Trustees, Section Editor for the Software and High Performance Computing Section for the SIAM Journal on Scientific Computing, and Associate Editor for the SIAM Journal on Matrix Analysis and Applications.

For more information on MIDAS or the Seminar Series, please contact MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

Yuejie Chi, PhD, Ohio State University – Shannon Centennial Lecture Series

By |


Yuejie Chi, PhD

Assistant Professor
Department of Electrical and Computer Engineering
Department of Biomedical Informatics

The Ohio State University


“Solving Corrupted Systems of Quadratic Equations, Provably”

Abstract: In this talk, we consider the problem of estimating a low-dimensional subspace by observing the magnitudes of the backprojections of a set of random vectors, which are quadratic in the unknown subspace. This problem is motivated by applications in covariance sketching of high-dimensional data streams, phase retrieval, quantum space tomography, to name a few.  We will describe provable algorithms for solving this problem using both convex and non-convex approaches, even when the measurements are corrupted by arbitrary outliers. The class of convex approaches are based on lifting, and we will highlight a method for resisting outliers without regularization. The class of non-convex approaches are based gradient descent and its stochastic variant (namely, Kaczmarz method), and we will highlight a method for resisting outliers based on median-guided truncation.

Bio: Yuejie Chi is an assistant professor in the department of Electrical and Computer Engineering and the department of Biomedical Informatics at The Ohio State University since 2012, after receiving her Ph.D. from Princeton University. She is the recipient of the IEEE Signal Processing Society Young Author Best Paper Award in 2013 and the Best Paper Award at ICASSP 2012. She received the Young Investigator Program Awards from AFOSR and ONR respectively in 2015, the Ralph E. Powe Junior Faculty Enhancement Award from Oak Ridge Associated Universities in 2014, and Google Faculty Research Award in 2013. Her research interests include statistical signal processing, machine learning, information theory and their applications in high-dimensional data analysis, network inference, radar and bioinformatics.

For more information on MIDAS or the Seminar Series, please contact MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

Jacob Abernethy, PhD, University of Michigan- MIDAS Seminar Series

By |


Jacob Abernethy, PhD

Electrical Engineering and Computer Science

‘Statistical and Algorithmic Tools to Aid Recovery in Flint’


Abstract: Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. On the other hand, city, state, and federal officials have been collecting and organizing a significant amount of data, including many thousands of water samples, information on pipe materials, and city records. Combining all of this information, and utilizing state-of-the-art algorithmic and statistical tools, we have be able to develop a clearer picture as to the source of the problems, to accurately estimate the greatest risks, and to more efficiently direct resources towards recovery.

Bio: Jacob Abernethy is an Assistant Professor in the EECS Department at the University of Michigan, Ann Arbor. He finished his PhD in Computer Science at the UC Berkeley, and was a Simons postdoctoral fellow at the University of Pennsylvania. Jake’s primary interest is in Machine Learning, and he likes discovering connections between Optimization, Statistics, and Economics.

For more information on MIDAS or the Seminar Series, please contact MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

Rebecca Willett, PhD, University of Wisconsin – Shannon Centennial Lecture Series

By |


Rebecca Willet, PhD

‘Estimating High-Dimensional Autoregressive Point Processes’


Abstract: Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series of observations at each network node. Of particular interest are autoregressive point processes, in which observations consist of the times at which each node participates in some event or activity. Such data is common in spike train observations of biological neural networks, interactions within a social network, and pricing changes within financial networks. However, very little is known about how many events must be recorded before we may accurately infer the underlying autoregressive models. I will describe sparsity-regularized methods and associated performance bounds which provide new insight into the sample complexity of these problems in high dimensions. While sparsity-regularization is well-studied in the statistics and machine learning communities, common assumptions from that literature (such as the restricted eigenvalue condition) are difficult to verify in this setting because of the correlations and heteroscedasticity of the observations. A novel analysis method leveraging a combination of Martingale concentration inequalities and high-dimensional linear regression characterizes how much data must be collected to ensure reliable inference depending on the size and sparsity of the autoregressive parameters, and these bounds are supported by several empirical studies.

Bio: Rebecca Willett is an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005 and was an Assistant then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. Willett received the National Science Foundation CAREER Award in 2007, is a member of the DARPA Computer Science Study Group, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. Willett has also held visiting researcher or faculty positions at the University of Nice in 2015, the Institute for Pure and Applied Mathematics at UCLA in 2004, the University of Wisconsin-Madison 2003-2005, the French National Institute for Research in Computer Science and Control (INRIA) in 2003, and the Applied Science Research and Development Laboratory at GE Healthcare in 2002.

For more information on MIDAS or the Seminar Series, please contact MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

Geoff Ginsburg, MD, PhD, Duke University – MIDAS Seminar Series

By |


Geoff S. Ginsburg, MD, PhD

Professor of Medicine, Pathology, and Biomedical Engineering

Director, Duke Center for Applied Genomics & Precision Medicine

Director, Duke MEDx

Duke University

Novel Genomic Paradigms for Early Detection and Diagnosis of Acute Infectious Disease”


Abstract: Early detection of infection could have profound implications on patient management and prognosis by allowing prompt initiation of appropriate therapy. This strategy is not possible with current culture-based diagnostic platforms owing to their low sensitivities and the time required to obtain results. Compelling evidence now exists that the host response to pathogens, in the form of pathogen-specific host gene and protein expression signatures, can serve as a potential early and rapid diagnostic strategy. Therefore we have identified host gene expression profiles as a strategy for the diagnosis of infection. Using unbiased sparse latent factor regression analysis, we generated gene signatures (or factors) from peripheral blood RNA analysis that distinguish individuals with symptomatic acute viral respiratory infection from individuals with bacterial infection or from uninfected individuals with ~ 90% accuracy.  Signatures specific for bacterial infection and for non-infectious illness based on the host response have also been derived allowing for the distinction of viral and bacterial infection.  Moreover, through a series of partnerships we have developed prototype novel sensing technologies for mRNA that can elevate this approach from a research-based endeavor to a clinically useful point of care diagnostic tool.  Host-based diagnostics using novel nucleic acid and protein detection technologies will result in a paradigm shift in the early detection and diagnosis of infectious disease.

Bio: Dr. Ginsburg is the founding director for the Center for Applied Genomics & Precision Medicine at the Duke University Medical Center and for MEDx, a partnership between the Schools of Medicine and Engineering to spark and translate innovation.  His research addresses the challenges for translating genomic information into medical practice and the integration of precision medicine into healthcare. He serves as a member of the Board of External Experts for the NHLBI, of the advisory council for the National Center for Accelerating Translational Science, and of the World Economics Forum’s Global Agenda Council on the Future of the Health Sector. He is co-Chair of the Institute of Medicine’s Roundtable on Genomics and Precision Health, of the Cures Acceleration Network, and of IOM/NIH Global Genomic Medicine Collaborative.

For more information on MIDAS or the Seminar Series, please contact MIDAS gratefully acknowledges Northrop Grumman Corporation for its generous support of the MIDAS Seminar Series.

This event will be live streamed.