# November 10 at 10:10am

## The Testing Paradox for COVID-19 (10:10am-10:30am)

##### Modeling COVID-19 testing strategy

**Bhramar Mukherjee – Professor and Chair, Biostatistics & Lauren Beesley – Post Doctoral Student, Biostatistics**

Reported case-counts for coronavirus are wrinkled with data errors, namely misclassification of the tests and selection bias associated with who got tested. The number of covert or unascertained infections is large across the world. How can one determine optimal testing strategies with such imperfect data? In this talk, we propose an optimization algorithm for allocating diagnostic/surveillance tests when your objective is estimating the true population prevalence or detecting an outbreak. Infectious disease models and survey sampling techniques are used jointly to come up with these strategies.

## Students’ mobility patterns on campus and the implications for the recovery of campus activities post-pandemic (10:30am-10:50am)

##### Modeling campus public health behavior

**Quan Nguyen – Research Fellow, School of Information**

This research project uses location data gathered from WiFi access points on campus to model the mobility patterns of students in order to inform the planning of educational activities that can minimize the transmission risk.

The first aim is to understand the general mobility patterns of students on campus to identify physical spaces associating with a high-risk of transmission. For example, we can extract insights from WiFi data about which locations are the busiest during which time of the day, how much time was typically spent at each location, and how do these mobility patterns change over time. The second aim is to understand how students share the same physical spaces on campus (e.g. attending a lecture, meeting in the same room, sharing the same dorm). Students are presumably in a close proximity when they are connected to the same WiFi access point. We model a student-to-student network from their co-location activities and use its network centrality measures as proxies of transmission risk (i.e. students in the center of a network would have a higher chance of getting exposed to COVID-19 than those in the periphery). We then correlate network centrality measures with academic information (e.g. class schedule, course enrollment, study major, year of study, gender, ethnicity) to determine whether certain features of the academic record are related to transmission risk. For example, we can identify which groups of students are more vulnerable to potential infections by associating with a high network centrality. Insights from this research project will inform the University of Michigan’s strategies for the recovery of educational activities post-pandemic with empirical evidence of students’ mobility pattern on campus as well as factors that associate with a high-risk of transmission.

## Modeling the Perceived Truthfulness of Public Statements on COVID-19: A New Model for Pairwise Comparisons of Objects with Multidimensional Latent Attributes (10:50am-11:10am)

##### Modeling the perception of truthfulness

**Qiushi Yu – Ph.D. student, Political Science & ****Kevin Quinn – Professor, Political Science**

What is more important for how individuals perceive the truthfulness of statements about COVID-19: a) the objective truthfulness of the statements, or b) the partisanship of the individual and the partisanship of the people making the statements? To answer this question, we develop a novel model for pairwise comparisons data that allows for a richer structure of both the latent attributes of the objects being compared and rater-specific perceptual differences than standard models. We use the model to analyze survey data that we collected in the summer of 2020. This survey asked respondents to compare the truthfulness of pairs of statements about COVID-19. These statements were taken from the fact-checked statements on https://www.politifact.com. We thus have an independent measure of the truthfulness of each statement. We find that the actual truthfulness of a statement explains very little of the variability in individuals’ perceptions of truthfulness. Instead, we find that the partisanship of the speaker and the partisanship of the rater account for the majority of the variation in perceived truthfulness, with statements made by co-partisans being viewed as more truthful.

## Computational Neuroscience, Time Complexity, and Spacekime Analytics (11:10am-11:30am)

##### Modeling high-dimensional, longitudinal data

**Ivo Dinov – Professor, HBBS/SoN, DCMB/SoM, MIDAS**

The proliferation of digital information in all human experiences presents difficult challenges and offers unique opportunities of managing, modeling, analyzing, interpreting, and visualizing heterogeneous data. There is a substantial need to develop, validate, productize, and support novel mathematical techniques, advanced statistical computing algorithms, transdisciplinary tools, and effective artificial intelligence apps.

Spacekime analytics is a new technique for modeling high-dimensional longitudinal data, such as functional magnetic resonance imaging (fMRI). This approach relies on extending the notions of time, events, particles, and wavefunctions to complex-time (kime), complex-events (kevents), data and inference-functions, respectively. This talk will illustrate how the kime-magnitude (longitudinal time order) and kime-direction (phase) affect the subsequent predictive analytics and the induced scientific inference. The mathematical foundation of spacekime calculus reveals various statistical implications including inferential uncertainty and a Bayesian formulation of spacekime analytics. Complexifying time allows the lifting of all commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacetime manifold, where a number of interesting mathematical problems arise.

Spacekime analytics transforms time-varying data, such as time-series observations, into higher-dimensional manifolds representing complex-valued and kime-indexed surfaces (kime-surfaces). This process uncovers some of the intricate structure in high-dimensional data that may be intractable in the classical space-time representation of the data. In addition, the spacekime representation facilitates the development of innovative data science analytical methods for model-based and model-free scientific inference, derived computed phenotyping, and statistical forecasting. Direct neuroscience science applications of spacekime analytics will be demonstrated using simulated data and clinical observations (e.g., UK Biobank).

View Recording## Challenges in dynamic mode decomposition (11:30am-11:50am)

##### Modeling time series data

**Ziyou Wu – PhD student, Electrical and computer engineering, Bio-inspired robotics dynamical system lab**

Dynamic Mode Decomposition (DMD) is a powerful tool in extracting spatio-temporal patterns from multi-dimensional time series. DMD takes in time series data and computes eigenvalues and eigenvectors of a finite-dimensional linear model that approximates the infinite-dimensional Koopman operator which encodes the dynamics. DMD is used successfully in many fields: fluid mechanics, robotics, neuroscience, and more. Two of the main challenges remaining in DMD research are noise sensitivity and issues related to Krylov space closure when modeling nonlinear systems. In our work, we encountered great difficulty in reconstructing time series from multilegged robot data. These are oscillatory systems with slow transients, which decay only slightly faster than a period.

Here we present an investigation of possible sources of difficulty by studying a class of systems with linear latent dynamics which are observed via multinomial observables. We explore the influences of dataset metrics, the spectrum of the latent dynamics, the normality of the system matrix, and the geometry of the dynamics. Our numerical models include system and measurement noise. Our results show that even for these very mildly nonlinear conditions, DMD methods often fail to recover the spectrum and can have poor predictive ability. We show that for a system with a well-posed system matrix, having a dataset with more initial conditions and shorter trajectories can significantly improve the prediction. With a slightly ill-conditioned system matrix, a moderate trajectory length improves the spectrum recovery. Our work provides a self-contained framework on analyzing noise and nonlinearity, and gives generalizable insights dataset properties for DMD analysis.

Work was funded by ARO MURI W911NF-17-1-0306 and the Kahn Foundation.

# November 11 at 9am

## Novel Tools to Increase the Reliability and Reproducibility of Population Genetics Research (9:00am-9:20am)

##### Addressing selection bias in population data

**Yajuan Si – Research Assistant Professor, Survey Research Center, Institute for Social Research**

Advances in population genetic research have the potential to create numerous important advances in the science of population dynamics. The interplay of micro-level biology and macro-level social sciences documents gene–environment–phenotype interactions and allows us to examine how genetics relates to child health and wellbeing. However, traditional genetics research is based on nonrepresentative samples that deviate from the target population, such as convenience and volunteer samples. This lack of representativeness may distort association studies. Recent findings have provoked concern about misinterpretation, irreproducibility and lack of generalizability, exemplifying the need to leverage survey research with genetics for population-based research. This project is motivated by the research team’s collaborative work on the Fragile Family and Child Wellbeing Study and the Adolescent Brain Cognitive Development Study, which present these common problems in population genetics studies, to advance the integration of genetic science into population dynamics research. The project will evaluate sample selection effects, identify population heterogeneity in polygenic score analysis, and develop strategies to adjust for selection bias in the association studies of educational attainment, cognition status and substance use for child health and wellbeing. This interdisciplinary project will strengthen the validity and generalizability of population genetics research, deepen new understandings of human behavior and facilitate advances in population science.

## An end-to-end deep learning system for rapid analysis of the breath metabolome with applications in critical care illness and beyond (9:20am-9:40am)

##### Deep learning for biomedical research

**Christopher Gillies – Assistant Research Scientist, Emergency Medicine****
**The metabolome is the set of low-molecular-weight metabolites and its quantification represents a summary of the physiological state of an organism. Metabolite concentration levels in biospecimens are important for many critical care health illnesses like sepsis and acute respiratory distress syndrome (ARDS). Sepsis is responsible for 35% of patients who die in the hospital and ARDS has a mortality rate of 40%. Missing data is a common challenge in metabolomics datasets. Many metabolomics investigators impute fixed values for missing metabolite concentrations and this imputation approach leads to lower statistical power, biased parameter estimates, and reduced prediction accuracy. Certain applications of metabolomics data, like breath analysis by gas chromatography, used for the prediction or detection of ARDS, can be done without the quantification of individual metabolites. This would circumvent the quantification step of individual metabolites, eliminating the missing data problem. Our team has developed a rapid gas chromatography breath analyzer, which has been challenged by missing data, a time-consuming process of breath signature alignment, and the following quantification of metabolites across patients. Analyzing the breath signal directly could eliminate these challenges. End-to-end deep learning systems are neural networks that operate directly on a raw data source and make a prediction directly for the target application. These systems have been successful in diverse fields from speech recognition to medicine. We envision an end-to-end deep learning that leverages transfer learning, from the collection of many healthy samples, that could rapidly multiply the applications of our breath analyzer. The end-to-end deep learning system will enhance our breath analyzer so it could be used more efficiently in areas of the intensive care unit to the battlefield to identity patients or soldiers with critical illnesses like sepsis and ARDS and monitor longitudinal changes in breath metabolites.

## Machine learning-guided equations for the on-demand prediction of natural gas storage capacities of materials for vehicular applications (9:40am-10:00am)

##### Machine learning for energy research

**Alauddin Ahmed – Assistant Research Scientist, Mechanical Engineering**

Transportation is responsible for nearly one-third of the world’s carbon dioxide (CO2) emission because of burning fossil fuel. While we dream for zero-carbon vehicles, future projections suggest little decline in fossil fuel consumption by the transportation sector until 2050. Therefore, ‘bending the curve’ of CO2 emission prompts the adoption of low-cost and reduced-emission alternative fuels. Natural gas (NG), the most abundant fossil fuel on earth, is such an alternative with nearly 25% lower carbon footprint and lower price compared to its gasoline counterpart. However, the widespread adoption of natural gas as a vehicular fuel is hindered by the scarcity of high-capacity, light-weight, low-cost, and safe storage systems. Recently, materials-based natural gas storage for vehicular applications have become one of the most viable options. Especially, nanoporous materials (NPMs) are in the spotlight of the U.S. Department of Energy (DOE) because of their exceptional energy storage capacities. However, the number of such NPMs is nearly infinite. It is unknown, a priori, which materials would have the expected natural gas storage capacity. Therefore, searching a high-performing material is like ‘finding a needle in a haystack’ that slows down the speed of materials discovery against growing technological demand. Here we present a novel approach of developing machine learning-guided equations for the on-demand prediction of energy storage capacities of NPMs using a few physically meaningful structural properties. These equations provide users the ability to calculate energy storage capacity of an arbitrary NPM rapidly using only paper and pencil. We show the utility of these equations by predicting NG storage of over 500,000 covalent-organic frameworks (COFs), a class of NPMs. We discovered a COF with record-setting NG storage capacity, surpassing the unmet target set by DOE. In principle, the data-driven approach presented here might be relevant to other disciplines including science, engineering, and health care.

## Fusing Computer Vision And Space Weather Modeling (10:00am-10:20am)

##### Deep learning and computer vision for space science

**David Fouhey – Assistant Professor, UM EECS**

Space weather has impacts on Earth ranging from rare, immensely disruptive events (e.g., electrical blackouts caused by solar flares and coronal mass ejections) to more frequent impacts (e.g., satellite GPS interference from fluctuations in the Earth’s ionosphere caused by rapid variations in the solar extreme UV emission). Earth-impacting events are driven by changes in the Sun’s magnetic field; we now have myriad instruments capturing petabytes worth of images of the Sun at a variety of wavelengths, resolutions, and vantage points. These data present opportunities for learning-based computer vision since the massive, well-calibrated image archive is often accompanied by physical models. This talk will describe some of the work that we have been doing to start integrating computer vision and space physics by learning mappings from one image or representation of the Sun to another. I will center the talk on a new system we have developed that emulates parts of the data processing pipeline of the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager (SDO/HMI). This pipeline produces data products that help study and serve as boundary conditions for solar models of the energetic events alluded to above. Our deep-learning-based system emulates a key component hundreds of times faster than the current method, potentially opening doors to new applications in near-real-time space weather modeling. In keeping with the goals of the symposium, however, I will focus on some of the benefits close collaboration has enabled in terms of understanding how to frame the problem, measure success of the model, and even set up the deep network.

## Decoding the Environment of Most Energetic Sources in the Universe (10:20am-10:40am)

##### Machine learning for astronomy

**Oleg Gnedin – Professor, Department of Astronomy, LSA**

Astrophysics has always been at the forefront of data analysis. It has led to advancements in image processing and numerical simulations. The coming decade is bringing qualitatively new and larger datasets than ever before. The next generation of observational facilities will produce an explosion in the quantity and quality of data for the most distant sources, such as the first galaxies and first quasars. Quasars are the most energetic objects in the universe, reaching luminosity up to 10^14 that of the Sun. Their emission is powered by giant black holes that convert matter into energy according to the famous Einstein’s equation E = mc^2. The largest progress will occur in quasar spectroscopy. Detailed measurements of spectrum of quasar light, as it is being emitted near the central black hole and partially absorbed by clouds of gas on the way to the observer on Earth, allows for a particularly powerful probe of quasar environment. Because spectra of different chemical elements are unique, spectroscopy allows to study not only the overall properties of matter such as density and temperature, but also the detailed chemical composition of the intervening matter. However, the interpretation of these spectra is made very challenging by the many sources contributing to the absorption of light. In order to take a full advantage of this new window into the nature of supermassive black holes we need detailed theoretical understanding of the origin of quasar spectral features. In a MIDAS PODS project we are applying machine learning to model and extract such features. We are training the models using data from the state-of-the-art numerical simulations of the early universe. This approach is fundamentally different from traditional astronomical data analysis. We have only started learning what information can be extracted and still looking for a new framework to interpret these data.