U-M Data Science Annual Symposium 2020

November 10, 12:00 AM - November 11, 2020, 12:00 AM

Virtual #2020MIDAS

U-M Data Science Annual Symposium discusses data feminism, COVID-19 – The Michigan Daily

Schedule

9:00 – Opening Remarks

H.V. Jagadish
Director, MIDAS | Professor of Electrical Engineering and Computer Science

9:05 – Keynote: Data Feminism

Institute for Research on Women and Gender

Co-Sponsored by:

View the Keynote Speakers
View Recording

10:05 – Break

10:10 – Research Talks Session 1

View the Research Talks

11:50 – Networking Session with Speakers

12:15 – Break

12:45 – Poster Session

The 2020 Symposium’s poster sessions will be hosted on via the College of Engineering’s CareerFair+ tool.View the Posters

14:45 – Mini-Workshops

Lead Presenter: Holly Hartman, PhD candidate, Biostatistics, University of Michigan

In this workshop, participants will gain a better understanding of systemic bias and how algorithms may continue to promote inequity. Participants will learn about agent based methods, a tool which can be used to examine algorithmic fairness. There will be opportunities to brainstorm ideas for new research projects within the participants’ fields.

View Recording

Lead Presenter: VG Vinod Vydiswaran, Assistant Professor, Learning Health Sciences and School of Information, University of Michigan

Natural language processing (NLP) and Data Science methods, including recently popular deep learning-based approaches, can unlock information from narrative text and have received great attention in the medical domain. Many NLP methods have been developed and showed promising results in various information extraction tasks, especially for rare classes of named entities. These methods have also been successfully applied to facilitate clinical research. In this workshop, we will highlight some methods and technologies to identify rare concepts and entities in text in the medical domain as well as other “open” domains.

View Recording

Lead Presenter: Fred Feng, Assistant Professor Industrial and Manufacturing Systems Engineering, University of Michigan-Dearborn

This hands-on workshop is tailored to audiences who do not have prior programming experience. The first half of the workshop covers Python programming basics and the second half covers performing data analysis and visualization in Python with real-world data. The audiences are encouraged to follow along with the examples on their own computer. We will use an online browser-based environment (Google Colab), and no software installations on your computer are required. Attendees will need a Google account and will sign in to their browser in order to use this cloud-based tool during the workshop.

View Recording

Lead Presenter: Jonathan Reader, Programmer/Data Analyst, Neurology, University of Michigan
Co-Presenters:

Nicolas May, Data Systems Manager, Neurology, University of Michigan
Kelly Bakulski, Research Assistant Professor, School of Public Health, University of Michigan

Before analysis, data must be retrieved, scrubbed of identifiable information, cleaned (e.g., addressed missing data, reshaped appropriately), and delivered. Using biomedical and transportation datasets as examples of how this generalizable process works, this workshop will walk attendees through a real-world pipeline used to process and deliver datasets. Documentation and code will be made available through GitLab to allow for coding along with the demonstration. As a result of this workshop, attendees will leave with a practical template for implementing their own a data science pipeline.

View Recording

Presentations:
Mike Mueller-Smith, Assistant Professor, Department of Economics, University of Michigan: “The Criminal Justice Administrative Records System: Assessing the Footprint of the U.S. Criminal Justice System”
David Johnson, Director and Research Professor, Panel Study of Income Dynamics and Survey Research Center, University of Michigan: “Building America’s Family Tree: The Panel Study of Income Dynamics”
Trent Alexander, Associate Director and Research Professor, ICPSR, University of Michigan: “Creating a New Census-based Longitudinal Infrastructure”
Joelle Abramowitz, Assistant Research Scientist, Survey Research Center, University of Michigan: “The Census-Enhanced Health and Retirement Study: Optimal Probabilistic Record Linkage for Linking Employers in Survey and Administrative Data”

Today’s pressing questions of social science and public policy demand an unprecedented degree of data scope and integration as we recognize the cross-cutting dynamics of economics, political science, sociology, demography, and psychology. This panel features four UM researchers who are pushing the frontier of data construction and linkage in coordination with partners at the U.S. Census Bureau.

View Slides

Lead Presenter: Jason Corso, Professor, Electrical Engineering and Computer Science, University of Michigan
Co-Presenters:

Maggie Levenstein, Director and Research Professor, ICPSR and School of Information, University of Michigan
Susan Jekielek, Assistant Research Scientist, ICPSR, University of Michigan
Donald Likosky, Professor, Department of Cardiac Surgery, University of Michigan

Video is being acquired at an alarming rate across domains, including social research, healthcare, entertainment, sporting and more. The ability to code this video—attribute certain properties, labels, and other annotations—in support of analytical domain-relevant questions is critical; otherwise, human coding is required. Human coding, however, is laborious, expensive, not repeatable, and, worse, often error prone. Video coding, an area within artificial intelligence and computer vision, seeks automated and semi-automated methods to support more effective and robust video coding. This workshop will review the state of the art in video coding from a capabilities, limitations and tooling perspective and present real-world use-cases.

View Recording

17:15 – End of Day

9:00 – Research Talks Session 2

View the Research Talks

10:40 – Poster Awards

Trisha Fountain
Education Program Manager, MIDAS

Poster Award Winners

Best Overall Poster, Most Effective Use of Data: April Kriebel – Department of Computational Medicine and Bioinformatics, Medical School “Integrating single-cell datasets with partially overlapping features using nonnegative matrix factorization”
Outstanding Project Design: Bruno Castelo Blanco – Marketing Department, Ross “Gaming Addiction: An Empirical Analysis”
Most Likely to Make an Impact in the Field: Alex Ritchie – EECS, College of Engineering “Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations”
Outstanding Undergraduate Poster: Nabeel Rehemtulla – Department of Astronomy, LSA “Non-Parametric Spherical Jeans Mass Estimation with B-Splines”

11:00 – Fireside Chat: Advancing Science and Social Change through Data Science and AI Research and Careers

View this Speaker
View Recording

12:00 – Closing Remarks, Networking Rooms open for additional discussion

H.V. Jagadish
Director, MIDAS | Professor of Electrical Engineering and Computer Science

Program Committee

Libby Hemphill

School of Information

Justin Johnson

Computer Science and Engineering

Danai Koutra

Computer Science and Engineering

Jing Liu

MIDAS

Christopher Miller

Astronomy

Sam Mukherji

Music Theory

Arvind Rao

Computational Medicine and Bioinformatics, and Radiation Oncology

Zhenke Wu

Biostatistics

Keynote Speakers

November 10 at 9:05am

Keynote: Data Feminism

Co-Sponsored by:

Institute for Research on Women & Gender

As data are increasingly mobilized in the service of governments and corporations, their unequal conditions of production, their asymmetrical methods of application, and their unequal effects on both individuals and groups have become increasingly difficult for data scientists–and others who rely on data in their work–to ignore. But it is precisely this power that makes it worth asking: “Data science by whom? Data science for whom? Data science with whose interests in mind? These are some of the questions that emerge from what we call data feminism, a way of thinking about data science and its communication that is informed by the past several decades of intersectional feminist activism and critical thought. Illustrating data feminism in action, this talk will show how challenges to the male/female binary can help to challenge other hierarchical (and empirically wrong) classification systems; it will explain how an understanding of emotion can expand our ideas about effective data visualization; how the concept of invisible labor can expose the significant human efforts required by our automated systems; and why the data never, ever “speak for themselves.” The goal of this talk, as with the project of data feminism, is to model how scholarship can be transformed into action: how feminist thinking can be operationalized in order to imagine more ethical and equitable data practices.

Catherine D’Ignazio, Assistant Professor, Urban Science & PlanningDirector, Data + Feminism LabDepartment of Urban Studies & Planning, MIT

Catherine D’Ignazio is a scholar, artist/designer and hacker mama who focuses on feminist technology, data literacy and civic engagement. She has run reproductive justice hackathons, designed global news recommendation systems, created talking and tweeting water quality sculptures, and led walking data visualizations to envision the future of sea level rise. With Rahul Bhargava, she built the platform Databasic.io, a suite of tools and activities to introduce newcomers to data science. Her 2020 book from MIT Press, Data Feminism, co-authored with Lauren Klein, charts a course for more ethical and empowering data science practices. Her research at the intersection of technology, design & social justice has been published in the Journal of Peer Production, the Journal of Community Informatics, and the proceedings of Human Factors in Computing Systems (ACM SIGCHI). Her art and design projects have won awards from the Tanne Foundation, Turbulence.org and the Knight Foundation and exhibited at the Venice Biennial and the ICA Boston. D’Ignazio is an Assistant Professor of Urban Science and Planning in the Department of Urban Studies and Planning at MIT. She is also Director of the Data + Feminism Lab which uses data and computational methods to work towards gender and racial equity, particularly in relation to space and place.

Lauren Klein, Associate Professor, English, Quantitative Theory and MethodsEmory University

Lauren Klein is an associate professor in the departments of English and Quantitative Theory & Methods at Emory University, where she also directs the Digital Humanities Lab. Before moving to Emory, she taught in the School of Literature, Media, and Communication at Georgia Tech. Klein works at the intersection of digital humanities, data science, and early American literature, with a research focus on issues of gender and race. She has designed platforms for exploring the contents of historical newspapers recreated forgotten visualization schemes with fabric and addressable LEDs and, with her students, cooked meals from early American recipes and then visualized the results. In 2017, she was named one of the “rising stars in digital humanities” by Inside Higher Ed. She is the author of An Archive of Taste: Race and Eating in the Early United States (University of Minnesota Press, 2020) and, with Catherine D’Ignazio, Data Feminism (MIT Press, 2020). With Matthew K. Gold, she edits Debates in the Digital Humanities, a hybrid print-digital publication stream that explores debates in the field as they emerge. Her current project, Data by Design: An Interactive History of Data Visualization, 1786-1900, was recently funded by an NEH-Mellon Fellowship for Digital Publication.

November 11 at 11:00am

Fireside Chat: Data Science as both a Science and a Force for Social Change

Eric Horvitz, Technical Fellow and Chief Scientific OfficerMicrosoft

Eric Horvitz is a technical fellow at Microsoft, where he serves as the company’s first Chief Scientific Officer. As chief scientist of the company, Dr. Horvitz provides leadership and perspectives on advances and trends on scientific matters, and on issues and opportunities rising at the intersection of technology, people, and society. He has pursued principles and applications of AI with contributions in machine learning, perception, natural language understanding, and decision making. His research centers on challenges with uses of AI amidst the complexities of the open world, including uses of probabilistic and decision-theoretic representations for reasoning and action, models of bounded rationality, and human-AI complementarity and coordination.

His efforts and collaborations have led to fielded systems in healthcare, transportation, ecommerce, operating systems, and aerospace. He received the Feigenbaum Prize and the Allen Newell Prize for contributions to AI. He received the CHI Academy honor for his work at the intersection of AI and human-computer interaction. He has been elected fellow of the National Academy of Engineering (NAE), the Association of Computing Machinery (ACM), Association for the Advancement of AI (AAAI), the American Association for the Advancement of Science (AAAS), the American Academy of Arts and Sciences, and the American Philosophical Society. He has served as president of the AAAI, and on advisory committees for the National Science Foundation, National Institutes of Health, President’s Council of Advisors on Science and Technology, DARPA, and the Allen Institute for AI.

Beyond technical work, he has pursued efforts and studies on the influences of AI on people and society, including issues around ethics, law, and safety. He chairs Microsoft’s Aether committee on AI, effects, and ethics in engineering and research. He established the One Hundred Year Study on AI at Stanford University and co-founded the Partnership on AI. Dr. Horvitz currently serves as a commissioner for the National Security Commission on AI and chairs the line of effort on ethical and responsible AI.

Eric received PhD and MD degrees at Stanford University. Previously, he served as director of Microsoft Research Labs, including research centers in Redmond, Washington, Cambridge, Massachusetts, New York, New York, Montreal, Canada, Cambridge, UK, and Bangalore, India. He also ran the Microsoft Research Lab in Redmond, Washington. More information can be found on his home page. A list of publications can be found here.

H.V. “Jag” Jagadish, Director, MIDAS. Moderator

Research Talks

U-M Data Science Annual Symposium discusses data feminism, COVID-19 – The Michigan Daily

November 10 at 10:10am

The Testing Paradox for COVID-19 (10:10am-10:30am)

Modeling COVID-19 testing strategy

Bhramar Mukherjee – Professor and Chair, Biostatistics & Lauren Beesley – Post Doctoral Student, Biostatistics
Reported case-counts for coronavirus are wrinkled with data errors, namely misclassification of the tests and selection bias associated with who got tested. The number of covert or unascertained infections is large across the world. How can one determine optimal testing strategies with such imperfect data? In this talk, we propose an optimization algorithm for allocating diagnostic/surveillance tests when your objective is estimating the true population prevalence or detecting an outbreak. Infectious disease models and survey sampling techniques are used jointly to come up with these strategies.

View Recording

Students’ mobility patterns on campus and the implications for the recovery of campus activities post-pandemic (10:30am-10:50am)

Modeling campus public health behavior

Quan Nguyen – Research Fellow, School of Information
This research project uses location data gathered from WiFi access points on campus to model the mobility patterns of students in order to inform the planning of educational activities that can minimize the transmission risk.
The first aim is to understand the general mobility patterns of students on campus to identify physical spaces associating with a high-risk of transmission. For example, we can extract insights from WiFi data about which locations are the busiest during which time of the day, how much time was typically spent at each location, and how do these mobility patterns change over time. The second aim is to understand how students share the same physical spaces on campus (e.g. attending a lecture, meeting in the same room, sharing the same dorm). Students are presumably in a close proximity when they are connected to the same WiFi access point. We model a student-to-student network from their co-location activities and use its network centrality measures as proxies of transmission risk (i.e. students in the center of a network would have a higher chance of getting exposed to COVID-19 than those in the periphery). We then correlate network centrality measures with academic information (e.g. class schedule, course enrollment, study major, year of study, gender, ethnicity) to determine whether certain features of the academic record are related to transmission risk. For example, we can identify which groups of students are more vulnerable to potential infections by associating with a high network centrality. Insights from this research project will inform the University of Michigan’s strategies for the recovery of educational activities post-pandemic with empirical evidence of students’ mobility pattern on campus as well as factors that associate with a high-risk of transmission.

View Recording

Modeling the Perceived Truthfulness of Public Statements on COVID-19: A New Model for Pairwise Comparisons of Objects with Multidimensional Latent Attributes (10:50am-11:10am)

Modeling the perception of truthfulness

Qiushi Yu – Ph.D. student, Political Science & Kevin Quinn – Professor, Political Science
What is more important for how individuals perceive the truthfulness of statements about COVID-19: a) the objective truthfulness of the statements, or b) the partisanship of the individual and the partisanship of the people making the statements? To answer this question, we develop a novel model for pairwise comparisons data that allows for a richer structure of both the latent attributes of the objects being compared and rater-specific perceptual differences than standard models. We use the model to analyze survey data that we collected in the summer of 2020. This survey asked respondents to compare the truthfulness of pairs of statements about COVID-19. These statements were taken from the fact-checked statements on https://www.politifact.com. We thus have an independent measure of the truthfulness of each statement. We find that the actual truthfulness of a statement explains very little of the variability in individuals’ perceptions of truthfulness. Instead, we find that the partisanship of the speaker and the partisanship of the rater account for the majority of the variation in perceived truthfulness, with statements made by co-partisans being viewed as more truthful.

View Recording

Computational Neuroscience, Time Complexity, and Spacekime Analytics (11:10am-11:30am)

Modeling high-dimensional, longitudinal data

Ivo Dinov – Professor, HBBS/SoN, DCMB/SoM, MIDAS
The proliferation of digital information in all human experiences presents difficult challenges and offers unique opportunities of managing, modeling, analyzing, interpreting, and visualizing heterogeneous data. There is a substantial need to develop, validate, productize, and support novel mathematical techniques, advanced statistical computing algorithms, transdisciplinary tools, and effective artificial intelligence apps.

Spacekime analytics is a new technique for modeling high-dimensional longitudinal data, such as functional magnetic resonance imaging (fMRI). This approach relies on extending the notions of time, events, particles, and wavefunctions to complex-time (kime), complex-events (kevents), data and inference-functions, respectively. This talk will illustrate how the kime-magnitude (longitudinal time order) and kime-direction (phase) affect the subsequent predictive analytics and the induced scientific inference. The mathematical foundation of spacekime calculus reveals various statistical implications including inferential uncertainty and a Bayesian formulation of spacekime analytics. Complexifying time allows the lifting of all commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacetime manifold, where a number of interesting mathematical problems arise.

Spacekime analytics transforms time-varying data, such as time-series observations, into higher-dimensional manifolds representing complex-valued and kime-indexed surfaces (kime-surfaces). This process uncovers some of the intricate structure in high-dimensional data that may be intractable in the classical space-time representation of the data. In addition, the spacekime representation facilitates the development of innovative data science analytical methods for model-based and model-free scientific inference, derived computed phenotyping, and statistical forecasting. Direct neuroscience science applications of spacekime analytics will be demonstrated using simulated data and clinical observations (e.g., UK Biobank).

View Recording

Challenges in dynamic mode decomposition (11:30am-11:50am)

Modeling time series data

Ziyou Wu – PhD student, Electrical and computer engineering, Bio-inspired robotics dynamical system lab

Dynamic Mode Decomposition (DMD) is a powerful tool in extracting spatio-temporal patterns from multi-dimensional time series. DMD takes in time series data and computes eigenvalues and eigenvectors of a finite-dimensional linear model that approximates the infinite-dimensional Koopman operator which encodes the dynamics. DMD is used successfully in many fields: fluid mechanics, robotics, neuroscience, and more. Two of the main challenges remaining in DMD research are noise sensitivity and issues related to Krylov space closure when modeling nonlinear systems. In our work, we encountered great difficulty in reconstructing time series from multilegged robot data. These are oscillatory systems with slow transients, which decay only slightly faster than a period.
Here we present an investigation of possible sources of difficulty by studying a class of systems with linear latent dynamics which are observed via multinomial observables. We explore the influences of dataset metrics, the spectrum of the latent dynamics, the normality of the system matrix, and the geometry of the dynamics. Our numerical models include system and measurement noise. Our results show that even for these very mildly nonlinear conditions, DMD methods often fail to recover the spectrum and can have poor predictive ability. We show that for a system with a well-posed system matrix, having a dataset with more initial conditions and shorter trajectories can significantly improve the prediction. With a slightly ill-conditioned system matrix, a moderate trajectory length improves the spectrum recovery. Our work provides a self-contained framework on analyzing noise and nonlinearity, and gives generalizable insights dataset properties for DMD analysis.
Work was funded by ARO MURI W911NF-17-1-0306 and the Kahn Foundation.

View Recording

November 11 at 9am

Novel Tools to Increase the Reliability and Reproducibility of Population Genetics Research (9:00am-9:20am)

Addressing selection bias in population data

Yajuan Si – Research Assistant Professor, Survey Research Center, Institute for Social Research
Advances in population genetic research have the potential to create numerous important advances in the science of population dynamics. The interplay of micro-level biology and macro-level social sciences documents gene–environment–phenotype interactions and allows us to examine how genetics relates to child health and wellbeing. However, traditional genetics research is based on nonrepresentative samples that deviate from the target population, such as convenience and volunteer samples. This lack of representativeness may distort association studies. Recent findings have provoked concern about misinterpretation, irreproducibility and lack of generalizability, exemplifying the need to leverage survey research with genetics for population-based research. This project is motivated by the research team’s collaborative work on the Fragile Family and Child Wellbeing Study and the Adolescent Brain Cognitive Development Study, which present these common problems in population genetics studies, to advance the integration of genetic science into population dynamics research. The project will evaluate sample selection effects, identify population heterogeneity in polygenic score analysis, and develop strategies to adjust for selection bias in the association studies of educational attainment, cognition status and substance use for child health and wellbeing. This interdisciplinary project will strengthen the validity and generalizability of population genetics research, deepen new understandings of human behavior and facilitate advances in population science.

View Slides

An end-to-end deep learning system for rapid analysis of the breath metabolome with applications in critical care illness and beyond (9:20am-9:40am)

Deep learning for biomedical research

Christopher Gillies – Assistant Research Scientist, Emergency Medicine
The metabolome is the set of low-molecular-weight metabolites and its quantification represents a summary of the physiological state of an organism. Metabolite concentration levels in biospecimens are important for many critical care health illnesses like sepsis and acute respiratory distress syndrome (ARDS). Sepsis is responsible for 35% of patients who die in the hospital and ARDS has a mortality rate of 40%. Missing data is a common challenge in metabolomics datasets. Many metabolomics investigators impute fixed values for missing metabolite concentrations and this imputation approach leads to lower statistical power, biased parameter estimates, and reduced prediction accuracy. Certain applications of metabolomics data, like breath analysis by gas chromatography, used for the prediction or detection of ARDS, can be done without the quantification of individual metabolites. This would circumvent the quantification step of individual metabolites, eliminating the missing data problem. Our team has developed a rapid gas chromatography breath analyzer, which has been challenged by missing data, a time-consuming process of breath signature alignment, and the following quantification of metabolites across patients. Analyzing the breath signal directly could eliminate these challenges. End-to-end deep learning systems are neural networks that operate directly on a raw data source and make a prediction directly for the target application. These systems have been successful in diverse fields from speech recognition to medicine. We envision an end-to-end deep learning that leverages transfer learning, from the collection of many healthy samples, that could rapidly multiply the applications of our breath analyzer. The end-to-end deep learning system will enhance our breath analyzer so it could be used more efficiently in areas of the intensive care unit to the battlefield to identity patients or soldiers with critical illnesses like sepsis and ARDS and monitor longitudinal changes in breath metabolites.

View Slides

Machine learning-guided equations for the on-demand prediction of natural gas storage capacities of materials for vehicular applications (9:40am-10:00am)

Machine learning for energy research

Alauddin Ahmed – Assistant Research Scientist, Mechanical Engineering
Transportation is responsible for nearly one-third of the world’s carbon dioxide (CO2) emission because of burning fossil fuel. While we dream for zero-carbon vehicles, future projections suggest little decline in fossil fuel consumption by the transportation sector until 2050. Therefore, ‘bending the curve’ of CO2 emission prompts the adoption of low-cost and reduced-emission alternative fuels. Natural gas (NG), the most abundant fossil fuel on earth, is such an alternative with nearly 25% lower carbon footprint and lower price compared to its gasoline counterpart. However, the widespread adoption of natural gas as a vehicular fuel is hindered by the scarcity of high-capacity, light-weight, low-cost, and safe storage systems. Recently, materials-based natural gas storage for vehicular applications have become one of the most viable options. Especially, nanoporous materials (NPMs) are in the spotlight of the U.S. Department of Energy (DOE) because of their exceptional energy storage capacities. However, the number of such NPMs is nearly infinite. It is unknown, a priori, which materials would have the expected natural gas storage capacity. Therefore, searching a high-performing material is like ‘finding a needle in a haystack’ that slows down the speed of materials discovery against growing technological demand. Here we present a novel approach of developing machine learning-guided equations for the on-demand prediction of energy storage capacities of NPMs using a few physically meaningful structural properties. These equations provide users the ability to calculate energy storage capacity of an arbitrary NPM rapidly using only paper and pencil. We show the utility of these equations by predicting NG storage of over 500,000 covalent-organic frameworks (COFs), a class of NPMs. We discovered a COF with record-setting NG storage capacity, surpassing the unmet target set by DOE. In principle, the data-driven approach presented here might be relevant to other disciplines including science, engineering, and health care.

View Slides

Fusing Computer Vision And Space Weather Modeling (10:00am-10:20am)

Deep learning and computer vision for space science

David Fouhey – Assistant Professor, UM EECS
Space weather has impacts on Earth ranging from rare, immensely disruptive events (e.g., electrical blackouts caused by solar flares and coronal mass ejections) to more frequent impacts (e.g., satellite GPS interference from fluctuations in the Earth’s ionosphere caused by rapid variations in the solar extreme UV emission). Earth-impacting events are driven by changes in the Sun’s magnetic field; we now have myriad instruments capturing petabytes worth of images of the Sun at a variety of wavelengths, resolutions, and vantage points. These data present opportunities for learning-based computer vision since the massive, well-calibrated image archive is often accompanied by physical models. This talk will describe some of the work that we have been doing to start integrating computer vision and space physics by learning mappings from one image or representation of the Sun to another. I will center the talk on a new system we have developed that emulates parts of the data processing pipeline of the Solar Dynamics Observatory’s Helioseismic and Magnetic Imager (SDO/HMI). This pipeline produces data products that help study and serve as boundary conditions for solar models of the energetic events alluded to above. Our deep-learning-based system emulates a key component hundreds of times faster than the current method, potentially opening doors to new applications in near-real-time space weather modeling. In keeping with the goals of the symposium, however, I will focus on some of the benefits close collaboration has enabled in terms of understanding how to frame the problem, measure success of the model, and even set up the deep network.

View Slides

Decoding the Environment of Most Energetic Sources in the Universe (10:20am-10:40am)

Machine learning for astronomy

Oleg Gnedin – Professor, Department of Astronomy, LSA
Astrophysics has always been at the forefront of data analysis. It has led to advancements in image processing and numerical simulations. The coming decade is bringing qualitatively new and larger datasets than ever before. The next generation of observational facilities will produce an explosion in the quantity and quality of data for the most distant sources, such as the first galaxies and first quasars. Quasars are the most energetic objects in the universe, reaching luminosity up to 10^14 that of the Sun. Their emission is powered by giant black holes that convert matter into energy according to the famous Einstein’s equation E = mc^2. The largest progress will occur in quasar spectroscopy. Detailed measurements of spectrum of quasar light, as it is being emitted near the central black hole and partially absorbed by clouds of gas on the way to the observer on Earth, allows for a particularly powerful probe of quasar environment. Because spectra of different chemical elements are unique, spectroscopy allows to study not only the overall properties of matter such as density and temperature, but also the detailed chemical composition of the intervening matter. However, the interpretation of these spectra is made very challenging by the many sources contributing to the absorption of light. In order to take a full advantage of this new window into the nature of supermassive black holes we need detailed theoretical understanding of the origin of quasar spectral features. In a MIDAS PODS project we are applying machine learning to model and extract such features. We are training the models using data from the state-of-the-art numerical simulations of the early universe. This approach is fundamentally different from traditional astronomical data analysis. We have only started learning what information can be extracted and still looking for a new framework to interpret these data.

View Slides

Poster Session

The 2020 Symposium’s poster sessions will be hosted on via the College of Engineering’s CareerFair+ tool. A direct link to this platform will be made available in the coming weeks.

Category	Poster Title	Presenting Author
Biomedical science	Comparing the old and new rat reference genomes using 10X linked-read data for 20 HR-LR rats	Pan, Yanchao
Biomedical science	Deconvolving spatial transcriptomic data using heterogeneous single-cell datasets	Sodicoff, Joshua
Biomedical science	Deep Learning for 2D and 3D Segmentation of Fluorescent Reporter Cells in Complex Tissue	Day, John
Biomedical science	Improving phenotype prediction in large biobank data sets using deep learning	Manzo, Brian
Biomedical science; Machine learning	Using genome-scale metabolic models and machine learning to design combination therapies	Chung, Carolina
Ecology	Crowdsourcing, curating and integrating data in the CHANGES project: Collections, Heterogenous data and Next Generation Ecological Studies	Karen Alofs
Health science	Application of a Tensor-Based Classification Method with Electrocardiogram Data	Alge, Olivia
Health sciences	HostSim: a whole host modeling framework for tuberculosis	Joslyn, Louis
Health sciences	Validation and comparison of PICTURE analytic and Epic Deterioration Index for COVID-19	Cummings, Brandon
Health sciences	Data-Driven Ranges of Near-Optimal Choices for Personalized Hypertension Treatment Plans	Marrero, Wesley J.
Health sciences	Predicting the second wave of COVID-19 in Washtenaw County, MI	Renardy, Marissa
Health sciences	GEODE: A novel method to optimize multidrug therapies for tuberculosis	Budak, Maral
Health sciences	Toward Intervention Prediction for Patients with Bipolar Disorder	Northrup, Haley
Health sciences	Estimating Propensity Scores from Electronic Health Records using Deep Learning	Ouyang, Jing
Machine learning; Space science	Machine Learning for Solar Flare Forecasting	Sun, Zeyu
Methodology; Algorithmic fairness	A Heuristic for Learn-and-Optimize New Mobility Services with Equity and Efficiency Metrics	Yu, Fangzhou
Methodology; Biomedical	Development and applications of interoperable ontologies for COVID-19 research	Yongqun Oliver He
Methodology: Biomedical	Integrating single-cell datasets with partially overlapping features using nonnegative matrix factorization	Kriebel, April
Methodology; Computational efficiency	Revarie: A Python Library for Variogram Calculation, Fitting and Random Field Generation	Price, Dean
Methodology; Computational efficiency; Biomedical science	Iterative Refinement of Cellular Identity Using Online Learning	Gao, Chao
Methodology; Data representation	MARMOT: A Framework for Constructing Multimodal Representations for Vision-and-Language Tasks	Wu, Patrick Y.
Methodology; Data wrangling	Datasheet Scrubber	Fayazi, Morteza
Methodology; Databases	Secure Query Processing in the Cloud using Fully Homomorphic Encryption	Singaraj, Naveenkumar
Methodology; Machine learning; Physical science	Opportunities for transfer learning in chemical reaction discovery	Shim, Eunjae
Methodology; Math modeling; Physical science	Non-Parametric Spherical Jeans Mass Estimation with B-Splines	Rehemtulla, Nabeel
Methodology; Math modeling; Social science	When To Buy, When To Attend – Modeling Event Ticket Purchase Dynamics	Ahn, Gwen
Methodology; Statistical modeling	Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations	Alex Ritchie
Methodology; Statistics; Machine learning; Computer vision; Cosmology	Bayesian Light Source Separator (BLISS):   Fully Probabilistic detection, deblending, and measurement	Mendoza, Ismael
Methodology; Text analysis	Developing OCR Pipelines for Historical Social Science	Thompson-Brusstar, Mike
Physical science	Application of Spatial Statistical Tool Revarie to Nuclear Reactor Pin Heat Distribution Scenario	Price, Dean
Physical science	Data-Driven Computational Chemistry Techniques to Understand Lignin Degradation	Punzalan, Exequiel
Social science	School Connectedness as a Buffer Against Childhood Exposure to Violence and Social Deprivation	Goetschius, Leigh
Social science	Ten Social Dimensions of Conversations and Relationships	Choi, Minje
Social science	Gaming Addiction: An Empirical Analysis	Castelo Branco, Bruno
Social science; Education research	Differential Assessment, Differential Benefit: Four-year Problem Roulette Analysis of STEM Practice Study	Weaverdyck, Noah
Statistics	Validating Surrogate Endpoints with Longitudinal Outcomes	Roberts, Emily
Statistics	Normalizing flows succeed where GANs fails: Lessons from low-dimensional data	Liu, Tianci
Statistics	Conditional covariance estimation for multivariate longitudinal data	Gupta, Sanjana
Statistics; Health sciences	A Multilevel Bayesian Approach to Improve Effect Size Estimation in Regression Modeling of Metabolomics Data Utilizing Imputation with Uncertainty	Jennaro, Theodore
Statistics; Physical science	ArgoSSM: A Bayesian state-space framework for predicting missing sensor locations	Hansen, Derek
The science of science	A Role of Preprints in Scholarly Communication Landscape during the Pandemic	Sevryugina, Yulia
The science of science	Analyzing Preprints: The Challenges of Working with Publishers’ Metadata	Dicks, Andrew
The science of science	Supporting Interdisciplinary Research Reviews with Multi-Level Topic Maps	Lafia, Sara

Symposium Sponsors

External Partners Supporting the Symposium

United States Environmental Protection Agency