Automated Research Workflows

January 26, 12:30 PM - January 27, 2023, 12:00 PM

Weiser Hall, 10th floor, 500 Church Street, Ann Arbor

View Event Recording

Speakers İlkay Altıntaş and Tapio Schneider join the MIDAS postdoctoral fellows for lunch to discuss shared interests in Automated Research Workflows.

Joining remotely from Johns Hopkins University, Professor Alex Szalay takes questions from the audience after his talk “Scalable Science with Petabyes.”

Chief Data Science Officer of the San Diego Supercomputer Center, İlkay Altıntaş presents her talk “Dynamic Capability Composition at the Digital Continuum from Edge to HPC.

Assistant Professor of Medicinal Chemistry Tim Cernak presents his talk “Chemical Synthesis at the Interface of Data Science.”

Professor of Environmental Science And Engineering, California Institute of Technology, Tapio Schneider presents his talk “Accelerating and Improving Climate Models with Hybrid AI Approaches.

Ilkay Altintas addresses a group of 25 U-M faculty, researchers, and fellows, discussing the prompt of, “How to build infrastructure for ARWs at a national scale?

About the Automated Research Workflow Colloquium

Significant advancements in scientific computing, Artificial Intelligence (AI), and the hardware and software research environment are enabling researchers to develop automated research workflows (ARWs): building AI and machine learning (ML) as components in the research workflow for data processing and analytics, and using these methods to design and monitor experiments. As stated in a recent report from the National Academies of Sciences, Engineering and Medicine, “the tools and techniques being developed under the large umbrella of ARWs promise to transform the centuries-old serial method of research investigation into processes in which thousands or even millions of simulations or experiments are iterated rapidly in closed loops, with the analysis of data and even the design of experiments or controlled observations being assisted by ML or optimization techniques. Simultaneously, ARWs provide a way to satisfy pressing demands across fields to increase interoperability, reproducibility, replicability, and trustworthiness by better tracking results, recording data, establishing provenance, and creating more consistent metadata than even the most dedicated researchers can provide themselves.”

At this colloquium, a group of national experts who are leading this trend presented their vision and work developing and employing ARWs in astronomy, chemical biology and environmental science. In addition, U-M faculty members joined the speakers at a half-day roundtable for intensive discussions on 1) immediate use cases of ARW to accelerate research; 2) ultimate goals of developing ARW; and 3) the infrastructure for ARW. The group will continue regular discussions to develop projects and collaboration. We welcome more faculty members to join this group. If you are interested, please email Jing Liu (ljing@umich.edu), MIDAS Managing Director.

Schedule

Jan. 27, 9:00am – 12:00pm: Faculty Research Roundtable Session
Weiser Hall, 10th floor, 500 Church Street, Ann Arbor

Before the session, attendees were encouraged to read the NASEM report on automated research workflows, with Dr. Atkins as the committee chair and all speakers as major contributors.

About the Speakers

Session Chair and Moderator

Daniel AtkinsEmeritus W.K. Kellogg Professor of Information and Professor of Electrical Engineering and Computer Science, University of Michigan

Dr. Atkins’ research focus included computer architecture and cyber-enabled distributed knowledge communities. He has served as Dean of Engineering, Founding Dean of the School of Information, and Associate VP for Research at UM, as well as the inaugural director of the Office of Cyberinfrastructure at the National Science Foundation (NSF). He chaired the Blue Ribbon Panel on Research Cyberinfrastructure for the NSF that became an international roadmap for initiatives on cyber-enabled research in the digital age.  Dr. Atkins is a member of the National Academy of Engineering.

Speakers

İlkay Altıntaş, Chief Data Science Officer, San Diego Supercomputer Center; Founding Fellow, Halıcıoğlu Data Science Institute, University of California San Diego.

Dynamic Capability Composition at the Digital Continuum from Edge to HPC
Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence (AI) driven methods which require specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures built around the composability of data-centric applications led to the emergence of an ecosystem for container coordination and integration. New approaches for dynamic composability of heterogeneous systems are needed to further advance the data-driven and AI-integrated scientific practice by multidisciplinary teams of researchers. This talk presents a novel approach for using composable systems at the intersection of scientific computing, artificial intelligence (AI), and remote sensing at the edge, including the first working example of a composable infrastructure that federates Expanse, an NSF-funded supercomputer, with Nautilus, a Kubernetes-based GPU geo-distributed cluster and Sage, a reconfigurable edge AI infrastructure. It will also overview scientific workflow case studies that compose the insights from edge sensing, scientific instrumentation, AI, computing capabilities and physics-driven simulations n the wildland fire domain.

Dr. Altıntaş is the Founding Director of the Workflows for Data Science (WorDS) Center of Excellence, which develops methods, cyberinfrastructure, and workflows for computational data science and its translation to practical applications. She is also the Founding Director of the WIFIRE Lab, which uses AI methods to build an all-hazards knowledge cyberinfrastructure and has achieved significant success in helping to manage wildfires.

Timothy Cernak, Assistant Professor of Medicinal Chemistry and Chemistry, University of Michigan.

Chemical Synthesis at the Interface of Data Science
The chemical synthesis of molecules gives society so many of the products we enjoy daily like medicines, agrochemicals, and plastics. While nearly any molecule can in principle be synthesized, devising practical synthetic routes remains a significant challenge. In fact, the design of chemical products is heavily influenced by how easily the products can be synthesized. This presentation will discuss new strategies and tactics for identifying high impact reactions for invention.

Tim Cernak was born in Montreal, Canada in 1980. He obtained a B.Sc. in Chemistry from University of British Columbia Okanagan and there studied the aroma profile of Chardonnay wines. Following PhD training in total synthesis with Prof. Jim Gleason at McGill University, Tim was a FQRNT Postdoctoral Fellow with Tristan Lambert at Columbia University. From 2009–2018, Tim worked with the Medicinal Chemistry team at Merck Sharp & Dohme in Rahway and Boston. In 2018, Dr. Cernak joined the Department of Medicinal Chemistry at the University of Michigan in Ann Arbor as an Assistant Professor. The Cernak Lab is exploring an interface of chemical synthesis and data science. Tim is a co-Founder of Entos, Inc.

Tapio Schneider, Theodore Y. Wu Professor of Environmental Science And Engineering, California Institute of Technology; Senior Research Scientist, Jet Propulsion Laboratory.

Accelerating and improving climate models with hybrid AI approaches
While climate change is certain, precisely how climate will change is less clear. But breakthroughs in the accuracy of climate projections and in the quantification of their uncertainties are now within reach, thanks to advances in the computational and data sciences and in the availability of Earth observations from space and from the ground. I will survey the design of a new Earth system model (ESM), developed by the Climate Modeling Alliance (CliMA). The talk will cover key new concepts, including how AI techniques can be combined with process-informed models and how they can be used to dramatically accelerate algorithms for learning from data and for quantifying uncertainties.

Dr. Schneider’s work has elucidated how rainfall extremes change with climate, how changes in cloud cover can destabilize the climate system, and how winds and  weather on planetary bodies such as Jupiter and Titan come about. He is currently leading the Climate Modeling Alliance (clima.caltech.edu), whose mission is to build the first Earth system model that automatically learns from diverse data sources to produce accurate climate predictions. He was named one of the “20 Best Brains Under 40” by Discover Magazine, a David and Lucile Packard Fellow, an Alfred P. Sloan Research Fellow, and fellow of the American Geophysical Union; he is the recipient of the James R. Holton Award of the American Geophysical Union and of the Rosenstiel Award of the University of Miami.

Alex Szalay, Bloomberg Distinguished Professor, Department of Computer Science; Director, Institute for Data Intensive Science, Johns Hopkins University.

Scalable Science with Petabyes
The talk will present a discussion of how scientific data collection evolved from manual in small project through the industrial revolution in big National Laboratories to robotic workflows in mid-scale projects today. These mid-scale projects represent the “sweet spot” of science today, through their technological agility. They achieve scalability through extreme application of modern software and hardware technologies, and they are in the process of developing an entirely new, opportunistic approach to their computational needs. The new model for for scientific sofware is Analysis Ready, Cloud Optimized, and can be run anywhere from campus clusters to national supercomputers and commercial clouds. We illustrate how mid-scale projects today can produce high-value, petabyte-scale data sets today, through heavy automation, representing the bleeding edge both in science and in the underlying technology.

Dr. Szalay has made significant contributions to theoretical astrophysics, especially our understanding of the structure formation and on the nature of dark matter in the universe. Equally importantly, he is one of the most important figures in data science, having revolutionized the role of data and computing in scientific research. He is the Architect for the Science Archive and Chair of the Science Council of the Sloan Digital Sky Survey, the most used astronomy facility in the world today. He also is a leader in the grass-roots standardization effort to bring the next generation petascale databases in astronomy to a common basis, so that they will be interoperable.

Questions? Contact Us.

Message the MIDAS team: midas-contact@umich.edu