A MIDAS mini-symposium
Co-sponsored by Electrical and Computer Engineering (ECE)
Generative AI, including large language models (LLMs), generative diffusion models and etc., has emerged as a powerful family of foundation models with unprecedented data generation ability. It has shown exceptional performance in various applications, including image and language generation, audio synthesis, and solving general inverse problems. Despite their success, these models face significant challenges and limitations that hinder their practical use in many scientific disciplines. This MIDAS symposium will bring together experts and researchers from both theoretical and applied fields to discuss the latest advancements of generative AI, from theoretical study to practical application deployment. It aims to explore the application of these models in scientific domains, providing a valuable platform for
exchanging ideas and fostering research collaborations in this emerging area.
Speakers
9:00 – 10:00 AM: “Controlled Generation for Large Foundation Models”
Dr. Mengdi Wang, Associate Professor of Electrical and Computer Engineering and the Center for Statistics and Machine Learning; Associate Director of Graduate Studies; Princeton University
Recent advances in large foundation models, such as large language models (LLMs) and diffusion models, have demonstrated impressive capabilities. However, to truly align these models with user feedback or maximize real-world objectives, it is crucial to exert control over the decoding processes, in order to steer the distribution of generated output. In this talk, we will explore methods and theory for controlled generation within LLMs and diffusion models. We will discuss various modalities or achieving this control, focusing on applications such as alignment of LLM, accelerated inference, transfer learning, and diffusion-based optimizer.
Mengdi Wang is an associate professor at the Department of Electrical and Computer Engineering and Center for Statistics and Machine Learning at Princeton University. She is also affiliated with the Department of Computer Science, Princeton’s ML Theory Group. She was a visiting research scientist at DeepMind, IAS and Simons Institute on Theoretical Computer Science. Her research focuses on machine learning, reinforcement learning, generative AI, AI for science and intelligence system applications . Mengdi received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013, where she was affiliated with the Laboratory for Information and Decision Systems and advised by Dimitri P. Bertsekas. Before that, she got her bachelor degree from the Department of Automation, Tsinghua University. Mengdi received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018, WAIC YunFan Award 2022, AACC’s Donald Eckman Award 2024. She serves as a Program Chair for ICLR 2023 and Senior AC for Neurips, ICML, COLT, associate editor for Harvard Data Science Review, Operations Research. Research supported by NSF, AFOSR, NIH, ONR, Google, Microsoft C3.ai, FinUP, RVAC Medicines.
Mengdi’s research group studies machine learning theory, reinforcement learning, generative artificial intelligence, AI for science and intelligence system applications.
10:00 – 10:45 AM: “The Emergence of Generalizability and Semantic Low-Dim Subspaces in Diffusion Models”
Dr. Qing Qu, Assistant Professor of Electrical Engineering and Computer Science, College of Engineering; University of Michigan
Recent empirical studies have shown that diffusion models possess a unique reproducibility property, transiting from memorization to generalization as the number of training samples increase. This demonstrates that diffusion models can effectively learn image distributions and generate new samples. Remarkably, these models achieve this even with a small number of training samples, despite the challenge of large image dimensions, effectively circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging two key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. With these setups, we rigorously demonstrate that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem across the training samples. This insight has practical implications for training and controlling diffusion models. Specifically, it enables us to precisely characterize the minimal number of samples necessary for accurately learning the low-rank data support, shedding light on the phase transition from memorization to generalization. Additionally, we empirically establish a correspondence between the subspaces and the semantic representations of image data, which enables one-step, transferrable, efficient image editing. Moreover, our results have profound practical implications for training efficiency and model safety, and they also open up numerous intriguing theoretical questions for future research.
Qing Qu is an assistant professor in EECS department at the University of Michigan. Prior to that, he was a Moore-Sloan data science fellow at Center for Data Science, New York University, from 2018 to 2020. He received his Ph.D from Columbia University in Electrical Engineering in Oct. 2018. He received his B.Eng. from Tsinghua University in Jul. 2011, and a M.Sc. from the Johns Hopkins University in Dec. 2012, both in Electrical and Computer Engineering. His research interest lies at the intersection of the foundations of data science, machine learning, numerical optimization, and signal/image processing. His current research interests focus on deep representation learning and diffusion models. He is the recipient of Best Student Paper Award at SPARS’15, and the recipient of Microsoft PhD Fellowship in machine learning in 2016, and best paper awards in NeurIPS Diffusion Model Workshop in 2023. He received the NSF Career Award in 2022, and Amazon Research Award (AWS AI) in 2023. He is the program chair of the new Conference on Parsimony & Learning, area chairs of NeurIPS and ICLR.
VIEW DR. QING QU’S PRESENTATION
11:00 – 11:45 AM: “Understanding and Improving Language Model Architectures“
Dr. Samet Oymak, Assistant Professor of Electrical Engineering and Computer Science, College of Engineering; University of Michigan
Recent advances, such as ChatGPT, have revolutionized language modeling. These models are based on the transformer architecture which uses the self-attention mechanism as its central component. In this talk, I discuss our recent results on optimization- and approximation-theoretic understanding of self-attention as well as how theory can guide the design of better mechanisms. I will first discuss the optimization dynamics to demystify how attention “finds the needle in the haystack”: We show that, under gradient-based training, the attention weights converge to an analytically predictable solution that acts as a separator of relevant and irrelevant context within the input. Secondly, we identify the shortcomings of the standard transformer architecture when adapting to variations in contextual sparsity. This leads us to introduce a simple but effective and theoretically-grounded method called “Gated Softmax Attention”. We show that GSA has negligible computational overhead but uniformly improves the language modeling capabilities, including in the latest models such as Llama 3. I will end the talk by discussing the current state of research and future directions.
Samet Oymak is an assistant professor of Electrical Engineering and Computer Science at the University of Michigan. His research spans optimization, statistical learning, and decision making, with applications to trustworthy and efficient AI. Prior to U-M, he was with the ECE department at the University of California, Riverside. He has also spent time in industry as a researcher and did a postdoc at UC Berkeley as a Simons Fellow. He obtained his PhD degree from Caltech for which he received a Charles Wilts Prize for the best departmental thesis. He is the recipient of an NSF CAREER award and multiple industry faculty research awards.
VIEW DR. SAMET OYMAK’S PRESENTATION
11:45 AM – 1:00 PM ~ Poster Session and Lunch Break
1:00 – 2:00 PM: “Enhancing Faithfulness and Transparency of Foundational Models via Parsimonious Concept Engineering and Information Pursuit”
Dr. Rene Vidal, Rachleff University Professor – Radiology, Perelman School of Medicine; ESE, Penn Engineering; University of Pennsylvania
Large Language Models (LLMs) and Vision Language Models (VLMs) have led to significant advances for many tasks. However, they also suffer from lack of faithfulness and transparency. To prevent LLMs from producing potentially harmful information, racist or sexist language, and hallucinations, we propose to decompose LLM activations as a linear combination of a dictionary of benign and undesirable activations, and then remove the undesirable component to produce faithful responses. Experiments on response detoxification, faithfulness enhancement, and sentiment revising tasks show that our method achieves state-of-the-art alignment performance while maintaining linguistic capabilities. To improve transparency, we propose an interpretable-by-design framework that makes predictions by sequentially selecting a short chain of user-interpretable queries about the input, which are most informative for making predictions. To generate and answer the queries, we use a combination of LLMs and VLMs. Experiments on bird classification, text classification, image classification, and medical diagnosis show that our approach produces more interpretable and shorter explanations.
René Vidal, a global pioneer of data science, is the Rachleff University Professor, with joint appointments in the Department of Radiology in the Perelman School of Medicine and the Department of Electrical and Systems Engineering in the School of Engineering and Applied Science. Dr. Vidal has been named a Penn Integrates Knowledge University Professor at the University of Pennsylvania.
René Vidal received his B.S. degree in Electrical Engineering (highest honors) from the Pontificia Universidad Catolica de Chile in 1997 and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively. He was a research fellow at the National ICT Australia in 2003 and joined The Johns Hopkins University in 2004 as a faculty member in the Department of Biomedical Engineering and the Center for Imaging Science.
VIEW DR. RENE VIDAL’S PRESENTATION
2:00 – 2:45 PM
Dr. Liyue Shen, Assistant Professor of Electrical Engineering and Computer Science, College of Engineering; University of Michigan
She received her B.E. degree in Electronic Engineering from Tsinghua University in 2016, and obtained her Ph.D. degree from the Department of Electrical Engineering, Stanford University in 2022, co-advised by Prof. John Pauly and Prof. Lei Xing. She was a postdoctoral research fellow at the Department of Biomedical Informatics, Harvard Medical School from 2022 to 2023. She is the recipient of Stanford Bio-X Bowes Graduate Student Fellowship (2019-2022), and was selected as the Rising Star in EECS by MIT and the Rising Star in Data Science by The University of Chicago in 2021.
Her research interest is in Biomedical AI, which lies in the interdisciplinary areas of machine learning, computer vision, signal and image processing, medical image analysis, biomedical imaging, and data science. She is particularly interested in developing efficient and reliable AI/ML-driven computational methods for biomedical imaging and informatics to tackle real-world biomedicine and healthcare problems, including but not limited to, personalized cancer treatment, and precision medicine. She recently focuses on the generative diffusion models, implicit neural representation learning and multimodal foundation models.
She co-organized the Woman in Machine Learning (WiML) workshop at ICML’ 21, and the Machine Learning for Healthcare (ML4H) workshop at NeurIPS’ 21. In MICCAI’ 21, she co-taught the tutorial on Deep 2D-3D Modeling and Learning in Medical Image Computing.
VIEW DR. LIYUE SHEN’S PRESENTATION
3:00 – 3:45 PM: “DiffusionPDE: Generative PDE-Solving Under Partial Observation”
Dr. Jeong Joon Park, Assistant Professor of Electrical Engineering and Computer Science, College of Engineering; University of Michigan
We introduce DiffusionPDE, a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions.
3D reconstruction and generative models. I use neural and physical 3D representations to generate realistic 3D objects and scenes. The current focus is large-scale, dynamic, and interactable 3D scene generations. These generative models will be greatly useful for content creators, like games or movies, or for autonomous agent training in virtual environments. For my research, I frequently use and adopt generative modeling techniques such as auto-decoders, GANs, or Diffusion Models.
In my project “DeepSDF,” I suggested a new representation for a 3D generative model that made a breakthrough in the field. The question I answered is: “what should the 3D model be generating? Points, meshes, or voxels?” In DeepSDF paper, I proposed that we should generate a “function,” that takes input as a 3D coordinate and outputs a field value corresponding to that coordinate, where the “function” is represented as a neural network. This neural coordinate-based representation is memory-efficient, differentiable, and expressive, and is at the core of huge progress our community has made for 3D generative modeling and reconstruction.
Two contributions I would like to make. First, I would like to enable AI generation of large-scale, dynamic, and interactable 3D world, which will benefit entertainment, autonomous agent training (robotics and self-driving) and various other scientific fields such as 3D medical imaging. Second, I would like to devise a new and more efficient neural network architecture that mimics our brains better. The current AI models are highly inefficient in terms of how they learn from data (requires a huge number of labels), difficult to train continuously and with verbal/visual instructions. I would like to develop a new architecture and learning methods that address these current limitations.
VIEW DR. JEONG JOON PARK’S PRESENTATION
3:45 – 4:30 PM: “Scalable Visual Intelligence in the Era of Generative AI“
Dr. Saining Xie, Assistant Professor of Computer Science, Courant Institute of Mathematical Sciences, New York University
This talk offers a comprehensive overview of our recent work in vision-centric generative AI, particularly in the areas of visual content (e.g. images and videos) understanding and generation. We will discuss the latest developments, such as multimodal large language models for visual understanding, along with diffusion transformers (DiTs) and beyond for visual generation. The talk will highlight the intricate interdependencies between these two areas and explore the opportunities and challenges in developing robust and scalable visual intelligence. Additionally, we will discuss the importance of these developments from both a practical standpoint (including scientific applications) and as foundational steps toward achieving general intelligence capable of interacting with and understanding the sensory-rich world in a more realistic and meaningful way.
Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. He is also a visiting faculty researcher at Google Deepmind. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning, with a particular interest in scalable visual representation learning. His work has been recognized with the Marr Prize honorable mention, CVPR best paper finalists and an Amazon research award.
VIEW DR. SAINING XIE’S PRESENTATION
4:30 – 5:30 PM
Panel Discussion with Dr. Alfred Hero, Dr. Erica Briscoe, Dr. Rene Vidal, Dr. Mengdi Wang, Dr. Karthik Duraisamy, and Dr. Saining Xie
VIEW PANEL DISCUSSIONOrganizers
Liyue Shen
Assistant Professor of Electrical Engineering and Computer Science, College of Engineering, University of Michigan
Qing Qu
Assistant Professor of Electrical Engineering and Computer Science, College of Engineering, University of Michigan
Co-Sponsoring Unit
Questions? Contact Us.
Message the MIDAS team: midas-contact@umich.edu