Registration is now closed. Please check out our other events.
Academy Overview
The AI for Scientists and Engineers Summer Academy is designed for academic researchers, including university faculty, in a wide range of domains including biological sciences, engineering, environmental and earth science, physical sciences, and social sciences. Participants will learn the mathematical foundations of machine learning (ML), critically assess the data used in AI models, evaluate and validate ML model outputs, and understand strategic considerations for incorporating AI into research workflows. The prerequisites are college level math and statistics; prior coding experience is not required. Specific topics include supervised and unsupervised learning, neural networks, causal inference, and science-informed machine learning models.
The Summer Academy consists of three weeks of instructions, with different focuses. One can choose to attend any or all weeks; however, weeks 2 and 3 require some prior knowledge of AI / ML.
- Week 1 (Monday, July 7 – Friday, July 11, 2025): The conceptual understanding of AI and its applications in domain research.
- Week 2 (Monday, July 14 – Friday, July 18, 2025): The implementation of ML models in a Python environment.
- Week 3 (Monday, July 21 – Friday, July 25, 2025): Advanced topics of AI and its applications in domain research.
Participants are expected to bring a laptop for programming components of the academy.
Light breakfast options will be available daily. A dedicated lunch reception is planned for Wednesday each week.
Week 1: Concepts and Applications
Monday, July 7 – Friday, July 11
8:30 AM – 4:30 PM
*Subject to change
Click each section for more details
8:30 – 9:00 AM
Welcome and Program Overview
Presented by Kerby Shedden
- Goals of the course: What participants will be able to do after the course
- The role of AI in scientific and engineering research
- Overview of the five-day schedule and learning approach
9:00 – 10:30 AM
Conceptual Introduction to AI and ML
Presented by Madeline Peters and Kamal Abdulraheem
- What is AI? What is machine learning (ML)? Differences and overlaps
- Key types of AI: Supervised, unsupervised, reinforcement learning, neural network, large language models (LLMs), and symbolic AI
- Feature engineering
- What are features and major types of features (temporal, categorical, ordinal, etc), and what is the process of feature engineering
- A walk through different features at a high level (e.g. a bird song) and how they are turned into data for ML (e.g. pitch or volume values)
- Discussion with the participants around the kinds of features they work with.
10:30 – 10:45 AM
Break
10:45 AM – 12:30 PM
Review of Core Mathematical Foundations for AI
Presented by Kerby Shedden and Kamal Abdulraheem
- Probability and statistics for AI: Probability distributions, conditioning, independence, conditional mean and variance, measures of association, limiting distributions and concentration
- Function approximation – basis expansions and convolution
- Differentiability, gradients, and optimization
12:30 – 1:30 PM
Lunch
1:30 – 3:15 PM
Statistical Inference and Data-Centric Concepts
Presented by Kerby Shedden
- Basics of statistical estimation and inference, uncertainty assessment, basic large sample theory
- Understanding bias, variance, and the bias-variance tradeoff
- Brief overview of key data challenges in data-driven science & engineering: Selection bias, causality, confounding, measurement error, nonstationarity, generalization, calibration, fairness.
3:15 – 3:30 PM
Break
3:30 – 4:30 PM
Research Rigor, Reproducibility and Ethics
Presented by Jing Liu
- Data privacy, responsible AI, model documentation (e.g., model cards) and transparent reporting
- Reproducibility principles for scientific AI
8:30 – 10:30 AM
Principles of Supervised Learning
Presented by Cindy (Xinyu) Liu
- What is supervised learning? Key goals and use cases in science and engineering
- Key models: regression models, tree-based models, and ensemble models
- Training (loss functions, optimization)
10:30 – 10:45 AM
Break
10:45 AM – 12:30 PM
Model Validation, and Assessment
Presented by Madeline Peters
- Underfitting and overfitting
- Training, testing, and validation: Why and how to use held-out data
- Model evaluation metrics for prediction and classification
- Conceptual demo: simple example of calculating regression evaluation metrics
12:30 – 1:30 PM
Lunch
1:30 – 3:30 PM
How Scientists Use Supervised Learning
Presented by Xin Wei
- 1 hour presentation of AI in natural hazards research
- 1 hour group discussion: A few more examples of AI (supervised learning) applications in other domains; How overfitting and underfitting arise in research workflows; Prediction uncertainties.
8:30 – 10:00 AM
Linear algebra for AI
Presented by Kamal Abdulraheem
- Vectors, matrices
- Eigenvalues/eigenvectors
- Matrix factorization – spectral decomposition, singular value decomposition (SVD)
10:00 – 10:15 AM
Break
10:15- 11:45 AM
Introduction to Unsupervised Learning
Presented by Cindy (Xinyu) Liu
- Differences from supervised learning and common use cases
- Clustering: k-means, hierarchical clustering, and when to use them
- Validating clusters/how many to have?
- Using unsupervised learning for feature engineering (what it is, why and when to use it).
11:45 AM – 12:30 PM
Dimensionality Reduction
Presented by Kamal Abdulraheem and Cindy (Xinyu) Liu
- Curse of dimensionality – nonparametric estimation in high dimensions
- Covariance matrices
- Approximate low dimensionality
- Linear methods (PCA)
- Nonlinear methods (t-SNE, UMAP)
- Connections to deep learning
- Word embeddings
- Supervised dimension reduction
- How do you decide if the dimensionality reduction is reasonable? Useful?
12:30 – 1:30 PM
Lunch provided
1:30 – 2:30 PM
Dimensionality Reduction (continued)
2:30 – 2:45 PM
Break
2:45 – 4:30 PM
Applications of Unsupervised Learning in Domain Research
Presented by Amirhossein Moosavi
- Small group discussion
- Deep learning in health care
- Applications in biology, materials science, climate studies, and other domains. Computational social science examples with text embeddings.
8:30 – 10:00 AM
Introduction to Deep Learning
Presented by Ken Reid
- Key concepts: Layers, neurons, activation functions, loss functions
- Overview of modern architectures (MLP, CNN, RNN, Transformers)
- Optimization involved in fitting neural networks
- How neural networks are different from classical ML models (computational efficiency, explainability)
10:00 – 10:15 AM
Break
10:15 AM – 12:00 PM
Model Interpretation and Causal Inference in Research
Presented by Kerby Shedden and Kamal Abdulraheem
- Benchmarks- what do they mean? Which are useful?
- Model uncertainty and uncertainty quantification
- Comparing models- is one model substantially better than another [also elsewhere, do we need it here?]? How sure are we?
- Causality vs. correlation; conditions needed for valid causal inference
- Propensity scores, matching, and methods to estimate causal effects
- Strategies for assessing and validating AI model outputs
12:00 – 1:00 PM
Lunch
1:00 – 2:30 PM
Use Cases in Science and Engineering
Presented by Nathan Fox
- Applications in image recognition, natural language processing, etc
2:30 – 2:45 PM
Break
2:45 – 4:30 PM
Science-Informed ML Models and Specific Examples
Presented by Haotian Chen
8:30 – 11:00 AM
Foundation Models
Presented by James Boyko
- What is a large language model?
- What is special about them compared to other deep learning models?
- ChatGPT vs. general class of transformer based models
- Multimodal/other applications (time series prediction)
- Scientific foundation models
- Use cases in domain research
11:00 – 11:15 AM
Break
11:15 AM – 12:00 PM
Addressing Practical Challenges in AI for Research
Presented by Kamal Abdulraheem and Kerby Shedden
- Recognizing imperfect data and basic techniques to address them: missing data, selection bias, and non-representative data
- Data leakage and its scientific consequences
- How to handle outliers and anomalies
- Hyperparameter tuning, both the methods and the concerns when auto tuning parameters
12:00 – 1:00 PM
Lunch
1:00 – 2:00 PM
Addressing Practical Challenges in AI for Research (continued)
2:00 – 2:15 PM
Break
2:15 – 3:30 PM
Closing Discussion and Recommendations for Further Learning
Presented by Jing Liu
Week 2: Implementations of AI for Research with Python
Monday, July 14 – Friday, July 18
8:30 AM – 4:30 PM
*Subject to change
Click each section for more details
8:30 – 9:00 AM
Welcome and Overview
- Overview of the five-day schedule and learning approach
The Computing Environment: On-Demand Notebooks
Presented by Elle O’Brien, Eunjae Shim, Zheng Guo, and Kamal Abdulraheem
9:00 – 10:00 AM
Introduction to Python Programming
Presented by Ken Reid
- Fundamentals of the python language
- Python interpreter, elements of a Python package
- Overview of Python libraries: NumPy, Pandas, Matplotlib, SKlearn
- Managing data files, paths, and APIs in Python
- Debugging and repairing Python code
10:00 – 10:15 AM
Break
10:15 – 11:30 AM
Introduction to Python Programming (continued)
11:30 AM – 12:30 PM
Lunch
12:30 – 2:15 PM
Interactive Notebooks and IDEs
Presented by Eunjae Shim and Zheng Guo
- What is the Jupyter notebook at a high level
- Creating notebooks/scripts in VS Code and understanding how they are similar to or different from jupyter notebooks
- Strengthening python programming through notebooks (worked practice)
2:15 – 2:30 PM
Break
2:30 – 4:30 PM
AI-Assisted Coding and Development
Presented by Eunjae Shim, Zheng Guo, and Kamal Abdulraheem
- Overview of Github copilot
- Coding, debugging, documentation and code packaging with AI
- Asking questions, refactoring code, and verification of methods through test driven development (without introducing any particular testing library)
8:30 – 10:00 AM
Working with Tabular Data using Pandas
Presented by Eunjae Shim and Zheng Guo
- Iterating over files in directories, conversion of excel files to python objects
- Exploratory data analysis session to review matplotlib touched in day 1
10:00 – 10:15 AM
Break
10:15 AM – 12:00 PM
Data Cleaning Techniques
Presented by Eunjae Shim and Zheng Guo
- Handling missing values, imputations, sampling strategies, and transformations
- Grouping and aggregating for further analysis
- Dropping irrelevant columns
12:00 – 1:00 PM
Lunch
1:00 – 1:45 PM
Feature Engineering
Presented by Eunjae Shim and Zheng Guo
- One-hot encoding, standardization, min-max scaling
1:45 – 2:00 PM
Break
2:00 – 2:15 PM
Introduction to sklearn
Presented by Eunjae Shim and Zheng Guo
- Train-test split and their preprocessing
- Training a linear regressor
- Cautionary example of data leakage
2:15 – 2:30 PM
Break
2:30 – 4:00 PM
Introduction to sklearn (cont)
4:00 – 4:30 PM
Importance of problem formulation
Presented by Eunjae Shim
- Presentation of how a same dataset can be treated differently
8:30 – 10:45 AM
Model Evaluation Metrics and Cross-validation Techniques
Presented by Madeline Peters
- Brief review of regression and classification model evaluation metrics
- Implementation of linear regression using scikit-learn
- Allocation of training, validation and testing data
- Interpretation of results
- Calculation model evaluation metrics
- Model selection using various metrics
- Implementation of multi-logistic regression using scikit-learn
- Calculation of model evaluation metrics
- Implementation of simple neural network for demonstrating of cross validation using scikit-learn
- Introduction to scikit-learn pipeline, saving and loading objects (e.g., trained model)
10:45 – 11:00 AM
Break
11:00 AM – 12:30 PM
Model Evaluation Metrics and Cross-validation Techniques (continued)
12:30 – 1:30 PM
Lunch provided
1:30 – 3:00 PM
Review of Building ML Models on an Imbalanced Dataset
Presented by Eunjae Shim, Zheng Guo, and Kamal Abdulraheem
- Weighing samples
- Tuned threshold Cross-validation
- SMOTE: the Synthetic Minority Oversampling Technique for data preparation for classical ML models.
3:00 – 3:15 PM
Break
3:15 – 4:30 PM
Review of Building ML Models on an Imbalanced Dataset (continued)
8:30 – 10:45 AM
Pytorch, keras, and Other Libraries
Presented by Zheng Guo and Eunjae Shim
- Environment setup
- Dataset + Dataloaders, tensors
- CPU vs GPU
- Build a simple model (MNIST)
- Pretraining + fine-tuning
- Loss functions, activation functions, dropouts, batch norms, etc.
→ might be a rather deeper topic than the others / do mention but might not need to get into too much detail - Pitfalls: overfitting, underfitting, gradient explosion and vanishing, etc.
- Hyperparam tuning
- Reproduction by setting random seeds
- Tensorboard (logging)
10:45 – 11:00 AM
Break
11:00 AM – 12:30 PM
Pytorch, keras, and Other Libraries (continued)
12:30 – 1:30 PM
Lunch
1:30 – 3:00 PM
Architectures (CNN, RNN)
Presented by Zheng Guo and Eunjae Shim
- CNN – resource : https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome
- RNN (LSTM)
- Transformer – resource: https://huggingface.co/learn/nlp-course/en/chapter1/1
3:00 – 3:15 PM
Break
3:15 – 4:30 PM
Architectures (CNN, RNN) (continued)
8:30 – 9:30 AM
Introduction to HuggingFace and its Ecosystem for AI Models
Presented by Elle O’Brien
9:30 – 10:30 AM
Downloading / Accessing Pre-Trained Models: For Various Modalities and Research Domains
Presented by Elle O’Brien
- Running inference with pre-trained models and evaluating outputs
- Foundation model
10:30 – 10:45 AM
Break
10:45 AM – 12:30 PM
Journal Club Example with Code
- Reading and Interpreting Model Cards to Assess Capabilities and Limitations
- Fine-tuning Models for Specific Tasks and Domains
12:30 – 1:30 PM
Lunch
1:30 – 2:30 PM
Embeddings
Presented by Elle O’Brien
- Vector similarity measures
- Thinking of foundation models as pretrained embeddings
- Fine-tuning
- Application to clustering & search
2:30 – 2:45 PM
Break
2:45 – 4:00 PM
Topic Modeling with BERTopic Activity
Presented by Elle O’Brien
4:00 – 4:30 PM
Other Resources for Leveraging LLM
Presented by Elle O’Brien
- AWS Bedrock and Leveraging APIs
- Langchain
Week 3: “Passion Week” – Advanced Topics of AI Methods with Applications in Domain Research
Monday, July 21 – Friday, July 25
8:30 AM – 4:30 PM
*Subject to change
Click each section for more details
8:30 – 9:45 AM
Welcome and Overview
Presented by Alexander Rodríguez
- Overview of AI methods for science and engineering
- Overview of the five-day schedule and learning approach
- Review of concepts in scientific modeling
9:45 – 10:00 AM
Break
10:00 – 10:30 AM
Applications in Scientific and Engineering Research
Presented by Zheng Guo
10:30 AM – 12:00 PM
Introduction to Generative Models (GANs, VAEs, Diffusion Models)
Presented by Zheng Guo
12:00 – 1:00 PM
Lunch
1:00 – 2:15 PM
Introduction to Generative Models (GANs, VAEs, Diffusion Models) (continued)
2:15 – 2:30 PM
Break
2:30 – 3:30 PM
Introduction to Generative Models (GANs, VAEs, Diffusion Models) (continued)
3:30 – 4:30 PM
Use Cases in Scientific Research (invited talks)
Presented by Madeline Peters
8:30 – 10:15 AM
Introduction to Causal Inference Concepts (DAGs, counterfactuals) and Causal Reasoning in Research
Presented by Coco Krumme
10:15 – 10:30 AM
Break
10:30 AM – 12:00 PM
Introduction to Causal Inference Concepts (DAGs, counterfactuals) and Causal Reasoning in Research (continued)
12:00 – 1:00 PM
Lunch
1:00 – 2:45 PM
Use Cases
Presented by Coco Krumme
2:45 – 3:00 PM
Break
3:00 – 4:30 PM
Use Cases (continued)
8:30 – 10:30 AM
Neural Operators
Presented by Zheng Guo
- Graph neural operator
- Fourier neural operator
- DeepONet
- Physics-informed neural operators
10:30 – 10:45 AM
Break
10:45 AM – 12:00 PM
Hands on exercise
Presented by Zheng Guo
12:00 – 1:00 PM
Lunch
1:00 – 2:45 PM
Symbolic Regression
Presented by Madeline Peters
- SINDy (Sparse Identification of Nonlinear Dynamical systems)
- Genetic programming approaches
- Deep learning approaches
2:45 – 3:00 PM
Break
3:00 – 4:30 PM
Hands on exercise: Python Library for pySyndy, Example of Genetic Programming
Presented by Madeline Peters
8:30 – 10:00 AM
Introduction to Physics-Informed Neural Networks (PINNs)
Presented by Yiluan Song
- Fundamental ideas
- Their role in scientific modeling
- Common implementation challenges and practical tips
10:00 – 10:15 AM
Break
10:15 AM – 12:00 PM
Introduction to Physics-Informed Neural Networks (PINNs) (continued)
12:00 – 1:00 PM
Lunch
1:00 – 2:15 PM
Hands on exercises
Presented by Yiluan Song
2:15 – 2:30 PM
Break
2:30 – 4:30 PM
Use Cases in Scientific Research (invited talks)
Presented by Liyue Shen, Soumi Tribedi, and Unique Subedi
8:30 – 10:30 AM
Introduction to Uncertainty Quantification
Presented by Madeline Peters
- What is UQ and why do we need UQ?
- Aleatoric (data) uncertainty
- Epistemic (model) uncertainty
- Sensor uncertainty, label uncertainty
- Common approaches
10:30 – 10:45 AM
Break
10:45 AM – 12:30 PM
Strategies for Quantifying and Interpreting Uncertainty in AI Models
Presented by Cindy (Xinyu) Liu
- Evaluation Metrics
12:30 – 1:30 PM
Lunch
1:30 – 2:45 PM
Applications of Uncertainty Quantification in AI/ML Methodologies
Presented by Cindy (Xinyu) Liu and Madeline Peters
- Large Language Models
- Reinforcement Learning
- Active Learning
- Digital Twin
2:45 – 3:00 PM
Break
3:00 – 4:30 PM
Applications of Uncertainty Quantification in Scientific and Engineering Applications
Presented by Cindy (Xinyu) Liu and Madeline Peters
- Biology, materials science, climate studies, and traffic engineering.
- Guest speakers: Dr. Nanta Sophonrat and Liyue Shen (to confirm)
Additional Information
Click each section for more details
By the conclusion of the Academy, participants will be better prepared to integrate AI approaches into their research, collaborate more effectively with AI experts, and be ready to take the next steps in their AI journey.
- Internal Participants (U-M Personnel and Students)
- Weekly Rate: $180
- Discounted Rate (All 3 weeks): $500 (Thanks to the support from the University that allows us to offer a deep discount for U-M employees and students)
- Other Academic Institution and U-M Alumni
- Weekly Rate: $1,000
- Discounted Rate (must be registered for all three weeks): $2,500
- External Participants
- Weekly Rate: $3,000
- Discounted Rate (All 3 weeks): $8,000
This academy is open to researchers in academia, industry and public-sector organizations. We especially welcome university faculty to attend.
Summer academies are designed with faculty, staff, and postdocs in mind. Students are also welcome to apply, though priority will be given to faculty, staff, and postdocs.
- More than 14 days before the first day: full refund minus $50 processing fee
- Cancellation between 7 and 14 days of the first day: 50% refund
- Less than 7 days: no refund
College level math and statistics
Note: prior coding experience is not required
Central Campus Classroom Building (CCCB), Room 2460
1225 Geddes Ave.
Ann Arbor, MI 48109
Parking available nearby includes a parking structure for U-M Blue/Gold permit holders, located at 525 Church St., and metered street parking along Church St. There is also a public garage at 650 S. Forest Ave. View available public parking in Ann Arbor here and real time occupancy counts and public parking structures here.
Summer Academy Faculty

Kamal Abdulraheem
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

James Boyko
Schmidt AI in Science Alum, Michigan Institute for Data and AI in Society

Mohna Chakraborty
Data Science Fellow, Michigan Institute for Data and AI in Society

Haotian Chen
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

Nathan Fox
AI Scientist, Schmidt AI in Science Fellow Alum, Michigan Institute for Data and AI in Society

Zheng Guo
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

CoCo Krumme
Lecturer II in Information, School of Information

Xinyu Liu
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

Amirhossein Moosavi
Data Science Fellow, Michigan Institute for Data and AI in Society

Elle O’Brien
Lecturer IV; Research Investigator, School of Information

Madeline Peters
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

Ken Reid
Data Scientist, Michigan Institute for Data and AI in Society

Alexander Rodríguez
Assistant Professor, Computer Science and Engineering

Kerby Shedden
Professor, Statistics, LSA, Biostatistics, School of Public Health, Director of Consulting for Statistics, Computing, and Analytics Research (CSCAR)

Eunjae Shim
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

Yiluan Song
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society

Xin Wei
Schmidt AI in Science Fellow, Michigan Institute for Data and AI in Society
Questions? Contact Us.
Contact Faculty Training Program Manager, Kelly Psilidis at [email protected]