AI for Scientists and Engineers Summer Academy 2025

July 7, 8:30 AM - July 25, 2025, 4:30 PM

Central Campus Classroom Building (CCCB), Room 2460
1225 Geddes Ave.
Ann Arbor, MI 48109

Registration is now closed. Please check out our other events.

Academy Overview

The AI for Scientists and Engineers Summer Academy is designed for academic researchers, including university faculty, in a wide range of domains including biological sciences, engineering, environmental and earth science, physical sciences, and social sciences. Participants will learn the mathematical foundations of machine learning (ML), critically assess the data used in AI models, evaluate and validate ML model outputs, and understand strategic considerations for incorporating AI into research workflows. The prerequisites are college level math and statistics; prior coding experience is not required. Specific topics include supervised and unsupervised learning, neural networks, causal inference, and science-informed machine learning models.

The Summer Academy consists of three weeks of instructions, with different focuses. One can choose to attend any or all weeks; however, weeks 2 and 3 require some prior knowledge of AI / ML.

Week 1 (Monday, July 7 – Friday, July 11, 2025): The conceptual understanding of AI and its applications in domain research.
Week 2 (Monday, July 14 – Friday, July 18, 2025): The implementation of ML models in a Python environment.
Week 3 (Monday, July 21 – Friday, July 25, 2025): Advanced topics of AI and its applications in domain research.

Participants are expected to bring a laptop for programming components of the academy.

Light breakfast options will be available daily. A dedicated lunch reception is planned for Wednesday each week.

Week 1: Concepts and Applications

Monday, July 7 – Friday, July 11
8:30 AM – 4:30 PM

*Subject to change

Click each section for more details

8:30 – 9:00 AM

Welcome and Program Overview

Presented by Kerby Shedden

Goals of the course: What participants will be able to do after the course
The role of AI in scientific and engineering research
Overview of the five-day schedule and learning approach

9:00 – 10:30 AM

Conceptual Introduction to AI and ML

Presented by Madeline Peters and Kamal Abdulraheem

What is AI? What is machine learning (ML)? Differences and overlaps
Key types of AI: Supervised, unsupervised, reinforcement learning, neural network, large language models (LLMs), and symbolic AI
Feature engineering
- What are features and major types of features (temporal, categorical, ordinal, etc), and what is the process of feature engineering
- A walk through different features at a high level (e.g. a bird song) and how they are turned into data for ML (e.g. pitch or volume values)
- Discussion with the participants around the kinds of features they work with.

10:30 – 10:45 AM

Break

10:45 AM – 12:30 PM

Review of Core Mathematical Foundations for AI

Presented by Kerby Shedden and Kamal Abdulraheem

Probability and statistics for AI: Probability distributions, conditioning, independence, conditional mean and variance, measures of association, limiting distributions and concentration
Function approximation – basis expansions and convolution
Differentiability, gradients, and optimization

12:30 – 1:30 PM

Lunch

1:30 – 3:15 PM

Model Validation, and Assessment

Presented by Madeline Peters

Underfitting and overfitting
Training, testing, and validation: Why and how to use held-out data
Model evaluation metrics for prediction and classification
Conceptual demo: simple example of calculating regression evaluation metrics

3:15 – 3:30 PM

Break

3:30 – 4:30 PM

Research Rigor, Reproducibility and Ethics

Presented by Jing Liu

Data privacy, responsible AI, model documentation (e.g., model cards) and transparent reporting
Reproducibility principles for scientific AI

8:30 – 10:30 AM

Principles of Supervised Learning

Presented by Cindy (Xinyu) Liu

What is supervised learning? Key goals and use cases in science and engineering
Key models: regression models, tree-based models, and ensemble models
Training (loss functions, optimization)

10:30 – 10:45 AM

Break

10:45 AM – 12:30 PM

Introduction to Deep Learning

Presented by Ken Reid

Key concepts: Layers, neurons, activation functions, loss functions
Overview of modern architectures (MLP, CNN, RNN, Transformers)
Optimization involved in fitting neural networks
How neural networks are different from classical ML models (computational efficiency, explainability)

12:30 – 1:30 PM

Lunch

1:30 – 3:30 PM

How Scientists Use Supervised Learning

Presented by Xin Wei

1 hour presentation of AI in natural hazards research
1 hour group discussion: A few more examples of AI (supervised learning) applications in other domains; How overfitting and underfitting arise in research workflows; Prediction uncertainties.

8:30 – 10:00 AM

Linear algebra for AI

Presented by Kamal Abdulraheem

Vectors, matrices
Eigenvalues/eigenvectors
Matrix factorization – spectral decomposition, singular value decomposition (SVD)

10:00 – 10:15 AM

Break

10:15- 11:45 AM

Introduction to Unsupervised Learning

Presented by Cindy (Xinyu) Liu

Differences from supervised learning and common use cases
Clustering: k-means, hierarchical clustering, and when to use them
Validating clusters/how many to have?
Using unsupervised learning for feature engineering (what it is, why and when to use it).

11:45 AM – 12:30 PM

Dimensionality Reduction

Presented by Kamal Abdulraheem and Cindy (Xinyu) Liu

Curse of dimensionality – nonparametric estimation in high dimensions
Covariance matrices
Approximate low dimensionality
Linear methods (PCA)
Nonlinear methods (t-SNE, UMAP)
Connections to deep learning
Word embeddings
Supervised dimension reduction
How do you decide if the dimensionality reduction is reasonable? Useful?

12:30 – 1:30 PM

Lunch provided

1:30 – 2:30 PM

Dimensionality Reduction (continued)

2:30 – 2:45 PM

Break

2:45 – 4:30 PM

Applications of Unsupervised Learning in Domain Research

Presented by Amirhossein Moosavi

Small group discussion
Deep learning in health care
Applications in biology, materials science, climate studies, and other domains. Computational social science examples with text embeddings.

8:30 – 10:00 AM

Statistical Inference and Data-Centric Concepts

Presented by Kerby Shedden

Basics of statistical estimation and inference, uncertainty assessment, basic large sample theory
Understanding bias, variance, and the bias-variance tradeoff
Brief overview of key data challenges in data-driven science & engineering: Selection bias, causality, confounding, measurement error, nonstationarity, generalization, calibration, fairness.

10:00 – 10:15 AM

Break

10:15 AM – 12:00 PM

Model Interpretation and Causal Inference in Research

Presented by Kerby Shedden and Kamal Abdulraheem

Benchmarks- what do they mean? Which are useful?
Model uncertainty and uncertainty quantification
- Comparing models- is one model substantially better than another [also elsewhere, do we need it here?]? How sure are we?
Causality vs. correlation; conditions needed for valid causal inference
Propensity scores, matching, and methods to estimate causal effects
Strategies for assessing and validating AI model outputs

12:00 – 1:00 PM

Lunch

1:00 – 2:30 PM

Use Cases in Science and Engineering

Presented by Nathan Fox

Applications in image recognition, natural language processing, etc

2:30 – 2:45 PM

Break

2:45 – 4:30 PM

Science-Informed ML Models and Specific Examples

Presented by Haotian Chen

8:30 – 11:00 AM

Foundation Models

Presented by James Boyko

What is a large language model?
What is special about them compared to other deep learning models?
ChatGPT vs. general class of transformer based models
Multimodal/other applications (time series prediction)
Scientific foundation models
Use cases in domain research

11:00 – 11:15 AM

Break

11:15 AM – 12:00 PM

Addressing Practical Challenges in AI for Research

Presented by Ken Reid and Kerby Shedden

Recognizing imperfect data and basic techniques to address them: missing data, selection bias, and non-representative data
Data leakage and its scientific consequences
How to handle outliers and anomalies
Hyperparameter tuning, both the methods and the concerns when auto tuning parameters

12:00 – 1:00 PM

Lunch

1:00 – 2:00 PM

Addressing Practical Challenges in AI for Research (continued)

2:00 – 2:15 PM

Break

2:15 – 3:30 PM

Closing Discussion and Recommendations for Further Learning

Presented by Jing Liu

Week 2: Implementations of AI for Research with Python

Monday, July 14 – Friday, July 18
8:30 AM – 4:30 PM

*Subject to change

Click each section for more details

8:30 – 9:00 AM

Welcome and Overview

Overview of the five-day schedule and learning approach

The Computing Environment: On-Demand Notebooks

Presented by Elle O’Brien

9:00 – 10:00 AM

Introduction to Python Programming

Presented by Ken Reid

Fundamentals of the python language
Python interpreter, elements of a Python package
Overview of Python libraries: NumPy, Pandas, Matplotlib, SKlearn
Managing data files, paths, and APIs in Python
Debugging and repairing Python code

10:00 – 10:15 AM

Break

10:15 – 11:30 AM

Introduction to Python Programming (continued)

11:30 AM – 12:30 PM

Lunch

12:30 – 2:15 PM

Interactive Notebooks and IDEs

Presented by Eunjae Shim

What is the Jupyter notebook at a high level
Creating notebooks/scripts in VS Code and understanding how they are similar to or different from jupyter notebooks
Strengthening python programming through notebooks (worked practice)

2:15 – 2:30 PM

Break

2:30 – 4:30 PM

AI-Assisted Coding and Development

Presented by Zheng Guo

Overview of Github copilot
Coding, debugging, documentation and code packaging with AI
Asking questions, refactoring code, and verification of methods through test driven development (without introducing any particular testing library)

8:30 – 10:00 AM

Working with Tabular Data using Pandas

Presented by Eunjae Shim

Iterating over files in directories, conversion of excel files to python objects
Exploratory data analysis session to review matplotlib touched in day 1

10:00 – 10:15 AM

Break

10:15 AM – 12:00 PM

Data Cleaning Techniques

Presented by Eunjae Shim

Handling missing values, imputations, sampling strategies, and transformations
Grouping and aggregating for further analysis
Dropping irrelevant columns

12:00 – 1:00 PM

Lunch

1:00 – 1:45 PM

Feature Engineering

Presented by Eunjae Shim

One-hot encoding, standardization, min-max scaling

1:45 – 2:00 PM

Break

2:00 – 2:15 PM

Introduction to sklearn

Presented by Eunjae Shim

Train-test split and their preprocessing
Training a linear regressor
Cautionary example of data leakage

2:15 – 2:30 PM

Break

2:30 – 4:00 PM

Introduction to sklearn (cont)

4:00 – 4:30 PM

Importance of problem formulation

Presented by Eunjae Shim

Presentation of how a same dataset can be treated differently

8:30 – 10:45 AM

Model Evaluation Metrics and Cross-validation Techniques

Presented by Madeline Peters

Brief review of regression and classification model evaluation metrics
Implementation of linear regression using scikit-learn
- Allocation of training, validation and testing data
- Interpretation of results
- Calculation model evaluation metrics
- Model selection using various metrics
Implementation of multi-logistic regression using scikit-learn
- Calculation of model evaluation metrics
Implementation of simple neural network for demonstrating of cross validation using scikit-learn
- Introduction to scikit-learn pipeline, saving and loading objects (e.g., trained model)

10:45 – 11:00 AM

Break

11:00 AM – 12:30 PM

Model Evaluation Metrics and Cross-validation Techniques (continued)

12:30 – 1:30 PM

Lunch provided

1:30 – 3:00 PM

Review of Building ML Models on an Imbalanced Dataset

Presented by Eunjae Shim

Weighing samples
Tuned threshold Cross-validation
SMOTE: the Synthetic Minority Oversampling Technique for data preparation for classical ML models.

3:00 – 3:15 PM

Break

3:15 – 4:30 PM

Review of Building ML Models on an Imbalanced Dataset (continued)

8:30 – 10:45 AM

Pytorch, keras, and Other Libraries

Presented by Zheng Guo

Environment setup
Dataset + Dataloaders, tensors
CPU vs GPU
Build a simple model (MNIST)
Pretraining + fine-tuning
Loss functions, activation functions, dropouts, batch norms, etc.
→ might be a rather deeper topic than the others / do mention but might not need to get into too much detail
Pitfalls: overfitting, underfitting, gradient explosion and vanishing, etc.
Hyperparam tuning
Reproduction by setting random seeds
Tensorboard (logging)

10:45 – 11:00 AM

Break

11:00 AM – 12:30 PM

Pytorch, keras, and Other Libraries (continued)

12:30 – 1:30 PM

Lunch

1:30 – 3:00 PM

Architectures (CNN, RNN)

Presented by Zheng Guo

CNN – resource : https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome
RNN (LSTM)
Transformer – resource: https://huggingface.co/learn/nlp-course/en/chapter1/1

3:00 – 3:15 PM

Break

3:15 – 4:30 PM

Architectures (CNN, RNN) (continued)

8:30 – 9:30 AM

Introduction to HuggingFace and its Ecosystem for AI Models

Presented by Elle O’Brien

9:30 – 10:30 AM

Downloading / Accessing Pre-Trained Models: For Various Modalities and Research Domains

Presented by Elle O’Brien

Running inference with pre-trained models and evaluating outputs
Foundation model

10:30 – 10:45 AM

Break

10:45 AM – 12:30 PM

Journal Club Example with Code

Reading and Interpreting Model Cards to Assess Capabilities and Limitations
Fine-tuning Models for Specific Tasks and Domains

12:30 – 1:30 PM

Lunch

1:30 – 2:30 PM

Embeddings

Presented by Elle O’Brien

Vector similarity measures
Thinking of foundation models as pretrained embeddings
Fine-tuning
Application to clustering & search

2:30 – 2:45 PM

Break

2:45 – 4:00 PM

Topic Modeling with BERTopic Activity

Presented by Elle O’Brien

4:00 – 4:30 PM

Other Resources for Leveraging LLM

Presented by Elle O’Brien

AWS Bedrock and Leveraging APIs
Langchain

Week 3: “Passion Week” – Advanced Topics of AI Methods with Applications in Domain Research

Monday, July 21 – Friday, July 25
8:30 AM – 4:30 PM

*Subject to change

Click each section for more details

8:30 – 9:45 AM

Welcome and Overview

Presented by Alexander Rodríguez

Overview of AI methods for science and engineering
Overview of the five-day schedule and learning approach
Review of concepts in scientific modeling

9:45 – 10:00 AM

Break

10:00 – 10:30 AM

Applications in Scientific and Engineering Research

Presented by Zheng Guo

10:30 AM – 12:00 PM

Introduction to Generative Models (GANs, VAEs, Diffusion Models)

Presented by Zheng Guo

12:00 – 1:00 PM

Lunch

1:00 – 2:15 PM

Introduction to Generative Models (GANs, VAEs, Diffusion Models) (continued)

2:15 – 2:30 PM

Break

2:30 – 3:30 PM

Introduction to Generative Models (GANs, VAEs, Diffusion Models) (continued)

3:30 – 4:30 PM

Use Cases in Scientific Research (invited talks)

Presented by Madeline Peters

8:30 – 10:15 AM

Introduction to Causal Inference Concepts (DAGs, counterfactuals) and Causal Reasoning in Research

Presented by Coco Krumme

10:15 – 10:30 AM

Break

10:30 AM – 12:00 PM

Introduction to Causal Inference Concepts (DAGs, counterfactuals) and Causal Reasoning in Research (continued)

12:00 – 1:00 PM

Lunch

1:00 – 2:45 PM

Use Cases

Presented by Coco Krumme

2:45 – 3:00 PM

Break

3:00 – 4:30 PM

Use Cases (continued)

8:30 – 10:30 AM

Neural Operators

Presented by Zheng Guo

Graph neural operator
Fourier neural operator
DeepONet
Physics-informed neural operators

10:30 – 10:45 AM

Break

10:45 AM – 12:00 PM

Hands on exercise

Presented by Zheng Guo

12:00 – 1:00 PM

Lunch

1:00 – 2:45 PM

Symbolic Regression

Presented by Madeline Peters

SINDy (Sparse Identification of Nonlinear Dynamical systems)
Genetic programming approaches
Deep learning approaches

2:45 – 3:00 PM

Break

3:00 – 4:30 PM

Hands on exercise: Python Library for pySyndy, Example of Genetic Programming

Presented by Madeline Peters

8:30 – 10:00 AM

Introduction to Physics-Informed Neural Networks (PINNs)

Presented by Yiluan Song

Fundamental ideas
Their role in scientific modeling
Common implementation challenges and practical tips

10:00 – 10:15 AM

Break

10:15 AM – 12:00 PM

Introduction to Physics-Informed Neural Networks (PINNs) (continued)

12:00 – 1:00 PM

Lunch

1:00 – 2:15 PM

Hands on exercises

Presented by Yiluan Song

2:15 – 2:30 PM

Break

2:30 – 4:30 PM

Use Cases in Scientific Research (invited talks)

Presented by Liyue Shen, Soumi Tribedi, and Unique Subedi

8:30 – 10:30 AM

Introduction to Uncertainty Quantification

Presented by Madeline Peters

What is UQ and why do we need UQ?
Aleatoric (data) uncertainty
Epistemic (model) uncertainty
Sensor uncertainty, label uncertainty
Common approaches

10:30 – 10:45 AM

Break

10:45 AM – 12:30 PM

Strategies for Quantifying and Interpreting Uncertainty in AI Models

Presented by Cindy (Xinyu) Liu

Evaluation Metrics

12:30 – 1:30 PM

Lunch

1:30 – 2:45 PM

Applications of Uncertainty Quantification in AI/ML Methodologies

Presented by Cindy (Xinyu) Liu and Madeline Peters

Large Language Models
Reinforcement Learning
Active Learning
Digital Twin

2:45 – 3:00 PM

Break

3:00 – 4:30 PM

Applications of Uncertainty Quantification in Scientific and Engineering Applications

Presented by Cindy (Xinyu) Liu and Madeline Peters

Biology, materials science, climate studies, and traffic engineering.
Guest speakers: Dr. Nanta Sophonrat and Liyue Shen (to confirm)

Additional Information

Click each section for more details

By the conclusion of the Academy, participants will be better prepared to integrate AI approaches into their research, collaborate more effectively with AI experts, and be ready to take the next steps in their AI journey.

Internal Participants (U-M Personnel and Students)
- Weekly Rate: $180
- Discounted Rate (All 3 weeks): $500 (Thanks to the support from the University that allows us to offer a deep discount for U-M employees and students)
Other Academic Institution and U-M Alumni
- Weekly Rate: $1,000
- Discounted Rate (must be registered for all three weeks): $2,500
External Participants
- Weekly Rate: $3,000
- Discounted Rate (All 3 weeks): $8,000

This academy is open to researchers in academia, industry and public-sector organizations. We especially welcome university faculty to attend.

Summer academies are designed with faculty, staff, and postdocs in mind. Students are also welcome to apply, though priority will be given to faculty, staff, and postdocs.

More than 14 days before the first day: full refund minus $50 processing fee
Cancellation between 7 and 14 days of the first day: 50% refund
Less than 7 days: no refund

College level math and statistics

Note: prior coding experience is not required

Central Campus Classroom Building (CCCB), Room 2460
1225 Geddes Ave.
Ann Arbor, MI 48109

View Campus Map

Parking available nearby includes a parking structure for U-M Blue/Gold permit holders, located at 525 Church St., and metered street parking along Church St. There is also a public garage at 650 S. Forest Ave. View available public parking in Ann Arbor here and real time occupancy counts and public parking structures here.