We are no longer accepting applicants.
Academy Overview
The three-day Data- and AI-intensive Research with Rigor and Reproducibility for U-M Biomedical Researchers Summer Academy 2025 is designed for University of Michigan biomedical researchers and faculty.
The rigor of scientific research and the reproducibility of research results are essential for the validity of research findings and the trustworthiness of science. However, research rigor and reproducibility remains a significant challenge across scientific fields, especially for research with complex data types from heterogeneous sources, and long data manipulation pipelines. This is especially critical as data science and artificial intelligence (AI) methods emerge at lightning speed and researchers scramble to seize the opportunities that the new methods bring.
Participants will develop the intellectual framework and technical skills to ensure the rigor and reproducibility of biomedical and healthcare research with cutting-edge data science and artificial intelligence (AI) methods. Specific topics include ethical issues in biomedical data science, data management, representation, and sharing; rigorous analytical design; the design and reporting of AI models; generative AI; and reproducible workflow.
Participants are expected to bring a laptop for programming components of the academy.
Light breakfast options will be available daily. A dedicated lunch reception is planned for Wednesday.
Curriculum and Schedule
8:30 AM – 4:30 PM each day
*Subject to change
Click each section for more details
Day 1
8:30 – 10:00 AM
Ethical Considerations for Data and AI
This lesson equips students to address ethical challenges in biomedical data science. Learners will identify strategies for ethical secondary data use, analyze stakeholder engagement approaches, and develop frameworks for ethical project review, emphasizing anticipatory governance and responsible data science practices.
10:00 – 10:15 AM
Break
10:15 – 11:45 AM
Responsible Conduct of Research in the Age of AI
This introductory unit includes an overview and discussion on RCR in the context of data- and AI-intensive research in biomedical science, key factors for rigor and reproducibility in data-intensive research, the human scientist’s responsibilities in AI-assisted research, and team dynamics for research rigor and reproducibility.
12:00 – 1:00 PM
Lunch on your own
1:00 – 2:30 PM
Rigor as a Key Consideration in Statistical Design
This session will review the dominant framework for conducting empirical research, which is based on statistics and probability. Participants will learn to consider how the manner in which data are collected and formally analyzed impacts the conclusions that can be drawn in their research, and the limitations and uncertainty of those conclusions. This session aims to enable participants to develop innovative research aims and sophisticated data analysis plans in their domains of expertise. Specific topics include measurement, types of study designs, sources of bias and uncertainty, power analysis, and some basic ideas from causal inference.
2:30 – 2:45 PM
Break
2:45 – 4:30 PM
Rigor as a Key Consideration in Statistical Design (cont)
Day 2
8:30 – 10:00 AM
Building Robust ML Models and Critically Evaluating ML Models
In this session, we will review the basics of predictive modeling and approaches to build an accurate and reproducible model, introduce best practices in reporting that will allow others to appropriately interpret and reproduce the results, and discuss guiding principles on how to reproduce others’ results.
10:00 – 10:15 AM
Break
10:15 – 11:45 AM
Building Robust ML Models and Critically Evaluating ML Models (cont)
12:00 – 1:00 PM
Lunch on your own
1:00 – 2:30 PM
Data Representation and Metadata to Make Data “AI Ready”
This lesson examines how data can be represented in multiple ways, highlighting that each representation impacts task efficiency. Students will learn to select optimal data representations tailored to specific research tasks, balancing ease and complexity
2:30 – 2:45 PM
Break
2:45 – 4:30 PM
Privacy with Data and AI
This session will introduce industry standards for how data should be described, represented, validated, and protected for efficient reproducibility. We will provide real world examples of metadata, transparent code books, and data sharing plans. Additionally, you will leave with an appreciation of data security and privacy, including what tools can be used on various types of biomedical research data.
Day 3
8:30 – 10:00 AM
Packaging Research Projects for Reproducibility
Learn the key goals and challenges of creating reproducible, transparent, and user-friendly analyses that are easy to share and reuse. Software tools emphasized include Docker/Podman and code notebooks (Jupyter, markdown).
10:00 – 10:15 AM
Break
10:15 – 11:45 AM
Packaging Research Projects for Reproducibility (cont)
12:00 – 1:00 PM
Lunch provided
1:00 – 2:30 PM
Meta-analysis as the assessment of the rigor and reproducibility of published results
This session will provide an introduction to quantitative meta-research, where the goal is to integrate findings from multiple research studies, usually lacking access to the primary data of those studies. We will begin with the principles of why integrating multiple studies may reduce uncertainty and bias, but need not always do so. We will focus on the core notion of heterogeneity, discuss methods that can be used to assess the presence of heterogeneity, and present tools that can provide effective meta-analytic summaries when heterogeneity is present and when it is not present.
2:30 – 2:45 PM
Break
2:45 – 4:30 PM
Meta-analysis as the Assessment of the Rigor and Reproducibility of Published Results (cont)
Additional Information
Click each section for more details
By the conclusion of the Academy, participants will develop the intellectual framework and technical skills to ensure the rigor and reproducibility of biomedical and healthcare research with cutting-edge data science and artificial intelligence (AI) methods.
- Internal Participants (University of Michigan researchers and faculty who carry out biomedical, clinical and healthcare research that is data-intensive)
- Cost: $120
This academy workshop is open only for University of Michigan researchers and faculty who carry out biomedical, clinical and healthcare research that is data- and / or AI-intensive.
Summer academies are designed with faculty, staff, and postdocs in mind. Students are also welcome to apply, though priority will be given to faculty, staff, and postdocs.
- More than 14 days before the first day: full refund minus $50 processing fee
- Cancellation between 7 and 14 days of the first day: 50% refund
- Less than 7 days: no refund
- Meet institutional training requirements for human subject research (e.g. CITI Training certificate etc).
- Some experience in study design, data processing and analysis, and reporting.
- Conceptual understanding about the processes of data science projects and methodologies covered in the units, so that they understand critical decisions to support rigor and reproducibility.
It is possible that some trainees will need to acquire prior knowledge in order to participate in the academy, for example, learning the very basics of R or Python. The instructors will provide pre-reading materials and tutorials as needed.
Central Campus Classroom Building (CCCB), Room 2460
1225 Geddes Ave.
Ann Arbor, MI 48109
Parking available nearby includes a parking structure for U-M Blue/Gold permit holders, located at 525 Church St., and metered street parking along Church St. There is also a public garage at 650 S. Forest Ave. View available public parking in Ann Arbor here and real time occupancy counts and public parking structures here.
Instructors

Johann Gagnon-Bartsch
Assistant Professor, Statistics, LSA

Greg Hunt
Assistant Professor of Mathematics, College of William and Mary

H.V. Jagadish
Edgar F Codd Distinguished University Professor and Bernard A Galler Collegiate Professor. EECS, College of Engineering; MIDAS Director

Erin Kaleba
Director, Data Office for Clinical and Translational Research, University of Michigan

Jing Liu
MIDAS Executive Director

Kerby Shedden
Professor, Statistics, LSA, Biostatistics, School of Public HealthDirector of Consulting for Statistics, Computing, and Analytics Research (CSCAR)

Suraj Rampure
Lecturer III, Electrical Engineering and Computer Sciences
Questions? Contact Us.
Contact Faculty Training Program Manager, Kelly Psilidis at [email protected]