Data and AI Intensive Research with Rigor and Reproducibility


(Required for all trainees) Unit 0: Introduction: Responsible conduct of research (RCR), rigor and reproducibility in the complex world of biomedical data science (4 hours)

0.1. RCR in the context of biomedical data science.

0.2. An overview of rigor and reproducibility considerations in biomedical research that employs data science and AI methods.

0.3. The complexity of biomedical data and the need for insight integration.

(Required for all trainees) Unit 1: Ethical issues in biomedical data science (4 hours)

1.1. What are ethics? 

1.2. Informed Consent.

1.3. Privacy.

1.4. Fairness.

(Elective) Unit 2. All about data: data management, representation, metadata, and data sharing with confidentiality considerations (16 hours)

2.1. Data management. 

2.2. Data representation. 

2.3. Metadata. 

2.4. Data sharing.

(Elective) Unit 3: Rigorous statistical design (8 hours)

3.1. An introduction to fundamental concepts.

3.2. Case studies representative of modern biomedical studies.

3.3. Summary of the fundamental concepts and their implementation in diverse settings.

(Elective) Unit 4. Design and reporting of predictive models (8 hours)

4.1. A review of predictive modeling modeling. 

4.2. Data preparation. Data cleaning, distributional checks, dimension reduction and their underlying assumptions, and consequences for downstream inference. 

4.3. Modeling tools.

4.4. Assessment of bias and fairness within predictive models.

4.5. How to report research with predictive models.

4.6. How to read an ML paper.

(Elective) Unit 5: Reproducible workflows (8 hours)

5.1. Goals of Reproducible Analyses: reproducible, user friendly, transparent, reusable, version controlled, permanently archived.

5.2. Reproducibility via Code Notebooks.

5.3. Best practices for Reproducible Programming.

5.4. Version Control.

5.5. Containers.

5.6. Putting Everything Together.

(Elective) Unit 6: Meta-analysis (8 hours)

6.1. Key concepts in research synthesis.

6.2. Basic adjustment for heterogeneity and miscalibration.

6.3. Assessment of study results heterogeneity. 

6.4. Multiple testing and causality.

(Elective) Unit 7: Transformer-based AI in Biomedical Research (8 hours)

7.1. Theoretical Foundations.

7.2. Transformers in Biomedical Research.

7.3. Data Management for Transformers.

7.4. Setting Up the Environment.

7.5. Model Training and Fine-Tuning.

7.6. Data Sharing with Transformers.

7.7. Data Representation and Result Interpretation.

Read full curriculum details here: