Fundamental Requirements
There are three fundamental requirements for earning a Graduate Data Science Certificate.
- Nine graduate credit hours of coursework in approved courses: These courses are designated as core and elective classes, which are each subdivided into three categories: “Algorithms and Applications” (AA), “Data Management” (DM), and “Analysis Methods” (AM). Students are required to choose at least two core courses. Also, students are required to choose one course from each category.
Only one course may be double-counted (up to 3 credits). It is recommended, but not required, that courses outside the main graduate program of study be selected to broaden the student’s data-science experiences (e.g., statistics students may take engineering courses, social-science students may take outside statistics and application courses, etc.). - A data science related experience (3 credit semester equivalent, over 160 hours for work): This can take the form of non-credit activity like an internship, practicum, or professional project equivalent to a three credit-hour course, or additional coursework of at least three credits from the approved course list. (This course may be double-counted with another Rackham degree program.) To satisfy this “Plus Requirement” with a data-related experience, students will need to have their supervisor or mentor sign the verification form certifying that the student spent sufficient time working on a data-intense project during that practicum. Alternatively, if allowed and approved by the mentor, students may complete and submit to the DS Certificate Program Chair a report (2-6 pages) describing their experience and results, which will be evaluated to ensure the project demonstrates data science content, relevance and applications.
- For the 2024 cohort and prior: Students should participate in at least 7-9 data-science specific seminars (1 semester) to enrich their formal didactic training. These seminar series could be from different schools, Institutes, Initiatives, Centers, etc. Seminar attendance should be recorded at https://forms.gle/jURhCeaBzG6FoVyf9
Enrollment Prerequisites
In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:
Prerequisites | Skills | Rationale |
Completed Undergraduate Degree | Quantitative training and coding skills as described below | The DS certificate is a graduate program requiring a minimum level of quantitative skill |
Quantitative Training | Undergraduate calculus, linear algebra and introduction to probability and statistics | These are the entry level skills required for most upper-level undergraduate and graduate courses in the program |
Coding Experience | Exposure to software development or programming on the job or in the classroom | Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB |
Motivation | Significant interest and motivation to pursue quantitative data analytic applications | Dedication for prolonged and sustained immersion into hands-on and methodological research |
Completion Competencies
In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:
Areas | Competency | Expectation | Notes | |
Algorithms and Applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL | |
Algorithms | Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | ||
Application Domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences | ||
Data Management | Data validation & visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) | |
Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration | Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data | ||
Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | ||
Analysis Methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression | |
Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | ||
Machine Learning | Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |