Graduate Data Science Certificate Program

Academic Requirements

Note: New academic requirements are in effect for students who are accepted to the program after Oct., 2022. Students who have already enrolled in the program by Oct., 2022 will still follow the previous requirements.

There are three fundamental requirements for earning a Graduate Data Science Certificate.

1. Nine graduate credit hours of coursework in approved courses: These courses are designated as core and elective classes, which are each subdivided into three categories: “Algorithms and Applications” (AA), “Data Management” (DM), and “Analysis Methods” (AM). Students are required to choose at least two core courses. Also, students are required to choose one course from each category.

Only one course may be double-counted (up to 3 credits). It is recommended, but not required, that courses outside the main graduate program of study be selected to broaden the student’s data-science experiences (e.g., statistics students may take engineering courses, social-science students may take outside statistics and application courses, etc.).

2. A Data Science related experience (3 credit semester equivalent, over 160 hours for work): This can take the form of non-credit activity like an internship, practicum, or professional project equivalent to a three credit-hour course, or additional coursework of at least three credits from the approved course list. (This course may be double-counted with another Rackham degree program.) To satisfy this “Plus Requirement” with a data-related experience, students will need to have their supervisor or mentor sign the verification form certifying that the student spent sufficient time working on a data-intense project during that practicum. Alternatively, if allowed and approved by the mentor, students may complete and submit to the DS Certificate Program Chair a report (2-6 pages) describing their experience and results, which will be evaluated to ensure the project demonstrates Data Science content, relevance and applications.

3. Regular attendance of the MIDAS Seminar Series, which brings nationally recognized data scientists to U-M, is required. One semester (1-credit) enrollment in EECS 409 (MIDAS Seminar) is required* (could count towards the 9 didactic credits). This colloquial training will expose students to current DS developments beyond the boundaries of their own discipline. Students will be required to attend 75% of all seminars to complete the requirement (attendance will be taken).

*Please note that EECS 409 is being offered Winter semester of 2024, but is not expected to be offered beyond that.  We will be suitably modifying this seminar requirement in 2024.

In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:

Enrollment Prerequisites:

Prerequisites Skills Rationale
Completed Undergraduate Degree Quantitative training and coding skills as described below The DS certificate is a graduate program requiring a minimum level of quantitative skill
Quantitative Training Undergraduate calculus, linear algebra and introduction to probability and statistics These are the entry level skills required for most upper-level undergraduate and graduate courses in the program
Coding Experience Exposure to software development or programming on the job or in the classroom Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB
Motivation Significant interest and motivation to pursue quantitative data analytic applications Dedication for prolonged and sustained immersion into hands-on and methodological research

In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:

Completion Competencies:

Areas Competency Expectation Notes
Algorithms and Applications Tools Working knowledge of basic software tools (command-line, GUI based, or web-services) Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL
Algorithms Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity,  and data structures Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching
Application Domain Data analysis experience from at least one application area, either  through coursework, internship, research project, etc. Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences
Data  Management Data validation & visualization Curation, Exploratory Data Analysis (EDA) and visualization Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)
Data wrangling Skills for data normalization, data cleaning, data aggregation, and data  harmonization/registration

 

Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data
Data infrastructure Handling databases, web-services, Hadoop, multi-source data Data structures, SOAP protocols, ontologies, XML, JSON, streaming
Analysis Methods Statistical inference Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression
Study design and  diagnostics Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction
Machine Learning Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning