Academic Requirements
There are three fundamental requirements for earning a Graduate Data Science Certificate Program.
- Nine graduate credit hours of coursework in approved courses. These courses are designated as core and elective Methods, Technology or Applications. Only one course may be double-counted (up to 3 credits). It is recommended, but not required, that courses outside the main graduate program of study be selected to broaden the student data-science experiences (e.g., statistics students may take engineering courses, social-science students may take outside statistics and application courses, etc.).
- A Data Science related experience (3 credit semester equivalent, over 160 hours for work). This can take the form of non-credit activity like an internship, practicum, or professional project equivalent to a three credit-hour course, or additional coursework of at least three credits from the approved course list. (This course may be double-counted with another Rackham degree program.) To satisfy this “Plus Requirement” with a data-related experience, students will need to have their supervisor or mentor sign the verification form certifying that the student spent sufficient time working on a data-intense project during that practicum. Alternatively, if allowed and approved by the mentor, students may complete and submit to the DS Certificate Program Chair a report (2-6 pages) describing their experience and results, which will be evaluated to ensure the project demonstrates Data Science content, relevance and applications.
- Regular attendance of the MIDAS Seminar Series, which brings nationally recognized data scientists to U-M, is required. One semester (1-credit) enrollment in EECS 409 (MIDAS Seminar) is required (could count towards the 9 didactic credits). This colloquial training will expose students to current DS developments beyond the boundaries of their own discipline. Students will be required to attend 75% of all seminars (attendance will be taken) to complete the requirement.
In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:
Enrollment Prerequisites:
Prerequisites | Skills | Rationale |
Completed Undergraduate Degree | Quantitative training and coding skills as described below | The DS certificate is a graduate program requiring a minimum level of quantitative skill |
Quantitative Training | Undergraduate calculus, linear algebra and introduction to probability and statistics | These are the entry level skills required for most upper-level undergraduate and graduate courses in the program |
Coding Experience | Exposure to software development or programming on the job or in the classroom | Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB |
Motivation | Significant interest and motivation to pursue quantitative data analytic applications | Dedication for prolonged and sustained immersion into hands-on and methodological research |
In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:
Completion Competencies:
Areas | Competency | Expectation | Notes | |
Algorithms and Applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL | |
Algorithms | Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | ||
Application Domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences | ||
Data Management | Data validation & visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) | |
Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration
|
Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data | ||
Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | ||
Analysis Methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression | |
Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | ||
Machine
Learning |
Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |