There are three fundamental requirements for earning a Graduate Data Science Certificate Program.
- Nine graduate credit hours of coursework in approved courses. These courses are designated as core and elective Methods, Technology or Applications. Only one course may be double-counted (up to 3 credits). It is recommended, but not required, that courses outside the main graduate program of study be selected to broaden the student data-science experiences (e.g., statistics students may take engineering courses, social-science students may take outside statistics and application courses, etc.).
- A Data Science related experience (3 credit semester equivalent, over 160 hours for work). This can take the form of non-credit activity like an internship, practicum, or professional project equivalent to a three credit-hour course, or additional coursework of at least three credits from the approved course list. (This course may be double-counted with another Rackham degree program.) To satisfy this “Plus Requirement” with a data-related experience, students will need to have their supervisor or mentor sign the verification form certifying that the student spent sufficient time working on a data-intense project during that practicum. Alternatively, if allowed and approved by the mentor, students may complete and submit to the DS Certificate Program Chair a report (2-6 pages) describing their experience and results, which will be evaluated to ensure the project demonstrates Data Science content, relevance and applications.
- Regular attendance of the MIDAS Seminar Series, which brings nationally recognized data scientists to U-M, is required. One semester (1-credit) enrollment in EECS 409 (MIDAS Seminar) is required (could count towards the 9 didactic credits). This colloquial training will expose students to current DS developments beyond the boundaries of their own discipline. Students will be required to attend 75% of all seminars (attendance will be taken) to complete the requirement.
In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:
|Completed Undergraduate Degree||Quantitative training and coding skills as described below||The DS certificate is a graduate program requiring a minimum level of quantitative skill|
|Quantitative Training||Undergraduate calculus, linear algebra and introduction to probability and statistics||These are the entry level skills required for most upper-level undergraduate and graduate courses in the program|
|Coding Experience||Exposure to software development or programming on the job or in the classroom||Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB|
|Motivation||Significant interest and motivation to pursue quantitative data analytic applications||Dedication for prolonged and sustained immersion into hands-on and methodological research|
In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:
|Algorithms and Applications||Tools||Working knowledge of basic software tools (command-line, GUI based, or web-services)||Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL|
|Algorithms||Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures||Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching|
|Application Domain||Data analysis experience from at least one application area, either through coursework, internship, research project, etc.||Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences|
|Data Management||Data validation & visualization||Curation, Exploratory Data Analysis (EDA) and visualization||Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)|
|Data wrangling||Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration
|Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data|
|Data infrastructure||Handling databases, web-services, Hadoop, multi-source data||Data structures, SOAP protocols, ontologies, XML, JSON, streaming|
|Analysis Methods||Statistical inference||Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling||Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression|
|Study design and diagnostics||Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates||Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction|
|Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN||Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning|