There are three fundamental requirements for earning a Graduate Data Science Certificate Program.
- Nine graduate credit hours of coursework in approved courses. These courses are designated as core and elective Methods, Technology or Applications. Only one course may be double-counted (up to 3 credits). It is recommended, but not required, that courses outside the main graduate program of study be selected to broaden the student data-science experiences (e.g., statistics students may take engineering courses, social-science students may take outside statistics and application courses, etc.).
- A Data Science related experience (3 credit semester equivalent, over 160 hours for work). This can take the form of non-credit activity like an internship, practicum, or professional project equivalent to a three credit-hour course, or additional coursework of at least three credits from the approved course list. (This course may be double-counted with another Rackham degree program.) To satisfy this “Plus Requirement” with a data-related experience, students will need to have their supervisor or mentor sign the verification form certifying that the student spent sufficient time working on a data-intense project during that practicum. Alternatively, if allowed and approved by the mentor, students may complete and submit to the DS Certificate Program Chair a report (2-6 pages) describing their experience and results, which will be evaluated to ensure the project demonstrates Data Science content, relevance and applications.
- Annual Graduate Research Symposium, which provides graduate students with an opportunity to present the results of their research in talks and poster sessions, will be required and will ensure the students’ interaction with MIDAS faculty. Ph.D. students will be encouraged to make an oral or poster presentation.
- Regular attendance of the MIDAS Seminar Series, which brings nationally recognized data scientists to U-M, is required. One semester (1-credit) enrollment in EECS 409 (MIDAS Seminar) is required (could count towards the 9 didactic credits). This colloquial training will expose students to current DS developments beyond the boundaries of their own discipline. Students will be required to attend 75% of all seminars (attendance will be taken) to complete the requirement.
In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:
|Completed Undergraduate Degree||Quantitative training and coding skills as described below||The DS certificate is a graduate program requiring a minimum level of quantitative skill|
|Quantitative Training||Undergraduate calculus, linear algebra and introduction to probability and statistics||These are the entry level skills required for most upper-level undergraduate and graduate courses in the program|
|Coding Experience||Exposure to software development or programming on the job or in the classroom||Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB|
|Motivation||Significant interest and motivation to pursue quantitative data analytic applications||Dedication for prolonged and sustained immersion into hands-on and methodological research|
In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:
|Algorithms and Applications||Tools||Working knowledge of basic software tools (command-line, GUI based, or web-services)||Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL|
|Algorithms||Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures||Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching|
|Application Domain||Data analysis experience from at least one application area, either through coursework, internship, research project, etc.||Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences|
|Data Management||Data validation & visualization||Curation, Exploratory Data Analysis (EDA) and visualization||Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)|
|Data wrangling||Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration
|Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data|
|Data infrastructure||Handling databases, web-services, Hadoop, multi-source data||Data structures, SOAP protocols, ontologies, XML, JSON, streaming|
|Analysis Methods||Statistical inference||Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling||Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression|
|Study design and diagnostics||Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates||Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction|
|Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN||Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning|