Academic Requirements
There are three fundamental requirements for earning a Graduate Data Science Certificate.
There are three fundamental requirements for earning a Graduate Data Science Certificate.
In order to enroll in the MIDAS Data Science Certificate Program, the following prerequisites are required:
Enrollment Prerequisites:
Prerequisites | Skills | Rationale |
Completed Undergraduate Degree | Quantitative training and coding skills as described below | The DS certificate is a graduate program requiring a minimum level of quantitative skill |
Quantitative Training | Undergraduate calculus, linear algebra and introduction to probability and statistics | These are the entry level skills required for most upper-level undergraduate and graduate courses in the program |
Coding Experience | Exposure to software development or programming on the job or in the classroom | Most DS practitioners need substantial experience with Java, C/C++, HTML5, Python, PHP, SQL/DB |
Motivation | Significant interest and motivation to pursue quantitative data analytic applications | Dedication for prolonged and sustained immersion into hands-on and methodological research |
In order to obtain the Data Science Certificate, moderate competency is 2 of each of the 3 competency areas below is required:
Completion Competencies:
Areas | Competency | Expectation | Notes | |
Algorithms and Applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL | |
Algorithms | Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | ||
Application Domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences | ||
Data Management | Data validation & visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) | |
Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration
|
Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Lynux time vs. timestamps, structured vs. unstructured data | ||
Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | ||
Analysis Methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression | |
Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | ||
Machine Learning | Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |