The Challenge Initiatives Program will bring data scientists and domain experts at the University of Michigan together to solve real world problems in Transportation Research, Learning Analytics, Personalized Medicine and Health, and Social Science.  By devoting resources to these existing strengths at U-M under a data-science focus, MIDAS’ goal is to produce extraordinary projects with major social impact.  Projects supported through the Challenge Initiatives Program will highly leverage Data Science Services and Infrastructure at the university.

Challenge Domains

  • Transportation — Enable the development of connected and automated vehicle systems by rapidly processing vast amounts of data from thousands of individual vehicles across an entire region.
  • Health — Improve personalized patient care through data-driven biomedical and health care research.
  • Learning Analytics — Tailor instruction to the needs of individual students by examining detailed data on the outcomes of thousands of student activities and experiences.
  • Social Science — Address a variety socioeconomic questions by tapping the big data generated by social media.

Cross-cutting Methodologies

  • Analytics and Visualization of Complex Data — We are entering an era where data dimension and complexity will only permit local caching of a small portion of the datasets of interest. New analytical tools to facilitate networked single-user and collaborative visualization of massive multi-modality datasets must be developed.
  • Machine Learning-enabled Analytics — Statistical machine learning approaches are needed that can automatically detect and extract deeply embedded patterns hidden in massive datasets and correlate them to outcomes of interest to the analyst.  Machine learning methods such as dictionary learning, reinforcement learning, similarity learning, and transfer learning must be scalable to massive data scales.
  • Temporal, Multi-Scale and Statistical Models — Mathematical, computational and statistical models are needed to integrate multimodal data collected at many different time and length scales. When combined with empirical data, computational physics, computational biology, and computational social science models can facilitate such integration.
  • Integration of Heterogeneous Data — Many different types of complementary data will be collected within and across the four challenge thrusts, including: numerical data, symbolic data, structured data, and streaming data.  Methods must be developed for reliable integration of these data types at various stages of the analysis pipeline.
  • Data Scrubbing, Wrangling and Provenance Tracking — It has been said that currently more than 90% of the analysis pipeline has been devoted to time consuming data  preparation steps such as normalization, calibration, outlier treatment, and provenance tracking. The practice of data science will benefit from new tools that can automate or semi-automate this process at large scales.
  • Data Privacy and Cybersecurity — As data about individuals becomes more and more pervasive it has become essential to build different degrees of privacy and security into data storage, management, and analysis methodologies.  The tradeoffs between data privacy/security and data utility must be understood in the context of the specific application, e.g., medicine, transportation, or business analytics.