The Michigan Institute for Data Science (MIDAS) announced the awardees of its first round of Propelling Original Data Science (PODS) Grants. 15 interdisciplinary teams, chosen from 65 proposals, receive funding support for an array of exciting projects with data science as the common thread. The projects range from detecting patterns of illicit wildlife trade networks, to reducing safety threats on social media, to understanding the energy sources in the Universe. Researchers on these projects are from 9 schools and colleges across the Ann Arbor and Dearborn campuses. For this set of projects, MIDAS provides a total of $860K of funding, and cost sharing from U-M departments and faculty amounts to an additional $220K.
This round of funding strongly encourages pioneering work based on innovative concepts that promises high reward, major impact, promotion of public interest, and potential for major expansion; in other words,“disruptive” instead of incremental research. The diverse range of research that MIDAS is able to fund demonstrates the widespread enthusiasm for incorporating data science into almost every research domain. “We are funding not one, but two projects from the School for Environment and Sustainability. And for the first time we are funding researchers from Physics, Climate and Space Sciences and Engineering, Ecology and Evolutionary Biology and many other departments,” says Dr. H. V. Jagadish, MIDAS Director, “This is exactly what MIDAS would like to accomplish, to catalyze the transformative use of Data Science in a wide range of disciplines to achieve lasting societal impact.”
Previous MIDAS grants have made it possible for the research teams to form many new collaborations, formulate groundbreaking ideas, and bring to U-M more than $60 million of external funding.
The awarded PODS grants and the (co) Principal Investigators are:
- CHANGES: Collections, Heterogeneous data, and Next Generation Ecological Studies, Karen Alofs (School for Environment and Sustainability), Andrea Thomer (School of Information), Hernan Lopez-Fernandez (Ecology and Evolutionary Biology and Museum of Zoology).
- Probabilistic Methods to Infer Structure and Dynamics of Illicit Wildlife Trade Networks, Neil Carter (School for Environment and Sustainability), Abigail Jacobs (School of Information and Complex Systems, College of Literature, Science, and the Arts).
- A Data-Driven Framework for Microstructure Optimization of Additively Manufactured Piezoelectric Composites, Lei Chen (Mechanical Engineering, U-M Dearborn), Zhen Hu (Industrial and Manufacturing Engineering, U-M Dearborn).
- Fusing Physics and Deep Learning for Solar Dynamics Forecasting, David Fouhey (Electrical Engineering and Computer Science), Ward Manchester (Climate and Space Sciences and Engineering).
- Probabilistic Modeling of Missing Data to Improve Predictions Using Metabolomics Data, Christopher E. Gillies (Emergency Medicine), Kevin Ward (Emergency Medicine and Biomedical Engineering), Kathleen Stringer (Pharmacy), Xudong Fan (Biomedical Engineering).
- Decoding the Environment of Most Energetic Sources in the Universe, Oleg Y. Gnedin (Astronomy), Xun Huan (Mechanical Engineering).
- Data Science for Quantum Simulation, Emanuel Gull (Physics and Chemistry), Dominika Zgid (Chemistry).
- Towards a Framework for the Characterization of Cellular & Spatial Relationships in Development and Disease, Sue Hammoud (Human Genetics, Urology),Arvind Rao (Computational Medicine and Bioinformatics, Radiation Oncology, Biomedical Engineering).
- Database Learning: A Query Engine That Becomes Smarter Over Time, Barzan Mozafari (Computer Science and Engineering), Reza Soroushmehr (Computational Medicine and Bioinformatics).
- Optimal Design of Data Assimilation for the Prediction of Hydrological Extremes, Ashley Payne (Climate and Space Sciences and Engineering), Yulin Pan (Naval Architecture and Marine Engineering).
- DevEEG: A Robust Repository for Developmental Electroencephalogram Data, Amy Pienta (Inter-university Consortium for Political and Social Research), William Gehring (Psychology).
- Achieving ML Robustness by Leveraging Physics-based Constraints, Atul Prakash (Computer Science and Engineering), Huei Peng (Mechanical Engineering).
- Large Scale Interventions for Reducing Threats to Safety and Trustworthiness on Social Media, Sarita Schoenebeck (School of Information), Eric Gilbert (School of Information), Jenny Radesky (Pediatrics).
- Regularized Regression and Poststratification: Blending Data Science and Survey Methodology to Increase the Reproducibility of Population Genetics Research, Yajuan Si (Institute for Social Research–Statistics), Colter Mitchell (Institute for Social Research–Sociology).
- Incorporation of Multilevel Ontologies of Adverse Events and Vaccines for Vaccine Safety Surveillance, Lili Zhao (Biostatistics), Gary Freed (Pediatrics).
Data Acquisition for Data Science (DADS) supports acquisition, preparation, management, and maintenance of specialized research data sets used in current and future data science-enabled research projects across U-M.
DADS is funded through the Data Science Initiative (DSI); total funding is capped at $200,000 per year for 5 years.
DADS will be managed jointly by the Library and Advanced Research Computing (ARC), with support from ARC’s Consulting for Statistics, Computing, and Analytics Research (CSCAR), MIDAS, and ARC-Technology Services (ARC-TS) units.
Requests for DADS funding will be submitted through a web form available on Library, MIDAS, and CSCAR websites, and accepted on a rolling basis. Selection criteria and processes are detailed below.
DADS SELECTION CRITERIA
- Relevance/importance (merit, extent of user community, etc.)
- DADS requests will be reviewed on their scientific merit and potential for impacting data science-driven research across the U-M Ann Arbor campus.
- Data sets acquired through DADS should have the potential to serve a wide segment of the U-M community. Highly specialized procurement requests that only serve individual researchers are discouraged.
- Costs (product/license and ingest/processing)
- DADS funds can be used to pay licensing and acquisition fees to publishers and commercial data providers, potentially including one-time or subscription-based costs.
- Data acquired through DADS can be processed into analyzable form by CSCAR or other U-M personnel; DADS funds can cover the costs for this data processing.
- Requests can be made for DADS funds to be used to cover transfer, processing, and storage costs for data that are otherwise free to obtain (e.g., to mirror open data repositories or to aggregate data obtained through an open API).
- Usability (ease of use, analytical tools, documentation, etc.)
- Data sets acquired through DADS should be made available to the U-M community through Turbo Research Storage or other campus storage options. Costs for use of these services can be covered through DADS.
- Priority will be given to data made available with appropriate documentation and metadata. [Note: If the raw data are subject to processing, the raw data will be retained and all scripts needed to generate the processed data will be made available along with the data. Metadata pertaining to the raw data, and documentation describing any data processing that was performed will be preserved and made available along with the processed and raw data.]
- Since the data are intended to be used by multiple researchers, there is a strong preference to use open and well-documented data format standards. If the data are provided in a proprietary or unusual format (e.g., SAS or MS-Access data files), CSCAR can be contracted to convert the data to an open format.
- Restrictions (embargo, number of users, exclusive use by single requestor)
- Priority will be given to data made openly available to U-M researchers, possibly within the constraints of dataset license and data use agreement.
- DADS funds should not be used for data management pertaining to new data produced at U-M.
- Restricted data, e.g., in which each user needs individual permission from the data provider to access the data, is eligible for this program provided that there is a clear process for additional users to obtain access.
TO REQUEST FUNDING
To request funding from DADS, fill out this form. Requesters will be asked to provide:
- Description of data set: domain, size, format, metadata and documentation, licensing and usage restrictions, raw or processed, required analysis tools, etc;
- Data source: vendor, publisher, foundation, government, web, research, etc;
- Intended use and community: requests must indicate the community of users that can be supported by the data resources while maintaining licensing, security, and other data restrictions as outlined in any applicable license or data use agreements;
- (if applicable) Data processing requirements;
- (if applicable) Hosting preference (Turbo, etc.);
- Estimated cost for acquisition and steps 4-5.
Requests will be accepted on a rolling basis. Questions can be directed to email@example.com. Unit-specific questions can also be sent to:
Library: Catherine Morse, firstname.lastname@example.org
ARC-TS: Brock Palen, email@example.com
Requests will be reviewed by a DADS committee comprised of Library and ARC personnel.
Below is a selection of grants possibly relevant to data science researchers. MIDAS and the other units under Advanced Research Computing can provide technical and other assistance for U-M researchers interested in applying for grants. Please email firstname.lastname@example.org for more information.