The National Academies Committee on Applied and Theoretical Statistics has released proceedings from its June 2016 workshop titled “Refining the Concept of Scientific Inference When Working with Big Data,” co-chaired by Alfred Hero, MIDAS co-director and the John H Holland Distinguished University Professor of Electrical Engineering and Computer Science.
The workshop explored four key issues in scientific inference:
Inference about causal discoveries driven by large observational data
Inference about discoveries from data on large networks
Inference about discoveries based on integration of diverse datasets
Inference when regularization is used to simplify fitting of high-dimensional models.
The workshop brought together statisticians, data scientists and domain researchers from different biomedical disciplines in order to identify new methodological developments that hold significant promise, and to highlight potential research areas for the future. It was partially funded by the National Institutes of Health Big Data to Knowledge Program, and the National Science Foundation Division of Mathematical Sciences.
The Michigan Data Science Team and the Michigan Student Symposium for Interdisciplinary Statistical Sciences (MSSISS) have partnered with the City of Detroit on a data challenge that seeks to answer the question: How can blight ticket compliance be increased?
An organizational meeting is scheduled for Thursday, Feb. 16 at 5:30 p.m. in EECS 1200.
The city is making datasets available containing building permits, trades permits, citizens complaints, and more.
From left, Al Hero, U-M; Patrick Wolfe, UCL; and Brian Athey, U-M signed an agreement for research and educational cooperation between the University of Michigan and University College London.
ANN ARBOR, MI and LONDON — The Michigan Institute of Data Science (MIDAS) at the University of Michigan and the Centre for Data Science and Big Data Institute at UCL (University College London) have signed a five-year agreement of scientific and academic cooperation.
The agreement sets the stage for collaborative research projects between faculty of both institutions; student exchange opportunities; and visiting scholar arrangements, among other potential partnerships.
“There is a lot of common ground in what we do,” said Patrick Wolfe, Executive Director of UCL’s Centre for Data Science and Big Data Institute. “Both MIDAS and UCL cover the full spectrum of data science domains, from smart cities to healthcare to transportation to financial services, and both promote cross-cutting collaboration between scientific disciplines.”
Alfred Hero, co-director of MIDAS and professor of Electrical Engineering and Computer Science at U-M, said that one of the original goals of the institute when it was founded in 2015 under U-M’s $100 million Data Science Initiative was to reach out to U.S. and international partners.
“It seemed very natural that this would be the next step,” Hero said, adding that it would complement MIDAS’s recent partnership with the Shenzhen Research Institute of Big Data in China. “UCL epitomizes the collaboration, multi-disciplinarity and multi-institutional involvement that we’re trying to establish in our international partnerships.”
Wolfe visited Ann Arbor in early January to sign a memorandum of understanding along with Hero and Brian Athey, professor of bioinformatics and the other MIDAS co-director.
The agreement lists several potential areas of cooperation, including:
FLINT—A partnership between Google and the University of Michigan’s Flint and Ann Arbor campuses aims to provide a smartphone app and other digital tools to Flint residents and officials to help them manage the ongoing water crisis.
The app and other tools will help predict where lead levels will be highest in the city’s water, and they’ll pull together information and resources to make the crisis easier to navigate for those affected. The project is made possible by a $150,000 grant from Google.
“This investment by Google is an outstanding commitment to our community. It creates an ideal combination of an industry powerhouse with faculty expertise. It will create new opportunities for students and continue building community partnerships—all so that we can provide quick and critically important information and analysis for our community as we move forward,” said Chancellor Susan E. Borrego of the University of Michigan-Flint.
The Android app is slated for roll-out this summer. It could help residents determine whether their homes are at high risk of having lead-contaminated water. It could also help them locate day-to-day resources for lead testing, water distribution, water bottle recycling, water filters, and volunteer opportunities. A website will offer similar resources and will be accessible on any computer, including those in public libraries.
Additional web-based tools for researchers and government officials could provide detailed insight on how to deploy repairs and resources. For example, they could help identify and prioritize the water service line replacements.
A student team at UM-Flint has already developed a prototype smartphone app for Flint residents. Google and U-M Ann Arbor will work with them through the spring and summer to add mapping features that use predictive analytics from U-M Ann Arbor’s Michigan Data Science Team. The team will also develop an improved user interface with assistance from Google.
Google has pledged a variety of resources to the project including a grant and remote and on-site assistance from its user experience and app development team. The company will also donate data resources to the Michigan Data Science Team including mapping, satellite imagery, and geo-location data.
Initial work by the data science team has already shown some success at predicting which homes and neighborhoods have a high risk of lead contamination. In the coming months, they’ll continue to apply predictive algorithms and machine learning techniques to data from a wide variety of sources including Google, the State of Michigan and the City of Flint. The data includes existing lead testing data; detailed information on the type and location of water infrastructure; and information on the size, age, type, and condition of every parcel of property in the city.
“There’s a lot of data on the water crisis, but it’s scattered over many different agencies and places,” said Jacob Abernethy, an assistant professor of computer science and engineering at U-M Ann Arbor and faculty advisor to the Michigan Data Science Team. “By organizing it in one place and analyzing it, we can predict which areas are likely to be at risk. We can help planners determine which infrastructure repairs will benefit the most residents, and how to allocate resources like bottled water most efficiently.”
Google and U-M also plan to create a separate set of web tools for city planners and other officials. They will include extensive mapping and predictive analytics, with details on waterline type and location and other infrastructure data.
Mark Allison is an assistant professor of computer science at UM-Flint and the faculty leader of the Flint student team. He says the project will be an opportunity for students to make a difference in the water crisis and pick up valuable real-world development experience along the way.
“Finding the best way to put resources close to where high lead levels are is a big part of managing this crisis, and it’s the kind of problem that analytics can solve. We also want to give residents more transparency by making it easier for anyone to get access to the most up-to-date information,” Allison said. “I think this project will be transformative. And for all of us here in Flint, it’s about much more than grades.”
Allison said the team is working to keep the tools they develop flexible, enabling them to be used by other cities that face similar crises. His team is developing the tools as part of UM-Flint Computer Science’s community-based learning program, which puts students to work on real-world challenges in and around Flint.
The Michigan Data Science Team is a competitive extra-curricular team at U-M Ann Arbor. Founded by Abernethy, the team builds and applies advanced computer algorithms that can analyze and “learn” from large sets of data. By finding connections and patterns within that data, they can make predictions about future events. The techniques are already widely used in areas like online retailing and advertising.
“Access to clean drinking water is a concern all over the world, but in the United States it’s often a foregone conclusion. That is not the case recently for the residents of Flint, Michigan,” said Mike Miller, head of Google Michigan. “I am proud that we can contribute to help with the recovery of and we hope we can help to support a resolution to this crisis and get the residents of Flint the resources and respect they so rightly deserve.”
The Flint Water crisis began after April of 2014, when the city’s drinking water source was changed from Lake Huron via Detroit’s water system to the Flint River. The water supply was not properly monitored for corrosion control and it caused lead to leach from service lines into the city’s drinking water. While the city has since switched its water supply back to the Detroit system, residents are still being advised not to drink unfiltered tap water.
Five research projects — three in health and two in social science — have been awarded funding in the second round of the Michigan Institute for Data Science Challenge Initiative program.
The projects will receive funding from MIDAS as part of the Data Science Initiative announced in fall 2015.
The goal of the multiyear MIDAS Challenge Initiatives program is to foster data science projects that have the potential to prompt new partnerships between U-M, federal research agencies and industry. The challenges are focused on four areas: transportation, learning analytics, social science and health science. For more information, visit midas.umich.edu/challenges.
The projects, determined by a competitive submission process, are:
Title: Michigan Center for Single-Cell Genomic Data Analysis Description: The center will establish methodologies to analyze sparse data collected from single-cell genome sequencing technologies. The center will bring together experts in mathematics, statistics and computer science with biomedical researchers. Lead researchers: Jun Li, Department of Human Genetics; Anna Gilbert, Mathematics Research team: Laura Balzano, Electrical Engineering and Computer Science; Justin Colacino, Environmental Health Sciences; Johann Gagnon-Bartsch, Statistics; Yuanfang Guan, Computational Medicine and Bioinformatics; Sue Hammoud, Human Genetics; Gil Omenn, Computational Medicine and Bioinformatics; Clay Scott, Electrical Engineering and Computer Science; Roman Vershynin, Mathematics; Max Wicha, Oncology.
Title: From Big Data to Vital Insights: Michigan Center for Health Analytics and Medical Prediction (M-CHAMP) Description: The center will house a multidisciplinary team that will confront a core methodological problem that currently limits health research — exploiting temporal patterns in longitudinal data for novel discovery and prediction. Lead researchers: Brahmajee Nallamothu, Internal Medicine; Ji Zhu, Statistics; Jenna Wiens, Electrical Engineering and Computer Science; Marcelline Harris, Nursing. Research team: T. Jack Iwashyna, Internal Medicine; Jeffrey McCullough, Health Management and Policy (SPH); Kayvan Najarian, Computational Medicine and Bioinformatics; Hallie Prescott, Internal Medicine; Andrew Ryan, Health Management and Policy (SPH); Michael Sjoding, Internal Medicine; Karandeep Singh, Learning Health Sciences (Medical School); Kerby Shedden, Statistics; Jeremy Sussman, Internal Medicine; Vinod Vydiswaran, Learning Health Sciences (Medical School); Akbar Waljee, Internal Medicine.
Title: Identifying Real-Time Data Predictors of Stress and Depression Using Mobile Technology Description: Using an app platform that integrates signals from both mobile phones and wearable sensors, the project will collect data from over 1,000 medical interns to identify the dynamic relationships between mood, sleep and circadian rhythms. These relationships will be utilized to inform the type and timing of personalized data feedback for a mobile micro-randomized intervention trial for depression under stress.
Lead researchers: Srijan Sen, Psychiatry; Margit Burmeister, Molecular and Behavioral Neuroscience. Research team: Lawrence An, Internal Medicine; Amy Cochran, Mathematics; Elena Frank, Molecular and Behavioral Neuroscience; Daniel Forger, Mathematics; Thomas Insel (Verily Life Sciences); Susan Murphy, Statistics; Maureen Walton, Psychiatry; Zhou Zhao, Molecular and Behavioral Neuroscience.
Title: Computational Approaches for the Construction of Novel Macroeconomic Data Description: This project will develop an economic dataset construction system that takes as input economic expertise as well as social media data; will deploy a data construction service that hosts this construction tool; and will use this tool and service to build an “economic datapedia,” a compendium of user-curated economic datasets that are collectively published online. Lead researcher: Matthew Shapiro, Department of Economics Research team: Michael Cafarella, Computer Science and Engineering; Jia Deng, Electrical Engineering and Computer Science; Margaret Levenstein, Inter-university Consortium for Political and Social Research.
Title: A Social Science Collaboration for Research on Communication and Learning based upon Big Data Description: This project is a multidisciplinary collaboration meant to introduce social scientists, computer scientists and statisticians to the methods and theories of engaging observational data and the results of structured data collections in two pilot projects in the area of political communication and one investigating parenting issues. The projects involve the integration of geospatial, social media and longitudinal data. Lead researchers: Michael Traugott, Center for Political Studies, ISR; Trivellore Raghunathan, Biostatistics Research team: Leticia Bode, Communications, Georgetown University; Ceren Budak, U-M School of Information; Pamela Davis-Keane, U-M Psychology, ISR; Jonathan Ladd, Public Policy, Georgetown; Zeina Mneimneh, U-M Survey Research Center; Josh Pasek, U-M Communications; Rebecca Ryan, Public Policy, Georgetown; Lisa Singh, Public Policy, Georgetown; Stuart Soroka, U-M Communications.
The Michigan Institute for Data Science (MIDAS) will hold a faculty meeting at noon on Thursday, January 19 (Suite 7625, School of Public Health I, 1415 Washington Heights) for the NSF 17-534 “Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA)” solicitation.
The meeting will include an overview of the NSF solicitation, U-M Data Science Resources (MIDAS, CSCAR, ARC-TS) available to faculty responding to the NSF call, and an opportunity to network with other faculty.
MIDAS has also arranged for Sylvia Spengler, NSF CISE Program Director, to be available at 1:30 pm to answer questions regarding the BIGDATA solicitation.
We invite you to participate in the faculty meeting to share your ideas and interest in responding to this BIGDATA solicitation as well as interact with other faculty looking to respond to this funding mechanism.
For those unable to participate in person, you can join virtually using GoToMeeting:
Sharon Broude Geva, the Director of Advanced Research Computing at the University of Michigan, has been elected vice-chair of the Coalition for Academic Scientific Computation (CASC).
Founded in 1989, CASC advocates for the use of advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success. The organization’s members represent 83 institutions of higher education and national labs.
The vice-chair position is one of four elected CASC executive officers. The officers work closely as a team with the director of CASC. The vice-chair also leads CASC meeting program committees, is responsible for recruitment of new members, substitutes for the chair in his or her absences, and assists with moderating CASC meetings.
Geva served as CASC secretary in 2015 and 2016. Her term as vice-chair is effective for the 2017 calendar year.
The other executive officers for 2017 are are Rajendra Bose, Chair, Columbia University; Neil Bright, Secretary, Georgia Institute of Technology; and Andrew Sherman, Treasurer, Yale University. Curt Hillegas of Princeton University is immediate past chair.
A mobile app and website built for the city of Flint is available now to help the community and government agencies manage the ongoing water crisis.
Mywater-Flint, for Android and online at Mywater-flint.com, was developed by computer science researchers at the University of Michigan’s Flint and Ann Arbor campuses and funded by Google.org. Through it, residents and city employees can:
Access a citywide map of where lead has been found in drinking water.
Discover where service line workers have replaced infrastructure that connects. homes to the water main, and where they’re currently working.
Locate the nearest distribution centers for water and water filters.
Find step-by-step instructions for water testing.
Determine the likelihood that the water in a home or another location is contaminated, among other features.
Several University of Michigan researchers and research IT staff made presentations at the SC16 conference in Salt Lake City Nov. 13-17. Material from many of the talks is now available for viewing online:
Shawn McKee (Physics) and Ben Meekhof (ARC-TS) presented a demonstration of the Open Storage Research Infrastructure (OSiRIS) project at the U-M booth. The demonstration extended the OSiRIS network from its participating institutions in Michigan to the conference center in Utah. Meekhof also presented at a”Birds of a Feather” session on Ceph in HPC environments. More information, including slides, is available on the OSiRIS website.
Todd Raeker (ARC-TS) made a presentation on ConFlux, U-M’s new computational physics cluster, at the NVIDIA booth. Slides and video are available.
Nilmini Abeyratne, a Ph.D student in computer science, presented her project “Low Design-Risk Checkpointing Storage Solution for Exascale Supercomputers” at the Doctoral Showcase. A summary, slides, and poster can be viewed on the SC16 website.
Jeremy Hallum (ARC-TS) presented information on the Yottabyte Research Cloud at the U-M booth. His slides are available here.