Ceren Budak, U-M SI assistant professor and MIDAS researcher, is among one of the first research teams to have access to anonymous data from Facebook. She will be studying social media’s impact on democracy in the United States. The study will look at how sharing behaviors on Facebook are affected by changes Facebook makes to the platform. More information can be found here: https://www.si.umich.edu/news/university-michigan-researcher-among-first-study-facebook-data.
ICPSR led by Maggie Levenstein, a MIDAS faculty member, was awarded the 2019 National Medal for Museum and Library Service. This award is given to institutions that provide dynamic programming and services exceeding the expected levels. More information may be found here: https://mailchi.mp/umich/icpsr-recognized-as-a-2019-recipient-of-nations-highest-museum-and-library-honor?e=4d027b48ec.
This research was supported by funding from the Michigan Institute for Data Science.
The Effect of Social Interaction on Facilitating Audience Participation in a Live Music Performance
ACM, June 23-26, 2019
Sang Won Lee, Aaron Willette, Danai Koutra, Walter S. Lasecki
Facilitating audience participation in a music performance brings with it challenges in involving non-expert users in large-scale collaboration. A musical piece needs to be created live, over a short period of time, with limited communication channels. To address this challenge, we propose to incorporate social interaction through mobile music instruments that the audience is given to play with, and examine how this feature sustains and affects the audience involvement. We test this idea with an audience participation music system, Crowd in C. We realized a participation-based musical performance with the system and validated our approach by analyzing the interaction traces of the audience at a performance. The result indicates that the audience members were actively engaged throughout the performance, with multiple layers of social interaction available in the system. We also present how the social interactivity among the audience shaped their interaction in the music making process.
A new textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R provides a solid Data Science foundation and identifies challenges, opportunities, and strategies for designing, collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets. It focuses on active-learning by integrating driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference. The material builds scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. The resources include well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.
The increasing popularity of this textbook, authored by a MIDAS Faculty, Ivo D. Dinov, and part of the Springer Computer Science Series, is evidenced by the number of readers and downloads. Compared to other computer science books, in 2018 this textbook had > 18,000 downloads vs. 14,000 for the average textbook, and for the first quarter of 2019, it had > 4,400 downloads compared to an average of 1,400 downloads. In addition, the book’s website (http://dspa.predictive.space) has 100,000’s of online readers, who have free access to all learning modules, assignments, videos, software tools, and case studies.
Cooper M. Stansbury, M.S. Data Science Candidate was awarded UM Dearborn’s 2019 Scholars Award for M.S. Data Science. He is one of the MIDAS challenge award graduate students on the MiChamp team. The awards have not been posted yet at UM Deaborn but information may be found here: https://umdearborn.edu/students/honor-scholars.
The MIDAS-supported project “Understanding How the Brain Processes Music through the Bach Trio Sonatas” was recently featured in Science Node and the Michigan Daily. The team also presented this project at Hill Auditorium on April 1 to a large and enthusiastic audience.
A paper co-authored by University of Michigan School of Information research assistant professor Christopher Brooks received the Best Full Research Paper Award at the International Conference on Learning Analytics & Knowledge (LAK) Conference in Tempe, Arizona. The award was announced on the final day of the conference, March 7, 2019.
The paper, “Evaluating the Fairness of Predictive Student Models Through Slicing Analysis,” describes a tool designed to test the bias in algorithms used to predict student success.
The goal of the paper, Brooks says, was to evaluate whether the algorithms used to predict whether students would succeed in massive online courses (MOOCs) was skewed by the gender makeup of the classes.
“We were able to find that some have more bias than others do,” says Brooks. “First we were able to show that different MOOCs tend to have different bias in gender representation inside of the MOOCs.”
HDDA: DataSifter: statistical obfuscation of electronic health records and other sensitive datasets
Journal of Statistical Computation and Simulation
11 Nov. 2018
There are no practical and effective mechanisms to share high-dimensional data including sensitive information in various fields like health financial intelligence or socioeconomics without compromising either the utility of the data or exposing private personal or secure organizational information. Excessive scrambling or encoding of the information makes it less useful for modelling or analytical processing. Insufficient preprocessing may compromise sensitive information and introduce a substantial risk for re-identification of individuals by various stratification techniques. To address this problem, we developed a novel statistical obfuscation method (DataSifter) for on-the-fly de-identification of structured and unstructured sensitive high-dimensional data such as clinical data from electronic health records (EHR). DataSifter provides complete administrative control over the balance between risk of data re-identification and preservation of the data information. Simulation results suggest that DataSifter can provide privacy protection while maintaining data utility for different types of outcomes of interest. The application of DataSifter on a large autism dataset provides a realistic demonstration of its promise practical applications.
Prof. Laura Balzano received an NSF CAREER award to support research that aims to improve the use of machine learning in big data problems involving elaborate physical, biological, and social phenomena. The project, called “Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering,” is expected to have broad applicability in data science.
Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data, says Balzano. Typically the data is broken down in one of two ways. Dimensionality-reduction uses an algorithm to break down high-dimensional data into low-dimensional structure that is most relevant to the problem being solved. Clustering, on the other hand, attempts to group pieces of data into meaningful clusters of information.
However, explains Balzano, “as increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable, yet hard to find.”
Balzano plans to develop techniques that combine the two key approaches used in machine learning to decipher data, while being applicable to data that is considered “messy.” Messy data is data that has missing elements, may be somewhat corrupted, or is filled heterogeneous information – in other words, it describes most data sets in today’s world.
Balzano is an affiliated faculty member of both the Michigan Institute for Data Science (MIDAS) and the Michigan Institute for Computational Discovery and Engineering (MICDE). She is part of a MIDAS-supported research team working on single-cell genomic data analysis.
Who’s Tweeting About the President? What Big Survey Data Can Tell Us About Digital Traces
January 21, 2019 Social Science Computer Review
Josh Pasek, Colleen A. McClain, Frank Newport, Stephanie Marken
Researchers hoping to make inferences about social phenomena using social media data need to answer two critical questions: What is it that a given social media metric tells us? And who does it tell us about? Drawing from prior work on these questions, we examine whether Twitter sentiment about Barack Obama tells us about Americans’ attitudes toward the president, the attitudes of particular subsets of individuals, or something else entirely. Specifically, using large-scale survey data, this study assesses how patterns of approval among population subgroups compare to tweets about the president. The findings paint a complex picture of the utility of digital traces. Although attention to subgroups improves the extent to which survey and Twitter data can yield similar conclusions, the results also indicate that sentiment surrounding tweets about the president is no proxy for presidential approval. Instead, after adjusting for demographics, these two metrics tell similar macroscale, long-term stories about presidential approval but very different stories at a more granular level and over shorter time periods.