All Posts By

Jenny Lee

Data Science and Predictive Analytics Textbook

By | Research

A new textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R provides a solid Data Science foundation and identifies challenges, opportunities, and strategies for designing, collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets. It focuses on active-learning by integrating driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference. The material builds scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. The resources include well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.

The increasing popularity of this textbook, authored by a MIDAS Faculty, Ivo D. Dinov, and part of the Springer Computer Science Series, is evidenced by the number of readers and downloads. Compared to other computer science books, in 2018 this textbook had > 18,000 downloads vs. 14,000 for the average textbook, and for the first quarter of 2019, it had > 4,400 downloads compared to an average of 1,400 downloads. In addition, the book’s website (http://dspa.predictive.space) has 100,000’s of online readers, who have free access to all learning modules, assignments, videos, software tools, and case studies.

Who’s Tweeting About the President? What Big Survey Data Can Tell Us About Digital Traces

By | Research

Title
Who’s Tweeting About the President? What Big Survey Data Can Tell Us About Digital Traces

Published in
January 21, 2019 Social Science Computer Review

Authors
Josh Pasek, Colleen A. McClain, Frank Newport, Stephanie Marken

Abstract
Researchers hoping to make inferences about social phenomena using social media data need to answer two critical questions: What is it that a given social media metric tells us? And who does it tell us about? Drawing from prior work on these questions, we examine whether Twitter sentiment about Barack Obama tells us about Americans’ attitudes toward the president, the attitudes of particular subsets of individuals, or something else entirely. Specifically, using large-scale survey data, this study assesses how patterns of approval among population subgroups compare to tweets about the president. The findings paint a complex picture of the utility of digital traces. Although attention to subgroups improves the extent to which survey and Twitter data can yield similar conclusions, the results also indicate that sentiment surrounding tweets about the president is no proxy for presidential approval. Instead, after adjusting for demographics, these two metrics tell similar macroscale, long-term stories about presidential approval but very different stories at a more granular level and over shorter time periods.

3D Shape Modeling for Cell Nuclear Morphological Analysis and Classification

By | Research

Title
3D Shape Modeling for Cell Nuclear Morphological Analysis and Classification

Published in
Scientific Reports 8, October 2018

DOI
10.1038/s41598-018-33574-w

Authors
Alexandr A. Kalinin, Ari Allyn-Feuer, Alex Ade, Gordon-Victor Fon, Walter Meixner, David Dilworth, Syed S. Husain, Jeffrey R. de Wet, Gerald A. Higgins, Gen Zheng, Amy Creekmore, John W. Wiley, James E. Verdone, Robert W. Veltri, Kenneth J. Pienta, Donald S. Coffey, Brian D. Athey & Ivo D. Dino

Abstract
Quantitative analysis of morphological changes in a cell nucleus is important for the understanding of nuclear architecture and its relationship with pathological conditions such as cancer. However, dimensionality of imaging data, together with a great variability of nuclear shapes, presents challenges for 3D morphological analysis. Thus, there is a compelling need for robust 3D nuclear morphometric techniques to carry out population-wide analysis. We propose a new approach that combines modeling, analysis, and interpretation of morphometric characteristics of cell nuclei and nucleoli in 3D. We used robust surface reconstruction that allows accurate approximation of 3D object boundary. Then, we computed geometric morphological measures characterizing the form of cell nuclei and nucleoli. Using these features, we compared over 450 nuclei with about 1,000 nucleoli of epithelial and mesenchymal prostate cancer cells, as well as 1,000 nuclei with over 2,000 nucleoli from serum-starved and proliferating fibroblast cells. Classification of sets of 9 and 15 cells achieved accuracy of 95.4% and 98%, respectively, for prostate cancer cells, and 95% and 98% for fibroblast cells. To our knowledge, this is the first attempt to combine these methods for 3D nuclear shape modeling and morphometry into a highly parallel pipeline workflow for morphometric analysis of thousands of nuclei and nucleoli in 3D.

The effectiveness of parking policies to reduce parking demand pressure and car use

By | Research

This study is a part of the “Reinventing Transportation and Urban Mobility” project, funded by the Michigan Institute for Data Science.

Title
The effectiveness of parking policies to reduce parking demand pressure and car use

Published in
Transport Policy, January 2019

DOI
10.1016/j.tranpol.2018.10.009

Authors
Xiang Yan, Jonathan Levine, Robert Marans

Abstract
Evaluating the effectiveness of parking policies to relieve parking demand pressure in central areas and to reduce car use requires an investigation of traveler responses to different parking attributes, including the money and time costs associated with parking. Existing parking studies on this topic are inadequate in two ways. First, few studies have modeled parking choice and mode choice simultaneously, thus ignoring the interaction between these two choice realms. Second, existing studies of travel choice behavior have largely focused on the money cost of parking while giving less attention to non-price-related variables such as parking search time and egress time from parking lot to destination. To address these issues, this paper calibrates a joint model of travel mode and parking location choice, using revealed-preference survey data on commuters to the University of Michigan, Ann Arbor, a large university campus. Key policy variables examined include parking cost, parking search time, and egress time. A comparison of elasticity estimates suggested that travelers were very sensitive to changes in egress time, even more so than parking cost, but they were less sensitive to changes in search time. Travelers responded to parking policies primarily by shifting parking locations rather than switching travel mode. Finally, our policy simulation results imply some synergistic effects between policy measures; that is, when pricing and policy measures that reduce search and egress time are combined, they shape parking demand more than the sum of their individual effects if implemented in isolation.

VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies

By | Research

This research was supported by funding from the Michigan Center for Single-Cell Genomic Data Analytics—a part of the Michigan Institute for Data Science.

Title
VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies

Published in
Genome Biology, November 12, 2018

DOI
10.1186/s13059-018-1575-1

Authors
Mengie Chen and Xiang Zhou

Abstract
We develop a method, VIPER, to impute the zero values in single-cell RNA sequencing studies to facilitate accurate transcriptome quantification at the single-cell level. VIPER is based on nonnegative sparse regression models and is capable of progressively inferring a sparse set of local neighborhood cells that are most predictive of the expression levels of the cell of interest for imputation. A key feature of our method is its ability to preserve gene expression variability across cells after imputation. We illustrate the advantages of our method through several well-designed real data-based analytical experiments.

TAIJI: Approaching Experimental Replicates-Level Accuracy for Drug Synergy Prediction

By | Research

MIDAS-affiliated researchers recently published a paper on accurate and fast computational tools for predicting drug synergistic effects.

Title
TAIJI: Approaching Experimental Replicates-Level Accuracy for Drug Synergy Prediction

Published in
Bioinformatics, November 21, 2018

DOI
10.1093/bioinformatics/bty955

Authors
Hongyang Li, Shuai Hu, Nouri Neamati, Yuanfang Guan

Abstract

Motivation

Combination therapy is widely used in cancer treatment to overcome drug resistance. High-throughput drug screening is the standard approach to study the drug combination effects, yet it becomes impractical when the number of drugs under consideration is large. Therefore, accurate and fast computational tools for predicting drug synergistic effects are needed to guide experimental design for developing candidate drug pairs.

Results

Here, we present TAIJI, a high-performance software for fast and accurate prediction of drug synergism. It is based on the winning algorithm in the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge, which is a unique platform to unbiasedly evaluate the performance of current state-of-the-art methods, and includes 160 team-based submission methods. When tested across a broad spectrum of 85 different cancer cell lines and 1089 drug combinations, TAIJI achieved a high prediction correlation (0.53), approaching the accuracy level of experimental replicates (0.56). The runtime is at the scale of minutes to achieve this state-of-the-field performance.