Libby Hemphill

By |

Dr. Hemphill studies conversations in social media and aims to promote just access to social media spaces and their data. She uses computational approaches to modeling political topics, predicting and addressing toxicity in online discussions, and tracing linguistic adaptations among extremists. She also studies digital data curation and is especially interested in ways to measure and model data reuse so that we can make informed decisions about how to allocate data resources.

Sol Bermann

By |

I am interested in the intersection of big data, data science, privacy, security, public policy, and law. At U-M, this includes co-convening the Dissonance Event Series, a multi-disciplinary collaboration of faculty and graduate students that explore the confluence of technology, policy, privacy, security, and law. I frequently guest lecture on these subject across campus, including at the School of Information, Ford School of Public Policy, and the Law School.

Hernán López-Fernández

By |

I am interested in the evolutionary processes that originate “mega-diverse” biotic assemblages and the role of ecology in shaping the evolution of diversity. My program studies the evolution of Neotropical freshwater fishes, the most diverse freshwater fish fauna on earth, with an estimate exceeding 7,000 species. My lab combines molecular phylogenetics and phylogeny-based comparative methods to integrate ecology, functional morphology, life histories and geography into analyses of macroevolutionary patterns of freshwater fish diversification. We are also comparing patterns of diversification across major Neotropical fish clades. Relying on fieldwork and natural history collections, we use methods that span

Andrea Thomer

By |

Andrea Thomer is an assistant professor of information at the University of Michigan School of Information. She conducts research in the areas of data curation, museum informatics, earth science and biodiversity informatics, information organization, and computer supported cooperative work. She is especially interested in how people use and create data and metadata; the impact of information organization on information use; issues of data provenance, reproducibility, and integration; and long-term data curation and infrastructure sustainability. She is studying a number of these issues through the “Migrating Research Data Collections” project – a recently awarded Laura Bush 21st Century Librarianship Early Career Research Grant from the Institute of Museum and Library Services. Dr. Thomer received her doctorate in Library and Information Science from the School of Information Sciences at the University of Illinois at Urbana‐Champaign in 2017.

John E Marcotte

By |

John E. Marcotte, PhD is a statistician and data security expert. His research concerns data sharing, data security, data management, disclosure, health policy, nursing staffing and patient outcomes. He has over 25 years of experience implementing computing systems and performing quantitative analysis. During his career, Marcotte has served as a quantitative researcher, biostatistician, data archivist, data security officer and computing director. Among Marcotte’s statistical fortes are linear and logistic regression, survival analysis and sampling while his computing specialties include secure systems, high performance systems and numerical methods. He has collaborated with social and natural scientists as well as nurses and physicians. Marcotte regularly presents at professional conferences and contributes to invited panels on data security and disclosure. He has formal training in Demography, Statistics and Computer Science.

Research Data Security Options

Aaron A. King

By |

The long temporal and large spatial scales of ecological systems make controlled experimentation difficult and the amassing of informative data challenging and expensive. The resulting sparsity and noise are major impediments to scientific progress in ecology, which therefore depends on efficient use of data. In this context, it has in recent years been recognized that the onetime playthings of theoretical ecologists, mathematical models of ecological processes, are no longer exclusively the stuff of thought experiments, but have great utility in the context of causal inference. Specifically, because they embody scientific questions about ecological processes in sharpest form—making precise, quantitative, testable predictions—the rigorous confrontation of process-based models with data accelerates the development of ecological understanding. This is the central premise of my research program and the common thread of the work that goes on in my laboratory.

Harm Derksen

By |

Current research includes a project funded by Toyota that uses Markov Models and Machine Learning to predict heart arrhythmia, an NSF-funded project to detect Acute Respiratory Distress Syndrome (ARDS) from x-ray images and projects using tensor analysis on health care data (funded by the Department of Defense and National Science Foundation).

Brian Perron

By |

Brian E. Perron, Ph.D., is an Associate Professor at the University of Michigan’s School of Social Work. Dr. Perron received his Ph.D. from Washington University in St. Louis and a specialization in Data Science from Johns Hopkins University. Dr. Perron has extensive experience in services research for persons with mental health and substance use disorders. His research (NCBI, Google Scholar) has been supported by the National Institutes of Health, Department of Veterans Affairs, and the State of Michigan. He recently published books on the topics of measurement (Oxford University Press) and social work practice (Sage Publications). Dr. Perron’s recent work focuses on helping community-based organizations more effectively use administrative data to improve service delivery and other business processes.This includes developing user-friendly and sustainable data management systems; using data visualizations to facilitate interpretation of data, especially for non-technical users; and building organizational capacity to promote data-driven decision making.

Carol Flannagan

Carol Flannagan

By |

As faculty member within the University of Michigan Transportation Research Institute, Dr. Flannagan currently serves as Director of the Center for Management of Information for Safe and Sustainable Transportation (CMISST) and Head of the Statistics and Methods Group for the CDC-funded UM Injury Center. Dr. Flannagan has over 20 years of experience conducting data analysis and research on injury risk related to motor vehicle crashes and was responsible for the development of a model of injury outcome that allows side-by-side comparison of public health, vehicle, roadway and post-crash interventions ( She has also applied statistical methods to understanding and evaluating benefits of crash-avoidance technologies, including evaluating safety in automated vehicles, and works to develop novel applications of statistics to analysis of driving data. Her current work with CMISST involves the fusion and analysis of large state-level crash databases, which are useful in analyzing the effect of a variety of countermeasures on crash involvement and injury risk. In addition, her group is working to make data available to researchers to expand the community of experts in transportation data analysis.

Q & A with Dr. Carol Flannagan

Q:  When in your career did you realize that data (broadly speaking) could open a lot of doors, in terms of research?

I have loved data analysis since my undergraduate Experimental Design course sophomore year when I learned about analysis of 2X2 tables. Beyond that, I didn’t necessarily see it as a career move or a research program. It was much later at UMTRI, when I started thinking about how much data we had scattered around the building and how great it would be to round up those data and share them, that I though explicitly about data per se as opening research doors.

In 2010, I was asked to head up the Transportation Data Center at UMTRI. In spite of its name, the group was doing very limited things with crash data at the time. After a few years, the group got a new name and some support to grow from UMOR. That, along with a number of research projects over the years, has led to our current incarnation with substantially upgraded data hosting for the state of Michigan’s crash data and strong capabilities in data analysis of all kinds of newer transportation data. For example, we have collaborated with the Engineering Systems group, GM and OnStar to conduct a series of studies funded by NHTSA to analyze driver response to crash avoidance systems using OnStar data on a large scale.

With the MIDAS transportation project, we are moving forward on several fronts. Most importantly, we are developing a high-performance data access and processing system to handle the large driving datasets that UMTRI has collected (and continues to collect). This system can integrate automated video processing with analysis and/or querying of time series data (e.g., speed, acceleration, etc.). For my group, this new capability opens new doors in both data sharing and data analysis.

Q: How has the field, and specifically the use of “Big Data,” changed during your career?

Wow… My first analyses as an undergrad were computed by hand as a matter of course. In grad school, since we needed to access terminals to use MTS (the Michigan Terminal System), it was often faster to just do it by hand (using a calculator and paper and pencil). What that meant is that computation was a true limitation on how much data could be analyzed. Even after personal computers were a regular fixture in research (and at home), computation of any size (e.g., more subjects or more variables) was still slow. My dissertation included a simulation component and I used to set simulations running every night before I went to bed. The simulation would finish around 2-3 a.m., and the computer would suddenly turn on, sending bright light into the room and waking my husband and me up. Those simulations can all be done in minutes or seconds now. Imagine how much more I could have done with current systems!

In the last 5 years, I’ve gotten much more involved in analysis of driving data to understand benefits of crash avoidance systems. We are often searching these data for extracts that contain certain kinds of situations that are relevant to such systems (e.g., hard braking events related to forward collision warnings). This is one use of Big Data—to observe enough that you can be sure of finding examples of what you’re interested in. However, in the last year or two, I have been more involved in full-dataset analyses, large-scale triggered data collection, and large-scale kinematic simulations. These are all enabled by faster computing and because of that, we can find richer answers to more questions.

Q: What role does your association with MIDAS play in this continuing development?

One of the advantages of being associated with MIDAS is that it gives me access to faculty who are interested in data as opposed to transportation. It’s not that I need other topic areas, but I often find that when I listen to data and methods talks in totally different content areas, I can see how those methods could apply to problems I care about. For example, the challenge grant in the social sciences entitled “Computational Approaches for the Construction of Novel Macroeconomic Data,” is tackling a problem for economists that transportation researchers share. That is, how do you help researchers who may not have deep technical skill in data querying get at extracts from very large, complex datasets easily? Advances that they make in that project could be transferred to transportation datasets, which share many of the characteristics of social-media datasets.

Another advantage is the access to students who are in data-oriented programs (e.g., statistics, data science, computer science, information, etc.) who want real-world experiences using their skills. I have had a number of data science students reach out to me and worked closely with two of them on independent studies where we worked together on Big Data analysis problems in transportation. One was related to measuring and tracking vehicle safety in automated vehicles and the other was in text mining of a complaints dataset to try to find failure patterns that might indicate the presence of a vehicle defect.

Q: What are the next research challenges for transportation generally and your work specifically?

Transportation is in the midst of a once-in-a-lifetime transformation. In my lifetime, I expect to be driven to the senior center and my gerontologist appointments by what amounts to a robot (on wheels). Right now, I’m working on research problems that will help bring that to fruition. Data science is absolutely at the core of that transformation and the problems are wide open. I’m particularly concerned that our research datasets need to be managed in a way that opens them to a broad audience where we support both less technical interactions (e.g., data extracts with a very simple querying system) and much more technical interactions (e.g., full-dataset automatic video processing to identify drivers using cell phones in face video). I’m also interested in new and efficient large-scale data collection methods, particularly those based on the idea of “smart” or triggered sampling rather than just acquisition of huge datasets “because the information I want is somewhere in there…if only I can find it…”

My own work has mostly been about safety, and although we envision a future without traffic fatalities (or ideally without crashes), the current fatality count has gone up over the past couple of years. Thus, I spend time analyzing crash data for new insights into changes in safety over time and even how the economy influences fatality counts. Much of my work is for the State of Michigan around predicting how many fatalities there will be in the next few years and identifying the benefits and potential benefits of various interventions. My web tool, UTMOST, (the Unified Theory Mapping Opportunities for Safety Technologies) allows visualization of the benefits, or potential benefits, of many different kinds of interventions, including public health policy, vehicle technology, and infrastructure improvements. I think this kind of integrated approach will be necessary to achieve the goal of zero fatalities over the next 20 years. Finally, a significant part of my research program will continue to be development and use of statistical methods to measure, predict, and understand safety in transportation. How can we tell if AVs are safer than humans? What should future traffic safety data systems look like? How can we integrate data systems (e.g., crash and injury outcome) to better understand safety and figure out how to prioritize countermeasure development? How can machine learning and other big data statistical tools help us make the most of driving data?