As faculty member within the University of Michigan Transportation Research Institute, Dr. Flannagan currently serves as Director of the Center for Management of Information for Safe and Sustainable Transportation (CMISST) and Head of the Statistics and Methods Group for the CDC-funded UM Injury Center. Dr. Flannagan has over 20 years of experience conducting data analysis and research on injury risk related to motor vehicle crashes and was responsible for the development of a model of injury outcome that allows side-by-side comparison of public health, vehicle, roadway and post-crash interventions (utmost.umtri.umich.edu). She has also applied statistical methods to understanding and evaluating benefits of crash-avoidance technologies, including evaluating safety in automated vehicles, and works to develop novel applications of statistics to analysis of driving data. Her current work with CMISST involves the fusion and analysis of large state-level crash databases, which are useful in analyzing the effect of a variety of countermeasures on crash involvement and injury risk. In addition, her group is working to make data available to researchers to expand the community of experts in transportation data analysis.
Q & A with Dr. Carol Flannagan
Q: When in your career did you realize that data (broadly speaking) could open a lot of doors, in terms of research?
I have loved data analysis since my undergraduate Experimental Design course sophomore year when I learned about analysis of 2X2 tables. Beyond that, I didn’t necessarily see it as a career move or a research program. It was much later at UMTRI, when I started thinking about how much data we had scattered around the building and how great it would be to round up those data and share them, that I though explicitly about data per se as opening research doors.
In 2010, I was asked to head up the Transportation Data Center at UMTRI. In spite of its name, the group was doing very limited things with crash data at the time. After a few years, the group got a new name and some support to grow from UMOR. That, along with a number of research projects over the years, has led to our current incarnation with substantially upgraded data hosting for the state of Michigan’s crash data and strong capabilities in data analysis of all kinds of newer transportation data. For example, we have collaborated with the Engineering Systems group, GM and OnStar to conduct a series of studies funded by NHTSA to analyze driver response to crash avoidance systems using OnStar data on a large scale.
With the MIDAS transportation project, we are moving forward on several fronts. Most importantly, we are developing a high-performance data access and processing system to handle the large driving datasets that UMTRI has collected (and continues to collect). This system can integrate automated video processing with analysis and/or querying of time series data (e.g., speed, acceleration, etc.). For my group, this new capability opens new doors in both data sharing and data analysis.
Q: How has the field, and specifically the use of “Big Data,” changed during your career?
Wow… My first analyses as an undergrad were computed by hand as a matter of course. In grad school, since we needed to access terminals to use MTS (the Michigan Terminal System), it was often faster to just do it by hand (using a calculator and paper and pencil). What that meant is that computation was a true limitation on how much data could be analyzed. Even after personal computers were a regular fixture in research (and at home), computation of any size (e.g., more subjects or more variables) was still slow. My dissertation included a simulation component and I used to set simulations running every night before I went to bed. The simulation would finish around 2-3 a.m., and the computer would suddenly turn on, sending bright light into the room and waking my husband and me up. Those simulations can all be done in minutes or seconds now. Imagine how much more I could have done with current systems!
In the last 5 years, I’ve gotten much more involved in analysis of driving data to understand benefits of crash avoidance systems. We are often searching these data for extracts that contain certain kinds of situations that are relevant to such systems (e.g., hard braking events related to forward collision warnings). This is one use of Big Data—to observe enough that you can be sure of finding examples of what you’re interested in. However, in the last year or two, I have been more involved in full-dataset analyses, large-scale triggered data collection, and large-scale kinematic simulations. These are all enabled by faster computing and because of that, we can find richer answers to more questions.
Q: What role does your association with MIDAS play in this continuing development?
One of the advantages of being associated with MIDAS is that it gives me access to faculty who are interested in data as opposed to transportation. It’s not that I need other topic areas, but I often find that when I listen to data and methods talks in totally different content areas, I can see how those methods could apply to problems I care about. For example, the challenge grant in the social sciences entitled “Computational Approaches for the Construction of Novel Macroeconomic Data,” is tackling a problem for economists that transportation researchers share. That is, how do you help researchers who may not have deep technical skill in data querying get at extracts from very large, complex datasets easily? Advances that they make in that project could be transferred to transportation datasets, which share many of the characteristics of social-media datasets.
Another advantage is the access to students who are in data-oriented programs (e.g., statistics, data science, computer science, information, etc.) who want real-world experiences using their skills. I have had a number of data science students reach out to me and worked closely with two of them on independent studies where we worked together on Big Data analysis problems in transportation. One was related to measuring and tracking vehicle safety in automated vehicles and the other was in text mining of a complaints dataset to try to find failure patterns that might indicate the presence of a vehicle defect.
Q: What are the next research challenges for transportation generally and your work specifically?
Transportation is in the midst of a once-in-a-lifetime transformation. In my lifetime, I expect to be driven to the senior center and my gerontologist appointments by what amounts to a robot (on wheels). Right now, I’m working on research problems that will help bring that to fruition. Data science is absolutely at the core of that transformation and the problems are wide open. I’m particularly concerned that our research datasets need to be managed in a way that opens them to a broad audience where we support both less technical interactions (e.g., data extracts with a very simple querying system) and much more technical interactions (e.g., full-dataset automatic video processing to identify drivers using cell phones in face video). I’m also interested in new and efficient large-scale data collection methods, particularly those based on the idea of “smart” or triggered sampling rather than just acquisition of huge datasets “because the information I want is somewhere in there…if only I can find it…”
My own work has mostly been about safety, and although we envision a future without traffic fatalities (or ideally without crashes), the current fatality count has gone up over the past couple of years. Thus, I spend time analyzing crash data for new insights into changes in safety over time and even how the economy influences fatality counts. Much of my work is for the State of Michigan around predicting how many fatalities there will be in the next few years and identifying the benefits and potential benefits of various interventions. My web tool, UTMOST, (the Unified Theory Mapping Opportunities for Safety Technologies) allows visualization of the benefits, or potential benefits, of many different kinds of interventions, including public health policy, vehicle technology, and infrastructure improvements. I think this kind of integrated approach will be necessary to achieve the goal of zero fatalities over the next 20 years. Finally, a significant part of my research program will continue to be development and use of statistical methods to measure, predict, and understand safety in transportation. How can we tell if AVs are safer than humans? What should future traffic safety data systems look like? How can we integrate data systems (e.g., crash and injury outcome) to better understand safety and figure out how to prioritize countermeasure development? How can machine learning and other big data statistical tools help us make the most of driving data?