I study cybercrime using data-driven methods to analyze, characterize, and measure the infrastructure and modus operandi used by criminal activities on the Internet. In particular, I focus on collection, analysis, and semantic characterization of cyber threat intelligence that comes in many shapes and forms (e.g., natural language, network traffic, system audit logs). The ultimate goal is to learn insights that will inform decisions on building robust defense against online criminal activities that involve threats such as ransomware, exploit kits, and botnets. To achieve these goals, I find graph theory and analytics, machine learning (deep learning), longitudinal analysis, and causality inference to be the natural methods. I also study the training and deployment of cyber threat classification/prediction systems in adversarial settings.
Current research includes a project funded by Toyota that uses Markov Models and Machine Learning to predict heart arrhythmia, an NSF-funded project to detect Acute Respiratory Distress Syndrome (ARDS) from x-ray images and projects using tensor analysis on health care data (funded by the Department of Defense and National Science Foundation).
Dr. Veera Baladandayuthapani is currently a Professor in the Department of Biostatistics at University of Michigan (UM), where he is also the Associate Director of the Center for Cancer Biostatistics. He joined UM in Fall 2018 after spending 13 years in the Department of Biostatistics at University of Texas MD Anderson Cancer Center, Houston, Texas, where was a Professor and Institute Faculty Scholar and held adjunct appointments at Rice University, Texas A&M University and UT School of Public Health. His research interests are mainly in high-dimensional data modeling and Bayesian inference. This includes functional data analyses, Bayesian graphical models, Bayesian semi-/non-parametric models and Bayesian machine learning. These methods are motivated by large and complex datasets (a.k.a. Big Data) such as high-throughput genomics, epigenomics, transcriptomics and proteomics as well as high-resolution neuro- and cancer- imaging. His work has been published in top statistical/biostatistical/bioinformatics and biomedical/oncology journals. He has also co-authored a book on Bayesian analysis of gene expression data. He currently holds multiple PI-level grants from NIH and NSF to develop innovative and advanced biostatistical and bioinformatics methods for big datasets in oncology. He has also served as the Director of the Biostatistics and Bioinformatics Cores for the Specialized Programs of Research Excellence (SPOREs) in Multiple Myeloma and Lung Cancer and Biostatistics&Bioinformatics platform leader for the Myeloma and Melanoma Moonshot Programs at MD Anderson. He is a fellow of the American Statistical Association and an elected member of the International Statistical Institute. He currently serves as an Associate Editor for Journal of American Statistical Association, Biometrics and Sankhya.
The future of transportation lies at the intersection of two emerging trends, namely, the sharing economy and connected and automated vehicle technology. Our research group investigates the impact of these two major trends on the future of mobility, quantifying the benefits and identifying the challenges of integrating these technologies into our current systems.
Our research on shared-use mobility systems focuses on peer-to-peer (P2P) ridesharing and multi-modal transportation. We provide: (i) operational tools and decision support systems for shared-use mobility in legacy as well as connected and automated transportation systems. This line of research focuses on system design as well as routing, scheduling, and pricing mechanisms to serve on-demand transportation requests; (ii) insights for regulators and policy makers on mobility benefits of multi-modal transportation; (ii) planning tools that would allow for informed regulations of sharing economy.
In another line of research we investigate challenges faced by the connected automated vehicle technology before mass adoption of this technology can occur. Our research mainly focuses on (i) transition of control authority between the human driver and the autonomous entity in semi-autonomous (level 3 SAE autonomy) vehicles; (ii) incorporating network-level information supplied by connected vehicle technology into traditional trajectory planning; (iii) improving vehicle localization by taking advantage of opportunities provided by connected vehicles; and (iv) cybersecurity challenges in connected and automated systems. We seek to quantify the mobility and safety implications of this disruptive technology, and provide insights that can allow for informed regulations.
Dr. Lee’s research in data science concerns biological questions in systems biology and network medicine by developing algorithms and models through a combination of statistical/machine learning, information theory, and network theory applied to multi-dimensional large-scale data. His projects have covered genomics, transcriptomics, proteomics, and metabolomics from yeast to mouse to human for integrative analysis of regulatory networks on multiple molecular levels, which also incorporates large-scale public databases such as GO for functional annotation, PDB for molecular structures, and PubChem and LINCS for drugs or small compounds. He previously carried out proteomics and metabolomics along with a computational derivation of dynamic protein complexes for IL-3 activation and cell cycle in murine pro-B cells (Lee et al., Cell Reports 2017), for which he developed integrative analytical tools using diverse approaches from machine learning and network theory. His ongoing interests in methodology include machine/deep learning and topological Kolmogorov-Sinai entropy-based network theory, which are applied to (1) multi-level dynamic regulatory networks in immune response, cell cycle, and cancer metabolism and (2) mass spectrometry-based omics data analysis.
The goal of my research is to leverage network analysis techniques to uncover how the brain mediates sex hormone influences on gendered behavior across the lifespan. Specifically, my data science research concerns the creation and application of person-specific connectivity analyses, such as unified structural equation models, to time series data; these are intensive longitudinal data, including functional neuroimages, daily diaries, and observations. I then use these data science methods to investigate the links between androgens (e.g., testosterone) and estradiol at key developmental periods, such as puberty, and behaviors that typically show sex differences, including aspects of cognition and psychopathology.
My research spans security, privacy, and optimization of data collection particularly as applied to the Smart Grid, an augmented and enhanced paradigm for the conventional power grid. I am particularly interested in optimization approaches that take a notion of security and/or privacy into the modeling explicitly. At the intersection of the Intelligent Transportation Systems, Smart Grid, and Smart Cities, I am interested in data privacy and energy usage in smart parking lots. Protection of data and availability, especially under assault through a Denial-of-Service attacks, represents another dimension of my area of research interests. I am working on developing data privacy-aware bidding applications for the Smart Grid Demand Response systems without relying on trusted third parties. Finally, I am interested in educational and pedagogical research about teaching computer science, Smart Grid, cyber security, and data privacy.
My research focuses on developing and applying computational and data-enabled methodology in the broader area of sustainability. Main thrusts are as follows:
- Human mobility dynamics. I am interested in mining large-scale real-world travel trajectory data to understand human mobility dynamics. This involves the processing and analyzing travel trajectory data, characterizing individual mobility patterns, and evaluating environmental impacts of transportation systems/technologies (e.g., electric vehicles, ride-sharing) based on individual mobility dynamics.
- Global supply chains. Increasingly intensified international trade has created a connected global supply chain network. I am interested in understanding the structure of the global supply chain network and economic/environmental performance of nations.
- Networked infrastructure systems. Many infrastructure systems (e.g., power grid, water supply infrastructure) are networked systems. I am interested in understanding the basic structural features of these systems and how they relate to the system-level properties (e.g., stability, resilience, sustainability).
A network visualization (force-directed graph) of the 2012 US economy using the industry-by-industry Input-Output Table (15 sectors) provided by BEA. Each node represents a sector. The size of the node represents the economic output of the sector. The size and darkness of links represent the value of exchanges of goods/services between sectors. An interactive version and other data visualizations are available at http://mingxugroup.org/
The GEMS (Graph Exploration and Mining at Scale) Lab develops new, fast and principled methods for mining and making sense of large-scale data. Within data mining, we focus particularly on interconnected or graph data, which are ubiquitous. Some examples include social networks, brain graphs or connectomes, traffic networks, computer networks, phonecall and email communication networks, and more. We leverage ideas from a diverse set of fields, including matrix algebra, graph theory, information theory, machine learning, optimization, statistics, databases, and social science.
At a high level, we enable single-source and multi-source data analysis by providing scalable methods for fusing data sources, relating and comparing them, and summarizing patterns in them. Our work has applications to exploration of scientific data (e.g., connectomics or brain graph analysis), anomaly detection, re-identification, and more. Some of our current research directions include:
*Scalable Network Discovery from non-Network Data*: Although graphs are ubiquitous, they are not always directly observed. Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, energy, economics, and more. However, traditional network discovery approaches are computationally expensive. We are currently investigating network discovery methods (especially from time series) that are both fast and accurate.
*Graph similarity and Alignment with Representation Learning*: Graph similarity and alignment (or fusion) are core tasks for various data mining tasks, such as anomaly detection, classification, clustering, transfer learning, sense-making, de-identification, and more. We are exploring representation learning methods that can generalize across networks and can be used in such multi-source network settings.
*Scalable Graph Summarization and Interactive Analytics*: Recent advances in computing resources have made processing enormous amounts of data possible, but the human ability to quickly identify patterns in such data has not scaled accordingly. Thus, computational methods for condensing and simplifying data are becoming an important part of the data-driven decision making process. We are investigating ways of summarizing data in a domain-specific way, as well as leveraging such methods to support interactive visual analytics.
*Distributed Graph Methods*: Many mining tasks for large-scale graphs involve solving iterative equations efficiently. For example, classifying entities in a network setting with limited supervision, finding similar nodes, and evaluating the importance of a node in a graph, can all be expressed as linear systems that are solved iteratively. The need for faster methods due to the increase in the data that is generated has permeated all these applications, and many more. Our focus is on speeding up such methods for large-scale graphs both in sequential and distributed environments.
*User Modeling*: The large amounts of online user information (e.g., in social networks, online market places, streaming music and video services) have made possible the analysis of user behavior over time at a very large scale. Analyzing the user behavior can lead to better understanding of the user needs, better recommendations by service providers that lead to customer retention and user satisfaction, as well as detection of outlying behaviors and events (e.g., malicious actions or significant life events). Our current focus is on understanding career changes and predicting job transitions.
Elizaveta (Liza) Levina and her group work on various questions arising in the statistical analysis of large and complex data, especially networks and graphs. Our current focus is on developing rigorous and computationally efficient statistical inference on realistic models for networks. Current directions include community detection problems in networks (overlapping communities, networks with additional information about the nodes and edges, estimating the number of communities), link prediction (networks with missing or noisy links, networks evolving over time), prediction with data connected by a network (e.g., the role of friendship networks in the spread of risky behaviors among teenagers), and statistical analysis of samples of networks with applications to brain imaging, especially fMRI data from studies of mental health).