Prof. Liu’s research interest lies in optimal resource allocation, sequential decision theory, online and machine learning, performance modeling, analysis, and design of large-scale, decentralized, stochastic and networked systems, using tools including stochastic control, optimization, game theory and mechanism design. Her most recent research activities involve sequential learning, modeling and mining of large scale Internet measurement data concerning cyber security, and incentive mechanisms for inter-dependent security games. Within this context, her research group is actively working on the following directions.
1. Cyber security incident forecast. The goal is to predict an organization’s likelihood of having a cyber security incident in the near future using a variety of externally collected Internet measurement data, some of which capture active maliciousness (e.g., spam and phishing/malware activities) while others capture more latent factors (e.g., misconfiguration and mismanagement). While machine learning techniques have been extensively used for detection in the cyber security literature, using them for prediction has rarely been done. This is the first study on the prediction of broad categories of security incidents on an organizational level. Our work to date shows that with the right choice of feature set, highly accurate predictions can be achieved with a forecasting window of 6-12 months. Given the increasing amount of high profile security incidents (Target, Home Depot, JP Morgan Chase, and Anthem, just to name a few) and the amount of social and economic cost they inflict, this work will have a major impact on cyber security risk management.
2. Detect propagation in temporal data and its application to identifying phishing activities. Phishing activities propagate from one network to another in a highly regular fashion, a phenomenon known as fast-flux, though how the destination networks are chosen by the malicious campaign remains unknown. An interesting challenge arises as to whether one can use community detection methods to automatically extract those networks involved in a single phishing campaign; the ability to do so would be critical to forensic analysis. While there have been many results on detecting communities defined as subsets of relatively strongly connected entities, the phishing activity exhibits a unique propagating property that is better captured using an epidemic model. By using a combination of epidemic modeling and regression we can identify this type of propagating community with reasonable accuracy; we are working on alternative methods as well.
3. Data-driven modeling of organizational and end-user security posture. We are working to build models that accurately capture the cyber security postures of end-users as well as organizations, using large quantities of Internet measurement data. One domain is on how software vendors disclose security vulnerabilities in their products, how they deploy software upgrades and patches, and in turn, how end users install these patches; all these elements combined lead to a better understanding of the overall state of vulnerability of a given machine and how that relates to user behaviors. Another domain concerns the interconnectedness of today’s Internet which implies that what we see from one network is inevitably related to others. We use this connection to gain better insight into the conditions of not just a single network viewed in isolation, but multiple networks viewed together.
My research spans security, privacy, and optimization of data collection particularly as applied to the Smart Grid, an augmented and enhanced paradigm for the conventional power grid. I am particularly interested in optimization approaches that take a notion of security and/or privacy into the modeling explicitly. At the intersection of the Intelligent Transportation Systems, Smart Grid, and Smart Cities, I am interested in data privacy and energy usage in smart parking lots. Protection of data and availability, especially under assault through a Denial-of-Service attacks, represents another dimension of my area of research interests. I am working on developing data privacy-aware bidding applications for the Smart Grid Demand Response systems without relying on trusted third parties. Finally, I am interested in educational and pedagogical research about teaching computer science, Smart Grid, cyber security, and data privacy.
My research group develops models and algorithms for large-scale inverse problems, especially image reconstruction for X-ray CT and MRI. The models include those based on sparsity using dictionaries learned from large-scale data sets. Developing efficient and accurate methods for dictionary learning is a recent focus.
My research focuses on data management problems that arise from extreme diversity in large data collections. Big data is not just big in terms of bytes, but also type (e.g., a single hard disk likely contains relations, text, images, and spreadsheets) and structure (e.g., a large corpus of relational databases may have millions of unique schemas). As a result, certain long-held assumptions — e.g., that the database schema is always known before writing a query — are no longer useful guides for building data management systems. As a result, my work focuses heavily on information extraction and data mining methods that can either improve the quality of existing information or work in spite of lower-quality information.
We develop the scientific foundations and associated algorithmic tools for compactly representing and analyzing heterogeneous data streams from sensor/information-rich networked dynamical systems. We take a unified dynamics-based and data-driven approach for the design of passive and active monitors for anomaly detection in such systems. Dynamical models naturally capture temporal (i.e., causal) relations within data streams. Moreover, one can use hybrid and networked dynamical models to capture, respectively, logical relations and interactions between different data sources. We study structural properties of networks and dynamics to understand fundamental limitations of anomaly detection from data. By recasting information extraction problem as a networked hybrid system identification problem, we bring to bear tools from computer science, system and control theory and convex optimization to efficiently and rigorously analyze and organize information. The applications include diagnostics, anomaly and change detection in critical infrastructure such as building management systems, transportation and energy networks.
In the next-generation power systems (Smart Grid), a large number of distributed energy devices (e.g., distributed generators, distributed energy storage, loads, smart meters) are connected to each other in an internet-like structure. Incorporating millions of new energy devices will require wide-ranging transformation of the nation’s aging electrical grid infrastructure. The key challenge is to efficiently manage a great amount of devices through distributed intelligence. The distributed grid intelligence (DGI) agent is the brain of distributed energy devices. DGI enables every single energy device to not only have a certain intelligence to achieve optimal management locally, but also coordinate with others to achieve a common goal. The massive volume of real-time data collected by DGI will help the grid operators gain a better understanding of a large-scale and highly dynamic power systems. In conventional power systems, the system operation is performed using purely centralized data storage and processing approaches. However, as the number of DGIs increases to more than hundreds of thousands, it is rather intuitive that the state-of-the-art centralized information processing architecture will no longer be sustainable under such big data explosion. The ongoing research work illustrates how advanced ideas from IT industry and power industry can be combined in a unique way. The proposed high-availability distributed file system and data processing framework can be easily tailored to support other data-intensive applications in a large-scale and complex power grids. For example, the proposed DGI nodes can be embedded into any distributed generators (e.g., roof-top PV panel), distributed energy storage devices (e.g., electric vehicle), and loads (e.g., smart home) in a future residential distribution system. If implemented successfully, we can translate Smart Grid with high-volume, high-velocity, and high-variety data to a completely distributed cyber-physical system architecture. In addition, the proposed work can be easily extended to support other cyber-physical system applications (e.g., intelligent transportation system).
The Corso group’s main research thrust is high-level computer vision and its relationship to human language, robotics and data science. They primarily focus on problems in video understanding such as video segmentation, activity recognition, and video-to-text; methodology, models leveraging cross-model cues to learn structured embeddings from large-scale data sources as well as graphical models emphasizing structured prediction over large-scale data sources are their emphasis. From biomedicine to recreational video, imaging data is ubiquitous. Yet, imaging scientists and intelligence analysts are without an adequate language and set of tools to fully tap the information-rich image and video. His group works to provide such a language. His long-term goal is a comprehensive and robust methodology of automatically mining, quantifying, and generalizing information in large sets of projective and volumetric images and video to facilitate intelligent computational and robotic agents that can natural interact with humans and within the natural world.