Mingyan Liu

Mingyan Liu

By | | No Comments

Prof. Liu’s research interest lies in optimal resource allocation, sequential decision theory, online and machine learning, performance modeling, analysis, and design of large-scale, decentralized, stochastic and networked systems, using tools including stochastic control, optimization, game theory and mechanism design. Her most recent research activities involve sequential learning, modeling and mining of large scale Internet measurement data concerning cyber security, and incentive mechanisms for inter-dependent security games. Within this context, her research group is actively working on the following directions.

1. Cyber security incident forecast. The goal is to predict an organization’s likelihood of having a cyber security incident in the near future using a variety of externally collected Internet measurement data, some of which capture active maliciousness (e.g., spam and phishing/malware activities) while others capture more latent factors (e.g., misconfiguration and mismanagement). While machine learning techniques have been extensively used for detection in the cyber security literature, using them for prediction has rarely been done. This is the first study on the prediction of broad categories of security incidents on an organizational level. Our work to date shows that with the right choice of feature set, highly accurate predictions can be achieved with a forecasting window of 6-12 months. Given the increasing amount of high profile security incidents (Target, Home Depot, JP Morgan Chase, and Anthem, just to name a few) and the amount of social and economic cost they inflict, this work will have a major impact on cyber security risk management.

2. Detect propagation in temporal data and its application to identifying phishing activities. Phishing activities propagate from one network to another in a highly regular fashion, a phenomenon known as fast-flux, though how the destination networks are chosen by the malicious campaign remains unknown. An interesting challenge arises as to whether one can use community detection methods to automatically extract those networks involved in a single phishing campaign; the ability to do so would be critical to forensic analysis. While there have been many results on detecting communities defined as subsets of relatively strongly connected entities, the phishing activity exhibits a unique propagating property that is better captured using an epidemic model. By using a combination of epidemic modeling and regression we can identify this type of propagating community with reasonable accuracy; we are working on alternative methods as well.

3. Data-driven modeling of organizational and end-user security posture. We are working to build models that accurately capture the cyber security postures of end-users as well as organizations, using large quantities of Internet measurement data. One domain is on how software vendors disclose security vulnerabilities in their products, how they deploy software upgrades and patches, and in turn, how end users install these patches; all these elements combined lead to a better understanding of the overall state of vulnerability of a given machine and how that relates to user behaviors. Another domain concerns the interconnectedness of today’s Internet which implies that what we see from one network is inevitably related to others. We use this connection to gain better insight into the conditions of not just a single network viewed in isolation, but multiple networks viewed together.

A predictive analytics approach to forecasting cyber security incidents. We start from Internet-scale measurement on the security postures of network entities. We also collect security incident reports to use as labels in a supervised learning framework. The collected data then goes through extensive processing and domain-specific feature extraction. Features are then used to train a classifier that generates predictions when we input new features, on the likelihood of a future incident for the entity associated with the input features. We are also actively seeking to understand the causal relationship among different features and the security interdependence among different network entities. Lastly, risk prediction helps us design better incentive mechanisms which is another facet of our research in this domain.

A predictive analytics approach to forecasting cyber security incidents. We start from Internet-scale measurement on the security postures of network entities. We also collect security incident reports to use as labels in a supervised learning framework. The collected data then goes through extensive processing and domain-specific feature extraction. Features are then used to train a classifier that generates predictions when we input new features, on the likelihood of a future incident for the entity associated with the input features. We are also actively seeking to understand the causal relationship among different features and the security interdependence among different network entities. Lastly, risk prediction helps us design better incentive mechanisms which is another facet of our research in this domain.

Omid Dehzangi

By | | No Comments

Wearable health technology is drawing significant attention for good reasons. The pervasive nature of such systems providing ubiquitous access to the continuous personalized data will transform the way people interact with each other and their environment. The resulting information extracted from these systems will enable emerging applications in healthcare, wellness, emergency response, fitness monitoring, elderly care support, long-term preventive chronic care, assistive care, smart environments, sports, gaming, and entertainment which create many new research opportunities and transform researches from various disciplines into data science which is the methodological terminology for data collection, data management, data analysis, and data visualization. Despite the ground-breaking potentials, there are a number of interesting challenges in order to design and develop wearable medical embedded systems. Due to limited available resources in wearable processing architectures, power-efficiency is demanded to allow unobtrusive and long-term operation of the hardware. Also, the data-intensive nature of continuous health monitoring requires efficient signal processing and data analytic algorithms for real-time, scalable, reliable, accurate, and secure extraction of relevant information from an overwhelmingly large amount of data. Therefore, extensive research in their design, development, and assessment is necessary. Embedded Processing Platform Design The majority of my work concentrates on designing wearable embedded processing platforms in order to shift the conventional paradigms from hospital-centric healthcare with episodic and reactive focus on diseases to patient-centric and home-based healthcare as an alternative segment which demands outstanding specialized design in terms of hardware design, software development, signal processing and uncertainty reduction, data analysis, predictive modeling and information extraction. The objective is to reduce the costs and improve the effectiveness of healthcare by proactive early monitoring, diagnosis, and treatment of diseases (i.e. preventive) as shown in Figure 1.

dehzangi-image

Embedded processing platform in healthcare

stilian_stoev-small

Stilian A. Stoev

By | | No Comments

Stilian Stoev’s research is in the area of applied probability and statistics for stochastic processes with emphasis on extremes, heavy tails, self-similarity, and long-range dependence. His recent theoretical contributions are in the area of max-stable processes, which is the class of processes emerging as a canonical model for the dependence in the extremes. This includes the representation, characterization, ergodicity, mixing, and prediction for this class of processes. Dr. Stoev is also working on applied problems in the area of computer network traffic monitoring, analysis and modeling. A recent joint project focuses on developing efficient statistical methods and algorithms for the visualization and analysis of fast multi-gigabit network traffic streams, which can help unveil the structure of traffic flows, detect anomalies and cyber attacks in real-time. This involves advanced low-level packet capture, efficient computation and rapid communication of summary statistics using non-relational data bases. More broadly, Dr. Stoev’s research is motivated by large-scale and data intensive applied problems arising in the areas of:

  1. environmental, weather and climate extremes.
  2. insurance and finance.
  3. Internet traffic monitoring, modeling and prediction.
Hash-binned array of 10+Gbps traffic stream measured at Merit Network. Bin (i,j) corresponds to traffic intensity in bytes of the data transferred from source IPs hashed in bin i with corresponding destination IPs hashed in bin j. The picture corresponds to a 10 second aggregation period. Bright horizontal lines indicate server-type communication from one bin to many, while unusual vertical lines are indicative of distributed denial of service (DDoS) type many-to-one attacks. The data were obtained using the PF_RING module in zero-copy mode, which by-passes the OS kernel and processes all packets passing through the interface. These and related statistical summaries derived via a recently developed AMON (All packet MONintoring) framework allows for a near-instantaneous visualization and automatic detection of structural changes in the network traffic conditions.

Hash-binned array of 10+Gbps traffic stream measured at Merit Network. Bin (i,j) corresponds to traffic intensity in bytes of the data transferred from source IPs hashed in bin i with corresponding destination IPs hashed in bin j. The picture corresponds to a 10 second aggregation period. Bright horizontal lines indicate server-type communication from one bin to many, while unusual vertical lines are indicative of distributed denial of service (DDoS) type many-to-one attacks.
The data were obtained using the PF_RING module in zero-copy mode, which by-passes the OS kernel and processes all packets passing through the interface. These and related statistical summaries derived via a recently developed AMON (All packet MONintoring) framework allows for a near-instantaneous visualization and automatic detection of structural changes in the network traffic conditions.

dinov-small

Ivo D. Dinov

By | | No Comments

Dr. Ivo Dinov directs the Statistics Online Computational Resource (SOCR), co-directs the multi-institutional Probability Distributome Project, and is an associate director for education of the Michigan Institute for Data Science (MIDAS).

Dr. Dinov is an expert in mathematical modeling, statistical analysis, computational processing and visualization of Big Data. He is involved in longitudinal morphometric studies of human development (e.g., Autism, Schizophrenia), maturation (e.g., depression, pain) and aging (e.g., Alzheimer’s and Parkinson’s diseases). Dr. Dinov is developing, validating and disseminating novel technology-enhanced pedagogical approaches for scientific education and active learning.

Analyzing Big observational data including thousands of Parkinson's disease patients based on tens-of-thousands signature biomarkers derived from multi-source imaging, genetics, clinical, physiologic, phenomics and demographic data elements is challenging. We are developing Big Data representation strategies, implementing efficient algorithms and introducing software tools for managing, analyzing, modeling and visualizing large, complex, incongruent and heterogeneous data. Such service-oriented platforms and methodological advances enable Big Data Discovery Science and present existing opportunities for learners, educators, researchers, practitioners and policy makers.

Analyzing Big observational data including thousands of Parkinson’s disease patients based on tens-of-thousands signature biomarkers derived from multi-source imaging, genetics, clinical, physiologic, phenomics and demographic data elements is challenging. We are developing Big Data representation strategies, implementing efficient algorithms and introducing software tools for managing, analyzing, modeling and visualizing large, complex, incongruent and heterogeneous data. Such service-oriented platforms and methodological advances enable Big Data Discovery Science and present existing opportunities for learners, educators, researchers, practitioners and policy makers.