Prof. Liu’s research interest lies in optimal resource allocation, sequential decision theory, online and machine learning, performance modeling, analysis, and design of large-scale, decentralized, stochastic and networked systems, using tools including stochastic control, optimization, game theory and mechanism design. Her most recent research activities involve sequential learning, modeling and mining of large scale Internet measurement data concerning cyber security, and incentive mechanisms for inter-dependent security games. Within this context, her research group is actively working on the following directions.
1. Cyber security incident forecast. The goal is to predict an organization’s likelihood of having a cyber security incident in the near future using a variety of externally collected Internet measurement data, some of which capture active maliciousness (e.g., spam and phishing/malware activities) while others capture more latent factors (e.g., misconfiguration and mismanagement). While machine learning techniques have been extensively used for detection in the cyber security literature, using them for prediction has rarely been done. This is the first study on the prediction of broad categories of security incidents on an organizational level. Our work to date shows that with the right choice of feature set, highly accurate predictions can be achieved with a forecasting window of 6-12 months. Given the increasing amount of high profile security incidents (Target, Home Depot, JP Morgan Chase, and Anthem, just to name a few) and the amount of social and economic cost they inflict, this work will have a major impact on cyber security risk management.
2. Detect propagation in temporal data and its application to identifying phishing activities. Phishing activities propagate from one network to another in a highly regular fashion, a phenomenon known as fast-flux, though how the destination networks are chosen by the malicious campaign remains unknown. An interesting challenge arises as to whether one can use community detection methods to automatically extract those networks involved in a single phishing campaign; the ability to do so would be critical to forensic analysis. While there have been many results on detecting communities defined as subsets of relatively strongly connected entities, the phishing activity exhibits a unique propagating property that is better captured using an epidemic model. By using a combination of epidemic modeling and regression we can identify this type of propagating community with reasonable accuracy; we are working on alternative methods as well.
3. Data-driven modeling of organizational and end-user security posture. We are working to build models that accurately capture the cyber security postures of end-users as well as organizations, using large quantities of Internet measurement data. One domain is on how software vendors disclose security vulnerabilities in their products, how they deploy software upgrades and patches, and in turn, how end users install these patches; all these elements combined lead to a better understanding of the overall state of vulnerability of a given machine and how that relates to user behaviors. Another domain concerns the interconnectedness of today’s Internet which implies that what we see from one network is inevitably related to others. We use this connection to gain better insight into the conditions of not just a single network viewed in isolation, but multiple networks viewed together.
My research focuses on technology and innovation in health care with an emphasis on information technology (IT), pharmaceuticals, and empirical methods. Many of my studies explored the effect of electronic health record (EHR) systems on health care quality and productivity. While the short-run gains from health IT adoption may be modest, these technologies form the foundation for a health information infrastructure. We are just beginning to understand how to harness and apply medical information. This problem is complicated by the sheer complexity of medical care, the heterogeneity across patients, and the importance of treatment selection. My current work draws on methods from both machine learning and econometrics to address these issues. Current pharmaceutical studies examine the roles of consumer heterogeneity and learning about the value of products as well as the effect of direct-to-consumer advertising on health.
My research interests are focused improving how we care for our patients by developing analytics tools that automate providing quantitative and statistical measures to augment qualitative and anecdotal evaluation. This requires technical efforts, to create databases and software, and clinical efforts, to integrate data aggregation, analysis and use into routine processes. Construction of knowledge based clinical practice improvement databases and standardizations in nomenclatures and ontologies needed to automate aggregation for all patients in a practice and enable data exchanges within and among institutions are facets of this work. A recent example includes, design implementation and use of an electronic prescription database to improve per patient treatment plan evaluation and enable longitudinal monitoring of results of practice quality improvement efforts. We are also leading a group, sponsored by our professional societies, to define national standards for naming used in data exchanges for clinical trials. Another facet is improvement of patient treatment plan evaluation. Traditionally qualitative, visual inspection of spatial dose relationships to target and normal tissues is used to evaluate plans. Development of algorithms to calculate vectorized dose volume histograms and other vector based spatial-dose objects provide a means to quantify those evaluations. Recently use of databases of dose information have enabled construction of statistical metrics to improve treatment plan evaluation and development of models for quantifying relationships to outcomes.
Data science applications: data driven clinical practice improvement, multi-institutional analysis of factors affecting patient outcomes and practice characterization, nomenclature and ontology.
Dr. Schnell works at the interface between biophysical chemistry, mathematical and computational biology, and pathophysiology. As an independent scientist, his primary research interest is to use mathematical, computational and statistical methods to design or select optimal procedures and experiments, and to provide maximum information by analyzing biochemical data. His laboratory deals with the following topics:
(i) Development and implementation of mathematical, computational, and statistical methods to identify and characterize reaction mechanisms.
(ii) Investigate and test performance design of experiments or standards to quantify, interpret and analyze biochemical data.
(iii) Development of new algorithms and software to analyze biochemical data.
The key objective of my research is to create suitable standards and appropriate support of standards leading to reproducible results in the biochemical sciences. Reproducibility is central to scientific credibility. Meta-research has repeatedly shown that accurate reporting and sound peer-review do not by themselves guarantee the reproducibility of scientific results. One of the leading causes of poor reproducibility is limited research efforts in quantitative biology and chemometrics. In my laboratory, we are developing new ways to assess the reproducibility of quantitative findings in the biochemical sciences.
As a team scientist, Dr. Schnell’s research interest is to investigate complex biomedical systems comprising many interacting components, where modeling and theory may aid in the identification of the key mechanisms underlying the behavior of the system as a whole. His collaborators are primarily basic scientists who focus on the identification of molecular, biochemical or developmental mechanisms associated with diseases. To this end, Dr. Schnell’s expertise plays a central role in the identification of these mechanisms. Using mathematical and computational models, Dr. Schnell can formulate several hypothetical model mechanisms in parallel, which are compared with independent experimental data used to construct the models. The resulting comparisons are then independent between models, and any models that satisfy statistical measures of similarity will be used to make predictions, which will be tested experimentally by his collaborators. The model validated by the experiments will be considered the mechanism capable of explaining the behavior of the systems.
The goal of the research is to design, develop and test a inconspicuous, awareness-enhancement and monitoring device (AEMD) which will assist the treatment of trichotillomania (TTM), a disorder involving recurrent pulling of one’s hair resulting in noticeable hair loss. TTM is associated with significant impairments in social functioning and often has a profound negative impact on self-esteem and well being. Best practice treatment for TTM involves a form of behavioral therapy known as habit reversal therapy (HRT). HRT requires persons with trichotillomania to be aware of their hair pulling behaviors, yet the majority of persons with TTM pull most of their hair outside of their awareness . HRT also requires TTM sufferers to record the frequency and duration of their hair pulling behaviors yet it is obviously impossible for a person to monitor behaviors that they are unaware of. Our Phase I efforts have produced a prototype device (AEMD) that solves these two problems. The prototype AEMD signals the TTM sufferer if their hand approaches their hair, thereby bringing pulling-related behavior into awareness. The prototype AEMD also logs the time, date, duration, and user classification of hair pulling related events and can later transfer the logged data to a personal computer for analysis and data presentation. We continue to refine this device and seek to integrate it with smart-phones to better understand activities and locations associated with hair pulling or other body-focused repetitive behaviors (e.g., skin picking). In the future, we seek to pool data from users to get a better sense of common situations and other factors associated with elevated pulling rates. We intend to develop other electronic tools to detect, monitor and intervene with other mental disorders in the future.
The basis of my work is to make the often invisible traces created by interactions students have with learning technologies available to instructors, technology solutions, and students themselves. This often requires the creation of new novel educational technologies which are designed from the beginning with detailed tracking of user activities. Coupled with machine learning and data mining techniques (e.g. classification, regression, and clustering methods), clickstream data from these technologies is used to build predictive models of student success and to better understand how technology affords benefits in teaching and learning. I’m interested in broadly scaled teaching and learning through Massive Open Online Courses (MOOCs), how predictive models can be used to understand student success, and the analysis of educational discourse and student writing.