In the next-generation power systems (Smart Grid), a large number of distributed energy devices (e.g., distributed generators, distributed energy storage, loads, smart meters) are connected to each other in an internet-like structure. Incorporating millions of new energy devices will require wide-ranging transformation of the nation’s aging electrical grid infrastructure. The key challenge is to efficiently manage a great amount of devices through distributed intelligence. The distributed grid intelligence (DGI) agent is the brain of distributed energy devices. DGI enables every single energy device to not only have a certain intelligence to achieve optimal management locally, but also coordinate with others to achieve a common goal. The massive volume of real-time data collected by DGI will help the grid operators gain a better understanding of a large-scale and highly dynamic power systems. In conventional power systems, the system operation is performed using purely centralized data storage and processing approaches. However, as the number of DGIs increases to more than hundreds of thousands, it is rather intuitive that the state-of-the-art centralized information processing architecture will no longer be sustainable under such big data explosion. The ongoing research work illustrates how advanced ideas from IT industry and power industry can be combined in a unique way. The proposed high-availability distributed file system and data processing framework can be easily tailored to support other data-intensive applications in a large-scale and complex power grids. For example, the proposed DGI nodes can be embedded into any distributed generators (e.g., roof-top PV panel), distributed energy storage devices (e.g., electric vehicle), and loads (e.g., smart home) in a future residential distribution system. If implemented successfully, we can translate Smart Grid with high-volume, high-velocity, and high-variety data to a completely distributed cyber-physical system architecture. In addition, the proposed work can be easily extended to support other cyber-physical system applications (e.g., intelligent transportation system).
My current research interest is focused on improving efficiency and utilization of outpatient clinics, using data mining techniques such as decision tree analysis, Bayesian networks, neural networks, and similar techniques. While our previous and continuing research have been focused on using some of these techniques to develop more sophisticated methods of patients scheduling within physical therapy clinics, we can see the applicability of the techniques to other types of health services providers. There is also applicability to university administration in developing predictive models using data mining techniques for assessing student success.
Dr. Zhu’s group conducts research on various topics, ranging from foundational methodologies to challenging applications, in data science. In particular, the group has been investigating the fundamental issues and techniques for supporting various types of queries (including range queries, box queries, k-NN queries, and hybrid queries) on large datasets in a non-ordered discrete data space. A number of novel indexing and searching techniques that utilize the unique characteristics of an NDDS are developed. The group has also been studying the issues and techniques for storing and searching large scale k-mer datasets for various genome sequence analysis applications in bioinformatics. A virtual approximate store approach to supporting repetitive big data in genome sequence analyses and several new sequence analysis techniques are suggested. In addition, the group has been researching the challenges and methods for processing and optimizing a new type of so-called progressive queries that are formulated on the fly by a user in multiple steps. Such queries are widely used in many application domains including e-commerce, social media, business intelligence, and decision support. The other research topics that have been studied by the group include streaming data processing, self-management database, spatio-temporal data indexing, data privacy, Web information management, and vehicle drive-through wireless services.
The basis of my work is to make the often invisible traces created by interactions students have with learning technologies available to instructors, technology solutions, and students themselves. This often requires the creation of new novel educational technologies which are designed from the beginning with detailed tracking of user activities. Coupled with machine learning and data mining techniques (e.g. classification, regression, and clustering methods), clickstream data from these technologies is used to build predictive models of student success and to better understand how technology affords benefits in teaching and learning. I’m interested in broadly scaled teaching and learning through Massive Open Online Courses (MOOCs), how predictive models can be used to understand student success, and the analysis of educational discourse and student writing.
I develop fast and principled methods for exploring and understanding one or more massive graphs. In addition to fast algorithmic methodologies, my research also contributes graph-theoretical ideas and models, and real-world applications in two main areas: (i) Single-graph exploration, which includes graph summarization and inference; (ii) Multiple-graph exploration, which includes summarization of time-evolving graphs, graph similarity and network alignment. My research is applied mainly to social, collaboration and web networks, as well as brain connectivity graphs.
The Ye Lab has been conducting fundamental research in machine learning and data mining, developing computational methods for biomedical data analysis, and building informatics software. We have developed novel machine learning algorithms for feature extraction from high-dimensional data, sparse learning, multi-task learning, transfer learning, active learning, multi-label classification, and matrix completion. We have developed the SLEP (Sparse Learning with Efficient Projections) package, which includes implementations of large-scale sparse learning models, and the MALSAR (Multi-tAsk Learning via StructurAl Regularization) package, which includes implementations of state-of-the-art multi-task learning models. SLEP achieves state-of-the-art performance for many sparse learning models, and it has become one of the most popular sparse learning software packages. With close collaboration with researchers at the biomedical field, we have successfully applied these methods for analyzing biomedical data, including clinical image data and genotype data.
The Smith lab group is primarily interested in examining evolutionary processes using new data sources and analysis techniques. We develop new methods to address questions about the rates and modes of evolution using the large data sources that have become more common in the biological disciplines over the last ten years. In particular, we use DNA sequence data to construct phylogenetic trees and conduct additional analyses about processes of evolution on these trees. In addition to this research program, we also address how new data sources can facilitate new research in evolutionary biology. To this end, we sequence transcriptomes, primarily in plants, with the goal of better understanding where, within the genome and within the phylogeny, processes like gene duplication and loss, horizontal gene transfer, and increased rates of molecular evolution occur.