My laboratory data science research includes: (1) Ontology development. We have initiated and led the development of several community-based ontologies, including Vaccine Ontology (VO), Ontology of Adverse Events (OAE), Cell Line Ontology (CLO), Ontology of Genes and Genomes (OGG), and Interaction Network Ontology (INO). (2) Ontology tool development. We have developed many ontology software programs, such as OntoFox and Ontobee, which are widely used for ontology reuse, ontology development, and ontology applications. (3) Literature mining, with a focus on ontology-based literature mining approaches. (4) Bayesian network (BN) modeling for analysis of gene interaction networks. Meanwhile, we have applied these ontologies, ontology-related approaches, and BN modeling in different data science domains including vaccinology, microbiology, immunology, and pharmacovigilance.
With ever increasing quantities of big data, how to integrate, share, and analyze these data has become a huge challenge. Hundreds of biological interaction pathway resources are publicly available. While each of these resources is widely used, the data in these resources are typically overlapped but not integrated. This disintegration results in redundant work and inefficient data usages. An ontology is a human- and computer-interpretable set of terms and relations that represent entities in a specific domain and how these terms relate to each other. As part of a funded MCubed Diamond project, we aim to ontologically and non-redundantly represent and integrate various molecular interactions, pathways, and networks. The integrated ontology of interaction pathways and networks will then be used by novel statistical and computational methods to efficiently address various scientific problems.