MIDAS Seminar Series Presents: Heng Ji, University of Illinois Urbana Champaign
March 8 @ 4:00 pm - 5:00 pm
Professor, Computer Science Department, University of Illinois Urbana-Champaign
Knowledge Extraction to Accelerate Scientific Discovery
To combat COVID-19, clinicians and scientists all need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. The first challenge is quantity. For example, nearly 2.7K new papers are published at PubMed per day. This knowledge bottleneck causes significant delay in the development of vaccines and drugs for COVID-19. The second challenge is quality due to the rise and rapid, extensive publications of preprint manuscripts without pre-publication peer review. Many research results about coronavirus from different research labs and sources are redundant, complementary or event conflicting with each other.
Let’s consider drug repurposing as a case study. Besides the long process of clinical trial and biomedical experiments, another major cause for the long process is the complexity of the problem involved and the difficulty in drug discovery in general. The current clinical trials for drug re-purposing mainly rely on symptoms by considering drugs that can treat diseases with similar symptoms. However, there are too many drug candidates and too much misinformation published from multiple sources. In addition to a ranked list of drugs, clinicians and scientists also aim to gain new insights into the underlying molecular cellular mechanisms on Covid-19, and which pre-existing conditions may affect the mortality and severity of this disease.
To tackle these two challenges, we have developed a novel and comprehensive knowledge discovery framework, COVID-KG, to accelerate scientific discovery and build a bridge between clinicians and biology scientists. COVID-KG starts by reading existing papers to build multimedia knowledge graphs (KGs), in which nodes are entities/concepts and edges represent relations involving these entities, extracted from both text and images. Given the KGs enriched with path ranking and evidence mining, COVID-KG answers natural language questions effectively. Using drug repurposing as a case study, for 11 typical questions that human experts aim to explore, we integrate our techniques to generate a comprehensive report for each candidate drug. Preliminary assessment by expert clinicians and medical school students show our generated reports are informative and sound. I will also talk about our ongoing work to extend this framework to other domains including molecular synthesis and agriculture.
Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department of University of Illinois at Urbana-Champaign. She is also an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as “Young Scientist” and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. The awards she received include “AI’s 10 to Watch” Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014 and Bosch Research Award in 2014-2018, and ACL2020 Best Demo Paper Award. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030. She is the lead of many multi-institution projects and tasks, including the U.S. ARL projects on information fusion and knowledge networks construction, DARPA DEFT Tinker Bell team and DARPA KAIROS RESIN team. She has coordinated the NIST TAC Knowledge Base Population task since 2010. She has served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2021. Her research has been widely supported by the U.S. government agencies (DARPA, ARL, IARPA, NSF, AFRL, DHS) and industry (Amazon, Google, Bosch, IBM, Disney).