Minimum Supervised Text Mining for Literature-based Scientific Discovery

Speaker: Xuan Wang (UIUC)

Date and Time: Friday, December 3 at 11am CST


Text mining is promising for advancing human knowledge in many fields, given the rapidly growing volume of text data (e.g., scientific articles, medical notes, and news reports) we are seeing nowadays. In this talk, I will present my work on minimum supervised text mining methods that enable and accelerate scientific discovery. First, I will talk about information extraction with minimum human supervision. With the growing volume of text data and the breadth of information, it is inefficient or nearly impossible for humans to manually find, integrate, and digest useful information. A major challenge is to develop methods that automatically understand massive unstructured text data with minimum human effort. To address this challenge, I have contributed a series of algorithms and systems for two tasks: (1) named entity recognition with distant supervision and (2) meta-pattern-guided open relation extraction. Second, I will talk about literature-based scientific discovery in biomedicine. This research direction aims to enable and accelerate real-world knowledge discovery with the rich information we automatically extracted from scientific text. I have collaborated with experts in various scientific disciplines (e.g., biomedicine, chemistry, and health) to achieve this goal. Through these collaborations, I have developed algorithms and systems for two tasks: (1) textual evidence discovery and (2) scientific topic contrasting. Last, I will discuss future directions on developing multi-modal and multi-dimensional text mining methods for a broader impact on science and engineering.


Xuan Wang is a fifth-year Ph.D. student in the Computer Science Department at the University of Illinois at Urbana-Champaign (UIUC). She is working in the Data Mining Group, and her thesis advisor is Dr. Jiawei Han. Xuan received an M.S. in Statistics and an M.S. in Biochemistry from UIUC. She received a B.S. in Biological Science from Tsinghua University, China. Her research interests are in Natural Language Processing and Data Mining, emphasizing applications to Biological and Health Sciences. Her research theme is developing effective and scalable algorithms and systems for automatically understanding massive text data to enable and accelerate scientific discovery. Her research has focused on two directions: (1) information extraction with minimum human supervision and (2) literature-based knowledge discovery in biomedicine. Xuan has published about 20 research/demo papers in both top NLP conferences (e.g., ACL, EMNLP, and NAACL) and biomedical informatics journals (e.g., Bioinformatics) and conferences (e.g., ACM-BCB and IEEE-BIBM). She is the recipient of the YEE Fellowship Award in 2020-2021 in UIUC.