The Road Towards Language Agnostic Information Extraction

Speaker: Radu Florian (IBM)

Date and Time: 11am CT, Friday, April 30


In this talk I will present my view on the evolution of Information Extraction, in particular mention detection, coreference resolution, relation extraction, and entity linking across multiple languages, going from language specific in CoNLL'02 and CoNLL'03 up until current research that produces models that can process a large set of languages with one engine. If time permits, I will also present some newer experiments that allows a user to take a system in English (for instance) and produce good models in other languages, further enabling true multi-language Information Extraction.


Radu Florian wears two hats as Distinguished Research Scientist and Senior Manager, managing the Multilingual Natural Language Processing Group in IBM Watson Research Center in Yorktown Heights, NY. His research interests include multi-language statistical information extraction, question answering, semantic parsing, and machine learning. He has participated and lead teams in several competitions, including CoNLL information extraction, ACE, TAC-KBP, and DARPA projects such as GALE, BOLT, MRP, and KAIROS.

One of the recent focus in Radu's research involves building models that can operate cross-language -- using multilingual language models such as multilingual BERT or XLMRoberta to build information extraction, dependency parsing, and question answering models that can take a wide variety of languages as input and produce good output, even on languages that were not trained on. Not only these models can perform the extraction on any of the language input (unfortunately, not on Klingon yet), but they usually work better than the models built on one language alone, and do degrade gracefully when tested on completely new languages.