Recent years have witnessed a big data boom that includes a wide spectrum of heterogeneous data types, from image, speech, and multimedia signals to text documents and labels. Much of this information is encoded in natural language, which makes it accessible to some people—for example, those who can read that particular language—but much less amenable to computer processing beyond a simple keyword search. The research area of Blender Lab, cross-source information extraction (IE) on a massive scale, aims to create the next generation of information access in which humans can communicate with computers in natural languages beyond keyword search, and computers can discover the accurate, concise, and trustable information embedded in big data from heterogeneous sources.

Traditional IE techniques pull information from individual documents in isolation, but users might need to gather information that’s scattered among a variety of sources (for example, in multiple languages, documents, genres, and data modalities). Complicating matters, these facts might be redundant, complementary, incorrect, or ambiguously worded; the extracted information might also need to augment an existing Knowledge Base (KB), which requires the ability to link events, entities, and associated relations to KB. In our research, we aim to define several new extensions to the state-of-the-art IE paradigm beyond “slot filling,” getting to the point where we systematically develop the foundation, methodologies, algorithms, and implementations needed for more accurate, coherent, complete, concise, and most importantly, dynamic and resilient extraction capabilities.

The general principle of Blender Lab is to do creative, ground-breaking and enjoyable research. Each member is a serious researcher, critical thinker as well as efficient engineer. We aim to deliver each research project as a piece of art, and have great fun on each creative idea.

We are part of the bigger UIUC NLP community.