OBIE Research at Aimlab

 

Multiple-Ontology-Based Information Extraction (MOBIE)

This page presents the datasets and source code related to our work on the use of multiple ontologies in information extraction. A paper based on this work has been published in CIKM 2009 conference. The full paper can be found here.

Abstract:

Ontology-Based Information Extraction (OBIE) has recently emerged as a subfield of Information Extraction (IE). Here, ontologies - which provide formal and explicit specifications of conceptualizations - play a crucial role in the information extraction process. Several OBIE systems have been implemented previously but all of them use a single ontology although multiple ontologies have been designed for many domains. We have studied the theoretical basis for using multiple ontologies in information extraction and have developed information extraction systems that use them. These systems investigate the two major scenarios for having multiple ontologies for the same domain: specializing in sub-domains and providing different perspectives. The domain of universities has been used for the former scenario through a corpus collected from university websites. For the latter, the domain of terrorist attacks and a corpus used by a previous Message Understanding Conference (MUC) have been used. The results from these two case studies indicate that using multiple ontologies in information extraction has led to a clear improvement in performance measures.

Ontologies:

OWL files for University Data Case Study are as follows.

OWL files for the Terrorist Data Case Study are as follows.

Text Corpora:

The corpus used by the University Data Case Study: udcs-corpus.zip .

The corpus used by the Terrorist Data Case Study: tdcs-corpus.zip

Source Code:

Java has been used as the programming language in both case studies. The Java source files are available from the following zip files.

University Data Case Study: udcs-source-code.zip

Terrorist Data Case Study: tdcs-source-code.zip

JAPE rules of GATE (which are linguistic extraction rules) are used by all information extractors of the University Data Case Study and the extractors for the MindSwap ontology in the Terrorist Data Case Study. These JAPE rules are available from the following files.

University Data Case Study: udcs-jape.zip

Terrorist Data Case Study: tdcs.jape

If you need any other details regarding this work please contact Dejing Dou (dou AT cs.uoregon.edu).