OBIE Research at Aimlab


Discovering Inconsistencies in PubMed Abstracts through Ontology-Based Information Extraction

This page presents the datasets and source code related to our work on discovering inconsistencies in PubMed abstracts through Ontology-Based Information Extraction. A paper based on this work has been published in ACM-BCB 2017 conference. The full paper can be found here.


Searching for a cure for cancer is one of the most vital pursuits in modern medicine. In that aspect microRNA research plays a key role. Keeping track of the shifts and changes in established knowledge in the microRNA domain is very important. In this paper, we introduce an Ontology-Based Information Extraction method to detect occurrences of inconsistencies in microRNA research paper abstracts. We propose a method to first use the Ontology for MIcroRNA Targets (OMIT) to extract triples from the abstracts. Then we introduce a new algorithm to calculate the oppositeness of these candidate relationships. Finally we present the discovered inconsistencies in an easy to read manner to be used by medical professionals. To our best knowledge, this study is the first ontology-based information extraction model introduced to find shifts in the established knowledge in the medical domain using research paper abstracts. We downloaded 36877 abstracts from the PubMed database. From those, we found 102 inconsistencies relevant to the microRNA domain.

Source Files (collected data):

Intermediate Outputs:

Output Files:

Source Code:

Java has been used as the programming language in implimentations for the entire project. The Java source files are available from the GitHub organization OMIT-PubMed-Inconsistencies. It contains source code for the following projects:

If you need any other details regarding this work please contact Dejing Dou (dou AT