Statistical Knowledge Translation and Integration


Knowledge translation is the task of applying knowledge learned or developed in one domain to another semantically different one. Knowledge integration is the related task of building a unified knowledge base from multiple sources that may be very different from each other. These tasks arise in many scenarios where individuals or organizations wish to exchange knowledge or combine diverse knowledge resources. Two motivating examples are distributed data mining (DDM) and the Semantic Web. For DDM, we wish to share knowledge gained from different databases without transferring and translating the original data among different parties. For the Semantic Web, an application or analyst may wish to use resources defined in a number of ontologies that do not have consistent semantics.

The “Statistical Knowledge Translation and Integration” (SKTI) framework combines Markov logic and Semantic Web ontologies to thoroughly address the challenging problem of translating and integrating knowledge. Markov logic is well-suited to this task, since it is built on firstorder logic and probabilistic graphical models. By expressing both knowledge and semantic mappings in Markov logic, we obtain a unified probabilistic model for jointly translating knowledge and refining semantic mappings. In this framework, we address the challenges of integrating heterogeneous knowledge with uncertain mappings, defining and evaluating the correctness of translated knowledge, simplifying our models to obtain more compact “approximate translations,” and evaluating our methods in realistic scenarios based on real-world ontologies and knowledge bases.

To our best knowledge, the SKTI research is the first to address the knowledge translation and knowledge integration problem among semantically heterogeneous resources by considering uncertainty of knowledge and semantic mappings. It lays solid foundations for automatic knowledge translation and integration in data mining and the Semantic Web. Not only does it capture correctness and accuracy in the translation, but it also develops innovative applications of Markov logic and Semantic Web ontologies which will benefit both data mining and Semantic Web research. In addition, the benchmarks we develop will allow current and future prototype knowledge translation and integration systems to be formally evaluated and compared, thereby improving their overall quality. This research also contributes to a larger theme of semantic data mining, in which formal semantics (e.g., ontologies) and semantic linkages that exist in data can be discovered and incorporated into the knowledge discovery process.

The general nature of SKTI makes it applicable to any domain, especially biomedical and decision sciences, where large amounts of data are already publicly available but are structurally and semantically heterogeneous. Through integrated research and education endeavors, this work will make substantial contributions to the curriculum development for undergraduate and graduate students, and greatly facilitate the collaboration with biomedical, health, and decision science groups. Our knowledge translation and integration tools and the benchmarking system will be useful for other data mining and Semantic Web research groups to test and compare their systems using synthetic or real heterogeneous data. We will release our source code, benchmarks, and services via this site.

This project is being funded with a three-year research grant by the NSF (Award Number: IIS-1118050 $495K PI: Dejing Dou Co-PI: Daniel Lowd 7/1/2011 - 6/30/2014).