Research Projects


Data Repository

Online Services



  • Semantic Mining of Activity, Social, and Health data (SMASH) is a multiple-discipline research among computer scientists, medical doctors, and social scientists. Traditionally, support groups and other social reinforcement approaches have been popular and effective in dealing with unhealthy behaviors including overweight. Research in the design and implementation of the SMASH (Semantic Mining of Activity, Social, and Health data) system will address a critical need for formal ontologies and data mining tools to help understanding the influence of healthcare social networks, such as YesiWell, on sustained weight loss where the data are multi-dimensional, temporal, semantically heterogeneous, and very sensitive. SMASH will develop new methods in social network analysis, Semantic Web ontologies, privacy preserving data mining, building on the current state-of-the-art. This project is being supported with a three-year R01 grant by the NIH/NIGMS (Grant Number: R01GM103309 $1.54M PI: Dejing Dou; Co-I: Brigitte Piniewski, Ruoming Jin, Xintao Wu, Jessica Greene, Daniel Lowd, Junfeng Sun; Consultant: David Kil; 5/1/2013 - 2/29/2016).

  • Statistical Knowledge Translation and Integration (SKTI) combines formal ontologies and Markov logic to thoroughly address the challenging problem of translating and integrating semantically heterogeneous knowledge in a systematic way. By expressing both knowledge and semantic mappings in formal ontologies and Markov logic, a unified probabilistic model can jointly translate and integrate knowledge with uncertain mappings. The methods will be evaluated in real-world ontologies, knowledge bases, and a benchmarking system for heterogeneous data. This research will contribute to distributed data mining, knowledge transfer, and a larger theme of semantic data mining, in which formal semantics (e.g., ontologies) and semantic linkages that exist in data can be discovered and incorporated into the knowledge discovery process. This project is being funded with a three-year research grant by the NSF (Award Number: IIS-1118050 $495K PI: Dejing Dou Co-PI: Daniel Lowd 7/1/2011 - 6/30/2014).

  • Neural ElectroMagnetic Ontologies (NEMO) addresses the need for formal representation, storage, mining, and dissemination of brain electromagnetic (e.g., EEG) data. NEMO is a collaborative project between computer scientists and neuroscientists. NEMO aims to develop ontologies and ontology-based methods for representing and sharing event-related brain potentials (ERP) data from experimental studies of neural processes underlying human language and cognition. The NEMO project is the first to develop formal ontologies for the ERP domain. These ontologies are used to represent the current state of knowledge in the ERP domain and to support ontology-based mark-up (annotation) of ERP experiment data collected within our NEMO consortium. With the ontology-based mining, mapping, and integration tools developed for this project, the NEMO research team aims to conduct meta-analyses of ERP patterns in language and cognition, combining results from a variety of ERP research paradigms and different analysis methods and results from our international team of ERP researchers. The NEMO project is being funded with a four-year R01 grant by the NIH/NIBIB (Grant Number: R01EB007684 $2.22M PI: Dejing Dou Co-I: Gwen Frishkoff, Allen Malony, Don Tucker 5/1/2009 - 4/30/2013).

  • Ontology Based Information Extraction (OBIE) has recently emerged as a sub-field of information extraction (IE). Here, the general idea is to use ontologies to guide the information extraction process and formally present the results of information extraction. We have focused on three directions: First, identifying components of information extraction systems that make extractions with respect to particular components of an ontology (which we call information extractors) and reuse those information extractors in other IE processes. Second, using multiple ontologies in the same domain with their semantic mappings to improve the performance of IE. Third, constructing and enriching ontologies automatically during the IE processes.


  • OntoGrate is an ontology-based information integration framework. The general goal is to integrate information that is heterogenous in both structure and semantics in a highly automatic way. Key innovations in OntoGrate include broadening the typical scope of integration to span databases, XML data, and the Semantic Web; strengthening and formalizing the derivation of mapping rules by introducing machine learning and data mining techniques; and extending our inference engine, OntoEngine, and first order ontology language, Web-PDDL, to solve the problem of integration using formal mapping rules with uncertainty. It is novel to apply multi-relational data mining to discover complex mapping rules. As one application of OntoGrate, we have collaborated with the ZFIN (Zebrafish Model Organism Database) research group to integrate heterogeneous gene databases (ZFIN-MGI Integration). The OntoGrate project was supported by the start-up fund of Dejing Dou from the University of Oregon.

  • Internet Routing Forensics (IRF) is a collaborative project with Jun Li at the UO's Network Security Lab and David Meyer at the Advanced Network Technology Center. We are extending several data mining techniques, such as classification and clustering, to discover and analyze abnormal BGP (Border Gateway Protocol) events, such as worms and blackouts. This project was funded with a three-year research grant by the NSF (Award Number: CNS-0520326 $350K PI: Jun Li Co-PI: Dejing Dou, David Meyer 10/1/2005 - 9/30/2008).