AIM Laboratory
Introduction:
The Advanced Integration and Mining Lab (AIM Lab) was established in
2005 to conduct data integration and data mining research in Computer and Information Science Department at the University
of Oregon. There are large amounts of data stored in different data
repositories, such as databases, data warehouses, the WWW and the emerging
Semantic Web. They may use different structures (schemas) or semantics
(ontologies) to describe their data even in the same domain. How to
integrate those heterogenous data resources is still a big challenge to
both database and Semantic Web research. Finding the relationships
(mappings) is the first step to do data integration and it may need human
involvement. Then the mappings can be used for data translation or
query answering across different data resources. Our lab is developing
ontology-based information systems to help both general users and
domain experts (e.g., biologists, neuroscientists) to integrate,
process and analyze their data.
Data mining is a useful technique for finding interesting trends
or patterns in those large datasets to guide decisions about future
activities. Data mining has been used in many fields. In our lab, we
have been interested in using data mining to study the semantic structures of biomedical data, web
data and network data. How to mine the interesting
relationships across heterogenous data resources is also a very
interesting topic to us. There can be some interaction between data
mining and data integration tasks. For example, the generated data
mining rules across different data resources can be used to guide data
integration, and the integrated data can be used for finding more
interesting trends and patterns.
Research Projects:
- OntoGrate is an ontology-based system for
interactive information integration. The goal is to help
human experts in multiple domains to interactively integrate information
that is heterogenous in both structure and semantics. Key innovations in OntoGrate include
broadening the typical scope of integration to span databases, XML
data, Knowledge bases and the Semantic Web; strengthening and
formalizing the derivation of mapping rules by introducing machine
learning and data mining techniques; and extending our inference
engine, OntoEngine, and first order ontology
language, Web-PDDL, to solve the problem of integration using
formal mapping rules. Along with OntoGrate, one sub-project is
on how to use inductive logic programming and multi-relational data mining to find complex mapping
rules. As an application of OntoGrate, we are collaborating with ZFIN (the Zebrafish Model Organism Database) research group to integrate heterogeneous gene databases.
- The Neural ElectroMagnetic Ontology (NEMO) system addresses
the need for tools to support representation, storage, mining, and
dissemination of brain electromagnetic (EEG and MEG) data. NEMO is
a collaborative project with Gwen Frishkoff, Allen Malony, Don Tucker and Robert Frank at NIC (NeuroInformatics Center),
EGI (Electrical Geodesics, Inc), the Department of Psychology at UO,
and the Learning Research and Development Center at the
University of Pittsburgh. We are mining and developing several temporal, spatial and functional ontologies which are novel for the EEG/MEG
neuroscience field. We are are also building ontology-based databases for EEG/MEG data. We expect this
work to receive significant attention.
- Internet Routing Forensics (IRF) is a collaborative
project with Jun Li at the UO's Network Security Lab
and David Meyer at Advanced Network Technology Center. We are
extending several data mining techniques, such as classification
and clustering, to discover and analyze abnormal BGP (Border
Gateway Protocol) events, such as worms and blackouts. This project
is being funded with a three-year research grant (PI: Jun Li Co-PI: Dejing Dou, David Meyer) by the
NSF.
- Collaboration with Dr. Jongwan Kim is to
construct a Personalized Ontology-based Anti-spam email System
(POAS) by data mining and logic inference.
Software for download and online services:
A
prototype of our OntoGrate online service is available for querying relational databases with Semantic Web ontologies.
We will release the software and a complete online service to the public when the final version of
OntoGrate is finished.
Our inference engine, OntoEngine, and the OntoMerge online service
are designed to translate Semantic Web data. (Both OntoEngine and OntoMerge were firstly developed by Dejing Dou and his Yale colleague, Peishen Qi, and their advisor, Drew McDermott.).
A first version of OntoGUI, an ontology editing and creation tool, is available for download here. OntoGUI requires Java version 1.5 or later.
All of our software uses the Web-PDDL language, an expressive, strongly typed, first-order language for describing semantic
mapping rules between different ontologies. Web-PDDL's syntax and semantics are described in this white paper.
Heterogeneous Data Repository:
A prototype repository for heterogeneous data is available for all heterogeneous data and meta data (e.g., ontologies and database schemas) we collected or created. We have used most of them for experiments reported in our publications and online services. We will keep releasing more heterogeneous data and finally build a benchmark for both distributed data mining and information integration communities.
Faculty members:
Student members:
- Paea LePendu (Ph.D. student since 2004, NPSC Graduate Research Fellow)
- Han Qin (Ph.D. student since 2005)
- Haishan Liu (Ph.D. student since 2006)
- Daya Wimalasuriya (Ph.D. student since 2005)
- Brad Pitcher (Master student since 2007)
- Endei Noda (visiting scholar/student)
Former members:
- Vikash Agarwal (MS' 05)
- Amanda Hosler (BS' 06 with honor thesis)
- Shiwoong Kim (MS' 06)
- Mike Matloff (BS'06)
- Jongwan Kim (visiting Professor 2006 - 2008)
- Jiawei Rong (MS'08)
- Jigme Tenzing (MS'08)
- Darren Brown
- DongHwi Kwak
Publications:
(A complete list of our publications in BibTeX format)
Paea LePendu, Dejing Dou, Gwen Frishkoff and Jiawei Rong 2008. Ontology Database: a New Method for Semantic
Modeling and an Application to Brainwave Data.
(to appear) Proceedings of the 20th International Conference on
Scientific and Statistical Database Management (SSDBM 2008). 2008.
Gwen A. Frishkoff, Robert M. Frank, Jiawei Rong, Dejing Dou, Joseph Dien and Laura K.
Halderman 2007. A Framework to Support Automated Classification and
Labeling of Brain Electromagnetic Patterns. Computational Intelligence and Neuroscience, Special Issue, EEG/MEG Analysis and Signal Processing. Volume 7, Number 3, pp. 1-13, 2007.
Han Qin, Dejing Dou and Paea LePendu 2007. Discovering Executable Semantic Mappings Between Ontologies. In Proceedings of International Conference on Ontologies,
Databases and Applications of SEmantics (ODBASE 2007). LNCS 4803, pp. 832-849.
Dejing Dou, Gwen Frishkoff, Jiawei Rong, Robert Frank, Allen Malony and Don Tucker 2007. Development of NeuroElectroMagnetic Ontologies (NEMO): A Framework for Mining Brainwave Ontologies. In Proceedings of 13th ACM International Conference on Knowledge Discovery and Data Mining (KDD'07). pp. 270-279. (A Candidate for Best Research Paper Award).
Jongwan Kim, Dejing Dou, Haishan Liu and Donghwi Kwak 2007. Constructing
A User Preference Ontology for Anti-spam Mail Systems. In
Proc. the 20th Canadian Conference on Artificial
Intelligence (Canadian AI'07). LNCS/LNAI 4509, pp. 272-283.
Jiawei Rong, Dejing Dou, Gwen Frishkoff, Robert Frank, Allen Malony and Don Tucker 2007. A Semi-automatic Framework for Mining ERP Patterns. In Proc. the 2007 IEEE International Symposium on Data Mining and Information Retrieval (IEEE DMIR-07), pp. 329-334.
Dejing Dou, Jun Li, Han Qin, Shiwoong Kim and Sheng Zhong 2007. Understanding and Utilizing the Hierarchy of Abnormal BGP Events. In Proc. SIAM International Conference on Data Mining 2007 (SDM 2007) (short paper). pp. 467-472.
Daya Wimalasuriya, Sridhar Ramachandran and Dejing Dou 2007. Clustering Zebrafish Genes Based on Frequent-Itemsets and Frequency Levels. In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining 2007 (PAKDD 2007) (short paper). LNCS 4426, pp. 912-920.
Dejing Dou, Jeff Z. Pan, Han Qin and Paea LePendu 2006. Towards Populating and Querying the Semantic Web. In Proc. 2nd Int'l workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006), co-located with ISWC 2006.
Dejing Dou and Drew McDermott 2006. Deriving Axioms Across Ontologies. In Proc. Int'l joint conference on Autonomous Agents and Multi-Agent Systems (AAMAS'06) (short paper). pp. 952-954. (We are invited to submit an extended version of this paper to the post-proceedings of DALT 2006.)
Dejing Dou, Paea LePendu, Shiwoong Kim and Peishen Qi 2006. Integrating Databases into the Semantic Web through an Ontology-based Framework. In Proc. 3rd Int'l workshop on Semantic Web and Databases (SWDB'06). pp. 54, co-located with ICDE 2006.
Dejing Dou and Paea LePendu 2005. Ontology-based Integration for Relational Databases. In Proc. ACM Symposium on Applied computing (SAC'06). pp. 461-466. (A preliminary short version appeared in ODBASE2005 as poster paper, LNCS 3762, pp. 35-36.)
Jun Li, Dejing Dou, Zhen Wu, Shiwoong Kim and Vikash Agarwal 2005. An Internet Routing Forensics Framework for Discovering Rules of Abnormal BGP Events. ACM Computer Communication Review . Volume 35, Number 5, pp. 58-66, October 2005.
Dejing Dou, Drew McDermott and Peishen Qi 2004. Ontology Translation on the Semantic Web. Journal on Data Semantics, Volume II, LNCS 3360, pp. 35-57. (invited submission)
send feedback to: dou AT cs.uoregon.edu