DBMiner Project

LOGO
DBMiner:
A data mining tool for large relational databases

Project Overview

Major functional modules

Implementation of DBMiner

Further Development of DBMiner

Research Project Team

Research Funding

Publications

Try DBMiner...

MiMo our DBMiner mascot
Back to Database Lab Page

LogicBase DBMiner GeoMiner WebMiner

DBMiner, a data mining system for interactive mining of multiple-level knowledge in large relational databases, has been developed based on our years-of-research. The system implements a wide spectrum of data mining functions, including generalization, characterization, discrimination, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining multiple-level rules, and meta-rule guided knowledge mining, the system provides a user-friendly, interactive data mining environment with good performance.

Project Overview

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational dtabases. It is based on our studies of data mining techniques and our experience in the development of an early system prototype, DBLearn. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, statistical analysis, progressive deepening for mining multiple-level knowledge, and meta-rule guided mining, the system provides a user-friendly, interactive data mining environment with good performance.

Project Description

Figure: General architecture of DBMiner

The system has the following distinct features:

It incorporates several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining multiple-level rules and meta-rule guided knowledge mining, etc., and implements a wide spectrum of data mining functions including generalization, characterization, association, classification, and prediction.
It performs interactive data mining at multiple concept levels on any user-specified set of data in a database using an SQL-like Data Mining Query Language, DMQL, or a graphical user interface. Users may interactively set and adjust various thresholds, control a data mining process, perform roll-up or drill-down at multiple concept levels, and generate different forms of outputs, including generalized relations, generalized feature tables, multiple forms of generalized rules, visual presentation of rules, charts, curves, etc.
Efficient implementation techniques have been explored using different data structures, including generalized relations and multiple-dimensional data cubes, and being integrated with relational database techniques. The data mining process may utilize user- or expert-defined set-grouping or schema-level concept hierarchies which can be specified flexibly, adjusted dynamically based on data distribution, and generated automatically for numerical attributes.
Both UNIX and PC (Windows/NT) versions of the system adopt a client/server architecture. The latter communicates with various commercial database systems for data mining using the ODBC technology.

Major functional modules:

Figure: Knowledge discovery modules of DBMiner
DBMiner characterizer

The characterizer generalizes a set of task-relevant data into a generalized relation which can then be viewed at multiple concept levels from different angles. In particular, it derives a set of characteristic rules which summarize the general characteristics of a set of user-specified data (called the target class). For example, the symptoms of a specific disease can be summarized by a characteristic rule.

DBMiner discriminator

A discriminator discovers a set of discriminant rules which summarize the features that distinguish the class being examined (the target class) from other classes (called contrasting classes). For example, to distinguish one disease from others, a discriminant rule summarizes the symptoms that discriminate this disease from others.

DBMiner association rule finder

An association rule finder discovers a set of association rules (in the form of "

") at multiple concept levels from the relevant set(s) of data in a database. For example, one may discover a set of symptoms frequently occurring together with certain kinds of diseases and further study the reasons behind them.

DBMiner data classifier

A classifier analyzes a set of training data(i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A set of classification rules is generated by such a classification process, which can be used to classify future data and develop a better understanding of each class in the database. For example, one may classify diseases and provide the symptoms which describe each class or subclass of diseases.

DBMiner predictor

A predictor predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. This involves finding the set of attributes relevant to the attribute of interest (by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected object(s). For example, an employee's potential salary can be predicted based on the salary distribution of similar employees in the company.

DBMiner meta-rule guided miner

A meta-rule guided miner is a data mining mechanism which takes a user-specified meta-rule form, such as "

" as a pattern to confine the search for desired rules. For example, one may specify the discovered rules to be in the form of "" in order to find the relatinships between a student's major and his/her gpa in a university database.

DBMiner evolution evaluator

A data evolution evaluator evaluates the data evolution regularities for certain objects whose behavior changes over time. This may include characterization, classification, association, or clustering of time-related data. For example, one may find the general characteristics of the companies whose stock price has gone up over 20% last year or evaluate the trend or particular growth patterns of certain stocks.

DBMiner deviation evaluator

A deviation evaluator evaluates the deviation patterns for a set of task-relevant data in the database. For example, one may discover and evaluate a set of stocks whose behavior deviates from the trend of the majority of stocks during a certain period of time. The module contains the following three functions:

recognizes or identifies the general trend and/or behavior for data in the database,
detects the set of data which deviates from such a trend or behavior, and
summarizes the general characteristics of deviation data.

DBMiner user interfaces

Three user interfaces, UNIX-based, Windows/NT-based, and WWW/netscape-based GUIs have been developed to allow users to interactively discover multiple-level knowledge in large relational databases, it integrates well with existing commercial database systems with high performance, and is robust at handling noise and exceptional data.

Implementation of DBMiner

Refer to the KDD publication of DBLab, SFU.

Further Development of DBMiner

The DBMiner system is currently being extended in several directions, as illustrated below.

Further enhancement of the power and efficiency of data mining in relational database systems, including the improvement of system performance and rule discovery quality for the existing functional modules, and the development of techniques for mining new kinds of rules, especially on time-related data.
Integration, maintenance and application of discovered knowledge, including incremental update of discovered rules, removal of redundant or less interesting rules, merging of discovered rules into a knowledge-base, intelligent query answering using discovered knowledge, and the construction of multiple layered databases.
Extension of data mining technique towards advanced and/or special purpose database systems, including extended-relational, object-oriented, text, spatial, temporal, and heterogeneous databases. Currently, two such data mining systems, GeoMiner and WebMiner, for mining knowledge in spatial databases and the Internet information-base respectively, are being under design and construction.

Data Mining Research Project Team,
Database Systems Research Laboratory

Jiawei Han.
Ph.D., University of Wisconsin-Madison, 1985.
Professor of the School of Computing Science and Director of Database Systems Research Laboratory, Simon Fraser University, Canada.

He has conducted research in the areas of knowledge discovery in databases, deductive databases, object-oriented databases, spatial databases, multimedia databases, and logic programming, with over 100 journal and conference publications. He is known for his work on knowledge discovery in databases and has been invited to give talks or tutorials in several international conferences (including SSD'93, ICDE'95, CIKM'95, and SIGMOD'96), universities, and industry firms in many countries. His research has been supported by Natural Sciences and Engineering Research Council (NSERC) of Canada (1988--present), Network of Centres of Excellence of Canada (IRIS-2 Project Leader for the project ``data mining and knowledge discovery in large databases'', 1994--1998), Hughes Research Laboratories (1995-1996), B.C. Science Council, MPR Teltech Ltd., and some other funding agencies.
He has served as a program committee member for over 20 international conferences and workshops, including ICDE'95 (PC vice-chairman), DOOD'95, VLDB'96, SIGMOD'96, and SSD'97. He is currently the program committee co-chairman of the Second Int'l Conf. on Knowledge Discovery and Data Mining (KDD'96), the workshop co-organizer of the SIGMOD'96 workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), and the guest co-editor of the special issue on data mining and knowledge discovery for the IEEE Transactions on Knowledge and Data Engineering. He is also an editor for IEEE Transactions on Knowledge and Data Engineering, Journal of Intelligent Information Systems, and Journal of Data Mining and Knowledge Discovery.
Sonny Chee. ( NSERC postgraduate scholarship holder)
Ph.D. student, Computing Science, Simon Fraser University.
He plans to work on the development of the new modules for mining unstructured data in the DBMiner system.
Shan Chen.
M.Sc. student, Computing Science, Simon Fraser University. She has been working on the application of statistical techniques in data mining, and, in particular, the DBMiner deviation evaluator.
Jenny Chiang.( NSERC postgraduate scholarship holder).
M.Sc. student, Computing Science, Simon Fraser University.
She is working on the cube (multiple-dimensional database)-based DBMiner and the performance improvements of the DBMiner system.
Yongjian Fu. ( BC Science Council GREAT award scholarship holder, CATA'95 (Canadian Advance Technology Association) scholarhsip awardee).
Ph.D. student, Computing Science, Simon Fraser University.
He is the major implementor of DBMiner version 1.0 and is currently working on multiple-level mining of association rules and meta-rule guided data mining.
Wan Gong.
M.Sc. student, Computing Science, Simon Fraser University.
She is working on the classifier of the DBMiner system.
Micheline Kamber. ( NSERC postgraduate scholarship awardee)
Ph.D. student, Computing Science, Simon Fraser University. She has been working on interestingness measurements for discovered rules and is planning to work on meta-rule guided mining of different kinds of rules.
Krzysztof Koperski.
Ph.D. student, Computing Science, Simon Fraser University.
He is working on spatial data mining and the GeoMiner project. He is also interested in spatial reasoning and spatial object-oriented databases. He has also been working on the testing of the DBMiner System.
Deyi Li.
Visiting Professor, Computing Science, Simon Fraser University. Ph.D. in Computing Science (1983), University of Edinburgh, U.K. Author of a few scientific books including `` A Prolog Database System'' and `` A Fuzzy Prolog Database System''.
His major research interests include database and knowledge-base systems, knowledge discovery in databases, deductive databases, logic programming, and artificial intelligence.
Yijun Lu.
M.Sc. student, Computing Science, Simon Fraser University. He holds an M.Sc. degree, Mathematics and Statistics, Simon Fraser University.
He is working on the concept hierarchy: generation, specification and adjustment, of the DBMiner system.
Amynmohamed Rajan.
M.Sc. student, Computing Science, Simon Fraser University.
He is working on the development of the user-interfaces in the DBMiner system.
Nebojsa Stefanovic.
M.Sc. student, Computing Science, Simon Fraser University.
He is working on data mining in spatial database systems and the GeoMiner project. He has also been doing the association and classification visualization for the DBMiner System.
Wei Wang.
M.Sc. student, Computing Science, Simon Fraser University.
He has been implementing the GUI interface for the DBMiner system on PCs and is working on predictor of the DBMiner system.
Lara Winstone. ( NSERC postgraduate scholarship holder)
M.Sc. student, Computing Science, Simon Fraser University.
She plans to work on the development of the classification techniques in the DBMiner system.
Betty Xia.
M.Sc. student, Computing Science, Simon Fraser University.
She holds an M.Sc. degree, Computer Science, Jilin University, China. She is working on the development of PC interface of the DBMiner system.
Osmar R. Zaïane. ( Quebec postgraduate scholarship holder)
Ph.D. student, Computing Science, Simon Fraser University.
His current research focuses on resource and knowledge discovery in global network information systems (inter/intranet), and the WebMiner project.
He developed WWW/Netscape-based DBMiner User Interface.
He is also working in the TeleLearning project designing and implementing a multimedia database.

Research Funding

The research has been supported by the following funding agencies and industry.

Natural Sciences and Engineering Research Council of Canada (NSERC)

Networks of Centres of Excellence of Canada (IRIS-II:HMI-5, IC-2), administered by PRECARN Associates, Inc.
British Columbia Science Council.
MPR Teltech Ltd.
Hughes Research Laboratories, USA.
Centre for Systems Science, Simon Fraser University.

Selected Publications

LogicBase DBMiner GeoMiner WebMiner

Return to Database Research Lab Page

Last updated: June 11, 1996. Page maintained by Osmar R. Zaïane (zaiane@cs.sfu.ca)

DBMiner: