DBMiner:A data mining tool for large relational databases |
|
LogicBase | DBMiner | GeoMiner | WebMiner |
---|
DBMiner, a data mining system for interactive mining of multiple-level
knowledge in large relational databases, has been developed based on our
years-of-research.
The system implements a wide spectrum of data mining functions,
including generalization, characterization, discrimination,
association, classification, and prediction.
By incorporation of several interesting data mining techniques,
including attribute-oriented induction, progressive deepening for mining
multiple-level rules, and meta-rule guided knowledge mining, the system
provides a user-friendly, interactive data mining environment with good
performance.
The system has the following distinct features:
Project Overview
A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational dtabases. It is based on our studies of data mining techniques and our experience in the development of an early system prototype, DBLearn. The system implements a wide spectrum of data mining functions, including generalization, characterization, association, classification, and prediction. By incorporation of several interesting data mining techniques, including attribute-oriented induction, statistical analysis, progressive deepening for mining multiple-level knowledge, and meta-rule guided mining, the system provides a user-friendly, interactive data mining environment with good performance.
Project Description
Figure: General architecture of DBMiner
It incorporates several interesting data mining techniques, including attribute-oriented induction, progressive deepening for mining
multiple-level rules and meta-rule guided knowledge mining, etc., and implements
a wide spectrum of data mining functions including generalization, characterization, association, classification, and prediction.
It performs interactive data mining at multiple concept levels on any user-specified set of data in a database using an SQL-like Data Mining Query Language, DMQL, or a graphical user interface. Users may interactively set and adjust various thresholds, control a data mining process, perform roll-up or drill-down at multiple concept levels, and generate different forms of outputs, including generalized relations, generalized feature tables, multiple forms of generalized rules, visual presentation of rules, charts, curves, etc.
Efficient implementation techniques have been explored using different data structures, including generalized relations and multiple-dimensional data cubes, and being integrated with relational database techniques. The data mining process may utilize user- or expert-defined set-grouping or schema-level concept hierarchies which can be specified flexibly, adjusted dynamically based on data distribution, and generated automatically for numerical attributes.
Both UNIX and PC (Windows/NT) versions of the system adopt a client/server architecture. The latter communicates with various commercial database systems for data mining using the ODBC technology.
Major functional modules:
Figure: Knowledge discovery modules of DBMiner
The characterizer generalizes a set of task-relevant data into a generalized relation which can then be viewed at multiple concept levels from different angles. In particular, it derives a set of characteristic rules which summarize the general characteristics of a set of user-specified data (called the target class). For example, the symptoms of a specific disease can be summarized by a characteristic rule. |
A discriminator discovers a set of discriminant rules which summarize the features that distinguish the class being examined (the target class) from other classes (called contrasting classes). For example, to distinguish one disease from others, a discriminant rule summarizes the symptoms that discriminate this disease from others. |
An association rule finder discovers a set of association rules (in the form of "") at multiple concept levels from the relevant set(s) of data in a database. For example, one may discover a set of symptoms frequently occurring together with certain kinds of diseases and further study the reasons behind them. |
A classifier analyzes a set of training data(i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A set of classification rules is generated by such a classification process, which can be used to classify future data and develop a better understanding of each class in the database. For example, one may classify diseases and provide the symptoms which describe each class or subclass of diseases. |
A predictor predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. This involves finding the set of attributes relevant to the attribute of interest (by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected object(s). For example, an employee's potential salary can be predicted based on the salary distribution of similar employees in the company. |
A meta-rule guided miner is a data mining mechanism which takes a user-specified meta-rule form, such as "" as a pattern to confine the search for desired rules. For example, one may specify the discovered rules to be in the form of "" in order to find the relatinships between a student's major and his/her gpa in a university database. |
A data evolution evaluator evaluates the data evolution regularities for certain objects whose behavior changes over time. This may include characterization, classification, association, or clustering of time-related data. For example, one may find the general characteristics of the companies whose stock price has gone up over 20% last year or evaluate the trend or particular growth patterns of certain stocks. |
A deviation evaluator evaluates the deviation patterns for a set of
task-relevant data in the database.
For example, one may discover and evaluate a set of stocks whose behavior
deviates from the trend of the majority of stocks during a certain period
of time.
The module contains the following three functions:
|
Three user interfaces, UNIX-based, Windows/NT-based, and WWW/netscape-based GUIs have been developed to allow users to interactively discover multiple-level knowledge in large relational databases, it integrates well with existing commercial database systems with high performance, and is robust at handling noise and exceptional data. |
The DBMiner system is currently being extended in several directions, as illustrated below.
Data Mining Research Project Team,
Database Systems Research Laboratory
Ph.D., University of Wisconsin-Madison, 1985.
Professor of the School of Computing Science and Director of Database
Systems Research Laboratory, Simon Fraser University, Canada.
He has conducted research in the areas of knowledge discovery in databases, deductive databases, object-oriented databases, spatial databases, multimedia databases, and logic programming, with over 100 journal and conference publications. He is known for his work on knowledge discovery in databases and has been invited to give talks or tutorials in several international conferences (including SSD'93, ICDE'95, CIKM'95, and SIGMOD'96), universities, and industry firms in many countries. His research has been supported by Natural Sciences and Engineering Research Council (NSERC) of Canada (1988--present), Network of Centres of Excellence of Canada (IRIS-2 Project Leader for the project ``data mining and knowledge discovery in large databases'', 1994--1998), Hughes Research Laboratories (1995-1996), B.C. Science Council, MPR Teltech Ltd., and some other funding agencies.
He has served as a program committee member for over 20 international conferences and workshops, including ICDE'95 (PC vice-chairman), DOOD'95, VLDB'96, SIGMOD'96, and SSD'97. He is currently the program committee co-chairman of the Second Int'l Conf. on Knowledge Discovery and Data Mining (KDD'96), the workshop co-organizer of the SIGMOD'96 workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), and the guest co-editor of the special issue on data mining and knowledge discovery for the IEEE Transactions on Knowledge and Data Engineering. He is also an editor for IEEE Transactions on Knowledge and Data Engineering, Journal of Intelligent Information Systems, and Journal of Data Mining and Knowledge Discovery.
The research has been supported by the following funding agencies and
industry.
LogicBase | DBMiner | GeoMiner | WebMiner |
---|
Return to Database Research Lab Page