MyDataMining blog has new face!

January 11, 2010

Hi, I have moved my blog to new website named mydatamine.com. It is a Compilation of the Web’s Best Resources for Data Mining geeks. The blog specialized on algorithms, data modeling, web mining, evolutionary computation, predictive analytic, business intelligence, data visualization, databases, datasets, software and tools. Have a look :)

Ridzuan Md


Data mining tasks

May 1, 2008

There are four major tasks in Data Mining:

  • Classification
  • Clustering
  • Association Rule Discovery
  • Sequential Pattern Discovery

Classification

In this task, data will be defined in terms of attributes, one of which is the class. It will find a model for class attribute as a function of the values of other (predictor) attributes, such that previously unseen records can be assigned a class as accurately as possible.


Fuzzy rules

April 16, 2008

I just get started on research in fuzzy rules which I believed can increase the understanding of mining complex rules from medical datasets. Here are some readings that I should begin with:

Fuzzy set

Fuzzy logic

Fuzzy system

Genetic fuzzy systems


Papers on evolutionary fuzzy rules:

Combining Evolutionary and Fuzzy..

Balancing accuracy and interpretability ..

Structure identification of fuzzy classifiers..

Learning fuzzy classification rules..

Hybrid evolutionary MO…


Rule learner (or Rule Induction)

April 14, 2008

It is also known as Separate-And-Conquer method. This method apply an iterative process consisting in first generating a rule that covers a subset of the training examples and then removing all examples covered by the rule from the training set. This process is repeated iteratively until there are no examples left to cover. The final rule set is the collection of the rules discovered at every iteration of the process [13]. Some examples of these kinds of systems are:

  • OneR

OneR or “One Rule” is a simple algorithm proposed by Holt. The OneR builds one rule for each attribute in the training data and then selects the rule with the smallest error rate as its ‘one rule’. To create a rule for an attribute, the most frequent class for each attribute value must be determined. The most frequent class is simply the class that appears most often for that attribute value. A rule is simply a set of attribute values bound to their majority class. OneR selects the rule with the lowest error rate. In the event that two or more rules have the same error rate, the rule is chosen at random.

R.C. Holte (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning. 11:63-91.

  • Ridor

Ridor algorithm is the implementation of a RIpple-DOwn Rule learner proposed by Gaines and Compton. It generates a default rule first and then the exceptions for the default rule with the least (weighted) error rate. Then it generates the “best” exceptions for each exception and iterates until pure. Thus it performs a tree-like expansion of exceptions. The exceptions are a set of rules that predict classes other than the default. IREP is used to generate the exceptions.

Brian R. Gaines, Paul Compton (1995). Induction of Ripple-Down Rules Applied to Modeling Large Databases. J. Intell. Inf. Syst.. 5(3):211-228.

  • PART

PART is a separate-and-conquer rule learner proposed by Eibe and Witten. The algorithm producing sets of rules called ‘decision lists’ which are ordered set of rules. A new data is compared to each rule in the list in turn, and the item is assigned the category of the first matching rule (a default is applied if no rule successfully matches). PART builds a partial C4.5 decision tree in each iteration and makes the “best” leaf into a rule. The algorithm is a combination of C4.5 and RIPPER rule learning.

Eibe Frank, Ian H. Witten: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, 144-151, 1998.

  • JRip (RIPPER)

JRip implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP. Ripper builds a ruleset by repeatedly adding rules to an empty ruleset until all positive examples are covered. Rules are formed by greedily adding conditions to the antecedent of a rule (starting with empty antecendent) until no negative examples are covered. After a ruleset is constructed, an optimization postpass massages the ruleset so as to reduce its size and improve its fit to the training data. A combination of cross-validation and minimum-description length techniques is used to prevent overfitting.

Cohen, W. W. 1995. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference, Lake Tahoe, California. http://citeseer.ist.psu.edu/cohen95fast.html

  • DecisionTable

DecisionTable algorithm builds and using a simple decision table majority classifier as proposed by Kohavi. It summarizes the dataset with a ‘decision table’ which contains the same number of attributes as the original dataset. Then, a new data item is assigned a category by finding the line in the decision table that matches the non-class values of the data item. DecisionTable employs the wrapper method to find a good subset of attributes for inclusion in the table. By eliminating attributes that contribute little or nothing to a model of the dataset, the algorithm reduces the likelihood of over-fitting and creates a smaller and condensed decision table.

Ron Kohavi: The Power of Decision Tables. In: 8th European Conference on Machine Learning, 174-189, 1995.

  • ConjunctiveRule

ConjuctiveRule algorithm implements a single conjunctive rule learner that can predict for numeric and nominal class labels. A rule consists of antecedents “AND”ed together and the consequent (class value) for the classification/regression. In this case, the consequent is the distribution of the available classes (or mean for a numeric value) in the dataset. If the test instance is not covered by this rule, then it’s predicted using the default class distributions/value of the data not covered by the rule in the training data. This learner selects an antecedent by computing the Information Gain of each antecedent and prunes the generated rule using Reduced Error Pruning (REP) or simple pre-pruning based on the number of antecedents. For classification, the Information of one antecedent is the weighted average of the entropies of both the data covered and not covered by the rule.

http://www.dbs.informatik.uni-muenchen.de/Lehre/KDD_Praktikum/weka/doc/weka/ classifiers/rules/ConjunctiveRule.html


Decision Trees

April 14, 2008

It is also known as Divide-And-Conquer method. This method constructs a rule by dividing overly general rules into a set of rules, which correspond to conjunction subsets of the examples. It then continues recursively with those rules for which the corresponding subsets contain both positive and negative examples. The final rule set consists of all specialized rules for which the corresponding sets contain positive examples only. Some examples of these systems are:

  • J48 (C4.5)

J48 algorithm is the Weka implementation of the C4.5 top-down decision tree learner proposed by Quinlan. The algorithm uses the greedy technique and is a variant of ID3, which determines at each step the most predictive attribute, and splits a node based on this attribute. Each node represents a decision point over the value of some attribute. J48 attempts to account for noise and missing data. It also deals with numeric attributes by determining where thresholds for decision splits should be placed. The main parameters that can be set for this algorithm are the confidence threshold, the minimum number of instances per leaf and the number of folds for reduced error pruning.

Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.

  • ADTree

Alternating decision trees (ADTree) algorithm is a generalization of decision trees, voted decision trees and voted decision stumps. The algorithm applied boosting procedures to decision tree algorithms to produce accurate classifiers. The classifiers are in the form of a majority vote over a number of decision trees but having a smaller and easier to understand classification rules.

  • DecisionStump

DecisionStump algorithm builds simple binary decision ‘stumps’ (1 level decision tress) for both numeric and nominal classification problems. It copes with mission values by extending a third branch from the stump or treating ‘missing’ as a separate attribute value. DecisionStump is usually used in conjunction with a boosting algorithm such as LogitBoost. It does regression (based on mean-squared error) or classification (based on entropy).

Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J. Weka: Practical machine learning tools and techniques with java implementations. In Proc. ICONIP/ANZIIS/ANNES’99 Int. Workshop: Emerging Knowledge Engineering and Connectionist-Based Info. Systems. (1999) 192-196

  • RandomTree

RandomTree is an algorithm for constructing a tree that considers K randomly chosen attributes at each node. It performs no pruning.

http://www.lsi.upc.es/~bejar/apren/docum/doc/weka/classifiers/trees/RandomTree.html

  • REPTree

REPTree algorithm is a fast decision tree learner. It builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with back-fitting). The algorithm only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (i.e. as in C4.5).

http://www.dbs.informatik.uni-muenchen.de/Lehre/KDD_Praktikum/weka/doc/weka/classifiers/trees/REPTree.html


Follow

Get every new post delivered to your Inbox.