GitHub - nttrungmt/Rule-Extraction-in-Python

Rule Extraction Systems

Rule Extraction systems are a family of inductive learning techniques that build a predictive model based on a dataset. It uses separate and conquer to directly induce rules from a given training set and build a knowledge repository to aid in decision making.

We want to make a system that takes a dataset as input and returns a set of rules to differentiate the dataset into Rules that best split the dataset across the defined label using the descriptive features.

Lets say we input a dataset that has as features the sociodemographic variables of each data-point and we want to have a set of rules that tell us the sociodemographic makeup of people that are visited our homepage vs people that did not. Our Dataset looks like:

AgeBracket	IncomeBracket	Gender	Visited Home Page
3	1	F	1
2	2	M	0
2	3	F	0

We want to know which combination of features influences the propensity of a user to visit the homepage vs the alternative. This is very similar to training a decision tree classifier on the dataset. However there are specific differences in Decision Trees and Rule Extraction Classifiers. Let us define a few terms to better motivate our description:

Lets say our rule is $(\text{Gender} = F) \rightarrow \text{Visited Home Page} = 1 $

with LHS and RHS being $(\text{Gender} = F)$ and $\text{Visited Home Page} = 1 $ respectively. Now we can define the quality indicators of this rule:

Coverage: Fraction of records that satisfy the antecedent of a rule, or the LHS

$Coverage = \frac{LHS}{\text{Total number of samples}}$
Accuracy: Fraction of records covered by the rule when it holds

$Accuracy = \frac{\text{LHS } \cap \text{ RHS}}{LHS}$

For the above rule, Coverage = 2/3 = 66.7% and Accuracy = 1/2 = 50%

There are two ways in which we can extract rules from the dataset. One is to ask for a set of rules based on metrics of maximizing coverage and accuracy. The other is to already give a part of the rule, like gender==female and ask for complementary rules that best specialize the rule set.

Methodology:

Generate rule that maximizes the rules accuracy. Choosing the best attribute to split on, and choosing the best value to split the attribute on. Here the difference from a decision tree lies in the fact that a decision tree would try to maximize a node's purity.

Covering Algorithms	Decision Tree
Covering Approach where at each stage a rule a identified that covers some of the examples, which are then skipped while considering the next rules	Sequential Approach where we conduct a step-wise search for the best rules guided by evaluation measures
Flexibility in choosing for better coverage/accuracy or a combination of both	Rules become more complex as we move downwards in a tree
More readable than decision trees	A new set of rules requires reshaping the entire tree
Empirically proven to be better than decision trees	Rule obtained without decision trees are more compact and accurate
Easier to analyze and modify, do not have an order	Rules have an implied order in which the splitting is performed
More flexibility and more combinations of rules given that they all don't grow recursively out of a single root node

Covering Algorithms

Covering Algorithm	Rule Extraction Method
Prism
RIPPER
LEM2
AQ
CN2
MODLEM
1R	Learns one rule to split the dataset based on one single attribute, minimizes error rate

Methods for Sequential Learning One Rule algorithm

def Sequential Convering (K Class, A attributes, D data, T acceptance threshold):
    R = [] #previous set of rules
    r = learn_one_rule(Y Class, A attributes, D data)
    for i in range(0,n):
        if evaluate(r, D) > T:
            R.append(r)
            D = D[R == True]
            r = learn_one_rule(K Class, A attributes, D data)
   	return R

def R_1(C Classes, A Attributes, D Data):
    R = []
    for Att in A:
        R_a = []
      for v in Att:
          if isdigit(Att) == False: #Categorical features
            D_a_v = D[Att==v]
            Class_v = D[C].value_counts().index[0] #Take most occuring class
            R_a.append([Att=v->class=C_i])
          elif isdigit(Att) == True: #Continuous numerical features
              D_a_v = D[Att>=v]
              ...
      R_a =	get_lowest_error_value_split(R_a)
    R = get_lowest_error_attribute(R_a)
    return R

Prism Covering Algorithm

def get_rules(C Classes, A Attributes, D Data):
    for c_i in C:
        D_sub = D
        p = True
        R_ci = [if p then C=c_i]
        for each Att in A:
            accuracy = {}
            for v in Att:
                accuracy[(Att, v)] = (D_sub[Att==v, Classes = c_i])/D_sub[Att==v]
        accuracy_max_v = take max_acc key #break ties with size of D_sub[Att==v]
        R_ci.append(if Att_best=v_best then C=c_i)
        #repeat above with D[Att_best=v_best] to refine the rule
        #Remove all instances covered by R and do same process on remaining dataset 
     return R_c1, R_c2..

Decision Tree Algorithms

Flowcharts that aid in the decision making process. Different kinds of Decision Tree algorithms in practice which a top down, recursive, divide and conquer approach to decision tree induction:

ID3 (Iterative Dichotomiser)	C4.5	CART
Precursor in the C4.5

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Lung_Cancer.csv		Lung_Cancer.csv
Readme.md		Readme.md
Rule_Extraction_Algo.py		Rule_Extraction_Algo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rule Extraction Systems

Covering Algorithms

Methods for Sequential Learning One Rule algorithm

Prism Covering Algorithm

Decision Tree Algorithms

About

Uh oh!

Releases

Packages

Languages

nttrungmt/Rule-Extraction-in-Python

Folders and files

Latest commit

History

Repository files navigation

Rule Extraction Systems

Covering Algorithms

Methods for Sequential Learning One Rule algorithm

Prism Covering Algorithm

Decision Tree Algorithms

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages