-
Notifications
You must be signed in to change notification settings - Fork 9
6 User guided induction
RuleKit suite allows user-guided rule induction which follows the scheme introduced by the GuideR algorithm (Sikora et al, 2019).
The user's knowledge is specified by the following parameters:
<parameter_set name="paramset_1">
...
<param name="use_expert">true</param>
<param name="extend_using_preferred">...</param>
<param name="extend_using_automatic">...</param>
<param name="induce_using_preferred">...</param>
<param name="induce_using_automatic">...</param>
<param name="preferred_conditions_per_rule">...</param>
<param name="preferred_attributes_per_rule>...</param>
<param name="consider_other_classes">...</param>
<param name ="expert_rules">
<entry name="rule-0">...</entry>
<entry name="rule-1">...</entry>
...
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-condition-0">...</entry>
<entry name="preferred-condition-1">...</entry>
...
</param>
<param name ="expert_forbidden_conditions">
<entry name="forbidden-condition-0">...</entry>
<entry name="forbidden-condition-1">...</entry>
...
</param>
</parameter_set>
Parameter meaning (symbols from the GuideR paper are given in parentheses):
-
use_expert
- boolean indicating whether user's knowledge should be used, -
expert_rules
(R⊕) - set of initial rules, -
expert_preferred_conditions
(C⊕, A⊕) - multiset of preferred conditions (used also for specifying preferred attributes by using special valueAny
), -
expert_forbidden_conditions
(C⊖, A⊖) - set of forbidden conditions (used also for specifying forbidden attributes by using special valyeAny
), -
extend_using_preferred
(Σpref)/extend_using_automatic
(Σauto) - boolean indicating whether initial rules should be extended with a use of preferred/automatic conditions and attributes, -
induce_using_preferred
(Υpref)/induce_using_automatic
(Υauto) - boolean indicating whether new rules should be induced with a use of preferred/automatic conditions and attributes, -
preferred_conditions_per_rule
(KC)/preferred_attributes_per_rule
(KA) - maximum number of preferred conditions/attributes per rule, -
consider_other_classes
- boolean indicating whether automatic induction should be performed for classes for which no user's knowledge has been defined (classification only).
Let us consider the following user's knowledge (superscripts next to C⊕, A⊕, C⊖, and A⊖ symbols indicate class labels):
- R⊕ = { (IF gimpuls < 750 THEN class = 0), (IF gimpuls >= 750 THEN class = 1)},
- C⊕0 = { (seismic = a) },
- C⊕1 = { (seismic = b ∧ seismoacoustic = c)5 },
- A⊕1 = { gimpulsinf },
- C⊖0 = { seismoacoustic = b },
- A⊖1 = { ghazard }.
The XML definition of this knowledge is presented below.
<param name ="expert_rules">
<entry name="rule-1">IF [[gimpuls = (-inf, 750)]] THEN class = {0}</entry>
<entry name="rule-2">IF [[gimpuls = <750, inf)]] THEN class = {1}</entry>
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-condition-1">1: IF [[seismic = {a}]] THEN class = {0}</entry>
<entry name="preferred-condition-2">5: IF [[seismic = {b} AND seismoacoustic = {c}]] THEN class = {1}</entry>
<entry name="preferred-attribute-1">inf: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
<entry name="forbidden-condition-1">IF [[seismoacoustic = b]] THEN class = {0}</entry>
<entry name="forbidden-attribute-1">IF [[ghazard = Any]] THEN class = {1}</entry>
</param>
Please note several remarks:
- Inifinity is represented as
inf
string (rule-1
,preferred-attribute-1
). - Conditions based on continuous attributes are represented as intervals. Left-closed intervals are specified using
<
symbol as<
is reserved by XML syntax (rule-2
). - Multiplicity is specified before multiset element (
preferred-condition-1
andpreferred-condition-2
), - Preferred/forbidden attributes are defined as conditions with special value
Any
(preferred-attribute-1
,forbidden-attribute-1
).
User's guided induction may also be executed from RapidMiner plugin and R package. In the former case, convinent wizards are provided for specifying expert rules, preferred conditions/attibutes, and forbidden conditions/attributes (Figure 6.1). However, the traditional, parameter-based method of defining expert's knowledge may also be used.
Figure 6.1. RapidMiner wizard for specifying user's rules, preferred conditions/attributes, and forbidden conditions/attributes. |
The datasets investigated in GuideR study are:
- classification: seismic-bumps - forecasting high energy seismic bumps in coal mines,
- regression: methane - predicting methane concentration in a coal mine,
- survival analysis: bmt - analyzing factors contributing to the patients’ survival following bone marrow transplants.
In the following subsections we present all examined guided-induction scenarios with relevant XML parameters. The entire XML experimental files for test cases discussed in the GuideR paper can be found here.
guided-c1 The model consists of two initial rules:
- IF gimpuls < 750 THEN class = 0
- IF gimpuls >= 750 THEN class = 1
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
<entry name="rule-0">IF [[gimpuls = (-inf, 750)]] THEN class = {0}</entry>
<entry name="rule-1">IF [[gimpuls = <750, inf)]] THEN class = {1}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-c2 Attribute gimpuls is used in rules for both classes at least once:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-attribute-0">1: IF [[gimpuls = Any]] THEN class = {0}</entry>
<entry name="preferred-attribute-1">1: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-c3 Every rule contains at least two out of gimpuls, genergy, and senergy attributes:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="preferred_attributes_per_rule">2</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-attribute-0">inf: IF [[genergy = Any]] THEN class = {0}</entry>
<entry name="preferred-attribute-1">inf: IF [[senergy = Any]] THEN class = {0}</entry>
<entry name="preferred-attribute-2">inf: IF [[gimpuls = Any]] THEN class = {0}</entry>
<entry name="preferred-attribute-3">inf: IF [[genergy = Any]] THEN class = {1}</entry>
<entry name="preferred-attribute-4">inf: IF [[senergy = Any]] THEN class = {1}</entry>
<entry name="preferred-attribute-5">inf: IF [[gimpuls = Any]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-c4 At least one of seismic, seismoacoustic, and ghazard attributes is used in each rule, with an additional requirement on value sets - class 0 may use values a, b, class 1 may use values b, c, d:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="consider_other_classes">false</param>
<param name="preferred_conditions_per_rule">1</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-condition-01">inf: IF [[seismic = {a}]] THEN class = {0}</entry>
<entry name="preferred-condition-02">inf: IF [[seismic = {b}]] THEN class = {0}</entry>
<entry name="preferred-condition-03">inf: IF [[seismoacoustic = {a}]] THEN class = {0}</entry>
<entry name="preferred-condition-04">inf: IF [[seismoacoustic = {b}]] THEN class = {0}</entry>
<entry name="preferred-condition-05">inf: IF [[ghazard = {a}]] THEN class = {0}</entry>
<entry name="preferred-condition-06">inf: IF [[ghazard = {b}]] THEN class = {0}</entry>
<entry name="preferred-condition-11">inf: IF [[seismic = {b}]] THEN class = {1}</entry>
<entry name="preferred-condition-12">inf: IF [[seismic = {c}]] THEN class = {1}</entry>
<entry name="preferred-condition-13">inf: IF [[seismic = {d}]] THEN class = {1}</entry>
<entry name="preferred-condition-14">inf: IF [[seismoacoustic = {b}]] THEN class = {1}</entry>
<entry name="preferred-condition-15">inf: IF [[seismoacoustic = {c}]] THEN class = {1}</entry>
<entry name="preferred-condition-16">inf: IF [[seismoacoustic = {d}]] THEN class = {1}</entry>
<entry name="preferred-condition-17">inf: IF [[ghazard = {b}]] THEN class = {1}</entry>
<entry name="preferred-condition-18">inf: IF [[ghazard = {c}]] THEN class = {1}</entry>
<entry name="preferred-condition-19">inf: IF [[ghazard = {d}]] THEN class = {1}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-c5 Attributes gimpuls, goimpuls, ghazard, and seismoacoustic are forbidden:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">true</param>
<param name="consider_other_classes">false</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
<entry name="forb-attribute-00">1: IF [[seismoacoustic = Any]] THEN class = {0}</entry>
<entry name="forb-attribute-01">1: IF [[gimpuls = Any]] THEN class = {0}</entry>
<entry name="forb-attribute-02">1: IF [[goimpuls = Any]] THEN class = {0}</entry>
<entry name="forb-attribute-03">1: IF [[ghazard = Any]] THEN class = {0}</entry>
<entry name="forb-attribute-10">1: IF [[seismoacoustic = Any]] THEN class = {1}</entry>
<entry name="forb-attribute-11">1: IF [[gimpuls = Any]] THEN class = {1}</entry>
<entry name="forb-attribute-12">1: IF [[goimpuls = Any]] THEN class = {1}</entry>
<entry name="forb-attribute-13">1: IF [[ghazard = Any]] THEN class = {1}</entry>
</param>
guided-c6 Attributes from nbumps family as well as senergy, maxenergy, and seismic are forbidden: analogous to guided-c5.
guided-r1 The model contains PD = 0 and PD = 1 conditions, both appearing in three rules:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-condition-0">3: IF PD = <0.5, inf) THEN MM116_pred = {NaN}</entry>
<entry name="preferred-condition-1">3: IF PD = (-inf, 0.5) THEN MM116_pred = {NaN}</entry>
</param>
guided-r2 The conjunction PD = 1 AND MM116 < 1 appears in five rules:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-condition-0">5: IF PD = <0.5, inf) AND MM116 = (-inf, 1.0) THEN MM116_pred = {NaN}</entry>
</param>
guided-r3 The conjunction PD = 0 AND MM116 > 1 appears in five rules: analogous to guided-r2.
guided-r4 Attributes DMM116, MM116, and PD appear in every rule:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="preferred-attribute-0">inf: IF PD = Any THEN MM116_pred = {NaN}</entry>
<entry name="preferred-attribute-1">inf: IF MM116 = Any THEN MM116_pred = {NaN}</entry>
<entry name="preferred-attribute-2">inf: IF DMM116 = Any THEN MM116_pred = {NaN}</entry>
</param>
guided-s1 Every rule contains CD34 and does not contain ANCRecovery and PLTRecovery attributes:
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">true</param>
<param name="induce_using_automatic">true</param>
<param name="preferred_attributes_per_rule">1</param>
<param name ="expert_rules">
</param>
<param name ="expert_preferred_conditions">
<entry name="attr-preferred-0">inf: IF [CD34kgx10d6 = Any] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_forbidden_conditions">
<entry name="condition-forbidden-0">IF ANCrecovery = Any THEN survival_status = {NaN}</entry>
<entry name="condition-forbidden-1">IF PLTrecovery = Any THEN survival_status = {NaN}</entry>
</param>
guided-s2 The model consists of four initial rules:
- IF extcGvHD = No AND CD34 < 10 THEN ...
- IF extcGvHD = No AND CD34 >= 10 THEN ...
- IF extcGvHD = Yes AND CD34 < 10 THEN ...
- IF extcGvHD = Yes AND CD34 >= 10 THEN ...
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
<entry name="rule-0">IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
<entry name="rule-1">IF [[extcGvHD = {0}]] AND [[CD34kgx10d6 = <10.0, inf)]] THEN survival_status = {NaN}</entry>
<entry name="rule-2">IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
<entry name="rule-3">IF [[CD34kgx10d6 = <10.0, inf)]] AND [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-s3 Similarly as in the previous case, but CD34 ranges may be altered and rules can be extended with automatic conditions:
<param name="use_expert">true</param>
<param name="extend_using_preferred">true</param>
<param name="extend_using_automatic">true</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name="preferred_attributes_per_rule">1</param>
<param name ="expert_rules">
<entry name="rule-0">IF [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
<entry name="rule-1">IF [[extcGvHD = {0}]] THEN survival_status = {NaN}</entry>
<entry name="rule-2">IF [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
<entry name="rule-3">IF [[extcGvHD = {1}]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
<entry name="attr-0">4: IF [CD34kgx10d6 = Any] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_forbidden_conditions">
</param>
guided-s4 The model consists of two initial rules:
- IF CD34 < 10 THEN ...
- IF CD34 >= 10 THEN ...
<param name="use_expert">true</param>
<param name="extend_using_preferred">false</param>
<param name="extend_using_automatic">false</param>
<param name="induce_using_preferred">false</param>
<param name="induce_using_automatic">false</param>
<param name ="expert_rules">
<entry name="rule-0">IF [[CD34kgx10d6 = (-inf, 10.0)]] THEN survival_status = {NaN}</entry>
<entry name="rule-1">IF [[CD34kgx10d6 = <10.0, inf)]] THEN survival_status = {NaN}</entry>
</param>
<param name ="expert_preferred_conditions">
</param>
<param name ="expert_forbidden_conditions">
</param>