ENHANCING THE EFFECTIVENESS AND INTERPRETABILITY OF DECISION TREE AND RULE
US$53.15
10000 in stock
SupportDescription
ABSTRACT
Classification in imbalanced domains is a recent challenge in data mining. It refer to
imbalanced classification when data presents many examples from one class and few from
the other class, and the less representative class is the one which has more interest from the
point of view of the learning task. One of the most used techniques to tackle this problem
consists in preprocessing the data previously to the learning process. This preprocessing
could be done through under-sampling; removing examples, mainly belonging to the
majority class; and over-sampling, by means of replicating or generating new minority
examples. In this paper, we propose an under-sampling procedure guided by evolutionary
algorithms to perform a training set selection for enhancing the decision trees obtained by the
C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal
has been compared with other under-sampling and over-sampling techniques and the results
indicate that the new approach is very competitive in terms of accuracy when comparing
with over-sampling and it outperforms standard under-sampling. Moreover, the obtained
models are smaller in terms of number of leaves or rules generated and they can consider
more interpretabl.