(IJACSA) International Journal of Advanced Computer Science and Applications,
Special Issue on Advances in Vehicular Ad Hoc Networking and Applications
13 | P a g e
www.ijacsa.thesai.org
A comparative study of decision tree ID3 and C4.5
Badr HSSINA, Abdelkarim MERBOUHA,Hanane EZZIKOURI,Mohammed ERRITALI
TIAD laboratory, Computer Sciences Department, Faculty of sciences and techniques
Sultan Moulay Slimane University
Beni-Mellal, BP: 523, Morocco
Abstract—Data mining is the useful tool to discovering the
knowledge from large data. Different methods & algorithms
are available in data mining. Classification is most common
method used for finding the mine rule from the large database.
Decision tree method generally used for the Classification,
because it is the simple hierarchical structure for the user
understanding & decision making. Various data mining
algorithms available for classification based on Artificial
Neural Network, Nearest Neighbour Rule & Baysen classifiers
but decision tree mining is simple one. ID3 and C4.5
algorithms have been introduced by J.R Quinlan which produce
reasonable decision trees. The objective of this paper is to present
these algorithms. At first we present the classical algorithm that
is ID3, then highlights of this study we will discuss in more detail
C4.5 this one is a natural extension of the ID3 algorithm. And we
will make a comparison between these two algorithms and others
algorithms such as C5.0 and CART.
Keywords—Data mining; classification algorithm; decision
tree; ID3 algorithme; C4.5 algorithme
I. INTRODUCTION
The construction of decision trees from data is a
longstanding discipline. Statisticians attribute the paternity to
Sonquist and Morgan (1963) [4] who used regression trees in
the process of prediction and explanation (AID - Automatic
Interaction Detection). It was followed by a whole family of
method, extended to the problems of discrimination and
classification, which were based on the same paradigm of
representation trees (Thaid - Morgan and Messenger, 1973;
CHAID - Kass, 1980). It is generally considered that this
approach has culminated in the CART (Classification and
Regression Tree ) method of Breiman et al. (1984 ) described
in detail in a monograph refers today. [4]
In machine learning, most studies are based on information
theory. It is customary to quote the ID3 Quinlan method
(Induction of Decision Tree - Quinlan 1979), which itself
relates his work to that of Hunt (1962) [4]. Quinlan has been a
very active player in the second half of the 80s with a large
number of publications in which he proposes a heuristics to
improve the behavior of the system. His approach has made a
significant turning point in the 90s when he presented the C4.5
method which is the other essential reference when we want to
include decision trees (1993). There are many other changes
this algorithm, C5.0, but is implemented in a commercial
software.
Classification methods aim to identify the classes that
belong objects from some descriptive traits. They find utility in
a wide range of human activities and particularly in automated
decision making.
Decision trees are a very effective method of supervised
learning. It aims is the partition of a dataset into groups as
homogeneous as possible in terms of the variable to be
predicted. It takes as input a set of classified data, and outputs a
tree that resembles to an orientation diagram where each end
node (leaf) is a decision (a class) and each non- final node
(internal) represents a test. Each leaf represents the decision of
belonging to a class of data verifying all tests path from the
root to the leaf.
The tree is simpler, and technically it seems easy to use. In
fact, it is more interesting to get a tree that is adapted to the
probabilities of variables to be tested. Mostly balanced tree will
be a good result. If a sub-tree can only lead to a unique
solution, then all sub-tree can be reduced to the simple
conclusion, this simplifies the process and does not change the
final result. Ross Quinlan worked on this kind of decision trees.
II. INFORMATION THEORY
Theories of Shannon is at the base of the ID3 algorithm and
thus C4.5. Entropy Shannon is the best known and most
applied. It first defines the amount of information provided by
an event: the higher the probability of an event is low (it is
rare), the more information it provides is great. [2] (In the
following all logarithms are base2).
A. Shannon Entropy
In general, if we are given a probability distribution P = (p
1
,
p
2
,…, p
n
) and a sample S then the Information carried by this
distribution, also called the entropy of P is giving by:
B. The gain information G (p, T)
We have functions that allow us to measure the degree of
mixing of classes for all sample and therefore any position of
the tree in construction. It remains to define a function to select
the test that must label the current node.
It defines the gain for a test T and a position p
where values (p
j
) is the set of all possible values for
attribute T. We can use this measure to rank attributes and
build the decision tree where at each node is located the