16 CHAPTER 1. A COMMAND-LINE PRIMER
1.2.2 Classifier
Any learning algorithm in WEKA is derived from the abstract weka.classifiers.AbstractClassifier
class. This, in turn, implements weka.classifiers.Classifier. Surprisingly
little is needed for a basic classifier: a routine which generates a classifier model
from a training dataset (= buildClassifier) and another routine which eval-
uates the generated model on a n unseen test dataset (= classifyInstance), or
generates a probability distribution for all classes (= distributionForInstance).
A classifier model is an arbitrary complex mapping from all-but-one dataset
attributes to the class attribute. The specific form and creation of this map-
ping, or model, differs from classifier to clas sifier. For example, ZeroR’s (=
weka.classifiers.rules.ZeroR) model just consists of a single value: the
most common class, or the median of a ll numeric values in c ase of pr edicting a
numeric value (= regression learning). ZeroR is a trivial classifier, but it gives a
lower b ound on the performance of a given dataset which should be significantly
improved by more complex classifiers. As such it is a reasonable test on how
well the class can be predicted without cons idering the other attributes.
Later, we will explain how to interpret the output from classifiers in detail –
for now just focus on the Correctly Classified Instances in the section Stratified
cross-validation and notice how it improves from ZeroR to J48:
java weka.classifiers.rules.ZeroR -t weather.arff
java weka.classifiers.trees.J48 -t weather.arff
There are various approaches to determine the performance of classifiers. The
performance can most simply be measured by counting the propor tion of cor-
rectly predicted examples in a n unse e n test dataset. T his value is the accuracy,
which is also 1-ErrorRate. Both terms are used in literature.
The simplest case is using a training set and a test set which are mutually
independent. This is referred to as hold-out estimate. To estimate variance in
these performance estimates, hold-out estimates may be computed by repeatedly
resampling the same dataset – i.e . randomly reordering it and then splitting it
into training and test sets with a specific prop ortion of the examples, collecting
all estimates on test data and computing average and standard deviation of
accuracy.
A more elaborate method is cross-validation. Here, a number of folds n is
specified. The dataset is randomly reordered and then s plit into n folds of equal
size. In each iteration, one fold is used for testing and the other n-1 folds are
used for training the classifier. The test results are collected and averaged over
all folds. This gives the cross-validation estimate of the acc uracy. The folds can
be purely random or slightly modified to create the same class distributions in
each fo ld as in the complete dataset. In the latter c ase the cross-validation is
called stratified. Leave-one-out (= loo) cross-validation sig nifies that n is equal
to the number o f examples. Out of necessity, loo cv has to be no n-stratified,
i.e. the class distributions in the test set are not related to those in the training
data. Therefore loo cv tends to give less reliable results. However it is still
quite useful in dealing with small datasets since it utilizes the greatest amount
of training data from the dataset.