Can under-exploited structure of original-classes help ECOC-based
multi-class classification?
Yunyun Wang
a
, Songcan Chen
a,
n
, Hui Xue
b
a
School of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing 210016 , PR China
b
School of Computer Science and Engineering, Southeast University, Nanjing 210096, PR China
article info
Article history:
Received 13 July 2011
Received in revised form
29 November 2011
Accepted 26 February 2012
Communicated by Weifeng Liu
Available online 24 March 2012
Keywords:
Multi-class classification
Error correcting output codes
Support vector machine
Cluster assumption
Manifold assumption
abstract
Error correcting output code s (ECOC) is a popular framework for addressing multi-class classification
problems by combing multiple binary sub-problems. In each binary sub-problem, at least one class is
actually a ‘‘meta-class’’ consisting of multiple original classes, and treated as a single class in the
learning process. This strategy brings a simple and common implementation of multi-class classifica-
tion, but simultaneously, results in the under-exploitation of already-provided structure knowledge in
individual original classes. In this paper, we present a new methodology to show that the utilization of
such prior structure knowledge can further strengthen the performance of ECOCs, and the structure
knowledge is formulated under the cluster and manifold assumptions, respectively. Finally, we validate
our methodology on both toy and real benchmark datasets (UCI, face recognition and objective
category), consequently validate the structure knowledge of individual original classes for ECOC-based
multi-class classification.
& 2012 Elsevier B.V. All rights reserved.
1. Introduction
In real applications, we frequently encounter problems invol-
ving multi-class classification, in which observed data belong to
more than two classes [1,2]. Examples for such applications
include optical character recognition, text classification and
medical analysis, etc..
There are mainly two independent lines of researches for
designing multi-class classification methods. One line is ‘‘direct
design’’, i.e., directly designing a multi-class classifier by adopting
multi-class output encodings, typically including decision tree,
neural network, logistic regression [3], least-squares classifier,
and multi-class SVMs [4–6]. The other line is ‘‘(indirect) decom-
position or ECOC design’’, i.e., decomposing the original multi-
class problems into multiple binary sub-problems, which can be
efficiently solved by any binary classification method [7–9], and
then combining the results from all binary sub-classifiers for final
classification. This strategy is simple and common, thus has
brought an independent and broad area of researches. In this
paper, we focus the second line.
The simplest decomposition strategy is One-Vs-All (OVA), in
which each class is compared with all other ones, generating C
binary sub-problems (or corresponding binary classifiers), where
C is the number of classes. A new instance is then assigned to the
class with the maximum classification score among all corre-
sponding binary classifiers. Friedman [10] suggested the One-Vs-
One (OVO) strategy, in which all pairs of classes are compared,
resulting in C(C1)/2 binary sub-problems, and the prediction for
a new instance is implemented by voting of all corresponding
binary classifiers. Dietterich et al. [11] developed the general
(binary) error correcting output codes (ECOC) framework, in
which each class is given a N-length error correcting output
codeword with each component valued from {1, þ 1}, and those
codewords for individual classes have the optimal separation
between each other. Arranging those codewords as rows, a
C N-size codeword matrix is constructed, whose individual
columns indicate the class-set partitions for the N generated
binary sub-problems, respectively. For a new instance, a N-length
code can be obtained from the corresponding binary classifiers,
and the instance is classified to the ‘‘closest’’ class measured by
Hamming distance between the instance code and individual
class codewords. Allwein et al. [7] extended the ECOC framework
and developed ternary ECOC, in which each component in the
codeword matrix is allowed to take values from { 1, þ1, 0}, and
the zero-value indicates that the corresponding class is not
considered in the current binary sub-problem. The prediction of
any new instance adopts a loss-based function instead of the
original Hamming distance. It is ternary ECOC that covers OVA,
OVO and ECOC in a unified framework.
Later, new improvements have been developed for ECOC and
focus on both the designs of its encoding (w.r.t. the construction
Contents lists available at SciVerse ScienceDirect
journal ho mepage: www.elsevier.com/locate/neucom
Neurocomputing
0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2012.02.035
n
Corresponding author. Tel.: þ86 25 848 96481; fax: þ 86 25 844 98 069.
E-mail addresses: wangyunyun@nuaa.edu.cn (Y. Wang),
s.chen@nuaa.edu.cn (S. Chen), hxue@seu.edu.cn (H. Xue).
Neurocomputing 89 (2012) 158–167