挖掘原始类别结构提升ECOC多类分类性能

43 浏览量更新于2024-08-26 收藏 768KB PDF 举报

本文主要探讨的是在多类分类问题的处理中，是否存在一种可能，即通过充分利用原始类别结构，能够提升基于纠错输出代码（Error Correcting Output Codes, ECOC）的性能。ECOC是一种广泛应用于多类分类的框架，它通过组合多个二进制子问题来简化问题，其中每个子问题涉及至少一个由多个原始类别组成的元类。这种设计虽然使得多类分类变得直观且通用，但它可能导致对原始类别内部结构的忽视。作者提出了一种新方法，旨在探索如何在学习过程中利用这些被低估的原始类别结构，特别是在两种假设——聚类假设（Cluster Assumption）和多方面假设（Manifold Assumption）的背景下。聚类假设假设数据点在高维空间中通常彼此聚集在一起，而多方面假设则假定邻近的数据点在低维嵌入空间中具有相似的属性。通过考虑这些假设，研究者尝试发掘原始类别之间的潜在关系，以便更好地组织和利用它们。新方法的主要目的是通过重新定义或增强子问题的构建，将原始类别的结构融入到ECOC模型中，这可能包括使用聚类算法对原始类别进行分组，或者利用低维嵌入来捕捉类别间的局部相关性。这种方法旨在减少过度简化的元类概念，同时保留原始类别间的复杂关系，从而提高分类准确性和泛化能力。为了验证这一方法的有效性，作者在实验部分选择了多个数据集，包括UCI标准数据集、面部识别数据以及目标类别数据。这些数据集的多样性有助于评估新方法在不同情境下的表现。通过对比实验结果，研究人员展示了如何利用结构知识改进ECOC模型，以及这如何优于传统的不考虑原始类别结构的ECOC方法。总结来说，本文的核心贡献在于提出了一个理论框架，强调了在ECOC多类分类中利用原始类别结构的重要性，并通过实证研究证明了这种方法在提升分类性能方面的潜力。这对于那些依赖于ECOC技术的多类问题解决者，特别是机器学习和计算机视觉领域的研究者来说，提供了一个有前景的方向，即如何在处理复杂类别关系时更充分地利用现有数据结构。

Can under-exploited structure of original-classes help ECOC-based

multi-class classiﬁcation?

Yunyun Wang

, Songcan Chen

, Hui Xue

School of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing 210016 , PR China

School of Computer Science and Engineering, Southeast University, Nanjing 210096, PR China

article info

Article history:

Received 13 July 2011

Received in revised form

29 November 2011

Accepted 26 February 2012

Communicated by Weifeng Liu

Available online 24 March 2012

Keywords:

Multi-class classiﬁcation

Error correcting output codes

Support vector machine

Cluster assumption

Manifold assumption

abstract

Error correcting output code s (ECOC) is a popular framework for addressing multi-class classiﬁcation

problems by combing multiple binary sub-problems. In each binary sub-problem, at least one class is

actually a ‘‘meta-class’’ consisting of multiple original classes, and treated as a single class in the

learning process. This strategy brings a simple and common implementation of multi-class classiﬁca-

tion, but simultaneously, results in the under-exploitation of already-provided structure knowledge in

individual original classes. In this paper, we present a new methodology to show that the utilization of

such prior structure knowledge can further strengthen the performance of ECOCs, and the structure

knowledge is formulated under the cluster and manifold assumptions, respectively. Finally, we validate

our methodology on both toy and real benchmark datasets (UCI, face recognition and objective

category), consequently validate the structure knowledge of individual original classes for ECOC-based

multi-class classiﬁcation.

1. Introduction

In real applications, we frequently encounter problems invol-

ving multi-class classiﬁcation, in which observed data belong to

more than two classes [1,2]. Examples for such applications

include optical character recognition, text classiﬁcation and

medical analysis, etc..

There are mainly two independent lines of researches for

designing multi-class classiﬁcation methods. One line is ‘‘direct

design’’, i.e., directly designing a multi-class classiﬁer by adopting

multi-class output encodings, typically including decision tree,

neural network, logistic regression [3], least-squares classiﬁer,

and multi-class SVMs [4–6]. The other line is ‘‘(indirect) decom-

position or ECOC design’’, i.e., decomposing the original multi-

class problems into multiple binary sub-problems, which can be

efﬁciently solved by any binary classiﬁcation method [7–9], and

then combining the results from all binary sub-classiﬁers for ﬁnal

classiﬁcation. This strategy is simple and common, thus has

brought an independent and broad area of researches. In this

paper, we focus the second line.

The simplest decomposition strategy is One-Vs-All (OVA), in

which each class is compared with all other ones, generating C

binary sub-problems (or corresponding binary classiﬁers), where

C is the number of classes. A new instance is then assigned to the

class with the maximum classiﬁcation score among all corre-

sponding binary classiﬁers. Friedman [10] suggested the One-Vs-

One (OVO) strategy, in which all pairs of classes are compared,

resulting in C(C1)/2 binary sub-problems, and the prediction for

a new instance is implemented by voting of all corresponding

binary classiﬁers. Dietterich et al. [11] developed the general

(binary) error correcting output codes (ECOC) framework, in

which each class is given a N-length error correcting output

codeword with each component valued from {1, þ 1}, and those

codewords for individual classes have the optimal separation

between each other. Arranging those codewords as rows, a

C  N-size codeword matrix is constructed, whose individual

columns indicate the class-set partitions for the N generated

binary sub-problems, respectively. For a new instance, a N-length

code can be obtained from the corresponding binary classiﬁers,

and the instance is classiﬁed to the ‘‘closest’’ class measured by

Hamming distance between the instance code and individual

class codewords. Allwein et al. [7] extended the ECOC framework

and developed ternary ECOC, in which each component in the

codeword matrix is allowed to take values from { 1, þ1, 0}, and

the zero-value indicates that the corresponding class is not

considered in the current binary sub-problem. The prediction of

any new instance adopts a loss-based function instead of the

original Hamming distance. It is ternary ECOC that covers OVA,

OVO and ECOC in a uniﬁed framework.

Later, new improvements have been developed for ECOC and

focus on both the designs of its encoding (w.r.t. the construction

Contents lists available at SciVerse ScienceDirect

journal ho mepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2012.02.035

Corresponding author. Tel.: þ86 25 848 96481; fax: þ 86 25 844 98 069.

E-mail addresses: wangyunyun@nuaa.edu.cn (Y. Wang),

s.chen@nuaa.edu.cn (S. Chen), hxue@seu.edu.cn (H. Xue).

Neurocomputing 89 (2012) 158–167

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38697471

粉丝: 6
资源: 980

挖掘原始类别结构提升ECOC多类分类性能

基于MATLAB的KNN算法实现多分类

多分类_MvM_ecoc.pdf

纠错输出代码 (ECOC) 分类器：用于多类分类的纠错输出代码 (ECOC)。-matlab开发

ECOC多分类算法在慕课数据挖掘中的应用.pdf

ECOC2017论文集

ECOC PAK-开源

ECOC2014.rar

弱监督ECOC算法提升肺结节分类准确度：形状特征与编码策略

谱聚类提升多类问题的ECOC鉴别纠错：新方法与应用对比

ECOC方法如何解决多分类问题？

最新资源