层次分类任务综述与方法比较

需积分: 50 15 浏览量更新于2024-07-18 收藏 900KB PDF 举报

层级分类综述是一篇深入探讨机器学习领域中一个关键问题的论文，它关注的是分类任务中的层级结构。在现实应用中，诸如生物信息学、计算机视觉、自然语言处理等多个领域都可能涉及对数据进行分层标记，以便更有效地理解和组织复杂的数据关系。作者Carlos N. Silla Jr. 和 Alex A. Freitas在2011年的文章《DataMinKnowlDisc》(Volume 22, Pages 31-72, DOI: 10.1007/s10618-010-0175-9)中，对这一主题进行了全面的梳理。首先，他们明确了层级分类的任务定义：它是通过考虑对象之间的上下层级关系来进行分类，这种关系可以是基于类别之间的层次结构，如物种分类或文本语义层次。他们指出，尽管这项任务广泛应用于多个领域，但不同领域的研究往往缺乏交流，导致各自发展出的方法可能并未充分考虑到其他领域的最佳实践。在这篇文章中，作者提出了一个新的视角来审视现有的层级分类方法，尝试构建一个统一的框架，以此来整合和分类不同的技术。他们区分了哪些相关任务虽然看似类似，但实际上并不属于真正的层级分类，比如仅基于聚类的简单层次化处理。他们的目标是提供一种系统化的理解，帮助研究人员更好地选择和设计适合特定应用场景的层级分类算法。接下来，作者回顾了文献中关于现有层级分类方法的实证比较，分析了各种方法的优点和缺点。这包括基于概率的贝叶斯网络方法、决策树、神经网络等的不同实现，以及它们在处理具有层级结构数据时的性能对比。这些比较不仅基于算法的精度，还考虑了效率、可解释性和适应性等因素。最后，作者提出了一个高层次的抽象概念比较，强调了层级分类在实际应用中的灵活性和复杂性，以及如何权衡模型的准确性与计算资源的需求。他们鼓励跨领域合作，以推动层级分类领域的整体进步。这篇综述论文为理解、评估和开发更有效的层级分类方法提供了宝贵的指导，有助于读者在面对复杂数据集时做出明智的选择，并促进了机器学习理论与实践的融合。

38 C. N. Silla Jr., A. A. Freitas

It should be noted that, although the three types of local hierarchical classiﬁcation

algorithms discussed in the next three sub-sections differ significantly in their training

phase, they share a very similar top-down approach in their testing phase. In essence, in

this top-down approach, for each new example in the test set, the system ﬁrst predicts

its ﬁrst-level (most generic) class, then it uses that predicted class to narrow the choices

of classes to be predicted at the second level (the only valid candidate second-level

classes are the children of the class predicted at the ﬁrst level), and so on, recursively,

until the most speciﬁc prediction is made.

As a result, a disadvantage of the top-down class-prediction approach (which is

shared by all the three types of local classiﬁers discussed next) is that an error at a

certain class level is going to be propagated downwards the hierarchy, unless some

procedure for avoiding t his problem is used. If the problem is non-mandatory leaf

node prediction, a blocking approach (where an example is passed down to the next

lower level only if the conﬁdence on the prediction at the current level is greater than a

threshold) can avoid that misclassiﬁcations are propagated downwards, at the expense

of providing the user with less speciﬁc (less useful) class predictions. Some authors use

methods to give better estimates of class probabilities, like shrinkage (McCallum et al.

1998) and isotonic smoothing (Punera and Ghosh 2008). The issues of non-mandatory

leaf node prediction and blocking are discussed in Sect. 4.4.

4.1 Local classiﬁer per node approach

This is by far the most used approach in the literature. It often appears under the name

of a top-down approach, but as we mentioned earlier, we shall see why this is not a

good name as the top-down approach is essentially a method to avoid inconsistencies

in class predictions at different levels in the class hierarchy. The LCN approach con-

sists of training one binary classiﬁer for each node of the class hierarchy (except the

root node). Figure 4 illustrates this approach.

Fig. 4 Local classiﬁer per node approach (circles represent classes and dashed squares with round corners

represent binary classiﬁers)

123

A survey of hierarchical classiﬁcation 39

Table 1 Notation for negative

and positive training examples

Symbol Meaning

Tr The set of all training examples

) The set of positive training examples of c

−

) The set of negative training examples of c

↑ (c

) The parent category of c

↓ (c

) The set of children categories of c

⇑ (c

) The set of ancestor categories of c

⇓ (c

) The set of descendant categories of c

↔ (c

) The set of sibling categories of c

∗(c

) Denotes examples whose most speciﬁc known class is c

There are different ways to deﬁne the set of positive and negative examples for

training the binary classiﬁers. In the literature most works often use one approach

and studies like Eisner et al. (2005) and Fagni and Sebastiani (2007) where differ-

ent approaches are compared are not common. In the work of Eisner et al. (2005)

the authors identify and experiment with four different policies to deﬁning the set of

positive and negative examples. In Fagni and Sebastiani (2007) the authors focus on

the selection of the negative examples and empirically compare four policies (two

standard ones compared with two novel ones). However the novel approaches are

limited to text categorization problems and achieved similar results to the standard

approaches; and for that reason they are not further discussed in this paper. The nota-

tion used to deﬁne the sets of positive and negative examples is based on the one used

in Fagni and Sebastiani (2007) and is presented in Table 1.

– The “exclusive” policy [as deﬁned by Eisner et al. (2005)]: Tr

) =∗(c

) and

−

) = Tr \∗(c

). This means that only examples explicitly labeled as c

their most speciﬁc class are selected as positive examples, while everything else

is used as negative examples. For example, using Fig. 4,forc

= 2.1, Tr

2.1

)

consists of all examples whose most speciﬁc class is 2.1; and Tr

−

2.1

) consists

of the set of examples whose most speciﬁc class is 1, 1.1, 1.2, 2, 2.1.1, 2.1.2, 2.2,

2.2.1 or 2.2.2. This approach has a few problems. First, it does not consider the

hierarchy to create the local training sets. Second, it is limited to problems where

partial depth labeling instances are available. By partial depth labeling instances

we mean instances whose class label is known just for shallower levels of the

hierarchy, and not for deeper levels. Third, using the descendant nodes of c

negative examples seems counter-intuitive considering that examples who belong

to class ⇓ (c

) also implicitly belong to class c

according to the “IS-A” hierarchy

concept.

– The “less exclusive” policy [as deﬁned by Eisner et al. (2005)]: Tr

) =∗(c

)

and Tr

−

) = Tr \∗(c

)∪⇓(c

). In this case, using Fig. 4 as example,

2.1

) consists of the set of examples whose most speciﬁc class is 2.1; and

−

2.1

) consists of the set of examples whose most speciﬁc class is 1, 1.1,

1.2, 2, 2.2, 2.2.1 or 2.2.2. This approach avoids the aforementioned ﬁrst and third

123

剩余41页未读，继续阅读

magic_fox

粉丝: 0
资源: 1

层次分类任务综述与方法比较

知识图谱构建技术综述(论文,共19页)

层级分类器训练程序

10种模式识别综述

网站前台设计技术综述

深度学习研究综述 人工智能

深度学习研究综述.pdf

BP神经网络的发展现状综述

卷积神经网络研究综述.pdf

多维可视化技术综述.pdf

绩效考评与管理研究综述.docx

最新资源

深度学习研究综述人工智能