深度学习驱动的多标签学习新进展与挑战

需积分: 45 182 浏览量更新于2024-07-15 收藏 3.39MB PDF 举报

"这篇期刊文章探讨了多标签学习的新兴趋势，强调了随着大数据时代的到来，多标签学习面临的巨大挑战以及深度学习在处理这些问题上的潜力。文章提到了极端多标签分类作为活跃的研究领域，以及如何利用有限监督的大数据构建多标签分类模型的实际价值。此外，还讨论了深度学习如何增强对标签依赖性的捕捉能力，以更好地解决现实世界的分类任务。然而，文章指出，目前缺乏对大数据时代多标签学习新趋势和挑战的系统性分析，呼吁进行全面的调查研究以填补这一空白。" 在当前AI领域，多标签学习是应对复杂问题的关键技术之一。与传统的单标签分类任务不同，多标签学习涉及一个实例可能关联多个标签的情况，这在视觉识别、自然语言处理、推荐系统等多个领域都有广泛应用。例如，在图像分类中，一张图片可能同时包含“动物”、“户外”和“自然”等多个标签；在文本分类中，一篇文章可能涉及“科技”、“经济”和“政策”等多个主题。随着大数据的爆炸式增长，多标签学习面临着新的挑战。一方面，极端多标签分类问题的出现，即类别的数量极其庞大，使得传统方法难以有效处理。在这种情况下，如何设计高效且准确的分类算法成为了研究的重点。另一方面，如何在数据标注成本高昂的情况下，利用大规模未标记或部分标记的数据来训练模型，成为了实践应用中的一个重要课题。深度学习的崛起为多标签学习带来了革命性的变化。深度神经网络能够自动学习多层次的表示，有助于捕捉复杂的标签关系。尤其在卷积神经网络（CNN）和循环神经网络（RNN）的应用中，它们在图像和文本的多标签分类上表现出了强大的性能。通过引入注意力机制、图神经网络等技术，深度学习可以进一步提升模型对标签相关性的理解。尽管深度学习在多标签学习中有显著优势，但目前的研究仍存在不足。首先，对深度学习在多标签学习中应用的理论理解相对匮乏，如何有效地利用深度学习的强大学习能力仍然是一个开放问题。其次，大数据时代带来的计算资源需求和模型泛化能力的平衡也需要进一步研究。最后，如何设计适应性强、鲁棒性好的多标签学习框架，以应对不断变化的现实世界任务，是未来的重要研究方向。多标签学习的新兴趋势不仅包括对大数据的处理能力和深度学习的应用，还涉及到对新挑战的应对策略。为了推动这一领域的进步，我们需要更深入地理解多标签学习的内在机制，发展有效的算法和模型，并开展全面的调查研究，以应对大数据时代带来的新需求。

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

size of GBDT to make it suitable for XMLC. CRAFTML [77]

tries to use fast partitioning strategies and exploit random

forest algorithm. CRAFTML ﬁrst randomly projects the

feature and label into lower dimensional spaces. A k-means

algorithm is then used in the projected labels to partition the

instances into k temporary subsets. For the prediction, a test-

ing instance follows a root-to-leaf path and the average label

vector is stored in the leaf node. The forest aggregates these

label vectors in each tree for prediction. GBDT-SPARSE and

CRAFTML also open the way to parallelization.

2.3 One-vs-all Methods

One-vs-all methods are one of the most popular strategies

for multi-label classiﬁcation which independently trains a

binary classiﬁer for each label. However, this technique

suffers two major limitations for XMLC: 1) Training one-vs-

all classiﬁers for XMLC problems using off-the-shelf solvers

such as Liblinear can be infeasible for computation and

memory. 2) The model size for XMLC data set can be

extremely large, which leads to slow prediction. Recently,

many works have been developed to address the above

issues of the one-vs-all methods in XMLC.

By exploiting the sparsity of the data, some sub-linear

algorithms are proposed to adapt one-vs-all methods in

the extreme classiﬁcation setting. For example, PD-Sparse

[74] proposes to minimize the separation ranking loss [78]

and l

penalty in an Empirical Risk Minimization (ERM)

framework for XMLC. The separation ranking loss penalizes

the prediction on an instance by the highest response from

the set of negative labels minus the lowest response from the

set of positive labels. PD-Sparse obtains an extremely sparse

solution both in primal and in dual with the sub-linear time

cost, while yields higher accuracy than SLEEC, FastXML

and some other XMLC methods. By introducing separable

loss functions, PPDSparse [7] parallelizes PD-Sparse with

sub-linear algorithms to scale out the training. PPDSparse

can also reduce the memory cost of PDSparse by orders of

magnitude due to the separation of training for each label.

DiSMEC [6] also presents a sparse model with a parameter

thresholding strategy, and employs a double layer of paral-

lelization to scale one-vs-all methods for problems involving

hundreds of thousand labels. ProXML [79] proposes to use

-regularized Hamming loss to address the tail label issues,

and reveals that minimizing one-vs-all method based on

Hamming loss works well for tail-label prediction in XMLC

based on the graph theory.

PD-Sparse, PPDSparse, DiSMEC and ProXML have ob-

tained high prediction accuracies and low model sizes.

However, they still train a separate linear classiﬁer for each

label and linear scan every single label to decide whether

it is relevant or not. Thus the training and testing cost of

these methods grow linearly with the number of labels.

Some advanced methods are presented to address this issue.

For example, to reduce the linear prediction cost of one-

vs-all methods, [75] proposes to predict on a small set of

labels, which is generated by projecting a test instance on a

ﬁltering line, and retaining only the labels that have training

instances in the vicinity of this projection. The candidate

label set should keep most of the true labels of the testing

instances, and be as small as possible. They train the label

ﬁlters by optimizing these two principles as a mixed integer

problem. The label ﬁlters can reduce the testing time of exist-

ing XMLC classiﬁers by orders of magnitude, while yields

comparable prediction accuracy. [75] shows an interesting

technique to ﬁnd a small number of potentially relevant

labels, instead of going through a very long list of labels.

How to use label ﬁlters to speed up the training time is left

as an open problem.

Parabel [8] reduces training time of one-vs-all methods

from O(ndL) to O((nd log L)/L) by learning balanced bi-

nary label trees based on an efﬁcient and informative label

representation. They also present a probabilistic hierarchical

multi-label model for generalizing hierarchical softmax to

the multi-label setting. The logarithmic prediction algorithm

is also proposed for dealing with XMLC. Experiments show

that Parabel could be orders of magnitude faster at training

and prediction compared to the state-of-the-art one-vs-all

extreme classiﬁers. However, Parabel is not accurate in low-

dimension data set, because Parabel can not guarantee that

similar labels are divided into the same group, and the

error will be propagated in the deep trees. To reduce the

error propagation, Bonsai [80] shows a shallow k-ary label

tree structure with generalized label representation. A novel

negative sampling technique is also presented in Slice [9] to

improve the prediction accuracy for low-dimensional dense

feature representations. Slice is able to cut down the training

time cost of one-vs-all methods from linear to O(nd log L)

by training classiﬁer on only O(n/L log L) of the most con-

fusing negative examples rather than on all n training set.

Slice employs generative model to estimate O(n/L log L)

negative examples for each label based on approximate

nearest neighbour search (ANNS) in time O((n+L)d log L),

and conduct the prediction on O(log L) of the most probable

labels for each testing data. Slice is up to 15% more accurate

than Parabel, and able to scale to 100 million labels and

240 million training points. The experiments in [9] show

that negative sampling is a powerful tool in XMLC, and

the performance gain of some advanced negative sampling

technique may be explored for future research.

The training and testing time complexity of some XMLC

methods are summarized in Table 1. From Table 1, we

can see that the tree methods usually obtain much lower

training and testing time complexity compared with em-

bedding and one-vs-all methods. The testing time of one-

vs-all methods can be reduced from linear to logarithmic in

the number of labels based on tree structure and negative

sampling technique. In the future, we need more advanced

techniques to further reduce the time cost for XMLC.

3 MULTI-LABEL LEARNING WITH LIMITED SUPER-

VISION

Collecting fully-supervised data is usually hard and expen-

sive and thus a critical bottleneck in real-world classiﬁcation

tasks. In MLC problems, there exist many ground-truth

labels and the output space can be very large, which further

aggravates the difﬁculty of precise annotation. To mitigate

this problem, plenty of works have studied different settings

of MLC with limited supervision. How to model label

dependencies and handle incomplete supervision pose two

剩余20页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

深度学习驱动的多标签学习新进展与挑战

meta-learning：survey.pdf

Multimodality in Meta-Learning A Comprehensive Survey.pdf

sample-survey

《少样本学习FSL》（2020年最新综述论文）

survey:研究调查

class_language_survey

fgo-servant-survey

Data Hiding with Deep Learning A Survey Unifying Digital Wate

Unsupervised Domain Adaption of Object Detectors A Survey.pdf

Explorative-Survey-of-Papers-in-Energy

最新资源