Learning with Feature Network and Label Network Simultaneously
Yingming Li,
†
Ming Yang,
†
Zenglin Xu,
‡
Zhongfei (Mark) Zhang
†
†
College of Information Science & Electronic Engineering, Zhejiang University, China
‡
School of Computer Science and Engineering, Big Data Research Center
University of Electronic Science and Technology of China
yingming@zju.edu.cn, cauchym@zju.edu.cn,
zenglin@gmail.com, zhongfei@zju.edu.cn
Abstract
For many supervised learning problems, limited training sam-
ples and incomplete labels are two difficult challenges, which
usually lead to degenerated performance on label prediction.
To improve the generalization performance, in this paper, we
propose Doubly Regularized Multi-Label learning (DRML)
by exploiting feature network and label network regulariza-
tion simultaneously. In more details, the proposed algorithm
first constructs a feature network and a label network with
marginalized linear denoising autoencoder in data feature set
and label set, respectively, and then learns a robust predictor
with the feature network and the label network regularization
simultaneously. While DRML is a general method for multi-
label learning, in the evaluations we focus on the specific ap-
plication of multi-label text tagging. Extensive evaluations on
three benchmark data sets demonstrate that DRML outstands
with a superior performance in comparison with some exist-
ing multi-label learning methods.
Introduction
With the research on tagging learning for decades (Nigam
et al. 1998; Elisseeff and Weston 2001; Yu, Yu, and Tresp
2005; Hsu et al. 2009; Liu and Tsang 2015), recent years
have witnessed the increasing applications of tagging learn-
ing in many fields ranging from social media searching
to classification of medical reports due to its capability
of improving data organization and management. Conse-
quently, many tagging methods (Liu, Jin, and Yang 2006;
Zhang and Zhou 2007; 2014; Li, Yang, and Zhang 2016)
have been developed based on different requirements from
different areas. However, most existing tagging methods as-
sume that the amount of given training data is sufficient
and the given training labels are complete. In contrast, for
many supervised learning problems, they often face two
challenges: limited training samples and incomplete train-
ing labels, which usually lead to degenerated performance
on label prediction.
Given a limited amount of labeled training data and a
very high-dimensional feature space, a common solution is
to regularize a model by penalizing a specific norm of its
parameters. The most commonly used norms in supervised
Copyright
c
2017, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
learning are L
1
and L
2
, which assume that model param-
eters are independent. However, dependencies between pa-
rameters usually exist in the real-world applications. For ex-
ample, in biomedical domain, gene features have structured
input since genes are organized as pathways; the learned
model parameters (feature weights for a linear classifier)
should be more effective by keeping the structural relation-
ship between features. Further, dependencies can also be in-
ferred from data, e.g., manifold-based feature graph can be
used to regularize the model parameters and show its effec-
tivity (Li and Li 2008). However, the feature network based
on feature manifold only considers the positive correlation
between features and ignores negative correlations between
features. It is inappropriate since negative correlations also
help to reduce the search space of the model parameters.
On the other hand, recent work, for example (Chen,
Zheng, and Weinberger 2013), considers regularized learn-
ing with label network to mitigate the influence of incom-
plete training label set. It assumes that the given label set is
incomplete and proposes a label network based on marginal-
ized linear denoising autoencoder to exploit the relation-
ship among tags. Consequently, a label network regularized
learning method is presented to cope with the incomplete
tagging problem. The proposed method significantly im-
proves over the prior state-of-the-art. However, it still suffers
from learning with limited training samples, which influence
the generalization performance.
To improve the generalization performance of tagging, it
is necessary to consider both feature network and label net-
work. To achieve this goal, we propose to train robust predic-
tors with feature network and label network simultaneously.
In particular, we first learn a feature network and a label net-
work with marginalized linear denoising autoencoders on
feature set and label set, respectively. Take the learning of
feature network for example, we learn the feature network
by a marginalized linear denoising autoencoder, which is a
one-layer linear denoising neural network, and train a net-
work weight matrix B
x
to make B
x
˜
x approximate x, where
˜
x is a corrupted version of sample x ∈ R
d
by random
dropout corruption on each feature dimension. The learned
network weight B
x
ij
indicates the relationship between fea-
ture i and feature j.
Further, we present the Doubly Regularized Multi-Label
learning (referred as DRML) model, which learns a robust
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)