Expert Systems With Applications 45 (2016) 1–7
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Towards designing risk-based safe Laplacian Regularized Least Squares
Haitao Gan
a,∗
, Zhizeng Luo
a
, Yao Sun
a
,XugangXi
a
,NongSang
b
,RuiHuang
b
a
School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
b
School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China
article info
Keywords:
Semi-supervised learning
Laplacian Regularized Least Squares
Safe mechanism
Risk degree
abstract
Recently, Safe Semi-Supervised Learning (S3L) has become an active topic in the Semi-Supervised Learning
(SSL) field. In S3L, unlabeled data that may affect the performance of SSL both positively and negatively are
exploited more safely through different risk-based strategies, and such S3L methods are expected to perform
at least the same as the corresponding Supervised Learning (SL) methods. While the previously proposed
S3L methods considered the risk of unlabeled data, they did not explicitly model the different risk degrees
of unlabeled data on the learning procedure. Hence, we propose risk-based safe Laplacian Regularized Least
Squares (RsLapRLS) by analyzing the different risk degrees of unlabeled data in this paper. Our motivation is
that unlabeled data may be risky in SSL and the risk degrees are different. We assign different risk degrees
to unlabeled data according to the different characteristics in supervised and semi-supervised learning. Then
a risk-based tradeoff term between supervised and semi-supervised learning is integrated into the objective
function of SSL. The role of risk degrees is to determine the way of exploiting the unlabeled data. Unlabeled
data with large risk degrees should be exploited by SL and others by SSL. In particular, we employ Regularized
Least Squares (RLS) and Laplacian RLS (LapRLS) for SL and SSL, respectively. Experimental results on several
UCI and benchmark datasets show that the performance of our algorithm is never significantly inferior to RLS
and LapRLS. In this way, our algorithm improves the practicability of SSL.
© 2015 Elsevier Ltd. All rights reserved.
1. Introduction
Past decades have witnessed the success of Semi-Supervised
Learning (SSL) (Zhu, 2005; Zhu & Goldberg, 2009) in the machine
learning field and various tasks, such as object detection and track-
ing (Chen, Li, Su, Cao, & Ji, 2014; Grabner Helmut & Horst, 2008; Qi,
Xu, Wang, & Song, 2011; Tan, Zhang, & Wang, 2011), image classifica-
tion (Cao, He, & Huang, 2011; Gan, Sang, & Huang, 2014; Lu & Wang,
2015; Richarz, Vajda, Grzeszick, & Fink, 2014; Van Vaerenbergh, San-
tamaria, & Barbano, 2011), speech recognition (Tur, Hakkani-Tur, &
Schapire, 2005; Varadarajan, Yu, Deng, & Acero, 2009), etc. SSL aims
at exploiting the information of both labeled and unlabeled data and
achieving better performance than Supervised Learning (SL). How to
utilize the unlabeled data is the core problem. Generally speaking,
SSL utilizes the following assumptions on the data space: (1) smooth-
ness; (2) cluster; (3) manifold; and (4) disagreement. Many algo-
rithms (Adankon and Cheriet, 2011; Belkin, Niyogi, and Sindhwani,
2006; Blum and Mitchell, 1998; Reddy, Shevade, and Murty, 2011;
Zhou and Li, 2005) have been proposed and achieved the encouraging
∗
Corresponding author. Tel.: +86 571 86919130.
E-mail addresses: htgan@hdu.edu.cn (H. Gan), luo@hdu.edu.cn (Z. Luo),
sunyao@hdu.edu.cn (Y. Sun), xixugang@hdu.edu.cn (X. Xi), nsang@hust.edu.cn
(N. Sang), ruihuang@hust.edu.cn (R. Huang).
performance using one or more assumptions in many tasks. Among
these assumptions, manifold regularization (Belkin et al., 2006; Gan,
Sang, & Chen, 2013) based methods have received much attention
which exploit the intrinsic manifold structure of both labeled and
unlabeled data. Belkin et al. (2006) proposed Laplacian Regular-
ized Least Squares (LapRLS) and Support Vector Machines (LapSVM)
which both employed a Laplacian regularization term to learn from
labeled and unlabeled data. Experimental results show that the man-
ifold regularization technique can effectively exploit the information
of unlabeled data.
Among these SSL methods, a common assumption is that all the
unlabeled data are safe to be exploited. However, some literatures
(Li & Zhou, 2011b; Wang & Chen, 2013; Yang & Priebe, 2011)show
that the information of unlabeled data has the dual characteristics:
(1) helpfulness; and (2) harmfulness. For a given SSL method, If the
unlabeled data can improve the performance, they can be consid-
ered helpful. If the unlabeled data degenerate the performance, they
can be considered harmful. Since different SSL methods utilized the
different assumptions as mentioned above, the unlabeled data may
have different characteristics in different SSL methods. When the em-
ployed assumption is not consistent with data distribution disclosed
by the whole dataset, the unlabeled data may be harmful for learn-
ing in SSL. Some previous studies (Cohen, Cozman, Sebe, Cirelo, &
Huang, 2004; Singh, Nowak, & Zhu, 2009; Yang & Priebe, 2011)have
http://dx.doi.org/10.1016/j.eswa.2015.09.017
0957-4174/© 2015 Elsevier Ltd. All rights reserved.