风险导向的安全拉普拉斯正则化最小二乘法设计

113 浏览量更新于2024-08-26 收藏 420KB PDF 举报

"这篇研究论文探讨了如何设计一种基于风险的安全拉普拉斯正则化最小二乘法，旨在提升半监督学习（Semi-Supervised Learning, SSL）中的安全性。作者们关注的是如何更安全地利用未标记数据，以防止其对模型性能产生负面影响。他们提出的方法称为风险基础的安全拉普拉斯正则化最小二乘法（Risk-based Safe Laplacian Regularized Least Squares, S3L），并强调了不同未标记数据的风险程度对模型的影响。" 在半监督学习中，由于缺乏大量标签数据，研究者们通常会借助未标记数据来提高模型的泛化能力。然而，未标记数据的质量和相关性直接影响着学习结果的可靠性。传统的SSL方法可能因为误用这些数据而导致性能下降。因此，S3L成为了一个重要的研究方向，目标是确保在使用未标记数据时，SSL方法至少能达到与监督学习（Supervised Learning, SL）相同的效果。这篇论文中，作者们提出了一种新的策略，即通过风险度量来筛选和处理未标记数据。他们引入了拉普拉斯正则化最小二乘法（Laplacian Regularized Least Squares, LRLS），这是一种常用的半监督学习技术，利用图论的概念来捕获数据之间的局部结构。在此基础上，他们加入了风险评估机制，以量化每个未标记样本对模型性能可能产生的风险。论文的关键点在于，它不仅考虑了未标记数据的风险，还进一步区分了不同数据点的风险程度。这意味着，模型可以动态调整对不同未标记数据的信任度，避免高风险数据对训练过程的干扰。通过这种方式，S3L方法能够更谨慎地整合未标记数据，从而提高学习的稳定性和准确性。此外，论文可能还涵盖了实验部分，对比了所提方法与其他SSL和SL方法在各种数据集上的性能。这些实验可能验证了风险基础的S3L方法在保留或改善模型性能的同时，能有效地降低由未标记数据带来的不确定性。这篇研究论文对于理解如何在半监督学习中安全、有效地利用未标记数据具有重要意义，它为机器学习领域的理论研究和实际应用提供了新的视角和工具。通过深入研究和应用这种风险基础的拉普拉斯正则化最小二乘法，可以有望优化半监督学习模型的性能，尤其是在数据标注成本高昂的情况下。

Expert Systems With Applications 45 (2016) 1–7

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Towards designing risk-based safe Laplacian Regularized Least Squares

Haitao Gan

a,∗

, Zhizeng Luo

, Yao Sun

,XugangXi

,NongSang

,RuiHuang

School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China

article info

Keywords:

Semi-supervised learning

Laplacian Regularized Least Squares

Safe mechanism

Risk degree

abstract

Recently, Safe Semi-Supervised Learning (S3L) has become an active topic in the Semi-Supervised Learning

(SSL) ﬁeld. In S3L, unlabeled data that may affect the performance of SSL both positively and negatively are

exploited more safely through different risk-based strategies, and such S3L methods are expected to perform

at least the same as the corresponding Supervised Learning (SL) methods. While the previously proposed

S3L methods considered the risk of unlabeled data, they did not explicitly model the different risk degrees

of unlabeled data on the learning procedure. Hence, we propose risk-based safe Laplacian Regularized Least

Squares (RsLapRLS) by analyzing the different risk degrees of unlabeled data in this paper. Our motivation is

that unlabeled data may be risky in SSL and the risk degrees are different. We assign different risk degrees

to unlabeled data according to the different characteristics in supervised and semi-supervised learning. Then

a risk-based tradeoff term between supervised and semi-supervised learning is integrated into the objective

function of SSL. The role of risk degrees is to determine the way of exploiting the unlabeled data. Unlabeled

data with large risk degrees should be exploited by SL and others by SSL. In particular, we employ Regularized

Least Squares (RLS) and Laplacian RLS (LapRLS) for SL and SSL, respectively. Experimental results on several

UCI and benchmark datasets show that the performance of our algorithm is never signiﬁcantly inferior to RLS

and LapRLS. In this way, our algorithm improves the practicability of SSL.

1. Introduction

Past decades have witnessed the success of Semi-Supervised

Learning (SSL) (Zhu, 2005; Zhu & Goldberg, 2009) in the machine

learning ﬁeld and various tasks, such as object detection and track-

ing (Chen, Li, Su, Cao, & Ji, 2014; Grabner Helmut & Horst, 2008; Qi,

Xu, Wang, & Song, 2011; Tan, Zhang, & Wang, 2011), image classiﬁca-

tion (Cao, He, & Huang, 2011; Gan, Sang, & Huang, 2014; Lu & Wang,

2015; Richarz, Vajda, Grzeszick, & Fink, 2014; Van Vaerenbergh, San-

tamaria, & Barbano, 2011), speech recognition (Tur, Hakkani-Tur, &

Schapire, 2005; Varadarajan, Yu, Deng, & Acero, 2009), etc. SSL aims

at exploiting the information of both labeled and unlabeled data and

achieving better performance than Supervised Learning (SL). How to

utilize the unlabeled data is the core problem. Generally speaking,

SSL utilizes the following assumptions on the data space: (1) smooth-

ness; (2) cluster; (3) manifold; and (4) disagreement. Many algo-

rithms (Adankon and Cheriet, 2011; Belkin, Niyogi, and Sindhwani,

2006; Blum and Mitchell, 1998; Reddy, Shevade, and Murty, 2011;

Zhou and Li, 2005) have been proposed and achieved the encouraging

∗

Corresponding author. Tel.: +86 571 86919130.

E-mail addresses: htgan@hdu.edu.cn (H. Gan), luo@hdu.edu.cn (Z. Luo),

sunyao@hdu.edu.cn (Y. Sun), xixugang@hdu.edu.cn (X. Xi), nsang@hust.edu.cn

(N. Sang), ruihuang@hust.edu.cn (R. Huang).

performance using one or more assumptions in many tasks. Among

these assumptions, manifold regularization (Belkin et al., 2006; Gan,

Sang, & Chen, 2013) based methods have received much attention

which exploit the intrinsic manifold structure of both labeled and

unlabeled data. Belkin et al. (2006) proposed Laplacian Regular-

ized Least Squares (LapRLS) and Support Vector Machines (LapSVM)

which both employed a Laplacian regularization term to learn from

labeled and unlabeled data. Experimental results show that the man-

ifold regularization technique can effectively exploit the information

of unlabeled data.

Among these SSL methods, a common assumption is that all the

unlabeled data are safe to be exploited. However, some literatures

(Li & Zhou, 2011b; Wang & Chen, 2013; Yang & Priebe, 2011)show

that the information of unlabeled data has the dual characteristics:

(1) helpfulness; and (2) harmfulness. For a given SSL method, If the

unlabeled data can improve the performance, they can be consid-

ered helpful. If the unlabeled data degenerate the performance, they

can be considered harmful. Since different SSL methods utilized the

different assumptions as mentioned above, the unlabeled data may

have different characteristics in different SSL methods. When the em-

ployed assumption is not consistent with data distribution disclosed

by the whole dataset, the unlabeled data may be harmful for learn-

ing in SSL. Some previous studies (Cohen, Cozman, Sebe, Cirelo, &

Huang, 2004; Singh, Nowak, & Zhu, 2009; Yang & Priebe, 2011)have

http://dx.doi.org/10.1016/j.eswa.2015.09.017

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38618024

粉丝: 0
资源: 938

风险导向的安全拉普拉斯正则化最小二乘法设计

基于多视图拉普拉斯正则化的半监督稀疏特征选择，

rilt:正则化拉普拉斯逆变换-matlab开发

基于小二乘法的数据处理问题研究综述.docx

基于流形学习的光谱回归的无监督特征选择，用于面部表情识别

BS23-287基于Python的期货程序化交易系统的设计与实现-206jhypi.zip

springboot052基于Springboot+Vue旅游管理系统毕业源码案例设计.zip

Matlab实现牛顿拉夫逊优化算法NRBO-Kmean-Transformer-BiLSTM负荷预测算法研究.rar

【创新未发表】Matlab实现樽海鞘优化算法SSA-GRU实现风电数据预测算法研究.rar

基于向量加权平均算法INFO-Kmean-Transformer-GRU实现数据回归预测算法研究Matlab代码.rar

【创新未发表】Matlab实现秃鹰优化算法BES-GRU实现风电数据预测算法研究.rar

最新资源