安全联邦迁移学习框架：数据隐私与性能提升

需积分: 12 151 浏览量更新于2024-09-01 1 收藏 756KB PDF 举报

"这篇PDF文章是作者版本，已在期刊上发表，主要讨论的是安全的联邦迁移学习框架，涉及参与方间的协作、安全机制以及损失函数和梯度计算的数学细节。作者包括Yang Liu, Yan Kang, Chaoping Xing, Tianjian Chen和Qiang Yang (IEEE Fellow)。" 在当今的机器学习领域，数据往往分散在不同的组织中，由于法律和实际限制难以整合。为了解决这个问题，研究者提出了一种名为联邦迁移学习（Federated Transfer Learning, FTL）的新技术与框架。联邦迁移学习旨在在保护用户隐私的同时促进知识共享，并且能够在数据联盟中跨域传递互补知识，使目标域的参与者能够利用源域丰富的标签构建灵活而有效的模型。 FTL框架的特点在于它只需要对现有的模型结构进行最小程度的修改，就能实现与传统集中式训练相同级别的准确性。这个框架的核心是确保数据在不同参与者之间协作时的安全性，尤其是在涉及敏感信息时，如个人隐私数据。它通过一种分布式计算的方式，使得模型可以在不直接接触原始数据的情况下进行训练。在安全方面，该框架可能采用了加密算法和同态加密等技术，使得在计算过程中数据保持加密状态，从而保证了数据的机密性。同时，通过安全多方计算，参与方可以协同计算模型的损失函数和梯度，但每个参与方只能获取到与自己相关的部分信息，无法窥探其他参与方的数据。联邦迁移学习中的损失函数和梯度计算是优化过程的关键步骤。损失函数衡量模型预测结果与真实值之间的差异，而梯度则是优化模型参数的方向指示。在分布式环境中，每个参与方计算其本地数据集上的损失函数和梯度，然后使用安全协议将这些信息汇总到中心节点或直接在参与方之间进行聚合，形成全局的损失函数和梯度，用于更新模型。这篇论文深入探讨了如何在保护数据隐私的前提下，利用联邦迁移学习实现跨域知识转移，为解决分布式数据环境下的机器学习问题提供了新的思路和方法。这一框架不仅有助于提升模型的性能，还为保护用户隐私和满足法规要求提供了一个可行的解决方案。

THIS IS THE AUTHOR’S VERSION OF AN ARTICLE THAT HAS BEEN PUBLISHED IN THIS JOURNAL. CHANGES WERE MADE TO THIS VERSION PRIOR TO PUBLICATION. DOI: 10.1109/MIS.2020.2988525 1

A Secure Federated Transfer Learning

Framework

Yang Liu, Yan Kang, Chaoping Xing, Tianjian Chen, Qiang Yang, Fellow, IEEE

Abstract—Machine learning relies on the availability of vast amounts of data for training. However, in reality, data are mostly scattered

across different organizations and cannot be easily integrated due to many legal and practical constraints. To address this important

challenge in the ﬁeld of machine learning, we introduce a new technique and framework, known as federated transfer learning (FTL), to

improve statistical modeling under a data federation. FTL allows knowledge to be shared without compromising user privacy and

enables complementary knowledge to be transferred across domains in a data federation, thereby enabling a target-domain party to

build ﬂexible and effective models by leveraging rich labels from a source domain. This framework requires minimal modiﬁcations to the

existing model structure and provides the same level of accuracy as the non-privacy-preserving transfer learning. It is ﬂexible and can

be effectively adapted to various secure multi-party machine learning tasks.

Index Terms—Federated Learning, Transfer Learning, Multi-party Computation, Secret Sharing, Homomorphic Encryption.

1 INTRODUCTION

ECENT Artiﬁcial Intelligence (AI) achievements have been

depending on the availability of massive amounts of labeled

data. For example, AlphaGo has been trained using a dataset

containing 30 million moves from 160,000 actual games. The

ImageNet dataset has over 14 million images. However, across

various industries, most applications only have access to small or

poor quality datasets. Labeling data is very expensive, especially

in ﬁelds which require human expertise and domain knowledge.

In addition, data needed for a speciﬁc task may not all be stored in

one place. Many organizations may only have unlabeled data, and

some other organizations may have very limited amounts of labels.

It has been increasingly difﬁcult from a legislative perspective for

organizations to combine their data, too. For example, General

Data Protection Regulation (GDPR) [1], a new bill introduced by

the European Union, contains many terms that protect user privacy

and prohibit organizations from exchanging data without explicit

user approval. How to enable the large number of businesses and

applications that have only small data (few samples and features)

or weak supervision (few labels) to build effective and accurate AI

models while complying with data privacy and security laws is a

difﬁcult challenge.

To address this challenge, Google introduced a federated

learning (FL) system [2] in which a global machine learning model

is updated by a federation of distributed participants while keeping

their data stored locally. Their framework requires all contributors

share the same feature space. On the other hand, secure machine

learning with data partitioned in the feature space has also been

studied [3]. These approaches are only applicable in the context

of data with either common features or common samples under

a federation. In reality, however, the set of such common entities

• Yang Liu, Yan Kang and Tianjian Chen are with WeBank, Shenzhen, China.

• Chanping Xing is with the Shanghai Jiao Tong University, Shanghai

China.

• Qiang Yang is with the Hong Kong University of Science and Technology,

Hong Kong, China.

may be small, making a federation less attractive and leaving the

majority of the non-overlapping data under-utilized.

In this paper, we propose Federated Transfer Learning (FTL)

to address the limitations of existing federated learning ap-

proaches. It leverages transfer learning [4] to provide solutions

for the entire sample and feature space under a federation. Our

contributions are as follows:

1) We formalize the research problem of federated transfer

learning in a privacy-preserving setting to provide solu-

tions for federation problems beyond the scope of existing

federated learning approaches;

2) We provide an end-to-end solution to the proposed FTL

problem and show that the performance of the proposed

approach in terms of convergence and accuracy is com-

parable to non-privacy-preserving transfer learning; and

3) We provide some novel approaches to incorporate addi-

tively homomorphic encryption (HE) and secret sharing

using beaver triples into two-party computation (2PC)

with neural networks under the FTL framework such

that only minimal modiﬁcations to the neural network

is required and the accuracy is almost lossless.

2 RELATED WORK

Recent years have witnessed a surge of studies on encrypted

machine learning. For example, Google introduced a secure ag-

gregation scheme to protect the privacy of aggregated user updates

under their federarted learning framework [5]. CryptoNets [6]

adapted neural network computations to work with data encrypted

via Homomorphic Encryption (HE). SecureML [7] is a multi-party

computing scheme which uses secret-sharing and Yao’s Garbled

Circuit for encryption and supports collaborative training for linear

regression, logistic regression and neural networks.

Transfer learning aims to build an effective model for an appli-

cation with a small dataset or limited labels in a target domain by

leveraging knowledge from a different but related source domain.

In recent years, there have been tremendous progress in applying

transfer learning to various ﬁelds such as image classiﬁcation and

arXiv:1812.03337v2 [cs.LG] 24 Jun 2020

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_41940648

粉丝: 0
资源: 12

安全联邦迁移学习框架：数据隐私与性能提升

联邦学习的概念与应用文档

2020-CCCF-张彦、卢云龙、黄小红-区块链与联邦学习：融合与互补.pdf

《区块链与大数据：打造智能经济》.pdf

Advances_and_Open_Problems_in_Federated_Learning.pdf

Survey of federated learning research.pdf

Client Selection for Federated Learning with.pdf

2018-杨强安全迁移学习-Secure Federated Transfer Learning1

BlockFLA Accountable Federated Learning.pdf

EXPANDING THE REACH OF FEDERATED LEARNING.pdf

A blockchain-orchestrated Federated Learning.pdf

最新资源