THIS IS THE AUTHOR’S VERSION OF AN ARTICLE THAT HAS BEEN PUBLISHED IN THIS JOURNAL. CHANGES WERE MADE TO THIS VERSION PRIOR TO PUBLICATION. DOI: 10.1109/MIS.2020.2988525 1
A Secure Federated Transfer Learning
Framework
Yang Liu, Yan Kang, Chaoping Xing, Tianjian Chen, Qiang Yang, Fellow, IEEE
Abstract—Machine learning relies on the availability of vast amounts of data for training. However, in reality, data are mostly scattered
across different organizations and cannot be easily integrated due to many legal and practical constraints. To address this important
challenge in the field of machine learning, we introduce a new technique and framework, known as federated transfer learning (FTL), to
improve statistical modeling under a data federation. FTL allows knowledge to be shared without compromising user privacy and
enables complementary knowledge to be transferred across domains in a data federation, thereby enabling a target-domain party to
build flexible and effective models by leveraging rich labels from a source domain. This framework requires minimal modifications to the
existing model structure and provides the same level of accuracy as the non-privacy-preserving transfer learning. It is flexible and can
be effectively adapted to various secure multi-party machine learning tasks.
Index Terms—Federated Learning, Transfer Learning, Multi-party Computation, Secret Sharing, Homomorphic Encryption.
F
1 INTRODUCTION
R
ECENT Artificial Intelligence (AI) achievements have been
depending on the availability of massive amounts of labeled
data. For example, AlphaGo has been trained using a dataset
containing 30 million moves from 160,000 actual games. The
ImageNet dataset has over 14 million images. However, across
various industries, most applications only have access to small or
poor quality datasets. Labeling data is very expensive, especially
in fields which require human expertise and domain knowledge.
In addition, data needed for a specific task may not all be stored in
one place. Many organizations may only have unlabeled data, and
some other organizations may have very limited amounts of labels.
It has been increasingly difficult from a legislative perspective for
organizations to combine their data, too. For example, General
Data Protection Regulation (GDPR) [1], a new bill introduced by
the European Union, contains many terms that protect user privacy
and prohibit organizations from exchanging data without explicit
user approval. How to enable the large number of businesses and
applications that have only small data (few samples and features)
or weak supervision (few labels) to build effective and accurate AI
models while complying with data privacy and security laws is a
difficult challenge.
To address this challenge, Google introduced a federated
learning (FL) system [2] in which a global machine learning model
is updated by a federation of distributed participants while keeping
their data stored locally. Their framework requires all contributors
share the same feature space. On the other hand, secure machine
learning with data partitioned in the feature space has also been
studied [3]. These approaches are only applicable in the context
of data with either common features or common samples under
a federation. In reality, however, the set of such common entities
• Yang Liu, Yan Kang and Tianjian Chen are with WeBank, Shenzhen, China.
• Chanping Xing is with the Shanghai Jiao Tong University, Shanghai
China.
• Qiang Yang is with the Hong Kong University of Science and Technology,
Hong Kong, China.
may be small, making a federation less attractive and leaving the
majority of the non-overlapping data under-utilized.
In this paper, we propose Federated Transfer Learning (FTL)
to address the limitations of existing federated learning ap-
proaches. It leverages transfer learning [4] to provide solutions
for the entire sample and feature space under a federation. Our
contributions are as follows:
1) We formalize the research problem of federated transfer
learning in a privacy-preserving setting to provide solu-
tions for federation problems beyond the scope of existing
federated learning approaches;
2) We provide an end-to-end solution to the proposed FTL
problem and show that the performance of the proposed
approach in terms of convergence and accuracy is com-
parable to non-privacy-preserving transfer learning; and
3) We provide some novel approaches to incorporate addi-
tively homomorphic encryption (HE) and secret sharing
using beaver triples into two-party computation (2PC)
with neural networks under the FTL framework such
that only minimal modifications to the neural network
is required and the accuracy is almost lossless.
2 RELATED WORK
Recent years have witnessed a surge of studies on encrypted
machine learning. For example, Google introduced a secure ag-
gregation scheme to protect the privacy of aggregated user updates
under their federarted learning framework [5]. CryptoNets [6]
adapted neural network computations to work with data encrypted
via Homomorphic Encryption (HE). SecureML [7] is a multi-party
computing scheme which uses secret-sharing and Yao’s Garbled
Circuit for encryption and supports collaborative training for linear
regression, logistic regression and neural networks.
Transfer learning aims to build an effective model for an appli-
cation with a small dataset or limited labels in a target domain by
leveraging knowledge from a different but related source domain.
In recent years, there have been tremendous progress in applying
transfer learning to various fields such as image classification and
Copyright (c) 2020 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
arXiv:1812.03337v2 [cs.LG] 24 Jun 2020