1
Communication-Efficient Federated Deep Learning
with Asynchronous Model Update and Temporally
Weighted Aggregation
Yang Chen, Xiaoyan Sun, Yaochu Jin,
Abstract—Federated learning obtains a central model on
the server by aggregating models trained locally on clients.
As a result, federated learning does not require clients to
upload their data to the server, thereby preserving the data
privacy of the clients. One challenge in federated learning
is to reduce the client-server communication since the end
devices typically have very limited communication bandwidth.
This paper presents an enhanced federated learning technique
by proposing a synchronous learning strategy on the clients
and a temporally weighted aggregation of the local models
on the server. In the asynchronous learning strategy, different
layers of the deep neural networks are categorized into shallow
and deeps layers and the parameters of the deep layers are
updated less frequently than those of the shallow layers.
Furthermore, a temporally weighted aggregation strategy is
introduced on the server to make use of the previously trained
local models, thereby enhancing the accuracy and convergence
of the central model. The proposed algorithm is empirically on
two datasets with different deep neural networks. Our results
demonstrate that the proposed asynchronous federated deep
learning outperforms the baseline algorithm both in terms of
communication cost and model accuracy.
Index Terms—Federated learning, Deep neural network,
aggregation, asynchronous learning, temporally weighted ag-
gregation
I. INTRODUCTION
Smart phones, wearable gadgets, and distributed wireless
sensors usually generate huge volumes of privacy sensitive
data. In many cases, service providers are interested in
mining information from these data to provide personalized
services, for example, to make more relevant recommenda-
tions to clients. However, the clients are usually not willing
to allow the service provider to access the data for privacy
reasons.
Federated learning is a recently proposed privacy-
preserving machine learning framework [1]. The main idea
is to train local models on the clients, send the model
parameters to the server, and then aggregate the local models
on the server. Since all local models are trained upon
This work is supported by the National Natural Science Foundation of
China with Grant No.61473298 and 61876184. (Corresponding author:
Yaochu Jin)
Y. Chen and Xi. Sun are with the School of Information and Control
Engineering, China University of Mining and Technology, Xuzhou 221116,
China. Y. Chen and X. Sun contributed equally to this work and are co-first
authors.(e-mail: fedora.cy@gmail.com; xysun78@hotmail.com)
Y. Jin is with the Department of Computer Science, University of Surrey,
Guildford, GU2 7XH, United Kingdom. (Email: yaochu.jin@surrey.ac.uk)
data that are locally stored in clients, the data privacy can
be perserved. The whole process of the typical federated
learning is divided into communication rounds, in which
the local models on the clients are trained on the local
datasets. For the k-th client, where k ∈ S, and S refers
to the participating subset of m clients, its training samples
are denoted as P
k
and the trained local model is represented
by the model parameter vector ω
k
. In each communication
round, only models of the clients belonging to the subset S
will download the parameters of the central model from the
server ans use them as the initial values of the local models.
Once the local training is completed, the participating clients
send the updated parameters back to the server. Conse-
quently, the central model can be updated by aggregating
the updated local models, i.e. ω = Agg(ω
k
) [2], [3], [1].
In this setting, the local models of each client can be
any type of machine learning models, which can be chosen
according to the task to be accomplished. In most exist-
ing work on federated learning [1], deep neural networks
(DNNs), e.g., long short-term memory (LSTM), are em-
ployed to conduct text-word/text-character prediction tasks.
In recent years, DNNs have been successfully applied to
many complex problem-solvings, including text classifica-
tion, image classification, and speech recognition [4], [5],
[6]. Therefore, DNNs are widely adopted as the local model
in federated learning, and the stochastic gradient descent
(SGD) is the most popular learning algorithm for training
the local models.
As aforementioned, one communication round includes
parameter download (on clients), local training (on clients),
trained parameter upload (on clients), and model aggregation
(on the server). Such a framework appears to be similar to
distributed machine learning algorithm [7], [8], [9], [10],
[11], [12]. In federated learning, however, only the models’
parameters are uploaded and downloaded between the clients
and server, and the data of local clients are not uploaded to
the server or exchanged between the clients. Accordingly,
the data privacy of each client can be preserved.
Compared with other machine leanring paradiagms, fed-
erated learning are subject to the following challenges [1],
[13]:
1) Unbalanced data: The data amount on different
clients may be highly imbalanced because there are
light and heavy users.
arXiv:1903.07424v1 [cs.LG] 18 Mar 2019