稀疏三元压缩技术：优化非IID数据下的通信效率联邦学习

需积分: 0 82 浏览量更新于2024-06-30 收藏 1.11MB PDF 举报

"该资源是一篇关于在非IID数据下实现高效、鲁棒且通信效率高的联邦学习的研究论文。文章作者包括Felix Sattler、Simon Wiedemann、Klaus-Robert Müller和Wojciech Samek。联邦学习允许多个参与者在不泄露本地数据的情况下联合训练深度学习模型，但这一过程会产生显著的通信开销。为了解决这个问题，文中探讨了分布式训练文献中提出的几种压缩方法，这些方法可以将所需的通信量减少三个数量级。然而，现有的方法在联邦学习环境中效果有限，要么只压缩客户端到服务器的上行通信，而不压缩下行通信，要么只在客户端数据理想分布（即IID分布）的情况下表现良好。" 联邦学习（Federated Learning）是一种保护隐私的协作学习方式，它允许各个参与节点（如设备或机构）在本地处理数据，而无需将数据上传至中央服务器。这种方式在确保数据隐私的同时，也引入了一个主要挑战：大量的通信开销。在联邦学习过程中，模型的参数需要在服务器和各个客户端之间频繁交换，这导致了高昂的通信成本。为降低通信开销，研究人员已经提出了多种压缩策略。这些策略包括但不限于：梯度压缩（Gradient Compression）、量化（Quantization）、剪枝（Pruning）和稀疏传输（Sparse Transmission）。例如，通过压缩梯度，可以减少在网络上传输的信息量；量化技术则可以将浮点数表示转换为更小的数据类型，如二进制或整数，从而降低带宽需求；剪枝则可以去除模型中对总体性能影响较小的参数，减少传输的参数数量；稀疏三元压缩（Sparse Ternary Compression）是其中一种有效的压缩方法，它将参数表示为三值（正、零、负），并仅传输非零元素，极大地减少了通信负担。然而，现有的压缩技术通常存在局限性。在联邦学习的上下文中，由于客户端数据的非独立同分布（Non-IID）特性，即每个客户端的数据分布可能与整体分布显著不同，这使得模型训练变得更加复杂。传统的压缩方法在面对这种非IID数据时可能会导致收敛速度减慢或模型性能下降。本文旨在提出一种针对非IID数据的鲁棒和通信高效的联邦学习算法。研究可能涵盖了如何适应非IID数据的压缩策略，以及如何同时优化上行和下行通信的压缩方法。作者可能会探讨如何在保持模型性能的同时，克服非IID数据带来的挑战，以实现更有效的跨设备联邦学习。这样的工作对于扩大联邦学习的应用范围，尤其是在设备资源有限和网络条件较差的场景下，具有重要的实践意义。

SATTLER ET AL. – ROBUST AND COMMUNICATION-EFFICIENT FEDERATED LEARNING FROM NON-IID DATA 4

0 20000 40000

Iterations

0.2

0.4

0.6

0.8

Accuracy

VGG11* @ CIFAR

IID Data

FedAvg

n=100

no comp.

signSGD

sparse top-k

p=0.01

0 20000 40000

Iterations

0.2

0.4

0.6

0.8

VGG11* @ CIFAR

NON-IID Data (2)

0 20000 40000

Iterations

0.2

0.4

0.6

0.8

VGG11* @ CIFAR

NON-IID Data (1)

0 5000 10000

Iterations

0.6

0.7

0.8

0.9

Accuracy

Logistic @ MNIST

IID Data

0 5000 10000

Iterations

0.6

0.7

0.8

0.9

Logistic @ MNIST

NON-IID Data (2)

0 5000 10000

Iterations

0.6

0.7

0.8

0.9

Logistic @ MNIST

NON-IID Data (1)

Fig. 2: Convergence speed when using different compression

methods during the training of VGG11*

on CIFAR-10 and

Logistic Regression on MNIST and Fashion-MNIST in a

distributed setting with 10 clients for iid and non-iid data.

In the non-iid cases, every client only holds examples from

exactly two respectively one of the 10 classes in the dataset.

All compression methods suffer from degraded convergence

speed in the non-iid situation, but sparse top-k is affected by

far the least.

reducing the bit size per update by a factor of

×32

. signSGD

also incorporates download compression by aggregating the

binary updates from all clients by means of a majority vote.

Other authors propose to stochastically quantize the gradients

during upload in an unbiased way (TernGrad [

], QSGD

[

], ATOMO [

]). These methods are theoretically appealing,

as they inherit the convergence properties of regular SGD

under relatively mild assumptions. However their empirical

performance and compression rates do not match those of

sparsiﬁcation methods.

Out of all the above listed methods, only Federated Averaging

and signSGD compress both the upstream and downstream

communication. All other methods are of limited utility in the

Federated Learning setting deﬁned in Section II as they leave

the communication from the server to the clients uncompressed.

Notation:

In the following calligraphic

will refer to the

entirety of parameters of a neural network, while regular

uppercase

refers to one speciﬁc tensor of parameters within

and lowercase

refers to one single scalar parameter of

the network. Arithmetic operations between neural network

parameters are to be understood element-wise.

IV. LIMITATIONS OF EXISTING COMPRESSION METHODS

The related work on efﬁcient distributed deep learning almost

exclusively considers iid data distributions among the clients,

i.e. they assume unbiasedness of the local gradients with respect

to the full-batch gradient according to

x∼p

[∇

l(x, W)] = ∇

R(W) ∀i = 1, .., n (2)

where

is the distribution of data on the

-th client and

R(W)

is the empirical risk function over the combined training data.

While this assumption is reasonable for parallel training

where the distribution of data among the clients is chosen by the

practitioner, it is typically not valid in the Federated Learning

setting where we can generally only hope for unbiasedness in

the mean

i=1

∼p

[∇

l(x

, W)] = ∇

R(W) (3)

while the individual client’s gradients will be biased towards

the local dataset according to

x∼p

[∇

l(x, W)] = ∇

(W) 6= ∇

R(W) ∀i = 1, .., n.

(4)

As it violates assumption

(2)

, a non-iid distribution of

the local data renders existing convergence guarantees as

formulated in [

][

] inapplicable and has dramatic

effects on the practical performance of communication-efﬁcient

distributed training algorithms as we will demonstrate in the

following experiments.

A. Preliminary Experiments

We run preliminary experiments with a simpliﬁed version of

the well-studied 11-layer VGG11 network [

], which we train

on the CIFAR-10 [

] dataset in a Federated Learning setup

using 10 clients. For the iid setting we split the training data

randomly into equally sized shards and assign one shard to

every one of the clients. For the "non-iid (

)" setting we assign

every client samples from exactly

classes of the dataset. The

data splits are non-overlapping and balanced such that every

client ends up with the same number of data points. The detailed

procedure that generates the split of data is described in Section

B of the appendix. We also perform experiments with a simple

logistic regression classiﬁer, which we train on the MNIST

dataset [

] under the same setup of the Federated Learning

environment. Both models are trained using momentum SGD.

To make the results comparable, all compression methods use

the same learning rate and batch size.

We denote by VGG11* a simpliﬁed version of the original VGG11

architecture described in [

], where all dropout and batch normalization

layers are removed and the number of convolutional ﬁlters and size of all

fully-connected layers is reduced by a factor of 2.

剩余16页未读，继续阅读

士多霹雳酱

粉丝: 23
资源: 299

稀疏三元压缩技术：优化非IID数据下的通信效率联邦学习

2019-Survey杨强-安全方面相关工作+横纵向分类-Federated+Machine+Learning-+Concept

Federated-Learning-and-Split-Learning-with-raspberry-pi:SRDS 2020

matlab分时代码-Communication-Efficient-Federated-Learning:高效沟通的联合学习

ReputationBased-Regional-Federated-Learning-for-Knowledge-Tradin

Federated-Learning-Project

federated-learning-chinese

federated-learning-addon

FedMMD-Heterogenous-Federated-Learning-based-on-Multi-teacher

federated-learning

Federated-Learning-PyTorch:实施基于分散数据的深度网络通信高效学习

最新资源