移动边缘网络中的联邦学习：全面综述与隐私保护

需积分: 11 194 浏览量更新于2024-07-16 收藏 3.7MB PDF 举报

随着移动设备计算能力和深度学习（Deep Learning, DL）的快速发展，越来越多的应用场景得以实现，如医疗诊断和车联网。然而，传统的云计算驱动的机器学习（Machine Learning, ML）方法面临着数据集中化带来的延迟和通信效率问题。为了应对这些问题，移动边缘计算（Mobile Edge Computing, MEC）应运而生，它将智能推向数据产生的边缘，降低了对远程服务器的依赖。然而，传统ML在移动边缘网络中的应用仍然面临挑战，即需要终端设备共享个人数据，这在数据隐私保护日益严格的法规背景下显得尤为敏感。为了解决这一矛盾，联邦学习（Federated Learning, FL）作为一种新兴的分布式学习方式，逐渐引起了研究者的关注。FL的核心思想是在保持数据本地化的同时，让终端设备利用自身的数据对全球模型进行训练和更新，最终仅上传模型参数而非原始数据到中心服务器进行聚合，实现了隐私保护与性能提升的双重目标。在移动边缘网络中采用FL的优势显著：首先，由于数据不离开设备，用户的隐私得到了极大保障，符合当前隐私保护法规的要求；其次，通过本地计算和模型交换，FL可以减少网络流量和延迟，提高服务响应速度，这对于实时性要求高的应用场景至关重要；最后，FL还可以支持在设备间进行模型协同训练，增强模型的泛化能力和适应性，尤其适合资源受限的移动设备环境。尽管FL在移动边缘网络中展现出巨大潜力，但该领域仍面临一些挑战，如系统安全、数据不均衡、隐私保护算法的优化、以及如何设计高效的通信和同步机制等。研究人员正在不断探索新的算法和技术，以克服这些限制，并进一步提升FL在移动边缘网络中的实用性和效率。总结来说，这篇综述论文深入探讨了联邦学习在移动边缘网络中的应用，包括其基本原理、优势、挑战以及未来的研究方向，为理解和实施这种新兴技术提供了全面的视角。随着数据隐私法规的严格性和用户隐私意识的提升，FL有望成为推动移动边缘网络迈向更智能、更安全未来的关键技术。

TABLE II: Loss functions of common ML models

Model

Loss function L(w

)

Neural network

j=1

− f(x

; w))

(Mean Squared Error)

Linear

regression



− w



K-means

− f(x

; w)k

(f(x

; w) is the centroid of all objects

assigned to x

’s class)

squared-SVM

[

j=1

max(0, 1 − y

− bias))]

+λ



(bias is the bias parameter and

λ is const.)

Algorithm 1 Federated averaging algorithm [23]

Require: Local minibatch size B, number of participants m per iteration, number of

local epochs E, and learning rate η.

Ensure: Global model w

1: [Participant i]

2: LocalTraining(i, w):

3: Split local dataset D

to minibatches of size B which are included into the set B

4: for each local epoch j from 1 to E do

5: for each b ∈ B

6: w ← w − η∆L(w; b) (η is the learning rate and ∆L is the gradient

of L on b.)

7: end for

8: end for

10: [Server]

11: Initialize w

12: for each iteration t from 1 to T do

13: Randomly choose a subset S

of m participants from N

14: for each partipant i ∈ S

parallely do

15: w

t+1

← LocalTraining(i, w

)

16: end for

17: w

i∈N

i=1

(Averaging aggregation)

18: end for

Thereafter, in Step 2, the participant i implements the local

training and optimizes the target in (3) on minibatches from

the original local dataset (lines 2-8). Note that a minibatch

refers to a randomized subset of each participant’s dataset. At

the t

iteration (line 17), the server minimizes the global loss

in (4) by the averaging aggregation which is formally deﬁned

i∈N

i=1

. (5)

The FL training process is iterated till the global loss function

converges, or a desirable accuracy is achieved.

C. Statistical Challenges of FL

Following an elaboration of the FL training process in the

previous section, we now proceed to discuss the statistical

challenges faced in FL.

In traditional distributed ML, the central server has access

to the whole training dataset. As such, the server can split the

dataset into subsets that follow similar distributions. The sub-

sets are subsequently sent to participating nodes for distributed

training. However, this approach is impractical for FL since

the local dataset is only accessible by the data owner.

In the FL setting, the participants may have local datasets

that follow different distributions, i.e., the datasets of partic-

ipants are non-IID. While the authors in [23] show that the

aforementioned FedAvg algorithm is able to achieve desirable

accuracy even when data is non-IID across participants, the

authors in [66] found otherwise. For example, the accuracy

of a FedAvg-trained CNN model has 51% lower accuracy

than centrally-trained CNN model for CIFAR-10 [67]. This

deterioration in accuracy is further shown to be quantiﬁed by

the earth mover’s distance (EMD) [68], i.e., difference in FL

participant’s data distribution as compared to the population

distribution. As such, when data is non-IID and highly skewed,

data-sharing is proposed in which a shared dataset with

uniform distribution across all classes is sent by the FL server

to each FL participant. Then, the participant trains its local

model on its private data together with the received data. The

simulation result shows that accuracy can be increased by

30% with 5% shared data due to reduced EMD. However,

a common dataset may not always be available for sharing by

the FL server. An alternative solution is subsequently discussed

in section IV.

The authors in [69] also ﬁnd that global imbalance, i.e.,

the situation in which the collection of data held across all

FL participants is class imbalanced, also leads to a deteri-

oration in model accuracy. As such, the Astraea framework

is proposed. On initialization, the FL participants ﬁrst send

their data distribution to the FL server. A rebalancing step is

introduced before training begins in which each participant

performs data augmentation [70] on the minority classes, e.g.,

through random rotations and shifts. After training on the aug-

mented data, a mediator is created to coordinate intermediate

aggregation, i.e., before sending the updated parameters to

the FL server for global aggregation. The mediator selects

participants with data distributions that best contributes to an

uniform distribution when aggregated. This is done through a

greedy algorithm approach to minimize the Kullback-Leibler

Divergence [71] between local data and uniform distribution.

The simuation results show accuracy improvement when tested

on imbalanced datasets.

The data on each participant’s device can also be het-

erogeneous in other ways, e.g., the number of training data

owned across participants can differ. The authors in [72]

propose learning separate, but structurally related models for

each participant. As such, concepts in multi-task learning

[73] can naturally be adopted to model such relationships.

Instead of minimizing the conventional loss function presented

previously in Table II, the loss function is modiﬁed to also

model the relationship amongst tasks. Then, the MOCHA

algorithm is proposed in which an alternating optimization

approach [74] is used to approximately solve the minimization

problem. Interestingly, MOCHA can be calibrated based on

the resource constraints of a participating device. For example,

the quality of approximation can be adaptively adjusted based

on network conditions and CPU states of the participating

devices. However, MOCHA cannot be applied to non-convex

DL models.

Apart from data heterogeneity, the convergence of a dis-

tributed learning algorithm is always a concern. Higher con-

vergence rate helps to save a large amount of time and

resources for the FL participants, and also signiﬁcantly in-

creases the success rate of the federated training since fewer

communication rounds will reduce participant dropouts. To

ensure convergence, the study in [75] propose FedProx, which

剩余29页未读，继续阅读

Rua一下

粉丝: 9
资源: 4

移动边缘网络中的联邦学习：全面综述与隐私保护

An Adaptive Power Pricing Scheme for Improved Fairness in Energy-Constrained Cooperative Networks

最新《联邦学习Federated Learning》报告

Federated Learning.zip

FedAvg算法在GitHub上有哪些实现，请列举一些例子

使用了MindSpore Federated框架的有具体名称的可以查询到的落地应用以及查询地址

federated learning mobile 开源

请推荐关于联邦学习的高引用英文文献

联邦学习pytorch代码

基于秘密共享的可验证联邦学习代码网址

federated learning实现代码

最新资源