联邦学习在分布式医疗数据库中的大规模脑部数据分析

需积分: 0 149 浏览量更新于2024-08-05 收藏 1.51MB PDF 举报

本文档探讨了2018年在分布式医学数据库中应用的联邦学习（Federated Learning, FL）技术，特别是针对大规模皮质下脑部数据的元分析。作者团队由来自不同国家和机构的研究人员组成，包括法国的大学、美国的伊利诺伊理工学院、哥伦比亚的国立大学、洛杉矶的南加州大学史蒂文斯神经影像与信息学研究所以及伦敦大学学院的医学成像计算中心。在当今，全球的数据库储存着数量空前的脑部图像，这些数据为深入理解大脑疾病背后的遗传机制提供了巨大潜力。然而，由于隐私和法律限制，不同的数据集通常不能直接共享，这在一定程度上制约了大数据在研究大脑疾病方面的全面利用。为解决这一问题，研究者提出了一种基于联邦学习的框架，它能够在保护数据隐私的同时，实现跨机构的数据协作和模型训练。联邦学习是一种分布式机器学习方法，它允许数据存储在本地设备或数据中心，而无需将原始数据集传输到一个中央位置。在医疗领域，这意味着医疗机构可以在遵守法规的前提下，利用各自拥有的患者数据进行模型训练，从而提高模型的准确性和可靠性，同时保护了患者的个人隐私。论文的核心内容包括对现有联邦学习算法在处理医学图像数据上的适应性分析，如何处理异构数据源的问题，以及如何通过安全通信协议确保数据交换过程中的保密性。此外，研究还可能讨论了模型的性能评估，如模型精度、模型泛化能力和在处理大规模皮质下结构（如灰质、白质等）时的效果。为了实现这个框架，可能采用了分层的系统架构，其中包括数据预处理步骤、加密技术（如同态加密或差分隐私）、模型聚合和更新策略，以及可能的模型压缩技术来减少通信开销。研究者们还可能探讨了如何处理不均衡的数据分布、迁移学习的应用，以及如何在保证数据隐私的前提下，实现跨组织的知识共享。这篇论文为解决医学数据隐私保护与大规模数据分析之间的矛盾提供了一个创新的解决方案，展示了联邦学习在分布式医学数据库中的实际应用潜力，为未来大脑疾病研究的发展铺平了道路。

on schemes analysis through Alternating Direction Method of Multipliers

(ADMM) reducing the amount of iterations.

We illustrate the framework leveraging on the ENIMGA Shape tool,

to provide a ﬁrst application of federated analysis compatible with the

standard ENIGMA pipelines. It should be noted that, even though this

work is here illustrated for the analysis of subcortical brain changes in

neurological diseases, it can be extended to general multimodal multi-

variate analysis, such as to imaging-genetics studies.

The framework is benchmarked on synthetic data (section 3.1). It

is then applied to the analysis of subcortical thickness and shape fea-

tures across diseases from multi-centric, multi-database data includ-

ing: Alzheimer’s disease (AD), progressive and non-progressive mild

cognitive impairment (MCIc, MCInc), Parkinson’s disease (PD) and healthy

individuals (HC) (section 3.2).

2 Methods

Biomedical data is assumed to be partitioned across diﬀerent centers

restricting the access to individual information. However, centers can

individually share model parameters and run pipelines for feature ex-

traction.

We denote the global data (e.g., image arrays) and covariates (e.g.,

age, sex information) as respectively X and Y, obtained by concatenat-

ing respectively data and covariates of each center. Although these data

matrices cannot be computed in practice, this notation will be used to

illustrate the proposed methodology. In the global setting, variability

analysis can be performed by analyzing the global data covariance ma-

trix S.

For each center c ∈ {1, . . . , C} with N

subjects each, we denote by

= (x

)

i=1

and Y

= (y

)

i=1

the local data and covariates. The feature-

wise mean and standard deviation vectors of each center are denoted

and σ

The proposed framework is illustrated in Figure 1 and discussed in

section 2.1. It is based on three main steps: 1) data standardization, 2)

correction from confounding factors and 3) variability analysis.

Data standardization is a data pre-processing step, aiming to en-

hance the stability of the analysis and easing the comparison across

features. In practice, each feature is mapped to the same space by cen-

tering data feature-wise to zero-mean and by scaling to unit standard

deviation. However, this is ideally performed with respect to the statis-

tics from the whole study (global statistics). This issue is addressed by

proposing a distributed standardization method in section 2.1.1.

Confounding factors have a biasing eﬀect on the data. To correct

for this bias, it is usually assumed a linear eﬀect of the confounders

X = YW, that must be estimated and removed. However, for a dis-

tributed scenario, computing W is not straightforward, since the global

data matrix cannot be computed. We propose in section 2.1.2 to use

Alternating Direction Method of Multipliers (ADMM) to estimate a matrix

W shared among centers, closely approximating W. In particular, we

剩余10页未读，继续阅读

宏馨

粉丝: 23
资源: 293

联邦学习在分布式医疗数据库中的大规模脑部数据分析

2018-FL+erlang语言-Functional Federated Learning in Erlang (ffl-er

2018-FL+云计算-Federated Learning via Over-the-Air Computation1

Privacy-Preserving Machine Learning Using Federated Learning and Secure Aggregation

Multi-objective Evolutionary Federated Learning

federated learning mobile 开源

Blockchained On-Device Federated Learning

federated learning with non-iid data

请你提供5篇最新的federated learning论文

Federated Learning

数据中心的传统分布式学习

最新资源