利用同态加密与联邦学习的高级数据融合架构

需积分: 0 130 浏览量更新于2024-06-14 收藏 4.96MB PDF 举报

本文探讨了一种先进的数据架构，它利用同态加密（Homomorphic Encryption）和联邦学习（Federated Learning），旨在提升数据融合的自动化和人工智能驱动能力。数据融合（Data Fusion）是当今信息技术领域中的关键概念，它涉及将来自不同源的异构数据整合在一起，以提取有价值的信息和洞察。传统的数据管理方法可能面临数据安全和隐私保护的挑战，尤其是在处理敏感数据时。同态加密是一种加密技术，允许在加密状态下对数据进行计算，而无需先解密。这意味着数据可以在不暴露原始内容的情况下进行处理，从而满足严格的隐私法规和合规性要求。这在大数据时代尤为重要，因为数据通常分布在多个地点，并且必须在保持数据完整性和机密性的前提下进行分析。联邦学习作为一种分布式机器学习方法，允许多个设备或机构在本地处理数据，仅将模型更新共享，而非原始数据。这种分散的学习方式解决了数据隐私问题，同时避免了集中式存储带来的风险。结合同态加密和联邦学习，数据可以在不泄露原始数据的情况下进行分析，提高了数据安全性，降低了数据传输和存储的成本。文章的作者们来自BRAC University、King Saud University和University of Calabria的计算机科学和工程、信息系统以及信息学部门，他们共同开发并提出了一种创新的数据架构设计。这种架构通过集成同态加密的高效计算能力和联邦学习的隐私保护特性，构建了一个智能的数据融合平台，旨在优化数据管理流程，支持大规模、复杂的数据分析任务，同时满足现代企业的隐私保护需求。该研究对于企业来说具有重要意义，因为它提供了实现数据价值的同时，确保数据安全和合规性的一种新颖解决方案。它可能在金融、医疗、政府等对数据隐私高度关注的行业中有广泛应用潜力，推动了未来数据驱动决策和智能服务的发展。随着数据量的增长和监管环境的强化，这种基于同态加密和联邦学习的数据融合架构有望成为数据管理领域的重要趋势。

Information Fusion 102 (2024) 102004

S.A. Rieyan et al.

Metadata management: Data governance policies should

include metadata management to ensure that data is

properly tagged, categorized, and classified. Metadata

helps organizations understand the meaning and context

of their data and facilitates data discovery and reuse.

Data privacy and security: Data governance policies

should include measures to ensure data privacy and se-

curity, such as access controls, data encryption, and data

masking.

Data lineage: Data governance policies should include

data lineage to track the origin, transformation, and

movement of data across the organization. Data lineage

helps organizations understand how data is used and

facilitates compliance with regulatory requirements.

Data ownership and stewardship: Data governance

policies should define data ownership and stewardship

to ensure that data is managed and maintained by the

appropriate individuals and teams.

(b) Compliance: Data compliance refers to adhering to rel-

evant laws, regulations, and industry standards related to

the handling, processing, and storage of data. In the con-

text of data fabric, data compliance refers to ensuring that

data is managed in accordance with these requirements

across the entire data fabric. To ensure data compliance

in a data fabric, it is necessary to establish policies and

procedures that cover the entire data lifecycle, from data

ingestion to archival and deletion. Personal data must

be collected, processed, and stored in compliance with

privacy regulations such as GDPR, CCPA, HIPAA, etc.

In this architecture, the feature selected weights were

trained on various models such as VGG16, VGG19, ResNet

50, ResNet 152, etc. and updates are stored in a data

lake structurally based on models they were trained on

complying with HIPAA regulations.

(iii) Exposing Data: Data exposure refers to making data available

for consumption and analysis by users or applications within an

organization. Exposing data in a data fabric involves providing

access to the data for authorized users or applications. There are

several ways to expose data in a data fabric, including:

APIs: Application Programming Interfaces (APIs) enable appli-

cations to access and retrieve data from the data fabric.

Data Catalogs: A data catalog provides a searchable inventory

of data assets in the data fabric. Users can discover and access

data assets through the data catalog.

Self-Service Analytics: A self-service analytics platform enables

users to create their own queries and reports using the data

available in the data fabric.

Data Virtualization: Data virtualization enables users to access

and combine data from multiple sources as if it were in a single

location.

2.2. Federated learning

Federated Learning is a distributed machine learning technique that

enables multiple clients to collaboratively learn a shared model without

exchanging their raw data. This technique has gained popularity in

recent years due to its ability to preserve data privacy and security

while improving model performance. As shown in Fig. 2, each client

trains a local model using its own data and then sends the local model

weights to a central server. The central server then aggregates the local

model weights to update a global model that is shared among all clients.

This process continues iteratively until the global model achieves the

desired level of accuracy. According to the report [18], Federated

Learning has been successfully applied to various domains, such as

speech recognition, natural language processing, and healthcare, where

data privacy is a major concern.

2.3. Homomorphic encryption

A cryptographic method called homomorphic encryption enables

mathematical operations to be carried out on ciphertext without ex-

posing the underlying plaintext. Table 1 offers a comprehensive view of

different types of homomorphic encryption along with the distinctions

that set them apart. In our research, we used partially homomorphic

encryption to encrypt sensitive medical images, specifically brain MRI

scans.

Partially homomorphic encryption (PHE) is a type of homomorphic

encryption that only supports a limited set of mathematical operations,

such as addition or multiplication. By encrypting the medical images

using this technique, we were able to process and analyze the data

without exposing the sensitive information contained within it. Fig. 3

provides an illustrative overview of Partially Homomorphic Encryption

(PHE) Technique at the Pixel Level to Dataset Images.

One major benefit of using partially homomorphic encryption in this

context is that it ensures the confidentiality of medical data. As medical

information is often highly sensitive and personal, it is important

to protect it from unauthorized access. By encrypting the data, we

were able to securely process and analyze it without compromising its

confidentiality.

In addition, partially homomorphic encryption allows for more

efficient processing of the encrypted data. Because the mathematical

operations can be performed directly on the ciphertext, there is no

need to decrypt the data first, which can be a time-consuming process.

This was particularly useful when working with large datasets or when

processing data in real time [19].

Overall, our use of partially homomorphic encryption proved to be

a successful and effective method for protecting the confidentiality of

sensitive medical images while still enabling their analysis.

2.3.1. The paillier encryption scheme

As previously mentioned, we are using Partial Homomorphic En-

cryption for our dataset which follows The Paillier Encryption Scheme.

The Paillier encryption scheme [20] is an additively homomorphic

cryptosystem based on the computational difficulty of the decisional

composite residuosity assumption. The scheme’s security relies on the

difficulty of factoring the product of two large prime numbers. Key

components of the Paillier encryption scheme are:

1. Key Generation: The key generation process creates a public

key (𝑛, 𝑔) and a private key (𝜆, 𝜇).

• 𝑛 is the product of two large prime numbers, kept secret

and known only to the data owner.

• 𝑔 is a public system parameter, typically set as 𝑔 = 𝑛 + 1.

• 𝜆 is Carmichael’s totient function [21], 𝜆 = 𝑙𝑐𝑚(𝑝 − 1, 𝑞 − 1),

where 𝑝 and 𝑞 are the large prime factors of 𝑛.

• 𝜇 is the modular multiplicative inverse of 𝜆 modulo 𝑛.

2. Encryption: Given a plaintext image 𝑥, the encryption process

is performed as follows:

𝐸𝑛𝑐(𝑥) = (𝑔

𝑥

× 𝑥

𝑛

) % 𝑛

(1)

where 𝑥 is a random value chosen for each encryption, ensuring

probabilistic encryption.

2.3.2. Homomorphic operations

1. Addition: Addition on encrypted values is akin to combining the

original plaintext values after decryption. Given two encrypted

images 𝐸𝑛𝑐(𝑥) and 𝐸𝑛𝑐(𝑦), the homomorphic addition can be

performed as:

𝐸𝑛𝑐(𝑥 + 𝑦) = 𝐸𝑛𝑐(𝑥) × 𝐸𝑛𝑐(𝑦) % 𝑛

(2)

剩余17页未读，继续阅读

yuan_0012

粉丝: 9
资源: 6

利用同态加密与联邦学习的高级数据融合架构

IDC #BigData2013 Leveraging the Value of Big Data

Networkers2009：BRKCAM-3011 - Advanced Enterprise Campus Design: Leveraging Virtual Switch System

Threat Forecasting Leveraging Big Data for Predictive Analysis 无水印pdf

藏经阁-Leveraging Spark to Democratize Data for Omni Commerce.pdf

Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

Creating Smart Enterprises: Leveraging Cloud, Big Data, Web, Social Media

An Adaptive Efficiency-Fairness Meta-scheduler for Data-Intensive Computing

Leveraging Dual-observable Input

system architecture

Continuous Delivery leveraging on Docker Caas

最新资源