Information Fusion 102 (2024) 102004
4
S.A. Rieyan et al.
Metadata management: Data governance policies should
include metadata management to ensure that data is
properly tagged, categorized, and classified. Metadata
helps organizations understand the meaning and context
of their data and facilitates data discovery and reuse.
Data privacy and security: Data governance policies
should include measures to ensure data privacy and se-
curity, such as access controls, data encryption, and data
masking.
Data lineage: Data governance policies should include
data lineage to track the origin, transformation, and
movement of data across the organization. Data lineage
helps organizations understand how data is used and
facilitates compliance with regulatory requirements.
Data ownership and stewardship: Data governance
policies should define data ownership and stewardship
to ensure that data is managed and maintained by the
appropriate individuals and teams.
(b) Compliance: Data compliance refers to adhering to rel-
evant laws, regulations, and industry standards related to
the handling, processing, and storage of data. In the con-
text of data fabric, data compliance refers to ensuring that
data is managed in accordance with these requirements
across the entire data fabric. To ensure data compliance
in a data fabric, it is necessary to establish policies and
procedures that cover the entire data lifecycle, from data
ingestion to archival and deletion. Personal data must
be collected, processed, and stored in compliance with
privacy regulations such as GDPR, CCPA, HIPAA, etc.
In this architecture, the feature selected weights were
trained on various models such as VGG16, VGG19, ResNet
50, ResNet 152, etc. and updates are stored in a data
lake structurally based on models they were trained on
complying with HIPAA regulations.
(iii) Exposing Data: Data exposure refers to making data available
for consumption and analysis by users or applications within an
organization. Exposing data in a data fabric involves providing
access to the data for authorized users or applications. There are
several ways to expose data in a data fabric, including:
APIs: Application Programming Interfaces (APIs) enable appli-
cations to access and retrieve data from the data fabric.
Data Catalogs: A data catalog provides a searchable inventory
of data assets in the data fabric. Users can discover and access
data assets through the data catalog.
Self-Service Analytics: A self-service analytics platform enables
users to create their own queries and reports using the data
available in the data fabric.
Data Virtualization: Data virtualization enables users to access
and combine data from multiple sources as if it were in a single
location.
2.2. Federated learning
Federated Learning is a distributed machine learning technique that
enables multiple clients to collaboratively learn a shared model without
exchanging their raw data. This technique has gained popularity in
recent years due to its ability to preserve data privacy and security
while improving model performance. As shown in Fig. 2, each client
trains a local model using its own data and then sends the local model
weights to a central server. The central server then aggregates the local
model weights to update a global model that is shared among all clients.
This process continues iteratively until the global model achieves the
desired level of accuracy. According to the report [18], Federated
Learning has been successfully applied to various domains, such as
speech recognition, natural language processing, and healthcare, where
data privacy is a major concern.
2.3. Homomorphic encryption
A cryptographic method called homomorphic encryption enables
mathematical operations to be carried out on ciphertext without ex-
posing the underlying plaintext. Table 1 offers a comprehensive view of
different types of homomorphic encryption along with the distinctions
that set them apart. In our research, we used partially homomorphic
encryption to encrypt sensitive medical images, specifically brain MRI
scans.
Partially homomorphic encryption (PHE) is a type of homomorphic
encryption that only supports a limited set of mathematical operations,
such as addition or multiplication. By encrypting the medical images
using this technique, we were able to process and analyze the data
without exposing the sensitive information contained within it. Fig. 3
provides an illustrative overview of Partially Homomorphic Encryption
(PHE) Technique at the Pixel Level to Dataset Images.
One major benefit of using partially homomorphic encryption in this
context is that it ensures the confidentiality of medical data. As medical
information is often highly sensitive and personal, it is important
to protect it from unauthorized access. By encrypting the data, we
were able to securely process and analyze it without compromising its
confidentiality.
In addition, partially homomorphic encryption allows for more
efficient processing of the encrypted data. Because the mathematical
operations can be performed directly on the ciphertext, there is no
need to decrypt the data first, which can be a time-consuming process.
This was particularly useful when working with large datasets or when
processing data in real time [19].
Overall, our use of partially homomorphic encryption proved to be
a successful and effective method for protecting the confidentiality of
sensitive medical images while still enabling their analysis.
2.3.1. The paillier encryption scheme
As previously mentioned, we are using Partial Homomorphic En-
cryption for our dataset which follows The Paillier Encryption Scheme.
The Paillier encryption scheme [20] is an additively homomorphic
cryptosystem based on the computational difficulty of the decisional
composite residuosity assumption. The scheme’s security relies on the
difficulty of factoring the product of two large prime numbers. Key
components of the Paillier encryption scheme are:
1. Key Generation: The key generation process creates a public
key (𝑛, 𝑔) and a private key (𝜆, 𝜇).
• 𝑛 is the product of two large prime numbers, kept secret
and known only to the data owner.
• 𝑔 is a public system parameter, typically set as 𝑔 = 𝑛 + 1.
• 𝜆 is Carmichael’s totient function [21], 𝜆 = 𝑙𝑐𝑚(𝑝 − 1, 𝑞 − 1),
where 𝑝 and 𝑞 are the large prime factors of 𝑛.
• 𝜇 is the modular multiplicative inverse of 𝜆 modulo 𝑛.
2. Encryption: Given a plaintext image 𝑥, the encryption process
is performed as follows:
𝐸𝑛𝑐(𝑥) = (𝑔
𝑥
× 𝑥
𝑛
) % 𝑛
2
(1)
where 𝑥 is a random value chosen for each encryption, ensuring
probabilistic encryption.
2.3.2. Homomorphic operations
1. Addition: Addition on encrypted values is akin to combining the
original plaintext values after decryption. Given two encrypted
images 𝐸𝑛𝑐(𝑥) and 𝐸𝑛𝑐(𝑦), the homomorphic addition can be
performed as:
𝐸𝑛𝑐(𝑥 + 𝑦) = 𝐸𝑛𝑐(𝑥) × 𝐸𝑛𝑐(𝑦) % 𝑛
2
(2)