基于约束高斯混合概率假设密度滤波的用户敏感数据识别方法

25 浏览量更新于2024-08-27 收藏 403KB PDF 举报

"用户敏感数据识别方法基于约束高斯混合概率假设密度滤波器" 本文主要探讨了一种在互联网环境中识别用户敏感数据的新方法，该方法结合了约束高斯混合概率假设密度滤波器（GM-PHD）和受限玻尔兹曼机（RBM）。在信息安全领域，保护用户敏感数据至关重要，因为这些数据可能包括个人隐私、财务信息或其他敏感信息，一旦泄露可能导致严重后果。因此，有效地识别并保护这些数据是网络安全研究的重点。首先，该方法通过引入权重约束对原始数据进行规范化处理，以降低噪声和异常值的影响，同时增强数据的可处理性。权重约束确保了在整个处理过程中，对那些更具敏感性的数据特征给予更高的重视。接下来，利用受限玻尔兹曼机（RBM）构建随机网络。RBM是一种无监督学习模型，能够从数据中学习潜在的特征表示。在这个过程中，通过定义收集到的特征模拟能量函数，RBM可以捕获数据的关键模式和结构，从而帮助识别可能的敏感信息。然后，GM-PHD滤波器在这一框架下发挥作用，用于生成敏感数据的特征权重。GM-PHD滤波是一种多目标跟踪算法，特别适用于动态环境中的目标检测和跟踪。它通过概率密度函数来估计目标状态，这在识别不断变化的敏感数据分布时非常有效。实验部分通过与GM-PGD滤波器和传统的高斯滤波器进行对比，使用MATLAB进行了仿真。评估指标包括过滤和跟踪性能、相关度、敏感词权重、聚类映射和高频逼近等方面。结果显示，该方法在所有比较中表现出更优的性能，尤其是在复杂数据环境中保持高精度和稳定性。本文关键词：敏感数据、权重约束、高斯混合概率假设密度、受限玻尔兹曼机。总结起来，该研究论文提出了一种创新的用户敏感数据识别方法，通过结合RBM的特征学习能力和GM-PHD滤波器的动态目标追踪特性，提高了在海量网络数据中识别敏感信息的效率和准确性。这种方法对于提升网络安全，特别是个人信息保护具有重要意义。

Abstract—In order to identify the sensitive data of users in

Internet, a sensitive data identification method is proposed by weight

constraint Gaussian Mixture-Probability Hypothesis Density

(GM-PHD) filter and Restricted Boltzmann Machines (RBM) in this

thesis. At first, the data is normalized with weight constraint in this

method, and the random network is formed by the definition of the

collected characteristic simulation energy function of RBM. Then, the

sensitive feature weight of sensitive data is generated in GM-PHD

filter. Finally, the simulation experiments are conducted to study this

method performance compared with GM-PGD filter, Gaussian filter

by MATLAB, including filtering and tracking performance, relevancy

degree, sensitive words weight, cluster mapping and high frequency

approximation. The results show that, compared with other methods,

this method has better performance.

Keywords—

sensitive data, weight constraint, Gaussian

mixture-probability hypothesis density, restricted Boltzmann machine.

I. INTRODUCTION

ITH the development of science and technology, people’s

dependency on the Internet intensifies. Greater attention

has been paid to sensitive data with the popularity of

applications growing in modern life [1-4]. The common

collection of sensitive data involves Oracle Database, Android,

cloud environment [5-7] etc. Since the dataflow generated from

big data is dynamic, it therefore is influenced by the actual uses

and the network environment. As a result, some of the sensitive

data cannot be identified effectively when data in large-scale

network integration exchange.

The fact that generated data flow can be clustered in data

This work was supported in part by Foundation of Zhejiang Educational

Committee for contract (Y201738610), and National Natural Science

Foundation of China (41275116, 61202464, 61472136 and 61772196).

Zhengqiu Lu is with the Department of Information & Media. Zhejiang

Fashion Institute of Technology, Ningbo 315175, Zhejiang, China

(corresponding author; e-mail: 459246322@qq.com).

Shengjun Xue is with the Department of Computer & Software, Nanjing

University of Information Science & Technology, Nanjing 210044, Jiangsu,

China.

Chunliang Zhou is with the Department of Information & Engineering,

Dahongying University, Ningbo 315175, Zhejiang, China.

Quanping Hua is with the Department of Information & Media. Zhejiang

Fashion Institute of Technology, Ningbo 315175, Zhejiang, China.

Defa Hu is with the Computer and Information Engineering, Hunan

University of Commerce, Changsha 410205, Hunan, China

Weijin Jiang is with the Computer and Information Engineering, Hunan

University of Commerce, Changsha 410205, Hunan, China

transmission, and some applications have the behavior that

notify cluster data flow on their own initiative, some research

indicates that the disclosure of sensitive data occurs in the

course of initiative notification of cluster information. So, to

identify sensitive data effectively is of great significance. As

present, there are two major ways of sensitive data

identification: data dictionary matching and artificial

identification. To prevent the loss in economy and reputation

due to the disclosure of sensitive data, some sensitive data can

be secured by secret key encryption or another is setting up

protection barrier by popularity of cloud computing [8]. Among

which the main protective method is to use labels for sensitive

data identification in numerous data. Nowadays smart phones

are the important collection locations of sensitive data, and

some of the Android malware can associate one and another

automatically [9]. Literature [10] puts forward an Android

malware detection method based on permission sequential

pattern mining algorithm, it designs the mining algorithm to

permission sequential detection for malware, and warns

sensitive information, which could be produced when using

malware. However, this method lacks accuracy because the

permission mode can be applied in normal applications.

Sensitive data plays a significant role in other aspects

information, and the database normally protects sensitive data

with encryption algorithm, for example, using transparent data

to encrypt [11] the sensitive data in Oracle database. However,

the access control depends on the authorization of external

functions, yet it lacks pertinence identification.

So, this thesis puts forward a method to identify sensitive

data based on weight constraint GM-PHD [12-16] filter and

RBM [17-19]. It is primarily built on the random Neutral

network model based on probability, and which is normalized

with weight constraint. And finally, it can extract the features of

sensitive data and the structure of belief network effectively.

Meanwhile, the successful detection rate of the sensitive data in

stimulation neutral network can be improved by calculating the

probability of the sensitive words which occur frequently and

maximizing the eligible sample probability.

II. SENSITIVE

DATA FEATURE MODEL

Sensitive data occur frequently in online applications, and in

general, a malware involving sensitive data will generate the

cooperation between several permission frequent itemsets. In

addition, association rules and cluster mapping are formed

User Sensitive Data Identification Method Based

on Constraint Gaussian Mixture-Probability

Hypothesis Density Filter

Zhengqiu Lu, Shengjun Xue, Chunliang Zhou and Quanping Hua, Defa Hu, Weijin Jiang

INTERNATIONAL JOURNAL OF CIRCUITS, SYSTEMS AND SIGNAL PROCESSING

Volume 12, 2018

ISSN: 1998-4464

367

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38570278

粉丝: 4
资源: 978

基于约束高斯混合概率假设密度滤波的用户敏感数据识别方法

2004 Locality-sensitive hashing scheme based on p-stable distributions.pdf

A Simple, Rapid and Sensitive Method for the Quantification of Methyl-Beta-Cyclodextrin

Highly sensitive torsion sensor based on triangular-prism-shaped long-period fiber gratings

Polymer gratings based on photopolymerization for low-order distributed feedback polymer lasers

Highly sensitive and selective ethanol sensor based on micron-sized zinc oxide porous-shell hollow spheres

Cirrhosis classification based on MRI with duplicative-feature support vector machine (DFSVM)

A NEW QUANTITATIVE DETECTION METHOD OF RECOMBINANT CFP10-ESAT6 AMALGAMATION PROTEINS FROM MYCOBACTERIUM TUBERCULOSIS BASED ON MICRO-MAGNETIC PROBES STRATEGY

Polymer diaphragm based sensitive fiber optic Fabry-Perot acoustic sensor

Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction

Highly sensitive and wide-dynamic-range liquid-prism surface plasmon resonance refractive index sensor based on the phase and angular interrogations

最新资源