异常检测技术在恶劣环境无线传感器网络中的特性与分类综述

需积分: 9 159 浏览量更新于2024-07-24 收藏 822KB PDF 举报

"这篇论文是关于无线传感器网络(WSNs)中的异常检测特性和分类的综述，重点关注在恶劣环境下的应用。论文讨论了异常检测技术的关键特征，如输入数据类型、空间-时间关联性、属性相关性、用户指定阈值、异常类型（局部和全局）、方法（分布式/集中式）、异常识别（事件或错误）、异常程度、异常评分、对动态拓扑的敏感性、非稳态和不均匀性。此外，还探讨了在恶劣环境下异常检测技术的各种特性优先级，并概述了WSNs异常检测技术的分类策略，讨论了不同技术在恶劣环境部署中的可行性。" 这篇论文深入探讨了无线传感器网络中异常检测的重要性，特别是在恶劣环境下的应用。异常检测能够帮助过滤虚假数据，定位故障节点，以及发现感兴趣的事件。作者首先列出了分析异常检测技术所必需的基本特征： 1. 输入数据类型：不同的数据类型可能需要不同的异常检测算法，例如连续、离散或混合数据。 2. 空间-时间关联性：在WSNs中，数据通常具有空间和时间上的相关性，这需要考虑在检测过程中。 3. 属性相关性：传感器读数可能在多个属性间存在相关性，影响异常检测的准确性。 4. 用户指定阈值：用户可以根据应用需求设置阈值来定义什么是异常。 5. 异常类型：局部异常指单个节点的异常，而全局异常涉及整个网络的异常行为。 6. 方法：分布式方法允许网络中的每个节点独立检测异常，而集中式方法在中心节点进行处理。 7. 异常识别：区分异常是由于事件（如环境变化）还是错误（如硬件故障）引起。 8. 异常程度和异常评分：这些指标量化了异常的严重程度，有助于确定响应策略。 9. 动态拓扑敏感性：网络拓扑的变化可能影响异常检测的效率和准确性。 10. 非稳态和不均匀性：环境条件的变化可能导致数据分布的非稳态和不均匀性，需要适应性异常检测方法。论文还讨论了在恶劣环境下如何优先考虑这些特性，并介绍了异常检测技术的分类，比如基于统计、机器学习或深度学习的方法。这些分类有助于理解各种技术的适用场景和优势。最后，论文评估了不同技术在部署于恶劣环境的WSNs中的实用性，这对于实际应用中选择合适的技术至关重要。该研究提供了全面的视角，对于理解和改进WSNs中的异常检测策略，尤其是在极端环境下的应用，具有很高的参考价值。

Characteristics and classiﬁcation of outlier detection

3.1 Statistical based techniques

These techniques require an underlying data distribution model for the detection of outliers.

They assume or estimate a statistical (probability distribution) model which captures the

distribution of the data and evaluate data instances with respect to how well they ﬁt the

model. A data instance is d eclared as an outlier if the probability of the data instance to

be generated by this model is very low, based on the distance measure. These techniques

can further be classiﬁed as parametric or non-parametric. Parametric techniques assume

availability of the knowledge about underlying data distribution, i.e., the data is generated

from a known distribution. Distribution parameters are then estimated from the available

data. These techniques are based on either a Gaussian based model or a non-Gaussian model.

Non-parametric techniques do not assume availability of data distribution. They typically

deﬁne a distance measure between a new test instance and the statistical model and use

some kind of thresholds on this distance to determine whether the observation is an outlier.

Most widely used approaches in this respect are histogram, kernel density and wavelet based

approaches. Some of the statistical based techniques considered in this paper are Dereszynski

and Dietterich (2011), Zhang et al. (2012), Wu et al. (2007), Yozo et al. (2004), Jun et al.

(2005), Bettencourt et al. (2007), Sheng et al. (2007), Palpanas et al. (2003), Subramaniam

et al. (2006)

3.2 Nearest neighbor based techniques

Nearest neighbor-based approaches had been the most commonly used approaches to analyze

a data instance with respect to its nearest neighbors in the data mining and machine learning

community in the past. They use several well-deﬁned distance notions to compute the distance

(similarity measure) between two data instances. A data instance is declared as an outlier if

it is located far from its neighbors. Euclidean distance is a popular choice for univariate data,

whereas, multivariate continuous attributes are handled by Mahalanobis distance metric.

Some of the nearest neighbor based techniques considered in this paper are presented in

Branch et al. (2006), Zhang et al. (2007b), Zhuang and Chen (2006). These techniques have

not been the focus of research community recently due to the limitations that will be discussed

in the forth-coming sections.

3.3 Clustering based techniques

Grouping similar data instances into clusters with similar behavior is known as clustering.

Clustering algorithms can be either centralized or distributed. In centralized clustering algo-

rithms each node transmits its entire data to the gateway/ central node which then performs

data clustering. This approach is however communication inefﬁcient. In a distributed clus-

tering approach, all the nodes are able to perform clustering of the sensed data vectors and

then send speciﬁc parameters of clustered data to the gateway node to reduce communication

overhead. The nodes then use some distance measure from the nearest cluster to identify out-

liers (Bezdek et al. 2011; Rajasegarar et al. 2008a, 2010b, 2012; Moshtaghi et al. 2011a,b,c;

Bezdek et al. 2010; Suthaharan et al. 2010a,b).

3.4 Classiﬁcation based techniques

Classiﬁcation based techniques learn a classiﬁcation model using the set of data instances

during the training phase and then classify the data instance to one of the training classes

123

N. Shahid et al.

during the testing phase. These techniques can be either supervised or unsupervised. The

one-class unsupervised techniques learn the boundary around normal instances during train-

ing while some anomalous instance may exist and declare any new instance lying outside this

boundary as an outlier. The boundary may be deﬁned as a sphere or quarter-sphere. However

this type of classiﬁer may need to train itself according to the new arriving normal data sets. In

existing outlier detection methodologies for WSNs, classiﬁcation based approaches are cat-

egorized into support vector machine (SVM) based and Bayesian network based approaches

depending upon the type of classiﬁcation model that is used (Bahrepour et al. 2010b; Shahid

et al. 2012a,b; Shahid and Naqvi 2011; Luo et al. 2006; Elnahrawy and Nath 2004; Janakiram

et al. 2006; Hill et al. 2007; Zhang et al. 2009a,b; Rajasegarar et al. 2006, 2007, 2008a,b,

2010a).

4 Characteristics of outlier detection techniques for WSNs

This section identiﬁes and discusses several important aspects of state-of-the-art outlier detec-

tion techniques specially developed for WSNs and those presented in Sect. 3. These char-

acteristics can be used as metrics to determine the feasibility of different outlier detection

techniques for non-harsh environments. To be feasible for harsh environments, the outlier

detection techniques should satisfy an additional set of characteristics which will be dis-

cussed in the next section. Following is a detailed description of various characteristics and

their signiﬁcance for WSNs.

4.1 Energy efﬁciency

WSNs have been found to be of key importance in monitoring applications. Speciﬁcally, the

monitoring of remote and isolated environments has been a primary application of WSNs.

Consider for example the case of a WSN deployed in a forest for ﬁre detection, where a

large number of sensor motes are randomly placed at various locations. Due to the physical

constraints posed by a large forest, the replacement of the batteries of the sensor motes may

present severe problems. Therefore, it is essential that once the WSN has been deployed

in such an environment, the battery life should be made as long as possible by conserving

the amount of energy consumed in computations and communication. Outlier and event

detection techniques for WSNs can be made energy efﬁcient by ensuring the following two

characteristics.

4.1.1 Low computational and communication complexity

Sensor nodes in a WSN have limited power and a major portion of energy is consumed in

communication and computation, so techniques should be computation and communication

efﬁcient. It has been proved that the communication cost of a sensor node is several orders

of magnitude higher than the computation cost (Gupta and Kumar 2000; Shnayder et al.

2004). Over the recent years, signiﬁcant work has been dedicated to reduce communication

overhead in a WSN by increasing the computations at individual nodes of the network without

compromising the performance. These efforts have led to a signiﬁcant improvement in the

battery life of WSNs deployed in remote environments. Various outlier and event detection

techniques p roposed in the literature require computations at individual nodes of the network

followed by communication between the nodes of the network.

123

剩余35页未读，继续阅读

波浪屿的浪子

粉丝: 0
资源: 3

异常检测技术在恶劣环境无线传感器网络中的特性与分类综述

SQL Injection (SQL注入攻击)

A Classification of SQL Injection Attacks and countermeasure

Information sheet on the classification of agricultural products

Classification metrics can't handle a mix of unknown and continuous targets

Clinical Data Classification of Type 2 Diabetes Based on Machine Learning

sklearn.svm

我写了一篇综述名字为：Detection and classification of UAV in RF signal：A Review请帮我写出一级标题、二级标题和三级标题

Exploring Classification Equilibrium in Long-Tailed Object Detection

Context dependent classification

最新资源