2 CHAPTER 1. AN INTRODUCTION TO OUTLIER ANALYSIS
• Interesting sensor events: Sensors are often used to track various environmen-
tal and location parameters in many real-world applications. Sudden changes in the
underlying patterns may represent events of interest. Event detection is one of the
primary motivating applications in the field of sensor networks. As discussed later in
this book, event detection is an important temporal version of outlier detection.
• Medical diagnosis: In many medical applications, the data is collected from a va-
riety of devices such as magnetic resonance imaging (MRI) scans, positron emission
tomography (PET) scans or electrocardiogram (ECG) time-series. Unusual patterns
in such data typically reflect disease conditions.
• Law enforcement: Outlier detection finds numerous applications in law enforcement,
especially in cases where unusual patterns can only be discovered over time through
multiple actions of an entity. Determining fraud in financial transactions, trading
activity, or insurance claims typically requires the identification of unusual patterns
in the data generated by the actions of the criminal entity.
• Earth science: A significant amount of spatiotemporal data about weather patterns,
climate changes, or land-cover patterns is collected through a variety of mechanisms
such as satellites or remote sensing. Anomalies in such data provide significant insights
about human activities or environmental trends that may be the underlying causes.
In all these applications, the data has a “normal” model, and anomalies are recognized as
deviations from this normal model. Normal data points are sometimes also referred to as
inliers. In some applications such as intrusion or fraud detection, outliers correspond to
sequences of multiple data points rather than individual data points. For example, a fraud
event may often reflect the actions of an individual in a particular sequence. The specificity
of the sequence is relevant to identifying the anomalous event. Such anomalies are also
referred to as collective anomalies, because they can only be inferred collectively from a set
or sequence of data points. Such collective anomalies are often a result of unusual events
that generate anomalous patterns of activity. This book will address these different types
of anomalies.
The output of an outlier detection algorithm can be one of two types:
• Outlier scores: Most outlier detection algorithms output a score quantifying the
level of “outlierness” of each data point. This score can also be used to rank the data
points in order of their outlier tendency. This is a very general form of output, which
retains all the information provided by a particular algorithm, but it does not provide
a concise summary of the small number of data points that should be considered
outliers.
• Binary labels: A second type of output is a binary label indicating whether a data
point is an outlier or not. Although some algorithms might directly return binary
labels, outlier scores can also be converted into binary labels. This is typically achieved
by imposing thresholds on outlier scores, and the threshold is chosen based on the
statistical distribution of the scores. A binary labeling contains less information than
a scoring mechanism, but it is the final result that is often needed for decision making
in practical applications.
It is often a subjective judgement, as to what constitutes a “sufficient” deviation for
a point to be considered an outlier. In real applications, the data may be embedded in a