Application of Cluster-Based Local Outlier Factor
Algorithm in Anti-Money Laundering
Gao Zengan
The research is supported by the National Social Science Foundation of China (No. 08BGJ013).
Post Doctoral Station of Theoretical Economics
China Center for Anti-Money Laundering Studies
Fudan University
Shanghai, P. R. China
School of Economics and Management
Southwest Jiaotong University
Chengdu, P. R. China
E-mail address: gaozengan133@163.com
Abstract—
Financial institutions’ capability in recognizing
suspicious money laundering transactional behavioral patterns
(SMLTBPs) is critical to anti-money laundering. Combining
distance-based unsupervised clustering and local outlier
detection, this paper designs a new cluster-based local outlier
factor (CBLOF) algorithm to identify SMLTBPs and use
authentic and synthetic data experimentally to test its
applicability and effectiveness.
Keywords-clustering; outlier detection; local outlier factor
(LOF); suspicious money laundering transactional behavioral
patterns (SMLTBPs); anti-money laundering (AML)
I. INTRODUCTION
Anti-money laundering (AML) in financial industry is
based on the analysis and processing of Suspicious Activity
Reports (SARs) filed by financial institutions (FIs), but the
very large number of SARs usually makes financial
intelligence units’ (FIUs’) analysis a waste of time and
resources simply because only a few transactions are really
suspicious in a given amount [1], so financial AML is far from
a real-time, dynamic, and self-adaptable recognition of
suspicious money laundering transactional behavioral patterns
(SMLTBPs). Literature review finds that artificial intelligence
[2], support vector machine (SVM) [3], outlier detection [4],
and break-point analysis (BPA) [5] are used to improve FIs’
ability in processing suspicious data, various approaches to
novelty detection on time series data are examined in [6],
outlier detection methodologies are surveyed by [7], and a
data mining-based framework for AML research is proposed
in [8] after a comprehensive comment is made on relative
studies. But the effectiveness and efficiency of SMLTBP
identification remains a hot spot for research since the passage
of the USA Patriot Act and the creation of the U.S.
Department of Homeland Security signaled a new era in
applying information technology and data mining in detecting
money laundering and terrorist financing [9].
As SMLTBP recognition is short of training data, the
number of clusters is usually unknown, and the result of
clustering is always changing dynamically [10, 11], this paper
designs a cluster-based local outlier factor (CBLOF) algorithm
to help FIUs concentrate on a desirable number of SMLTBPs
having a proper degree of suspiciousness as determined by
their actual needs and resources endowments. Following the
introduction, Section II describes the design of the algorithm,
Section III is about the experimental process, and Section IV
ends the paper with a suggestion for future research.
II.
ALGORITHM DESIGN
The CBLOF algorithm combines distance-based
unsupervised clustering and local outlier [12] detection, and
clustering is for the purpose of pre-processing data for the
consequent anomaly identification.
A. Clustering
As far as the nature of money laundering (ML) is
concerned, the chosen clustering algorithm should be able to
generate the number of clusters automatically (with no need
for pre-establishment) and all the clusters are to be ranked
according to the number of the components in each. Thus we
propose the following procedures:
1) Start with any object (say p) in a dataset and create a
cluster. The initial cluster is supposed to be C
1.
2) Choose any other object q, calculate its distance to the
existing clusters C
1
, C
2
, C
3
, …, C
i
and denote it by
(, )distance q C
i
, and then figure out the minimal distance
value
(, )distance q C
min
.
3)
Let the threshold be ε. If ( , )distance q C
min
ε
≤ holds
and “q has never been clustered” satisfies, add q to the cluster
C
i
which is assumed to be nearest to q when compared with all
978-1-4244-4639-1/09/$25.00 ©2009 IEEE