STING : A Statistical Information Grid Approach to Spatial Data
Mining
Wei Wang
Department of Computer Science
University of California, Los
Angeles
CA 90095, U.S.A.
weiwang@cs.ucla.edu
Jiong Yang
Department of Computer Science
University of California, Los
Angeles
CA 90095, U.S.A.
jyang@cs.ucla.edu
Richard Muntz
Department of Computer Science
University of California, Los
Angeles
CA 90095, U.S.A.
muntz@cs.ucla.edu
Abstract
1 Introduction
Spatial data mining, i.e., discovery of interesting
characteristics and patterns that may implicitly
exist in spatial databases, is a challenging task
due to the huge amounts of spatial data and to the
new conceptual nature of the problems which
must account for spatial distance. Clustering and
region oriented queries are common problems in
this domain. Several approaches have been
presented in recent years, all of which require at
least one scan of all individual objects (points).
Consequently, the computational complexity is at
least linearly proportional to the number of
objects to answer each query. In this paper, we
propose a hierarchical statistical information grid
based approach for spatial data mining to reduce
the cost further. The idea is to capture statistical
information associated with spatial cells in such a
manner that whole classes of queries and
clustering problems can be answered without
recourse to the individual objects. In theory, and
confirmed by empirical studies, this approach
outperforms the best previous method by at least
an order of magnitude, especially when the data
set is very large.
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed
for direct commercial advantage, the VLDB copyright notice
and the title of the publication and its date appear, and notice
is given that copying is by permission of the Very Large Data
Base Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Proceedings of the 23rd VLDB Conference
Athens, Greece, 1997
In general, spatial data mining, or knowledge discovery in
spatial databases, is the extraction of implicit knowledge,
spatial
relations and discovery of interesting
characteristics and patterns that are not explicitly
represented in the databases. These techniques can play an
important role in understanding spatial data and in
capturing intrinsic relationships between spatial and
nonspatial data. Moreover, such discovered relationships
can be used to present data in a concise manner and to
reorganize
spatial databases to accommodate data
semantics and achieve high performance. Spatial data
mining has wide applications in many fields, including
GIS systems,
image database exploration, medical
imaging, etc.[Che97, Fay96a, Fay96b, Kop96a, Kop96b]
The amount of spatial data obtained from satellite,
medical imagery and other sources has been growing
tremendously in recent years. A crucial challenge in
spatial data mining is the efficiency of spatial data mining
algorithms due to the often huge amount of spatial data
and the complexity of spatial data types and spatial
accessing methods. In this paper, we introduce a new
STatistical INformation Grid-based method (STING) to
efficiently process many common “region oriented”
queries on a set of points. Region oriented queries are
defined later more precisely but informally, they ask for
the selection of regions satisfying certain conditions on
density, total area, etc. This paper is organized as follows.
We first discuss related work in Section 2. We propose
our statistical information grid hierarchical structure and
discuss the query types it can support in Sections 3 and 4,
respectively. The general algorithm as well as a detailed
example of processing a query are given in Section 5. We
analyze the complexity of our algorithm in Section 6. In
Section 7, we analyze the quality of STING’s result and
propose a sufficient condition under which STING is
guaranteed to return the correct result. Limiting Behavior
of STING is in Section 8 and, in Section 9, we analyze the