
Spatial data mining and geographic knowledge discovery—An introduction
Jeremy Mennis
b,1
, Diansheng Guo
a,
*
a
Department of Geography, University of South Carolina, 709 Bull Street, Room 127, Columbia, SC 29208, United States
b
Department of Geography and Urban Studies, Temple University, 1115 W. Berks Street, 309 Gladfelter Hall, Philadelphia, PA 19122, United States
article info
Keywords:
Spatial data mining
Geographic knowledge discovery
abstract
Voluminous geographic data have been, and continue to be, collected with modern data acquisition tech-
niques such as global positioning systems (GPS), high-resolution remote sensing, location-aware services
and surveys, and internet-based volunteered geographic information. There is an urgent need for effec-
tive and efficient methods to extract unknown and unexpected information from spatial data sets of
unprecedentedly large size, high dimensionality, and complexity. To address these challenges, spatial
data mining and geographic knowledge discovery has emerged as an active research field, focusing on
the development of theory, methodology, and practice for the extraction of useful information and
knowledge from massive and complex spatial databases.
This paper highlights recent theoretical and applied research in spatial data mining and knowledge dis-
covery. We first briefly review the literature on several common spatial data-mining tasks, including spa-
tial classification and prediction; spatial association rule mining; spatial cluster analysis; and
geovisualization. The articles included in this special issue contribute to spatial data mining research
by developing new techniques for point pattern analysis, prediction in space–time data, and analysis
of moving object data, as well as by demonstrating applications of genetic algorithms for optimization
in the context of image classification and spatial interpolation. The papers concludes with some thoughts
on the contribution of spatial data mining and geographic knowledge discovery to geographic informa-
tion sciences.
Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction
Many fields of geographic research are observational rather
than experimental, because the spatial scale is often too large
and geographic problems are too complex for experimentation.
Researchers acquire new knowledge by searching for patterns, for-
mulating theories, and testing hypotheses with observations. With
the continuing efforts by scientific projects, government agencies,
and private sectors, voluminous geographic data have been, and
continue to be, collected. We now can obtain much more diverse,
dynamic, and detailed data than ever possible before with modern
data collection techniques, such as global positioning systems
(GPS), high-resolution remote sensing, location-aware services
and surveys, and internet-based volunteered geographic informa-
tion (Goodchild, 2007). Generally speaking, geography and related
spatial sciences have moved from a data-poor era to a data-rich era
(Miller & Han, 2009). The availability of vast and high-resolution
spatial and spatiotemporal data provides opportunities for gaining
new knowledge and better understanding of complex geographic
phenomena, such as human–environment interaction and social–
economic dynamics, and address urgent real-world problems, such
as global climate change and pandemic flu spread.
However, traditional spatial analysis methods were developed
in an era when data were relatively scarce and computational
power was not as powerful as it is today (Miller & Han, 2009). Fac-
ing the massive data that are increasingly available and the com-
plex analysis questions that they may potentially answer,
traditional analysis methods often have one or more of the follow-
ing three limitations. First, most existing methods focus on a lim-
ited perspective (such as univariate spatial autocorrelation) or a
specific type of relation model (e.g., linear regression). If the chosen
perspective or assumed model is inappropriate for the phenome-
non being analyzed, the analysis can at best indicate that the data
do not show interesting relationships, but cannot suggest any bet-
ter alternatives. Second, many traditional methods cannot process
very large data volume. Third, newly emerged data types (such as
trajectories of moving objects, geographic information embedded
in web pages, and surveillance videos) and new application needs
demand new approaches to analyze such data and discover
embedded patterns and information.
There is an urgent need for effective and efficient methods to
extract unknown and unexpected information from datasets of
unprecedentedly large size (e.g., millions of observations), high
dimensionality (e.g., hundreds of variables), and complexity (e.g.,
0198-9715/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compenvurbsys.2009.11.001
* Corresponding author. Tel.: +1 803 777 2989; fax: +1 803 777 4972.
E-mail addresses: jmennis@temple.edu (J. Mennis), guod@sc.edu (D. Guo).
1
Tel.: +1 215 204 4748; fax: +1 215 204 7833.
Computers, Environment and Urban Systems 33 (2009) 403–408
Contents lists available at ScienceDirect
Computers, Environment and Urban Systems
journal homepage: www.elsevier.com/locate/compenvurbsys