Surf. Topogr.:
Metrol.
Prop.
10
(2022) 044001
imitating human intelligence (e.g., Dey, 2016;
Copeland 2020).
Over the past three decades, applications of
machine learning (ML) methods have seen significant
increase in Archaeology. ML algorithms such as sup-
port vector machine (Cortes and Vapnik 1995; Kao
et al 2004) random forests (Ho 1995; Ho 1998),
K-means (Cao et al 2009; Jin and Han, 2011; Qi et al
2017) and other similar approaches have been widely
adopted with considerable success in detecting or clas-
sifying archaeological sites, and artifacts (e.g., Kintigh
and Ammerman 1982; Baxter 2009; Menze and
Ur 2012; Flores et al 2019; Orengo et al 2020). These
methods, often referred to as traditional ML algo-
rithms, require the careful selection of input features
(e.g., various spectral indices in satellite imaging) by
human-experts, that are important for the outcome.
Then through an iterative optimization process by the
input of exemplar data the algorithm is trained
based upon multivariate statistics and progressively
improves its performance. Since it requires the deter-
mination and the prior calculation of a range of possi-
ble statistically significant input features, it inevitably
suffers from a level of bias as although the training
procedure can point out which from the features are
statistically insignificant, it cannot suggest, or extract
features different than the provided ones. Also, the
relatively limited number of the features in most appli-
cations often cannot fully describe the targets at
different situations or environmental conditions.
Therefore, the applicability of these algorithms is often
limited to specific cases and restricts the identification
to features with limited spectral and geometric
variations.
In the early 2000s a new machine learning technol-
ogy emerged known as Deep Learning (DL) based on
Artificial Neural Networks (ANN), and in the case of
image applications, Convolutional Neural Networks
(CNNs). This new technology was largely based on the
seminal work of Fukushima (1980) as well as Hubel
and Wiesel (1959) that introduced the ‘neocognitron’
(Fukushima 1980; 1983; 2003) and established the use
of convolutional and down-sampling layers. In 1986,
Rina Decher was one of the first to use the term ‘deep
learning’ to the machine learning community, in
which ‘deep’ was used to describe the use of multiple
layers in a network. Later, Waibel (1987) proposed the
time delay neural network (TDNN), one of the first
convolutional networks followed by LeCun et al
(1989) who applied that in a handwritten character
recognition problem using a 7-level Convolutional
Neural Netowork (CNN), called LeNet-5 (LeCun et al
1998). A significant advantage of deep learning meth-
ods is that the feature extraction and selection stage is
performed by the learning algorithm automatically
and not by a person. Yet, this usually requires sig-
nificant amounts of labeled data and considerable
computational resources for the training process. The
utilization of GPUs in the training process was the
turning point for using CNNs in image recognition. In
the 2012 ImageNet competition, the first CNN ever
submitted, named AlexNet (Krizhevsky et al 2012),
won the competition. The training of AlexNet used
over one million labeled images about ∼1000 object
categories and took ∼6 days using 2 GPUs (Krizhevsky
et al 2012). Since then, deep neural networks have won
many international pattern recognition competitions
and have attracted broad attention, by outperforming
legacy machine learning methods and handling better
large amounts of data with minimum user interven-
tion (Schmidhuber 2015). As such, they offer con-
siderable potential for archaeology.
Among the common tasks assigned to deep learn-
ing CNN networks are image classification, object
detection, and semantic segmentation. Classification
is a basic process routinely performed in archaeology
with the objective of classifying groups of images that
share some common features, or objects into one of a
number of predefined classes. For example, AI meth-
ods have been used to analyze use-wear on lithic tools
(e.g., Van den Dries 1998) and to classify and identify
types of pottery (e.g., Hörr et al 2008; Anichini et al,
2021; Pawlowicz and Downum 2021). Caspari and
Crespo (2019), used an object-detection based method
to identify Iron Age burial mounds in aerial imagery.
More recently, Agapiou et al (2021) applied the object
detection method to detect surface ceramics in drone
images. Finally, semantic segmentation algorithms
attempt to analyze images further, by partitioning
them into semantically meaningful parts and after-
wards by classifying each part into one of the ‘X’ pre-
determined classes i.e., interpretable image regions for
instance, archaeological sites, regions of vegetation,
modern structures and others (e.g., Garcia-Garcia et al
2018; Minaee et al 2020). Semantic segmentation
operates at pixel-level in the sense that each pixel of an
image is labeled according to the class it belongs to.
This makes semantic segmentation a much more
complicated and computationally intensive task, yet it
can produce more informative and detailed results
compared to classification and object identification
(e.g., Kendall et al 2015; Garcia-Garcia et al 2018;
Minaee et al 2020). The value of this approach for
geophysical analysis has been demonstrated in the
work of Küçükdemirci and Sarris’s (2020) using
ground-penetrating radar images.
For all this success, only recently there have been
limited yet increasing work adopting CNN approaches
for the automated detection of archaeological sites
(Trier et al 2018; Caspari and Crespo, 2019; Kazimi
et al 2019; Lambers et al 2019; Rayne et al 2020;
Somrak et al 2020; Soroush et al 2020; Bonhage et al
2021; Verschoof-van der Vaart and Landauer 2021)
from Earth observation (EO) data. In part, this is due
to the need for an abundance of labeled data to enable
the CNN to accurately identify different signatures.
For example, ImageNet, an openly available visual
database designed for use in everyday contemporary