Semantic Texton Forests for Image Categorization and Segmentation
Jamie Shotton
†
Matthew Johnson
?
Roberto Cipolla
?
†
Toshiba Corporate R&D Center
Kawasaki, Japan
?
Department of Engineering
University of Cambridge, UK
Abstract
We propose semantic texton forests, efficient and pow-
erful new low-level features. These are ensembles of deci-
sion trees that act directly on image pixels, and therefore do
not need the expensive computation of filter-bank responses
or local descriptors. They are extremely fast to both train
and test, especially compared with k-means clustering and
nearest-neighbor assignment of feature descriptors. The
nodes in the trees provide (i) an implicit hierarchical clus-
tering into semantic textons, and (ii) an explicit local clas-
sification estimate. Our second contribution, the bag of se-
mantic textons, combines a histogram of semantic textons
over an image region with a region prior category distri-
bution. The bag of semantic textons is computed over the
whole image for categorization, and over local rectangu-
lar regions for segmentation. Including both histogram and
region prior allows our segmentation algorithm to exploit
both textural and semantic context. Our third contribution
is an image-level prior for segmentation that emphasizes
those categories that the automatic categorization believes
to be present. We evaluate on two datasets including the
very challenging VOC 2007 segmentation dataset. Our re-
sults significantly advance the state-of-the-art in segmenta-
tion accuracy, and furthermore, our use of efficient decision
forests gives at least a five-fold increase in execution speed.
1. Introduction
This paper introduces semantic texton forests, and
demonstrates their use for image categorization and seman-
tic segmentation; see Figure
1. Our aim is to show that one
can build powerful texton codebooks without computing ex-
pensive filter-banks or descriptors, and without performing
costly k-means clustering and nearest-neighbor assignment.
Semantic texton forests (STFs) fulfill both criteria. They
are randomized decision forests that use only simple pixel
comparisons on local image patches, performing both an
implicit hierarchical clustering into semantic textons and
an explicit local classification of the patch category. Our
results show that STFs improve the state-of-the-art in both
Figure 1. Semantic texton forests. (a) Test image, with ground
truth in-set. Semantic texton forests very efficiently compute (b) a
set of semantic textons per pixel and (c) a rough local segmenta-
tion prior. Our algorithm uses both textons and priors as features to
give coherent semantic segmentation (d), and even finds the build-
ing unmarked in the ground truth. Colors show texton indices in
(b), but categories corresponding to the ground truth in (c) and (d).
quantitative performance and execution speed.
We look at two applications of STFs: image categoriza-
tion (inferring the object categories present in an image)
and semantic segmentation (dividing the image into coher-
ent regions and simultaneously categorizing each region).
To these ends, we propose the bag of semantic textons. This
is computed over a given image region, and extends the bag
of words model [ 4] by combining a histogram of the hierar-
chical semantic textons with a region prior category distri-
bution. By considering the image as a whole, we obtain a
highly discriminative descriptor for categorization. For seg-
mentation, we use many local rectangular regions and build
a second randomized decision forest that achieves efficient
and accurate segmentation.
Inferring the correct segmentation depends on local im-
age information that can often be ambiguous. The global
statistics of the image, however, are more discriminative
and may be sufficient to accurately estimate the image cate-
gorization. We therefore investigate how categorization can