Proceedings of “Internation Conference on Computer Vision”, Vancouver, Canada, July 2001 vol.I, p.
105
Interactive Graph Cuts
for Optimal Boundary & Region Segmentation of Objects in N-D Images
Yuri Y. Boykov Marie-Pierre Jolly
Siemens Corporate Research
755 College Road East, Princeton, NJ 08540, USA
yuri@csd.uwo.ca
Abstract
In this paper we describe a new technique for general
purpose interactive segmentation of N-dimensional images.
The user marks certain pixels as “object” or “background”
to provide hard constraints for segmentation. Additional
soft constraints incorporate both boundary and region in-
formation. Graph cuts are used to find the globally optimal
segmentation of the N-dimensional image. The obtained so-
lution gives the best balance of boundary and region prop-
erties among all segmentations satisfying the constraints.
The topology of our segmentation is unrestricted and both
“object” and “background” segments may consist of sev-
eral isolated parts. Some experimental results are presented
in the context of photo/video editing and medical image seg-
mentation. We also demonstrate an interesting Gestalt ex-
ample. A fast implementation of our segmentation method
is possible via a new max-flow algorithm in [2].
1. Introduction
Interactive segmentation is becoming more and more
popular to alleviate the problems inherent to fully automatic
segmentation which seems to never be perfect. Our goal is
a general purpose interactive segmentation technique that
divides an image into two segments: “object” and “back-
ground”. A user imposes certain hard constraints for seg-
mentation by indicating certain pixels (seeds) that abso-
lutely have to be part of the object and certain pixels that
have to be part of the background. Intuitively, these hard
constraints provide clues on what the user intends to seg-
ment. The rest of the image is segmented automatically by
computing a global optimum among all segmentations sat-
isfying the hard constraints. The cost function is defined in
terms of boundary and region properties of the segments.
These properties can be viewed as soft constraints for seg-
mentation. A globally optimal segmentation can be very
efficiently recomputed when the user adds or removes any
hard constraints (seeds). This allows the user to get any
desired segmentation results quickly via very intuitive in-
teractions. Our method applies to N-D images (volumes).
One of the main advantages of our interactive segmen-
tation method is that it provides a globally optimal solution
for an N-dimensional segmentation when the cost function
is clearly defined. Some earlier techniques (snakes [14, 4],
deformable templates [21], shortest path [15], ratio regions
[5], and other [13]) can do that only in 2D applications when
a segmentation boundary is a 1D curve. Many techniques
either don’t have a clear cost function at all (region growing,
split and merge, see Chapter 10 in [11]) or compute only an
approximate solution (e.g. a local minimum) that can be
arbitrarily far from the global optimum (region competition
[22], level set methods [16], normalized cuts [18]). Global
properties of such segmentations may be difficult to analyze
or predict. Imperfections in the result might come from de-
ficiencies at the minimization stage. In contrast, imperfec-
tions of a globally optimal solution are directly related to
the definition of the cost function. Thus, the segmentation
can be controlled more reliably.
It is also important that the cost function that we use
as a soft constraint for segmentation is general enough to
include both region and boundary properties of segments.
In fact, our cost function is similar to one in [9] obtained
in a context of MAP-MRF estimation. Consider an arbi-
trary set of data elements P and some neighborhood sys-
tem represented by a set N of all unordered
1
pairs {p, q}
of neighboring elements in P. For example, P can con-
tain pixels (or voxels) in a 2D (or 3D) grid and N can
1
For simplicity, we present the case of unordered pairs of neighbors
{p, q}. If pairs of neighbors are ordered then we have two distinct pairs
(p, q) and (q, p). This would allow different (directed) discontinuity
penalties for two cases when p ∈ “object”, q ∈ “background” and when
p ∈ “background”, q ∈ “object”. To set such directed costs one should
have prior information on properties of the boundary from object to back-
ground. Algorithmically, generalization from unordered to ordered neigh-
bors means switching from undirected (presented here) to directed graphs.