A Reinforcement Learning Framework
for Medical Image Segmentation
Farhang Sahba, Member, IEEE, and Hamid R. Tizhoosh, and Magdy M.A. Salama, Fellow, IEEE
Abstract— This paper introduces a new method to medical
image segmentation using a reinforcement learning scheme.
We use this novel idea as an effective way to optimally find
the appropriate local thresholding and structuring element
values and segment the prostate in ultrasound images. Re-
inforcement learning agent uses an ultrasound image and
its manually segmented version and takes some actions (i.e.,
different thresholding and structuring element values) to change
the environment (the quality of segmented image). The agent
is provided with a scalar reinforcement signal determined
objectively. The agent uses these objective reward/punishment
to explore/exploit the solution space. The values obtained using
this way can be used as valuable knowledge to fill a Q-matrix.
The reinforcement learning agent can use this knowledge for
similar ultrasound images as well. The results demonstrate high
potential for applying reinforcement learning in the field of
medical image segmentation.
I. INTRODUCTION
Many applications in medical imaging need to segment
an object in the image [1]. Ultrasound imaging is an impor-
tant image modality for clinical applications. The accurate
detection of the prostate boundary in ultrasound images is
crucial for diagnostic tasks [2]. However, in these images
the contrast is usually low and the boundaries between the
prostate and background are fuzzy. Also speckle and weak
edges make the ultrasound images inherently difficult to
segment. The prostate boundaries are generally extracted
from transrectal ultrasound (TRUS) images [2]. Prostate seg-
mentation methods generally have limitations when there are
shadows with similar gray level and texture attached to the
prostate, and/or missing boundary segments. In these cases
the segmentation error may increase considerably. Another
obstacle may be the lack of a sufficient number of training
(gold) samples if a learning technique is employed and the
samples are being prepared by an expert as done in the
supervised methods. Algorithms based on active contours
have been quite successfully implemented with the major
drawback that they depend on user interaction to determine
the initial snake. Therefore, a more universal approach should
require a minimum level of user interaction and training data
set.
Farhang Sahba is with the Pattern Analysis and Machine Intelligence Lab-
oratory, Department of System Design Engineering, University of Waterloo,
Waterloo, Ontario , Canada ( email: fsahba@uwaterloo.ca).
Hamid R. Tizhoosh is with the Pattern Analysis and Machine Intelli-
gence Laboratory, Department of System Design Engineering, University of
Waterloo, Waterloo, Ontario , Canada (email: tizhoosh@uwaterloo.ca).
Magdy M.A. Salama is with the Department of Electrical and Computer
Engineering, University of Waterloo, Waterloo, Ontario , Canada (email:
msalama@hivolt.uwaterloo.ca).
Considering the above factors our new algorithm based on
reinforcement learning (RL) is introduced to locally segment
the prostate in ultrasound images. The most important con-
cept of RL is learning by trial and error based on interaction
with the environment [3], [4]. It makes the RL agent suitable
for dynamic environments. Its goal is to find out an action
policy that controls the behavior of the dynamic process,
guided by signals (reinforcements) that indicate how well it
has been performing the required task.
In the case of applying this method to medical image
segmentation, the agent takes some actions (i.e., different
values for thresholding and structuring element for a mor-
phological operator) to change its environment (the quality
of the segmented object). Also, states are defined based on
the quality of this segmented object. First, the agent takes the
image and applies some values. Then it receives an objective
reward or punishment obtained based on comparison of
its result with the goal image. The agent tries to learn
which actions can gain the highest reward. After this stage,
based on the accumulated rewards, the agent has appropriate
knowledge for similar images as well.
In our algorithm we use this reinforced local parameter
adjustment to segment the prostate. The proposed method
will control the local threshold and the post-processing
parameter by using a reinforcement learning agent. The main
purpose of this work is to demonstrate this ability that as an
intelligent technique, reinforcement learning can be trained
using a very limited number of samples and also can gain
extra knowledge during online training. This is a major
advantage in contrast to other approaches (like supervised
methods) which either need a large training set or significant
amount of expert or a-priori knowledge.
This paper is organized as follows: Section II is a short
introduction to reinforcement learning. Section III describes
the proposed method. Section IV presents results and the last
part, section V, concludes the work.
II. R
EINFORCEMENT LEARNING
Reinforcement learning (RL) is based on the idea that an
artificial agent learns by interacting with its environment
[3], [4]. It allows agents to automatically determine the
ideal behavior within a specific context that maximizes
performance with respect to predefined measures. Several
components constitute the general idea behind reinforcement
learning. The RL agent is the decision-maker of the process
and attempts to take an action recognized by the environment.
It receives a reward or punishment from its environment
depending on the action taken. The RL agents discover which
0-7803-9490-9/06/$20.00/©2006 IEEE
2006 International Joint Conference on Neural Networks
Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada
July 16-21, 2006
511