8
quality
which
may
be
missed
by
approaches
using
too
rigidly defined
study
protocols.
One
potentially
useful
approach
in
such
situations
involves subjective
comparisons
of
image
quality
in
which
the
observer's
attention
is focused systemati-
cally
upon
specific
normal
or
pathological
anatomical
features
in
similar
views
of
a
particular
patient
imaged
with
two modalities (Vucich, 1979).
After
attending
to
each
feature,
the
observer
is
required
to
report
the
relative fidelity
with
which
it
is demon-
strated
by
the
two modalities,
using
a five-
or
seven-
point
rating
scale, for example.
Although
results
obtained
in
this
way
are
inevitably subject
to
bias
and
variations
in
different
observers'
use
of
the
scale
on
which
impressions
are
reported,
the
use
of
a
common
patient
sample
and
the
act
of
focusing
attention
on
specific image
features
may
help
to
guard
against
gross violations
of
objectivity.
A second
technique
involves
the
observer
ranking
versions
of
the
same
image, differing according
to
some
imaging
parameter,
using
a specific
criterion
such
as
image
sharpness.
Comparison
of
the
rank
order
produced for
many
different images allows one
to
test
for
particular
preferences for images displayed
in
one
certain
way.
This
has, e.g.,
been
applied
to
the
study
of
the
effect
of
image pixel size
on
image
quality,
the
ranking
criteria
being
observer
prefer-
ence
(Sharp
et al., 1982).
The
development
of
tech-
niques
for
"multidimensional
scaling" (MDS)
may
also
be
of
benefit
to
these
rank-order
type
studies.
Given
rank
ordering
of
image preference
or
similarity
judgements,
MDS
techniques
determine
the
number
of
relevant
dimensions
that
yield
the
subjective deter-
mination
of
image preference
or
similarity
(Kruskal
and
Wish, 1978).
2.3.2
Method
of
Constant
Stimulus
Historically,
many
experiments
in
the
field
of
psy-
chophysics
have
used
the
"method
of
constant
stimu-
lus,"
in
which
a
sensory
signal
with
constant
charac-
teristics
is
presented
to
an
observer
on
multiple
occasions.
After
each
trial,
the
observer
is
required
to
report
whether
the
signal,
which
was,
in
fact, always
present,
had
been
"detected."
The
level
of
perfor-
mance
achieved by
the
observer
is
represented
by
the
fraction
of
trials
in
which
the
observer
reports
the
signal
to
be detectable.
This
experimental
paradigm
was adopted in psycho-
physics
at
a
time
when
most
sensory
detection pro-
cesses were believed
to
be well-represented by
"thresh-
old
theory,"
according
to
which a
stimulus
is detected
if
and
only
if
it
exceeds a fixed sensory
threshold
and
false positive
reports
are
ascribed
to
observer
error.
Beginning
in
the
early 1950s,
threshold
theory
was
challenged
and
eventually
supplanted
by
statistical
decision
theory
in
visual detection
tasks
(Tanner
and
Swets, 1954). According
to
statistical
decision
theory,
visual detection involves a
trade-off
between
the
frequencies
of
true
positive
and
false positive
reports,
with
the
balance achieved
in
an
experiment
depend-
ing
upon
the
particular
setting
of
a critical confidence
level
or
"decision
criterion"
that
the
observer
chooses
to
adopt.
Thus,
the
observer
can
produce
virtually
any
detection
rate
between
zero
and
100
percent
by
setting
the
decision
criterion
appropriately.
From
this
perspective,
experimental
results
obtained
with
the
method
of
constant
stimulus
are
compromised
severely by
the
fact
that
potential
effects
of
the
observer's
variable decision
criterion
are
not
taken
into
account; in effect, a potentially
important
source
of
variation
is
not
controlled.
An
apparent
advantage
of
the
method
of
constant
stimulus
is
that
it
can
be
used
to
determine
the
dependence
of
detectability
upon
any
physical
param-
eter
of
the
stimulus
(e.g., object
or
imaging
system
in
image-evaluation studies)
in
a direct
and
easily
under-
stood way. However,
the
validity
of
the
method's
results
depends crucially
upon
the
ability
of
each
observer
to
hold
constant
the
FPF
that
would be
produced
if
actually negative
trials
were
presented,
and
to
do so across different
imaging
conditions - a
notoriously difficult
task.
Clearly,
the
results
depend
also
upon
the
observer's
ability
to
resist
the
tempta-
tions
of
"wishful
thinking,"
in
which
it
is imagined
that
a
virtually
invisible
stimulus
is
"seen"
because
it
is
known
that
it
is
present
(Levison
and
Restle, 1968).
In
view
of
these
considerations,
the
method
of
con-
stant
stimulus
cannot
be
recommended
generally for
the
evaluation
of
image quality.
2.3.3
Diagnostic
Accuracy
Many
investigators
have
reported
the
results
of
medical
tests
in
terms
of
the
overall
percentage
of
correct diagnoses produced
by
the
test
in
a
mixture
of
actually positive
and
actually negative cases.
The
validity
of
this
index (often called
"diagnostic
accu-
racy"
in
the
medical
literature)
is
extremely
limited,
in
part
because
its
numerical
value depends
strongly
on
the
prevalence
of
actually positive cases;
in
part
because
its
value depends
upon
the
observer's
setting
of
his critical confidence level;
and
in
part
because
it
does
not
reveal
the
balance
of
false positive
and
false
negative
errors,
which
can
have
very different clinical
consequences (Metz, 1978).
It
should
also be
noted
that
some
authors,
e.g.,
Swets
and
Pickett
(1982)
and
Getty
et al. (1988),
have
used
the
term
"diagnostic
accuracy"
more
generally
to
indicate disease detection
performance
as
mea-
sured
by ROC analysis
and
summarized,
e.g.,
by
the
area
under
the
ROC
curve
(A
z
)
index (see Section
4.2.3).
Downloaded from https://academic.oup.com/jicru/article-abstract/os28/1/NP/2924011 by Oxford University Press USA user on 30 October 2018