3
and the input-output mutual information I(X; Y ), expressed
explicitly as a function of F is
I(F ) =
Z
∞
−∞
K
X
i=1
W
i
(x) log
W
i
(x)
R(y
i
; F )
dF (x) .
1
(5)
Under an average power constraint P , we wish to find the
capacity of the channel (1), given by
C = sup
F ∈F
I(F ), (6)
where F =
n
F : E[X
2
] =
R
∞
−∞
x
2
dF (x) ≤ P
o
, i.e., the set
of all average power constrained distributions on R.
Structural Properties of Optimal Inputs: We begin by em-
ploying the Karush-Kuhn-Tucker (KKT) optimality condition
to show that, even though we have not imposed a peak power
constraint on the input, it is automatically induced by the
average power constraint. Specifically, a capacity achieving
distribution for the AWGN-QO channel (1) must have bounded
support.
2
A. An Implicit Peak Power Constraint
Using convex optimization principles, the following
necessary and sufficient KKT optimality condition can
be derived for our problem, in a manner similar to the
development in [25], [26]. An input distribution F
∗
achieves
the capacity C in (6) if and only if there exists γ ≥ 0 such that
K
X
i=1
W
i
(x) log
W
i
(x)
R(y
i
; F
∗
)
+ γ(P − x
2
) ≤ C (7)
for all x, with equality if x is in the support
3
of F
∗
, where
the transition probability function W
i
(x), and the output prob-
ability R(y
i
; F
∗
) are as specified in (2) and (4), respectively.
The summation on the left-hand side (LHS) of (7) is the
Kullback-Leibler divergence (or the relative entropy) between
the transition PMF {W
i
(x), i = 1, . . . , K} and the output
PMF {R(y
i
; F ), i = 1, . . . , K}. For convenience, let us denote
this divergence function by d(x; F ), that is,
d(x; F ) =
K
X
i=1
W
i
(x) log
W
i
(x)
R(y
i
; F )
. (8)
We begin by studying the behavior of this function in the
limit as x → ∞.
Lemma 1: For the AWGN-QO channel (1), the divergence
function d(x; F ) satisfies the following properties
(a) lim
x→∞
d(x; F ) = −log R(y
K
; F ).
(b) There exists a finite constant A
0
such that
for x > A
0
, d(x; F ) < −log R(y
K
; F ).
4
1
The logarithm is base 2 throughout the paper, so the mutual information
is measured in bits.
2
That there exists a capacity achieving distribution follows by standard
function analytic arguments [23]. For details, see our technical report [24].
3
The support of a distribution F (or the set of increase points of F ) is the
set S
X
(F ) = {x : F (x + ²) − F (x − ²) > 0, ∀² > 0}.
4
The constant A
0
depends on the choice of the input F . For notational
simplicity, we do not explicitly show this dependence.
Proof: We have
d(x; F ) =
K
X
i=1
W
i
(x) log
W
i
(x)
R(y
i
; F )
=
K
X
i=1
W
i
(x) log W
i
(x) −
K
X
i=1
W
i
(x) log R(y
i
; F ) .
As x → ∞, the PMF {W
i
(x), i = 1, . . . , K} → 1(i = K),
where 1(·) is the indicator function. This observation,
combined with the fact that the entropy of a finite alphabet
random variable is a continuous function of its probability law,
gives lim
x→∞
d(x; F ) = 0 − log R(y
K
; F ) = −log R(y
K
; F ).
Next we prove part (b). For x > q
K−1
, W
i
(x) is a strictly
decreasing function of x for i ≤ K −1 and strictly increasing
function of x for i = K. Since {W
i
(x)} → 1(i = K) as
x → ∞, it follows that there is a constant A
0
such that
W
i
(A
0
) < R( y
i
; F ) for i ≤ K −1 and W
K
(A
0
) > R( y
K
; F ).
Therefore, it follows that for x > A
0
,
d(x; F ) =
K
X
i=1
W
i
(x) log
W
i
(x)
R(y
i
; F )
< W
K
(x) log
W
K
(x)
R(y
K
; F )
< −log R(y
K
; F ).
The saturating nature of the divergence function for the
AWGN-QO channel, as stated above, coupled with the KKT
condition, is now used to prove that a capacity achieving
distribution must have bounded support.
Proposition 1: For the average power constrained AWGN-
QO channel (1), an optimal input distribution must have
bounded support.
Proof: Let F
∗
be an optimal input, so that there exists
γ ≥ 0 such that (7) is satisfied with equality at every point
in the support of F
∗
. We exploit this necessary condition to
show that the support of F
∗
is upper bounded. Specifically,
we prove that there exists a finite constant A
2
∗
such that it is
not possible to attain equality in (7) for any x > A
2
∗
.
From Lemma 1, we get lim
x→∞
d(x; F
∗
)=−log R(y
K
; F
∗
)=:L.
Also, there exists a finite constant A
0
such that for
x > A
0
, d(x; F
∗
) < L.
We consider two possible cases.
• Case 1: γ > 0.
For x > A
0
, we have d(x; F
∗
) < L.
For x >
p
max{0, (L − C + γP )/γ} =:
˜
A, we have
γ(P − x
2
) < C −L.
Defining A
∗
2
= max{A
0
,
˜
A}, we get the desired result.
• Case 2: γ = 0.
Since γ = 0, the KKT condition (7) reduces to
d(x; F
∗
) ≤ C , ∀x.
Taking limit x → ∞ on both sides, we get
L = lim
x→∞
d(x; F
∗
) ≤ C.
Hence, choosing A
∗
2
= A
0
, for x > A
∗
2
we get,
d(x; F
∗
) < L ≤ C, that is, d(x; F
∗
) + γ(P − x
2
) < C.
Combining the two cases, we have shown that the support
of the distribution F
∗
has a finite upper bound A
2
∗
. Using