GUO et al.: MUTUAL INFORMATION AND MINIMUM MEAN-SQUARE ERROR IN GAUSSIAN CHANNELS 1265
Fig. 3. A Gaussian pipe where noise is added gradually.
Taking the expectation over on both sides of (42) yields
(43)
which establishes (33) by (35) together with the fact that
(44)
Hence the proof of Theorem 1.
Underlying the incremental-channel proof of Theorem 1 is
the chain rule for information
(45)
When
is a Markov chain, (45) becomes
(46)
where we let
. This applies to a train of outputs
tapped from a Gaussian pipe where noise is added gradually
until the SNR vanishes, as depicted in Fig. 3. The sum in (46)
converges to an integral as
becomes a finer and finer se-
quence of Gaussian channel outputs. To see this, note from (43)
that each conditional mutual information in (46) corresponds to
a low-SNR channel and is essentially proportional to the MMSE
times the SNR increment. This viewpoint leads us to an equiv-
alent form of Theorem 1
(47)
Therefore, as is illustrated by the curves in Fig. 1, the mutual
information is equal to an accumulation of the MMSE as a func-
tion of the SNR due to the fact that an infinitesimal increase in
the SNR adds to the total mutual information an increase pro-
portional to the MMSE.
The infinite divisibility of Gaussian distributions, namely, the
fact that a Gaussian random variable can always be decomposed
as the sum of independent Gaussian random variables of smaller
variances, is crucial in establishing the incremental channel (or
the Markov chain). This property enables us to study the mutual
information increase due to an infinitesimal increase in the SNR,
and thus obtain the differential (15) and (22) in Theorems 1
and 2.
The following corollaries are immediate from Theorem 1 to-
gether with the fact that
is monotone decreasing.
Corollary 1: The mutual information
is a concave
function in
.
Corollary 2: The mutual information can be bounded as
(48)
(49)
(50)
D. Applications and Discussions
1) Some Applications of Theorems 1 and 2: The newly
discovered relationship between the mutual information and
MMSE finds one of its first uses in relating code-division
multiple-access (CDMA) channel spectral efficiencies (mutual
information per dimension) under joint and separate decoding
in the large-system limit [30], [31]. Under an arbitrary fi-
nite-power input distribution, Theorem 1 is invoked in [30]
to show that the spectral efficiency under joint decoding is
equal to the integral of the spectral efficiency under separate
decoding as a function of the system load. The practical lesson
therein is the optimality in the large-system limit of successive
single-user decoding with cancellation of interference from
already decoded users, and an individually optimal detection
front end against yet undecoded users. This is a generalization
to arbitrary input signaling of previous results that successive
cancellation with a linear MMSE front end achieves the CDMA
channel capacity under Gaussian inputs [32]–[35].
Relationships between information theory and estimation
theory have been identified occasionally, yielding results in one
area taking advantage of known results from the other. This is
exemplified by the classical capacity–rate distortion relations,
that have been used to develop lower bounds on estimation
errors [36]. The fact that the mutual information and the MMSE
determine each other by a simple formula also provides a new
means to calculate or bound one quantity using the other. An
upper (resp., lower) bound for the mutual information is imme-
diate by bounding the MMSE for all SNRs using a suboptimal
(resp., genie-aided) estimator. Lower bounds on the MMSE,
e.g., [37], lead to new lower bounds on the mutual information.
An important example of such relationships is the case of
Gaussian inputs. Under power constraints, Gaussian inputs are
most favorable for Gaussian channels in information-theoretic
sense (they maximize the mutual information); on the other
hand, they are least favorable in estimation-theoretic sense
(they maximize the MMSE). These well-known results are seen
to be immediately equivalent through Theorem 1 (or Theorem
2 for the vector case). This also points to a simple proof of the
result that Gaussian inputs achieve capacity by observing that
the linear estimation upper bound for MMSE is achieved for
Gaussian inputs.
5
Another application of the new results is in the analysis of
sparse-graph codes, where [38] has recently shown that the
so-called generalized extrinsic information transfer (GEXIT)
function plays a fundamental role. This function is defined for
arbitrary codes and channels as minus the derivative of the
input–output mutual information per symbol with respect to a
channel quality parameter when the input is equiprobable on
the codebook. According to Theorem 2, in the special case of
the Gaussian channel the GEXIT function is equal to minus
one half of the average MMSE of individual input symbols
given the channel outputs. Moreover, [38] shows that (1) leads
to a simple interpretation of the “area property” for Gaussian
channels (cf. [39]). Inspired by Theorem 1, [40] also advocated
5
The observations here are also relevant to continuous-time Gaussian chan-
nels in view of results in Section III.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 11:40 from IEEE Xplore. Restrictions apply.