Attraction Domain in Gradient
Optimization-based Sample Maximum
Likelihood Estimation
⋆
Yiqun Zou
∗
Xiafei Tang
∗∗
∗
School of Information Science and Engineering, Central South
University, Changsha, 410083 China(e-mail:yiqunzou@csu.edu.cn)
∗∗
School of Electrical and Information Engineering, Changsha
University of Science and Technology, Changsha, China(e-mail:
xiafei.tang@csust.edu.cn)
Abstract: Sample maximum likelihood(SML) method is frequently used to identify errors-in-
variables(EIV) system. It generates the estimate through minimizing relevant cost function built
on the mean input-output data and sample noise variances. To help gradient-based algorithm
overcome local convergence, we examine the attraction domain for the SML cost. It is shown
in this paper that the asymptotic convergence properties of the objective can be learned
equivalently by the noiseless version. Moreover we present some special attraction domains
that contain the global minimum under certain structures. For the particular models, careful
initialization locating in the same domain leads the algorithm to find the global minimum.
Keywords: EIV system, Maximum likelihood estimation, Gradient-based optimization, Global
and local convergence, Attraction domain
1. INTRODUCTION
Identification of the dynamics for errors-in-variables(EIV)
system has been examined by researchers for a long
time. Many methods have been developed to handle this
problem. For example, total least squares is described
explicitly in Van Huffel and Vandewalle (1991). As an
alternative, basic instrumental-variable approach and its
extended version giving consistent EIV estimates asymp-
totically in time domain are designed in S¨oderstr¨om (1981)
and S¨oderstr¨om and Stoica (1983). The statistic proper-
ties of the two methods are analyzed and compared in
S¨oderstr¨om and Mahata (2002). It is suggested therein
that their asymptotic covariance matrices are similar in
terms of mathematic form. The Koopmans-Levin(KL)
method is presented in Guidorzi (1981) to estimate the
noise variance matrix based on known variance ratio
between input and output noise. Assuming the white-
ness of input and output noise, the Frisch scheme is de-
signed in Beghelli, Guidorzi and Soverini (1990). Both KL
and Frisch scheme can be seen as special forms of bias
compensation least squares(BCLS) method(Zheng, 2002).
Other methods like the prediction error method(PEM),
frequency domain approaches, and methods based on
higher order moment statistics are discussed separately in
Pintelon et al (1992). S¨oderstr¨om et al (2010) compares
the statistical accuracy of estimates between time-domain
maximum likelihood(ML) and frequency-domain sample
maximum likelihood(SML). Interested readers are recom-
mended to see S¨oderstr¨om (2007) for a thorough survey.
⋆
This work is supported by NSFC(Projects 61403427).
S¨oderstr¨om (2006) discusses time domain PEM and ML
method in details. The estimate criteria for both methods
are on the basis of prediction error sequences. Compared
with PEM, ML estimator is more accurate with a lower
covariance matrix for the parameter errors. The main
drawback of PEM and ML method in time domain is
a Riccati equation needs to be solved at each iterative
step in the derivation of the prediction error innova-
tion. This complicates the optimization process. Pintelon
et al (1992) discusses frequency-domain maximum likeli-
hood(FML) estimation for EIV models provided the exact
(co)variances of input and output noise. Schoukens et al
(1997) further transforms FML into sample maximum like-
lihood(SML) where the mean of input-output and sample
values replace real measurements and exact (co)variances
in FML. Both FML and SML estimators can be developed
via the minimization of relevant costs. Gradient-related al-
gorithm is often suggested(Pintelon and Schoukens, 2001)
to achieve this goal with good starting point generated
by for example BCLS scheme. If the optimization begins
with poor initialization, the search may get stuck by the
local minimum. Local minimum in this paper particularly
means the ‘false’ non-global minimum in the landscape.
The existence of local minimum relates to many factors,
for instance, model type, the structure of input, the mag-
nitude of signal-to-noise ratio(SNR). For output-error only
identification in which the noise just exists at the output,
how to tackle such problem has been described in various
literature.
˚
Astr¨om and S¨oderstr¨om (1974) points out there
is no local minimum in the cost function regardless of
the input while S¨oderstr¨om (1975) presents that white
noise as input signal leads to global convergence for OE
models. Zou and Heath (2012) summarizes these results