Int J Comput Vis
Fig. 2 Comparing computation times of our method against the
state-of-the-art ones introduced in Fig. 1. The computation times of
a MATLAB implementation on a standard PC, are plotted as a func-
tion of the number of correspondences. Our method is both more ac-
curate—see Fig. 1—and faster than the other non-iterative ones, espe-
cially for large amounts of noise, and is almost as accurate as the iter-
ative LHM. Furthermore, if maximal precision is required, the output
of our algorithm can be used to initialize a Gauss-Newton optimization
procedure which requires a negligible amount of additional time
urations, which cause problems for some methods as dis-
cussed in Oberkampf et al. (1996), Schweighofer and Pinz
(2006), by using three control points instead of four.
In the remainder of the paper, we first discuss related
work focusing on accuracy and computational complexity.
We then introduce our new formulation and derive our sys-
tem of linear and quadratic equations. Finally, we compare
our method against the state-of-the-art ones using synthetic
data and demonstrate it using real data. This paper is an
expanded version of that in Moreno-Noguer et al. (2007),
where a final Gauss-Newton optimization is added to the
original algorithm. In Sect. 4 we show that optimizing over
a reduced number of parameters, the accuracy of the closed-
solution proposed in Moreno-Noguer et al. (2007) is con-
siderably improved with almost no additional computational
cost.
2 Related Work
There is an immense body of literature on pose estimation
from point correspondences and, here, we focus on non-
iterative approaches since our method falls in this category.
In addition, we will also introduce the Lu et al. (2000) it-
erative method, which yields very good results and against
which we compare our own approach.
Most of the non-iterative approaches, if not all of them,
proceed by first estimating the points 3D positions in the
camera coordinate system by solving for the points depths.
It is then easy to retrieve the camera position and orientation
as the Euclidean motion that aligns these positions on the
given coordinates in the world coordinate system (Horn et
al. 1988;Arunetal.1987; Umeyama 1991).
The P3P case has been extensively studied in the liter-
ature, and many closed form solutions have been proposed
such as Dhome et al. (1989), Fischler and Bolles (1981),
Gao et al. (2003), Haralick et al. (1991), Quan and Lan
(1999). It typically involves solving for the roots of an eight-
degree polynomial with only even terms, yielding up to four
solutions in general, so that a fourth point is needed for
disambiguation. Fisher and Bolles (1981) reduced the P4P
problem to the P3P one by taking subsets of three points
and checking consistency. Similarly, Horaud et al. (1989)
reduced the P4P to a 3-line problem. For the 4 and 5 points
problem, Triggs (1999) derived a system of quadratic poly-
nomials, which solves using multiresultant theory. However,
as pointed out in Ansar and Daniilidis (2003), this does not
perform well for larger number of points.
Even if four correspondences are sufficient in general
to estimate the pose, it is nonetheless desirable to consider
larger point sets to introduce redundancy and reduce the sen-
sitivity to noise. To do so, Quan and Lan (1999) consider
triplets of points and for each one derive four-degree poly-
nomials in the unknown point depths. The coefficients of
these polynomials are then arranged in a
(n−1)(n−2)
2
×5ma-
trix and singular value decomposition (SVD) is used to es-
timate the unknown depths. This method is repeated for all
of the n points and therefore involves O(n
5
) operations.
2
It
should be noted that, even if it is not done in Quan and Lan
(1999), this complexity could be reduced to O(n
3
) by apply-
ing the same trick as we do when performing the SVD, but
even then, it would remain slower than our method. Ansar
and Daniilidis (2003) derive a set of quadratic equations
arranged in a
n(n−1)
2
× (
n(n+1)
2
+ 1) linear system, which,
as formulated in the paper, requires O(n
8
) operations to be
solved. They show their approach performs better than Quan
and Lan (1999).
The complexity of the previous two approaches stems
from the fact that quadratic terms are introduced from the
inter-point distances constraints. The linearization of these
equations produces additional parameters, which increase
the complexity of the system. Fiore’s method (Fiore 2001)
avoids the need for these constraints: He initially forms a
set of linear equations from which the world to camera rota-
tion and translation parameters are eliminated, allowing the
direct recovery of the point depths without considering the
inter-point distances. This procedure allows the estimation
of the camera pose in O(n
2
) operations, which makes real-
time performance possible for large n. Unfortunately, ignor-
ing nonlinear constraints produces poor results in the pres-
ence of noise (Ansar and Daniilidis 2003).
2
Following Golub and Van Loan (1996), we consider that the SVD
for a m × n matrix can be computed by a O(4m
2
n + 8mn
2
+ 9n
3
)
algorithm.