where s, k and t, correspond to scaling, skew and translation
applying in the direction of the u coordinate. In order to calculate
these parameters, the images corresponding to the 3D patch
are divided into l rows of pixels. When considering a single row of
pixels, the skew and translation act together to produce a single
horizontal offset denoted as o,sincethev coordinate of each pixel is
the same.
The normalized cross correlation is computed between each
pair of rows at various scale and offset values and those corre-
sponding to the best match are recorded. The scale s is calculated
as the median of the offset values found for each row from
the given patch. Any values significantly outside the median are
deemed to be outliers and discarded. Using the offsets from all
rows, the skew and translation parameters k and t are calculated
by solving a linear system
k
t
¼
v
1
1
⋮⋮
v
l
1
2
6
4
3
7
5
−1
o
1
⋮
o
l
2
6
4
3
7
5
; ð11Þ
where v
l
is the v coordinate of the pixel from the l’th row. In order
to improve the robustness further, the rows with the greatest
residuals are removed from the system (11), and k, t are recalcu-
lated. This process is repeated until convergence. Now H can be
calculated from H
R
taking into account the rectifying transforma-
tions
H ¼R′
−1
H
R
R: ð12Þ
Certain patches and their corresponding projections may not be
suitable to provide a reliable matching and are not considered for
the surface rectification mechanism described above. For finding
the offsets uniquely, the line by line matching based on cross-
correlation requires that there must be significant detail in the
quad regions from the images corresponding to the patch projec-
tion. The observed detail can be unreliable in patches with
significant deviation from planarity which are viewed from obli-
que angles. In order to assess the appropriateness of using a
certain image I for correcting a given 3D scene patch, we calculate
a confidence score χ as a function depending on the angle between
the pair of images and on the distance to the scene
χ
N
!
ðy−cÞ
∥y−c∥
; ∥y−c∥; arccos
ðy−cÞðy′−cÞ
∥y−c∥∥y′−c∥
!
; ð13Þ
where N
!
is the surface normal of the 3D patch, c is the location
from the 3D scene (usually an RBF center) viewed from the image
location y. The first term in χðÞ represents the cosine of the angle
between the surface normal N
!
and the viewing vector from y. The
third term from χðÞ enforces a minimum baseline angle formed by
the two viewing directions from locations y and y ′ to the center of
the patch c for the images fI; I′g. Images pairs fI; I′g which are too
closely located to each other will not provide enough disparity in
order to extract reliable information. By using the conditions from
χðÞ, we assess which images are suitable to be used for correcting
the disparity for a specific 3D patch from the scene S.
For a 3D patch, corresponding to the RBF center c
j
, let us
consider a set of image pairs fI; I′g
i
for i ¼ 1; …; K which fulfil the
above conditions. Each of the images I and I′ is characterized by
the projection matrices P and P′, we estimate H
R;i
and H
i
using
(12), and calculate the corresponding displacement vector v
i
from
(6). Consequently, we estimate the correct location of the plane ψ
i
,
which should contain the basis function center , in order to fulfilthe
consistency between the pair of images fI; I′g
i
and their corresponding
3D patch, using (8)
ψ
i
¼
P
0001
T
v
i
1
: ð14Þ
The location of the basis function center is updated as
^
c
j
¼c
j
þ
1
K
∑
K
i ¼ 1
N
!
i
c
T
j
ψ
i
N
!
T
i
N
!
i
ð15Þ
for j¼1,…,l,withl the number of RBFs centers to be updated, while N
!
i
isthesurfacenormaltotheplaneψ
i
,andjth basis function center c
j
is
updated to
^
c
j
, by being constrained to lie onto each of the planes ψ
i
.
This corresponds to the rectification due to the disparity identified in
each image pair fI; I′g
i
for i ¼1; …; K. After updating the RBF centers
we recalculate the output weights w
i
, i¼1,…,M by solving (2).Inorder
to av oid the singularity when solving (2), if multiple basis function
centers occur in the immediate neighborhood of each other , only one
is preserved while the others are remov ed.
4. Scene correction using shape-from-contours
In the following we assume that we have a 3D scene recon-
structed from a set of images as described in Section 2. The 3D
scene correction using image disparities, as described in Section 3,
relies on the existence of textured areas in the given set of images.
Large uniform colored regions may not provide suitable matches
for estimating image disparities between pairs of images. However,
such image regions can be easily segmented providing reliable
object contours. In the following we propose using contours of
segmented objects for correcting the 3D scene.
4.1. Detecting disparities in object contours
Let us assume that the scene contains at least two distinct
objects fA; Bg∈S in the scene. The background is assumed as a
distinct object, part of the scene as well. We consider that each
object outline from the 3D scene is projected onto contours in the
input images, denoted as fa
i
; b
i
g∈I
i
; i ¼1; …; n where
a
i
¼P
i
A; b
i
¼P
i
B ð16Þ
where P
i
represents the projection matrix from the 3D scene to
the ith image. In some of the images one or both objects can be
occluded and their contours may not be everywhere visible. The
assumption in the following is that the objects from the 3D scene
are inconsistent with the given set of images I due to errors in
their shape estimation.
In this paper we consider segmentation for defining the
contours of objects, such as fA; Bg in the 3D scene and fa
i
, b
i
g in
the image I
i
from the set I . In the case of 3D objects we assume
that we have an initial scene as provided by the initialization
described in the previous sections. Segmenting the given 3D scene
is rather straight forward since the RBF surface delimits each
object from the surrounding area, except for the case when two
objects are in contact with each other. 3D objects are characterized
by an additional location feature when compared to their corre-
sponding projections into images and are easier to segment
even when they are only roughly modeled. A simple compactness
criterion or a clustering algorithm considering three location and
their corresponding three color components, can be used for
segmenting 3D objects. Let us denote by Pðx∈Ajz
3D
Þ and
Pðx∈Bjz
3D
Þ the probability of segmenting the objects A and B in
3D, where z
3D
represents the feature vectors characterizing the set
of locations from the 3D scene.
We consider both unsupervised and supervised image segmen-
tations for extracting object contours from images. The unsupervised
segmentation corresponds to clustering in the feature space [31,32].
In the case of supervised classification, the image segmentation is
M. Grum, A.G. Bors / Pattern Recognition 47 (2014) 326–343 329