180 Int J Comput Vis (2010) 89: 177–192
proposed a hybrid descriptor formed by combining features
extracted from a depth-buffer and spherical-function based
representation, with enhanced translation and rotation in-
variance properties. The advantage of this method over sim-
ilar approaches is the top discriminative power along with
minimum space and time requirements.
2.2 Relevance Feedback in 3D Object Retrieval
In order to enable the machine to retrieve information
through adapting to individual categorization criteria, rel-
evance feedback (RF) was introduced as a means to involve
the user in the retrieval process and guide the retrieval sys-
tem towards the target. Relevance feedback was first used
to improve text retrieval (Rochio 1971), later on success-
fully employed in image retrieval systems and lately in a
few 3D object retrieval systems. It is the information that is
acquired from the user’s interaction with the retrieval sys-
tem about the relevance of a subset of the retrieved results.
Further information on relevance feedback methods can be
found in Ruthven and Lalmas (2003), Crucianu et al. (2004),
Zhou and Huang (2001) and Papadakis et al. (2008b).
Local relevance feedback (LRF), also known as pseudo
or blind relevance feedback, is different from the conven-
tional approach in that the user does not actually provide
any feedback at all. Instead, the required training data are
obtained based only on the unsupervised retrieval result.
The procedure comprises two steps. First, the user submits
a query to the system which uses a set of low-level features
to produce a ranked list of results which is not displayed to
the user. Second, the system reconfigures itself by only us-
ing the top m matches of the list, based on the assumption
that most likely they are relevant to the user’s query.
LRF was first employed in the context of text retrieval,
in order to extend the keywords comprising the query with
related words from the top ranked retrieved documents.
Apart from a few studies that incorporated RF in 3D ob-
ject retrieval (Elad et al. 2001; Bang and Chen 2002;Atmo-
sukarto et al. 2005; Lou et al. 2003; Leifman et al. 2005;
Akbar et al. 2006; Novotni et al. 2005), LRF has only lately
been examined in Papadakis et al. (2008b).
3 Computation of the PANORAMA Descriptor
In this section, we first describe the steps for the compu-
tation of the proposed descriptor (PANORAMA), namely:
(i) pose normalization (Sect. 3.1), (ii) extraction of the
panoramic views (Sect. 3.2) and (iii) feature extraction
(Sect. 3.3). Finally, in Sect. 3.4 we describe a weighing
scheme that is applied to the features and the procedure for
comparing two PANORAMA descriptors.
3.1 Pose Normalization
Prior to the extraction of the PANORAMA descriptor, we
must first normalize the pose of a 3D object, since the trans-
lation, rotation and scale characteristics should not influence
the measure of similarity between objects.
To normalize the translation of a 3D model we compute
its centroid using CPCA (Vranic 2004). In CPCA, the cen-
troid of a 3D mesh model is computed as the average of its
triangle centroids where every triangle is weighed propor-
tionally to its surface area. We translate the model so that its
centroid coincides with the origin and translation invariance
is achieved as the centroids of all 3D models coincide.
To normalize for rotation, we use CPCA and NPCA (Pa-
padakis et al. 2007) in order to align the principal axes of a
3D model with the coordinate axes. First, we align the 3D
model using CPCA to determine its principal axes using the
model’s spatial surface distribution and then we use NPCA
to determine its principal axes using the surface orientation
distribution. Both methods use Principal Component Analy-
sis (PCA) to compute the principal axes of the 3D model.
The difference between the two methods lies in the input
data that are used for the computation of the covariance ma-
trix. In particular, in CPCA the surface area coordinates are
used whereas in NPCA the surface orientation coordinates
are used which are obtained from the triangles’ normal vec-
tors. The detailed description regarding the formulation of
CPCA and NPCA can be found in Vranic (2004) and in our
previous work (Papadakis et al. 2007), respectively.
Thus, we obtain two alternative aligned versions of the
3D model, which are separately used to extract two sets of
features that are integrated into a single feature vector (see
Sect. 3.4).
The PANORAMA shape descriptor is rendered scale in-
variant, by normalizing the corresponding features to the
unit L
1
norm. As will be later described in Sects. 3.3.1 and
3.3.2, the features used by the PANORAMA descriptor are
obtained from the 2D Discrete Fourier Transform and 2D
Discrete Wavelet Transform. The corresponding coefficients
are proportional to the object’s scale, therefore by normal-
izing the coefficients to their unit L
1
norm we are in fact
normalizing all objects to the same scale.
3.2 Extraction of Panoramic Views
After the normalization of a 3D model’s pose, the next step
is to acquire a set of panoramic views.
To obtain a panoramic view, we project the model to the
lateral surface of a cylinder of radius R and height H =2R,
centered at the origin with its axis parallel to one of the co-
ordinate axes (see Fig. 1). We set the value of R to 3 ∗d
mean
where d
mean
is the mean distance of the model’s surface
from its centroid. For each model, the value of d
mean
is deter-
mined using the diagonal elements of the covariance matrix