Extension of CA to multiway data-sets through HOSVD 3
1 Introduction
Dimension reduction has been and still is one of the most widely used method to address modeling
questions on data sets living in large dimensional spaces [1, 2, with further detailed references
within]. One of the most popular and used method is Principal Component Analysis (PCA) [3,
4, 5, 6, 7, 8, 9, 10, 11]. It has three guises
1. an algebraic framework, working with vector spaces and matrix algebra, based on the
Singular Value Decomposition (SVD) of the data matrix, see, e.g., [12]
2. a geometric framework, by associating a point cloud with the matrix representing the data
set (one point per row, one dimension per feature), see, e.g., [13, 14]
3. a statistical modeling approach, see, e.g., [15, 16].
PCA has been extended to a diversity of situations, like Correspondence Analysis (CA) [17,
18, 19, 20, 21, 22, 23] for the analysis of contingency tables with metrics associated with χ
2
distances [23, 24]. From an algebraic perspective, CA is the PCA of a contingency table with
the metric associated with the inverse of row and column marginals. CA has been explained in
a geometric framework as well [25, 26]. Indeed, as for PCA, in the geometric view of CA each
row of a contingency table corresponds to the coordinates of a point in a specific vector space.
Since a similar argument holds for both rows and columns, a contingency table is naturally
associated with a two point clouds. Thanks to the underlying dimension reduction techniques,
CA makes possible the simultaneous visualization and interpretation of the two point clouds in
a low dimension space where a specific barycentric relation links the two objects. The dictionary
between the geometric and algebraic framework relies on the selection of weights and metrics in
the spaces where the point clouds live, see [13, 14].
Multiway data have appeared in a wide range of domains, requesting a further development
of these analysis techniques. PCA and associated methods have been extended algebraically to
dimension reduction in multiway array, see, e.g., [27, 28, 29, 30, 31, 32, 33] and many references
therein, with a link with tensor algebra [34]. These are algebraic extensions, relying on numerical
computations based on elementary operations in tensor algebra in the same way that PCA and
CA rely on numerical linear algebra. Starting from the generalization of PCA to multiway data
through the Tucker model [35], with High Order Singular Value Decomposition (HOSVD) [30], an
algebraic development of CA to MultiWay Correspondence Analysis (MWCA) follows naturally.
Indeed MWCA is the HOSVD of a multiway contingency table associated with metric of the
marginal inverse per each mode, see [29]. Our work aims to study the geometric perspective
of MWCA, which is still poorly investigated. More precisely, we show that a point cloud is
associated with each mode of a tensor in MWCA in the same way that a point cloud is associated
with either rows or columns of a matrix. Finally, the correspondence between two point clouds
based on scaling and computing barycenters holding in the classical CA is generalized to d point
clouds in MWCA.
The remainder of this paper is organized as follows. Section 2 introduces all the prelim-
inary tensor definitions, spaces and operations, clarifying the chosen notations. In Section 3
the algebraic link between the principal components is presented. In particular, Subsection 3.3
extends the results of the previous subsections in a space with generic metric. Section 4 presents
the barycentric relation characterizing Correspondence Analysis in the tensor case. Finally we
highlight the different outcome on two data sets using the previous theoretical results.
RR n° 9429