This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1
Matrix Factorization From a Few Dominators
Quan Liu , Yanbo Wang, and Bo Yuan
Abstract— The Pareto principle states that most effects are the result of
a few dominating causes. This principle also fits most matrix completion
problems. In practice, most real-world data sets exhibit nonuniformly
distributed observations. Unfortunately, most existing algorithms are
designed based on uniformly distributed observations. In this brief,
we propose a matrix factorization approach to recover a large-scale
matrix from a dominating submatrix that is composed of a few most
important rows and columns from the original matrix. The method for
evaluating the importance of a row or column is inspired by the term
frequency-inverse document frequency in natural language processing.
The selected submatrix is recovered using an existing base matrix fac-
torization algorithm. Then, factors of the completed submatrix are used
to retrieve the factors of the whole matrix via a linear regression model.
Numerical experiments demonstrate the effectiveness of our approach
for recovering the matrix from nonuniformly distributed observations.
In addition, our framework is naturally applicable for parallel and
distributed computing, which is very encouraging for massive-sized
data sets.
Index Terms—Divide-and-conquer, linear regression, matrix
completion, matrix factorization, parallel computing.
I. I
NTRODUCTION
Matrix completion has a great number of applications in learning
systems, including collaborative filtering in recommender sys-
tems [1], structure from motion in computer vision [2], global
positioning in sensor networks [3], multiclass learning in data analy-
sis [4], and deoxyribonucleic acid imputation in bioinformatics [5].
A commonly used assumption is that the unknown matrix M is
low rank or approximately low rank. Therefore, low-rank matrix
completion is a rank minimization problem as follows:
min
X∈R
m×n
Rank(X) s.t. P
(X) = P
(M) (1)
where X and M are both m × n matrices and P
is an orthogonal
projection onto the observed entries defined as P
(M
i, j
) = M
i, j
if (i, j) ∈ , or 0 otherwise. To solve (1), two widely used
heuristic approaches are nuclear norm minimization [6], [7] and
low-rank matrix factorization [8]. Many algorithms have since been
developed following these two approaches, such as singular value
thresholding [9], accelerated proximal gradient [10], softImpute [11],
and singular value projection (SVP) [12]. Most existing algorithms
have beautiful theories to guarantee their success in recovering the
unknown entries, in particular for the algorithms based on nuclear
norm regularization [6], [7]. However, in practice, circumstances are
more complicated. First, in the era of big data, the size of most
real-world data sets is very large, and most existing algorithms are
not suitable for handling such massive-sized data sets in a single
machine and cannot be easily implemented in distributed systems.
Manuscript received December 28, 2016; revised September 29, 2017 and
January 13, 2018; accepted January 16, 2018. This work was supported by
the National Natural Science Foundation of China under Grant 61472243 and
Grant 61574089. (Corresponding author: Bo Yuan.)
The authors are with the School of Computer Science and Engineer-
ing, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail:
liuquan@sjtu.org; yanbowang@nuc.edu.cn; boyuan@sjtu.edu.cn).
Color versions of one or more of the figures in this brief are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2018.2795581
Second, most real-world data sets do not obey the basic assumptions
on which these algorithms are established.
For the first problem, most existing algorithms [8]–[13] require
time-expensive calculations of singular value decomposition dur-
ing each iteration. This is a huge obstacle to the utilization of
these algorithms on large-scale data sets. To tackle this size prob-
lem, it is urged to embrace parallel and distributed computing.
Some notable works have been done in this area. For example,
Divide-Factor-Combine (DFC) [14] randomly splits the original
problem into several subproblems, solves these subproblems using
existing algorithms in parallel, and then combines the results of each
subproblem to generate the final solution. Local low rank matrix
approximation [15] also divides the original matrix into several
local submatrices and then combines them using a weighted sum.
SoftImpute-alternating least squares [16] combines nuclear norm
regularization and alternative least squares together to form an
algorithm that outperforms both; parallel computing is also used in
this algorithm.
The second problem mentioned above is that most algorithms are
designed and analyzed based on unrealistic assumptions. Most theo-
ries and algorithms assume that the observed entries are distributed
uniformly at random in the matrix [6], [7], [9], [10], [17]. Unfortu-
nately, most real-world data sets do not obey this assumption. The
famous Netflix Prize data set [18] is an example: 20% of the movies
receive 89.20% of the ratings and 20% of the users contribute 63.48%
of the ratings. The data set exhibits power-law distributed observa-
tions instead [19]. Power-law distribution is a common phenomenon
in the real world, for example, social networks, the World Wide Web,
and biological networks [20]. There are some remarkable works on
nonuniformly distributed matrix completion [19], [21]–[24]. Matrix
completion from information cascading [19] is the first work on
power-law distribution. Srebro and Salakhutdinov [21] proposed a
weighted trace-norm regularization (WTR) model. In their method,
the trace norm is regularized with weight by the frequency of
rows and columns. They have shown empirically that a weighted
trace norm can lead to significant performance improvement on
nonuniformly distributed data sets. Chen et al. [24] provided a
theoretical recovery guarantee for WTR.
A. Our Contributions
In this brief, we propose a distributed matrix factorization
approach, entitled Sample-Factor-Expand-Combine (SFEC), for
large-scale matrix completion problems. SFEC selects a few most
important rows and columns from the original matrix to compose a
small submatrix, solves the selected submatrix using a base matrix
completion algorithm and retains the result in factored form, expands
factors of the submatrix into factors of the original matrix via a
linear regression model, and finally combines the expanded factors
to recover the full large-scale matrix.
SFEC attempts to solve two important problems, massive size
and nonuniform observations. For the massive size problem, SFEC
divides the original massive-sized matrix into small submatrices that
can be recovered much easier. Furthermore, the expand step of SFEC
can be executed in parallel for each row or column, which is naturally
2162-237X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.