A nonnegative matrix factorization 631
There are generally two types of semi-supervised clustering methods: constraint-based
methods [22] and metric-based methods [42]. In constraint-based approaches, the algorithm
makes use of different constraints such as class labels or pairwise constraints to guide the
search for an appropriate clustering. The pairwise constraints (side-information) are often
taken as ‘must-links’ and ‘cannot-links’ relationship between objects in order to enhance
unsupervised clustering algorithms. A must-link constraint is used to specify that the two
instances in the must-link relation should be associated with the same cluster. A cannot-link
constraint is used to specify that the two instances in the cannot-link relation should not
be associated with the same cluster. These sets of constraints act as a guide for constrained
clustering algorithms, which attempt to find clusters in a data set satisfying the specified must-
link and cannot-link constraints. Classical semi-supervised clustering with labeled seeding
points can refer to [1,2]. As for semi-supervised clustering with labeled constraints, Wagstaff
et al. [39] enforced constraints to be satisfied during the cluster assignment in the clustering
process. Davidson et al. [12] presented the feasibility of clustering under different types of
constraints. Lu et al. [29] proposed a probabilistic clustering based on Gaussian mixture mod-
els (GMM) of the data distribution, which provided a flexible framework that encompassed
several other semi-supervised clustering models as its special cases.
In metric-based approaches, an existing clustering algorithm that uses a distance measure
is employed. However, the distance measure is first trained to satisfy the labels or constraints in
the supervised data [21,47]. In constraint-based approaches, the clustering algorithm should
be modified so that the constraints can be used to bias the search for an appropriate clustering
of the data [8]. Some noticeable work include: Hu et al. [19] integrated the constraints
into the K-means objective function, which is expressed as equivalent trace formulation.
Chang et al. [7] proposed a metric learning method which performs nonlinear transformation
globally and linear transformation locally. The trend for current research in semi-supervised
clustering is to combine both of these approaches. Yin et al. [45] proposed an adaptive
semi-supervised clustering kernel method based on metric learning, which can deal with the
problem of manually tuning kernel parameters and violation problem of pairwise constraints.
Besides, Gu et al. [17] proposed a dual regularized co-clustering method based on semi-
nonnegative matrix tri-factorization, which considered the geometric structure in the data.
The co-clustering method was formulated as semi-nonnegative matrix tri-factorization with
two graph regularizers.
Generally speaking, co-clustering algorithms deal with dyadic data and there are many
co-clustering algorithms initially developed for bioinformatics [23]. In terms of document
clustering, the similarity between words and the similarity between documents can be used
to co-cluster the term-document matrix. Thereafter, the co-occurrence frequencies can be
encoded in co-occurrence matrices, and then, matrix factorizations can be adopted to solve
the clustering problem [6,16]. Dhillon [14] proposed a co-clustering approach, modeling the
algorithm as an information-theoretic partition of the empirical joint probability distribution
of two sets of discrete random variables. Banerjee et al. [1] extended Dhillon’s method to a
general Bregman co- clustering and matrix factorization framework. Rege et al. [33]presented
a graph theoretic approach to the problem of document-word co-clustering, which used
isoperimetric co-clustering algorithm (ICA) for partitioning the document-word bipartite
graph. The Bayesian interpretation was also introduced [35,41] to formulate a two-sided
generative model for document and word co-occurrence. More recently, Song et al. [36]
proposed an approach combining the benefits of information-theoretic co-clustering and
constrained clustering.
NMF was first proposed in 1994 by Paatero et al. [32] and started to be extensively studied
after the publication of an article by Lee et al. [24] in 1999. The classical NMF algorithms
123