Overlapping Matrix Pattern Visualization: a Hypergraph Approach
Ruoming Jin Yang Xiang David Fuhry Feodor F. Dragan
Department of Computer Science
Kent State University, Kent, OH 44242
{jin,yxiang,dfuhry, dragan}@cs.kent.edu
Abstract
In this work, we study a visual data mining problem:
Given a set of discovered overlapping submatrices of inter-
est, how can we order the rows and columns of the data
matrix to best display these submatrices and their relation-
ships? We find this problem can be converted to the hyper-
graph ordering problem, which generalizes the traditional
minimal linear arrangement (or graph ordering) problem
and then we are able to prove the NP-hardness of this prob-
lem. We propose a novel iterative algorithm which uti-
lize the existing graph ordering algorithm to solve the op-
timal visualization problem. This algorithm can always
converge to a local minimum. The detailed experimental
evaluation using a set of publicly available transactional
datasets demonstrates the effectiveness and efficiency of the
proposed algorithm.
1 Introduction
Given a set of discovered submatrices of interests, how
can we order the rows and columns of the data matrix to
best display these submatrices and their relationships? For
example, the right matrix r eveals much richer information
about the four submatrix patterns than the left one. This is
a central problem emerging from the visualization require-
ment of a wide range of data mining tasks [8; 13; 24]:
Figure 1. An example of matrix pattern visu-
alization
Overlapping Bicluster Visualization: Gene-expression
data is commonly represented as a matrix, where each gene
corresponds to a row and each experimental condition cor-
responds to a column. Each element of this matrix rep-
resents the expression level of the gene under a specific
condition. Often, this matrix can be converted into a bi-
nary matrix by considering that each gene is either “on” or
“off”. The typical pattern discovery task, often referred to
as bi-clustering [14], would find “homogeneous” submatri-
ces, which are composed subsets of genes and conditions:
the genes are coregulated or coexpressed under the condi-
tions in the corresponding submatrices. Recently, there is a
lot of interest in discovering overlapping bi-clusters [8; 13].
Considering we have a list of most interesting submatrices
(bi-clusters) which overlap each other, how can we reorder
the rows and columns of the entire matrix so that we can vi-
sually inspect the relationship between these submatrices?
Transactional Data Visualization: The shopping-basket
data is one of the most studied data types in data mining.
Here each transaction corresponds to a row and each item
corresponds to column. The element of the binary matrix
records if the transaction purchased the item or not. Re-
cently, there is an increasing interest in summarizing the
data using a set of “dense” binary matrices [26; 5; 27]. In
a nutshell, the dense submatrix contains almost all 1s, and
a list of them can cover all the 1s in the entire matrix with
small false positive rate. Thus, the dense submatrix is also
closely related to the approximate frequent itemset pattern.
Given this, a similar problem occurs: how can we visualize
the entire matrix so that the dense submatrices of interests
and their relationships can be inspected?
Clearly, this task is very important in its own right and
complementary to some of the most critical and widely used
data mining tasks, such as bi-clustering and association rule
mining. However, it is not a typical data mining task, but be-
longs to the area of visual data mining or information visual-
ization [11; 4]. Visual data mining can be largely partitioned
into two categories: data visualization [11] and pattern visu-
alization [28; 25]. In the first category, the goal is to provide
the user an overview of the data. This is especially impor-
tant for the high dimensional data and other types of struc-
ture or text data, where a direct 2D view does not exist. In
this study, we will visualize high-dimensional transactional
data through its matrix representation. Note that matrix vi-
sualization has been a useful tool for visualizing relational
datasets, such as graphs [11]. In the second category, we are
interested in a visual representation of those already discov-
ered patterns or other mining models, such as association