超图方法：重叠矩阵模式可视化

13 浏览量更新于2024-08-25 收藏 548KB PDF 举报

"重叠矩阵模式可视化——超图方法(2008)-计算机科学" 这篇论文探讨了在计算机科学领域，特别是数据挖掘和可视化技术中的一个挑战性问题：如何有效地展示一组发现的重叠子矩阵及其相互关系。作者 Ruoming Jin、Yang Xiang、David Fuhry 和 Feodor F. Dragan 来自肯特州立大学的计算机科学系，他们提出了一种基于超图的解决方案。在论文中，研究人员关注的是如何在给定一组感兴趣的重叠子矩阵时，通过重新排列数据矩阵的行和列来最佳地呈现这些子矩阵及它们之间的关联。他们发现这个问题可以转换为超图排序问题，这是一个传统最小线性排列（或图排序）问题的推广，并且证明了该问题属于 NP-难度。为了解决这个复杂的问题，作者提出了一种新颖的迭代算法。这个算法利用现有的图排序算法来解决最优可视化问题。尽管它可能会收敛到局部最优解，但通过使用公开的事务数据集进行的详细实验评估显示，该算法在有效性和效率方面表现出色。 1. 引言论文开篇指出，当面对一组感兴趣的子矩阵时，如何合理地排列它们对于理解和分析数据至关重要。传统的矩阵可视化方法可能无法有效地处理这些重叠部分，因此需要新的方法来揭示隐藏的模式和关系。 2. 超图排序问题作者将原始问题转化为超图排序问题，超图是一种扩展的图结构，其中边可以连接任意数量的节点，这使得表示和处理重叠子矩阵成为可能。 3. 算法设计他们设计的迭代算法逐步优化矩阵的布局，每次迭代都试图改进当前的行和列顺序，以更好地展现子矩阵和它们的相互联系。尽管不能保证找到全局最优解，但算法能够确保向局部最优解收敛。 4. 实验与评估实验部分展示了算法在实际事务数据集上的应用，验证了算法在揭示子矩阵模式和提高可读性方面的有效性，同时也证明了算法的计算效率，使其适用于大数据集。 5. 结论与未来工作论文的结论部分可能讨论了算法的局限性以及未来可能的研究方向，比如优化算法以寻找更优解，或者将其应用到其他数据类型和问题中。这篇论文对计算机科学领域的数据可视化和数据挖掘研究具有重要意义，它提供了一种处理和展示重叠矩阵模式的新颖方法，有助于提升数据分析的深度和质量。

Overlapping Matrix Pattern Visualization: a Hypergraph Approach

Ruoming Jin Yang Xiang David Fuhry Feodor F. Dragan

Department of Computer Science

Kent State University, Kent, OH 44242

{jin,yxiang,dfuhry, dragan}@cs.kent.edu

Abstract

In this work, we study a visual data mining problem:

Given a set of discovered overlapping submatrices of inter-

est, how can we order the rows and columns of the data

matrix to best display these submatrices and their relation-

ships? We ﬁnd this problem can be converted to the hyper-

graph ordering problem, which generalizes the traditional

minimal linear arrangement (or graph ordering) problem

and then we are able to prove the NP-hardness of this prob-

lem. We propose a novel iterative algorithm which uti-

lize the existing graph ordering algorithm to solve the op-

timal visualization problem. This algorithm can always

converge to a local minimum. The detailed experimental

evaluation using a set of publicly available transactional

datasets demonstrates the effectiveness and efﬁciency of the

proposed algorithm.

1 Introduction

Given a set of discovered submatrices of interests, how

can we order the rows and columns of the data matrix to

best display these submatrices and their relationships? For

example, the right matrix r eveals much richer information

about the four submatrix patterns than the left one. This is

a central problem emerging from the visualization require-

ment of a wide range of data mining tasks [8; 13; 24]:

Figure 1. An example of matrix pattern visu-

alization

Overlapping Bicluster Visualization: Gene-expression

data is commonly represented as a matrix, where each gene

corresponds to a row and each experimental condition cor-

responds to a column. Each element of this matrix rep-

resents the expression level of the gene under a speciﬁc

condition. Often, this matrix can be converted into a bi-

nary matrix by considering that each gene is either “on” or

“off”. The typical pattern discovery task, often referred to

as bi-clustering [14], would ﬁnd “homogeneous” submatri-

ces, which are composed subsets of genes and conditions:

the genes are coregulated or coexpressed under the condi-

tions in the corresponding submatrices. Recently, there is a

lot of interest in discovering overlapping bi-clusters [8; 13].

Considering we have a list of most interesting submatrices

(bi-clusters) which overlap each other, how can we reorder

the rows and columns of the entire matrix so that we can vi-

sually inspect the relationship between these submatrices?

Transactional Data Visualization: The shopping-basket

data is one of the most studied data types in data mining.

Here each transaction corresponds to a row and each item

corresponds to column. The element of the binary matrix

records if the transaction purchased the item or not. Re-

cently, there is an increasing interest in summarizing the

data using a set of “dense” binary matrices [26; 5; 27]. In

a nutshell, the dense submatrix contains almost all 1s, and

a list of them can cover all the 1s in the entire matrix with

small false positive rate. Thus, the dense submatrix is also

closely related to the approximate frequent itemset pattern.

Given this, a similar problem occurs: how can we visualize

the entire matrix so that the dense submatrices of interests

and their relationships can be inspected?

Clearly, this task is very important in its own right and

complementary to some of the most critical and widely used

data mining tasks, such as bi-clustering and association rule

mining. However, it is not a typical data mining task, but be-

longs to the area of visual data mining or information visual-

ization [11; 4]. Visual data mining can be largely partitioned

into two categories: data visualization [11] and pattern visu-

alization [28; 25]. In the ﬁrst category, the goal is to provide

the user an overview of the data. This is especially impor-

tant for the high dimensional data and other types of struc-

ture or text data, where a direct 2D view does not exist. In

this study, we will visualize high-dimensional transactional

data through its matrix representation. Note that matrix vi-

sualization has been a useful tool for visualizing relational

datasets, such as graphs [11]. In the second category, we are

interested in a visual representation of those already discov-

ered patterns or other mining models, such as association

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38732740

粉丝: 2
资源: 895

超图方法：重叠矩阵模式可视化

Overlapping-Cell-Nuclei-Segmentation-using-DBN-matlab源码

bcpkix-jdk15on-1.68-API文档-中英对照版.zip

峰值检测matlab代码-An-overlapping-community-detection-algorithm-based-on-dens

编写一个名为Overlapping_search的函数，其输入的参数为两个字符型向量，target和 pattern。该函数的功能是：计算pattern向量中每个元素在target向量中连续出现两次的数量。函数返回一个与pattern长度相等的数值型向量。

最新资源