联合非负矩阵分解在转录组学数据元分析中的应用

研究论文

102 浏览量更新于2024-08-29 收藏 433KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"jNMFMA 是一种用于转录组学数据联合非负矩阵分解的元分析方法，旨在解决大量异质性omics数据的整合分析挑战。这种方法由 Wang、Zheng 和 Zhao 在2014年发表在《Bioinformatics》期刊上，卷31，第4期，页码572-580。文章的doi是10.1093/bioinformatics/btu679，属于系统生物学领域。" 正文：在当前的生物信息学研究中，随着高通量测序技术的飞速发展，产生了海量的转录组学数据。这些数据的积累为挖掘新的生物学知识提供了前所未有的机会，但同时也带来了巨大的挑战，尤其是在如何有效整合和分析这些异质性数据方面。传统的方法通常对每个基因单独进行分析，忽略了基因之间的相互作用，这可能导致检测差异表达基因（DEG）时产生较高的假阳性率。 jNMFMA（Joint Non-negative Matrix Factorization for Meta-Analysis）是一种创新的解决方案，它利用非负矩阵分解（Non-negative Matrix Factorization, NMF）技术来处理这个问题。NMF是一种有监督的机器学习方法，能发现数据中的潜在结构，并且在分解过程中保留了非负属性，使得结果更容易解释。在jNMFMA中，NMF被用于对多个独立实验的转录组数据进行联合分析，从而揭示不同样本集间的共同模式和差异。 jNMFMA的主要优点在于考虑了基因之间的依赖结构，通过同时分析所有基因，可以更准确地识别出共同的生物学信号。这种方法能够捕获基因共表达网络，降低假阳性率，提高DEG鉴定的可靠性。此外，jNMFMA还可以识别出在不同条件或疾病状态下的共变基因簇，这对于理解基因调控网络和疾病机制具有重要意义。在应用jNMFMA时，首先需要将来自不同实验的数据转换成统一的矩阵形式，然后通过NMF算法找到两个低秩的非负矩阵，它们的乘积尽可能接近原始数据矩阵。这两个矩阵分别代表了基因表达的潜在主题和样本的表达模式。通过这种方式，jNMFMA可以揭示基因之间的共表达关系，以及在不同研究中的共享表达模式。此外，jNMFMA的另一个关键点是其适应性，可以处理不同平台、不同样本量以及不同实验条件下的数据，增强了跨实验分析的通用性和稳健性。这种方法的提出为生物医学研究提供了一种有力的工具，有助于在大规模多组学数据中发现潜在的生物学规律。 jNMFMA是一种强大的转录组学数据整合分析工具，通过非负矩阵分解揭示基因间的依赖关系，提高了差异表达基因检测的准确性。这一方法对于促进系统生物学和精准医学的研究具有重要的科学价值和实际应用前景。

资源详情

资源推荐

Vol. 31 no. 4 2015, pages 572–580

BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btu679

Systems biology Advance Access publication October 16, 2014

jNMFMA: a joint non-negative matrix factorization meta-analysis

of transcriptomics data

Hong-Qiang Wang

,Chun-HouZheng

and Xing-Ming Zhao

Machine Intelligence and Computational Biology Lab, Hefei Institutes of Physical Science, Chinese Academy of Science,

Hefei 230031, China,

College of Electrical Engineering and Automation, Anhui University, Hefei 230031, China and

Department of Computer Science, School of Electronics and Information Engineering, Tongji University, Shanghai

201804, China

Associate Editor: Jonathan Wren

ABSTRACT

Motivation: Tremendous amount of omics data being accumulated

poses a pressing challenge of meta-analyzing the heterogeneous data

for mining new biological knowledge. Most existing methods deal with

each gene independently, thus often resulting in high false positive

rates in detecting differentially expressed genes (DEG). To our know-

ledge, no or little effort has been devoted to methods that consider

dependence structures underlying transcriptomics data for DEG iden-

tification in meta-analysis context.

Results: This article proposes a new meta-analysis method for iden-

tification of DEGs based on joint non-negative matrix factorization

(jNMFMA). We mathematically extend non-negative matrix factoriza-

tion (NMF) to a joint version (jNMF), which is used to simultaneously

decompose multiple transcriptomics data matrices into one common

submatrix plus multiple individual submatrices. By the jNMF, the

dependence structures underlying transcriptomics data can be inter-

rogated and utilized, while the high-dimensional transcriptomics data

are mapped into a low-dimensional space spanned by metagenes that

represent hidden biological signals. jNMFMA finally identifies DEGs as

genes that are associated with differentially expressed metagenes.

The ability of extracting dependence structures makes jNMFMA

more efficient and robust to identify DEGs in meta-analysis context.

Furthermore, jNMFMA is also flexible to identify DEGs that are

consistent among various types of omics data, e.g. gene expression

and DNA methylation. Experimental results on both simulation data

and real-world cancer data demonstrate the effectiveness of jNMFMA

and its superior performance over other popular approaches.

Availability and implementation: RcodeforjNMFMA is available for

non-commercial use via http://micblab.iim.ac.cn/Download/.

Contact: hqwang@ustc.edu

Supplementary information: Supplementary data are available at

Bioinformatics online.

Received on July 10, 2014; revised on September 26, 2014; accepted

on October 10, 2014

1INTRODUCTION

As high throughput biotechnologies have become routine tools

in biological and biomedical researches, tremendous amounts of

omics data have been generated that provide great opportunity

for deciphering molecular mechanisms of cancer or other

diseases (Jiao et al., 2014; Natrajan and Wilkerson, 2013;

TCGA, 2012; Zhang et al., 2013). Two famous public gene ex-

pression databases, GEO (www.ncbi.nlm.nih.gov/geo/) and

ArrayExpress (www.ebi.ac.uk/arrayexpress/), have deposited

transcriptomic data with more than a million assays from

more than 30 000 studies. Another valuable resource, the

TCGA project (http://cancergenome.nih.gov/), has released vari-

ous types of omics data for nearly 10 000 cancer patient samples.

Reusing the flood of transcriptomics data with meta-analysis can

reduce sample bias and increase statistical power, and thus allow

for indepth understanding of pathology of cancer or other dis-

eases at molecular level (Rung and Brazma, 2013). However, the

key issue of meta-analysis, i.e. capturing consistent but subtle

patterns of gene activity across multiple transcriptomics datasets,

still remains challenging both theoretically and practically.

Differentially expressed genes (DEG) across studies could

reflect subtle but consistent biological effects and might

be false negatives in individual analysis (Xia et al., 2013).

To efficiently identify DEGs, meta-analysis methods need to

overcome a variety of biological or non-biological variations

introduced by distinct protocols and data platforms used in

individual studies (Rung and Brazma, 2013). From the aspect

of information to be combined, existing meta-analysis methods

can be categorized into three classes: P-value-based, effect

size-based and rank-based, which each deal with non-specific

variations at different levels of data. Among them, the P-value-

based method is statistically most intuitive but allows for

standardization of topic-related associations from studies to the

common scale of significance (Li and Tseng, 2011). However,

the performance of P value-based methods heavily depends on

the underlying method used for P value calculation in individual

analysis (Tseng et al., 2012). Compared with P value-based meth-

ods, the effect size-based methods estimate and directly synthe-

size effect sizes across studies by using a t-statistic-like model.

Because the effect size quantity provides a direct measure of

differential expression, effect size methods tend to be more effi-

cient in detecting DEGs than the P value-based methods (Hong

and Breitling, 2008). There are two types of effect size models

that can be used for meta-analysis of transcriptomics data:

fixed-effect model (FEM) and random effect model (REM),

which differ in whether between-study variation is ignorable.

Generally, effect size-based methods suffer from unreliable

error estimates due to improper distribution assumption

*To whom correspondence should be addressed.

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38637918

粉丝: 9
资源: 946

联合非负矩阵分解在转录组学数据元分析中的应用

Projected Gradient Methods for Non-negative Matrix Factorization

Algorithms for Non-negative Matrix 论文概要

关于稳健非负矩阵分解（Robust Non-Negative Matrix Factorization）方法进行高光谱影像解混的Matlab代码

Non-negative matrix factorization分析TCGA数据库TPM数据代码

非负矩阵分解（Non-negative Matrix Factorization，NMF）聚类的MATLAB代码示例中，数据集的格式应该是什么样子的

给我推荐20个比较流行的人声分离算法模型

ieee icassp recent advances in nonnegative matrix factorization

综述常用的聚类算法（包括：单聚类算法和双聚类算法）

topic modeling matlab

母体血浆中胎儿和母体细胞DNA去卷积算法

语音分离的网络模型及介绍

IS散度的NMF/MU算法matlab代码

NMF是如何实现聚类的

decomposition 中 NMF的具体参数作用

decomposition 中 NMF的参数作用

matlab非负矩阵分解

simulink矩阵模块名称

非负矩阵分解聚类的MATLAB代码中eps通常取多少

spectral unmixing

sklearn nmf 参数

最新资源