基因表达数据分析：配对对特征选择方法探索

研究论文

94 浏览量更新于2024-08-26 收藏 2.4MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇综述文章探讨了基因表达数据分析中的配对对特征选择方法，旨在为高维基因数据的理解和分析提供支持。随着微阵列、RNA测序(RNA-seq)和单细胞RNA测序等技术产生的基因表达数据迅速积累，特征选择变得至关重要。这些方法有助于数据降维和识别关键的特征基因，从而促进后续的数据分析和解释。" 在基因表达数据分析领域，特征选择是核心问题之一，因为它能够帮助科学家从海量数据中找出关键的生物标志物，进一步理解基因功能、疾病发生机制或药物靶点。配对对特征选择方法是一种处理这种问题的有效策略，尤其适用于临床试验或病例对照研究，其中样本通常以配对形式出现，如正常与疾病状态的比较。文章可能涵盖了以下几点内容： 1. **配对对设计的优势**：在配对对设计中，同一个体在不同条件下的样本被配对，可以减少个体间变异的影响，提高统计功效，使得特征选择更准确，结果更具生物学意义。 2. **特征选择方法分类**：文章可能讨论了各种特征选择方法，如过滤式、包裹式和嵌入式方法。过滤式方法快速但可能丢失重要信息；包裹式方法全面但计算复杂；嵌入式方法结合了两者优点，如正则化技术（LASSO、Ridge回归等）。 3. **配对对方法的应用**：这些方法在疾病诊断、预后预测、治疗响应分析等场景中有着广泛的应用，例如比较癌症患者手术前后的基因表达差异，寻找与疾病进展相关的基因。 4. **评价指标与算法**：文章可能涉及不同的评价标准，如ROC曲线、AUC值、F-score等，以及相应的优化算法，如遗传算法、粒子群优化等。 5. **案例研究与实证分析**：通过具体的基因表达数据集，作者可能展示了如何应用这些方法进行特征选择，并分析了所选特征的生物学意义。 6. **未来挑战与发展趋势**：随着单细胞测序等新技术的涌现，高维数据带来的挑战增加，文章可能会讨论配对对特征选择方法的新趋势和未解决的问题，如数据噪声处理、稀疏数据的处理等。 7. **结论**：最后，作者可能会总结当前研究的局限性，指出未来研究方向，强调配对对特征选择在基因表达数据分析中的重要性和潜在价值。这篇综述文章对理解基因表达数据分析中的配对对特征选择方法及其在生物医学研究中的应用提供了深入见解，对于研究人员来说是一份宝贵的参考资料。

资源详情

资源推荐

Mini Review

A Review of Matched-pairs Feature Selection Methods for Gene

Expression Data Analysis

Sen Liang

, Anjun Ma

b,c

, Sen Yang

, Yan Wang

⁎

,QinMa

b,c,

⁎⁎

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China

Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, Department of Mathematics and Statistics, South Dakota State University, Brookings,

SD 57007, USA

BioSNTR, Brookings, SD, USA

abstractarticle info

Article history:

Received 18 September 2017

Received in revised form 14 February 2018

Accepted 19 February 2018

Available online 25 February 2018

With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-

sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature

(signature genes) selection in support of making sense out of such high dimensional data. These computational

methods signiﬁcantly facilitate further data analysis and interpretation, such as gene fu nction enrichment

analysis, cancer biomarker detection, and drug targeting identiﬁcation in precision medicine. Although numer-

ous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the

appropriate methods for a speciﬁc problem and seek for the most reasonable ranking features. Meanwhile, the

paired gene expression data under matched case-control design (MCC D) is becoming increasingly popular,

which has often been used in multi-omics integration studies and may increase feature selection efﬁciency by

offsetting similar distributions of confounding features. The appropriate feature selection methods speciﬁcally

designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not

been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection

methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied

three classiﬁcation methods, and analyze the algorithm complexity of these methods through the running

of their pro grams. This review aims to induce and compr ehensively present the MPFS in such a way that

readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their

analyses.

Biotechnology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Keywords:

Matched-pairs feature selection

Matched case-control design

Paired data

Gene expression

Contents

1. Introduction...............................................................89

2. FeatureSelectionTechniques .......................................................89

2.1. UnpairedFeatureSelectionMethods.................................................89

2.2. ADifferentPerspectiveofFeatureSelectionByDataProperties.....................................90

3. Matched-pairsFeatureSelection......................................................90

3.1. ProblemDescription........................................................90

3.2. MethodsSurvey..........................................................90

3.2.1. TestStatisticforMPFS ...................................................90

3.2.2. ConditionalLogisticRegressionforMPFS...........................................91

3.2.3. BoostingStrategyforMPFS.................................................92

4. ExperimentalValidation..........................................................92

5. Discussion................................................................94

6. Conclusion................................................................95

Computational and Structural Biotechnology Journal 16 (2018) 88–97

⁎ Corresponding author.

⁎⁎ Correspondence to: Q. Ma, Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, Department of Mathematics and Statistics,

South Dakota State University, Brookings, SD 57007, USA.

E-mail addresses: wy6868@jlu.edu.cn (Y. Wang), qin.ma@sdstate.edu (Q. Ma).

https://doi.org/10.1016/j.csbj.2018.02.005

license (http://creativecommons.org/licenses/by/4.0/).

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/csbj

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38659311

粉丝: 5
资源: 892

基因表达数据分析：配对对特征选择方法探索

一种用于基因表达数据分析的具有纯度信息的配对对特征选择方法

基于椭圆曲线中配对的密码学研究综述.pdf

单基因泛癌配对表达图

对不同分类方法的分类性能进行统计分析，如配对T检验或重复测量的方差分析

对基因表达量FPKM建立差异显著性检验模型

成对数据除了主客体互倚模型外，还能用什么数据分析方法

R语言 差异基因分析

数据结构舞伴配对问题

SAS软件中，cox回归分析配对病例对照分析的代码是什么？

matlab复杂网络配对

用flann算法对特征点进行配对

python数据清洗 配对原图像及裁剪后图像

蓝牙配对流程源码分析csdn

GEO NGS数据分析

mirna靶基因预测

配对策略回测csdn

android 蓝牙配对传输数据协议 demo

android 蓝牙配对原理

android 蓝牙弹出框配对

android 蓝牙配对连接源码分析文档大全

最新资源

R语言差异基因分析

python数据清洗配对原图像及裁剪后图像