首页Applied Multivariate Statistics with R
Applied Multivariate Statistics with R
需积分: 14 73 浏览量 更新于2023-05-31 评论 收藏 7.24MB PDF 举报
Applied Multivariate Statistics with R textbook 有目录 可选择 可跳转
Statistics for Biology and Health
Statistics with R
Statistics with R
School of Public Health
New Haven, CT, USA
ISSN 1431-8776 ISSN 2197-5671 (electronic)
Statistics for Biology and Health
ISBN 978-3-319-14092-6 ISBN 978-3-319-14093-3 (eBook)
Library of Congress Control Number: 2015942244
Springer Cham Heidelberg New York Dordrecht London
© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica-
tion does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www.
请将以下内容进行压缩凝练（注意：可对内容进行删减，对错误进行修正，对语句顺序进行调整，符合美式英语标准，符合英语母语者语言习惯，句子简明易懂，术语使用准确，保留文章结构、不偏离论文主要内容）： The multivariate statistics are used to reduce the dimensionality of the data and thus achieve feature extraction, after which the feature components are applied to detect geochemical anomalies. Statistical analysis of geochemical data based on common statistical variables is required first, and then ANOVA, correlation analysis, regression analysis, cluster analysis, discriminant analysis, factor analysis, etc. are performed on this basis. However, multivariate statistics has obvious shortcomings. It uses mathematical-statistical methods to establish models, and after finding the functional relationship between variables, predictions can be made, but they tend to discuss whether the models or conclusions drawn on small-scale data are true and credible, and the prediction effect is poor. In addition, geochemical data usually do not satisfy normal distribution as well as log-normal distribution, which contradicts the premise of using methods of multivariate statistical analysis.
精简下面表达：Existing protein function prediction methods integrate PPI networks and multivariate bioinformatics data to improve the performance of function prediction. By combining multivariate information, the interactions between proteins become diverse. Different interactions’ functions in functional prediction are various. Combining multiple interactions simply between two proteins can effectively reduce the effect of false negatives and increase the number of predicted functions, but it can also increase the number of false positive functions, which contribute to nonobvious enhancement for the overall functional prediction performance. In this article, we have presented a framework for protein function prediction algorithms based on PPI network and semantic similarity with the addition of protein hierarchical functions to them. The framework relies on diverse clustering algorithms and the calculation of protein semantic similarity for protein function prediction. Classification and similarity calculations for protein pairs clustered by the functional feature are more accurate and reliable, allowing for the prediction of protein function at different functional levels from different proteomes, and giving biological applications greater flexibility.The method proposed in this paper performs well on protein data from wine yeast cells, but how well it matches other data remains to be verified. Yet until now, most unknown proteins have only been able to predict protein function by calculating similarities to their homologues. The predictions result of those unknown proteins without homologues are unstable because they are relatively isolated in the protein interaction network. It is difficult to find one protein with high similarity. In the framework proposed in this article, the number of features selected after clustering and the number of protein features selected for each functional layer has a significant impact on the accuracy of subsequent functional predictions. Therefore, when making feature selection, it is necessary to select as many functional features as possible that are important for the whole interaction network. When an incorrect feature was selected, the prediction results will be somewhat different from the actual function. Thus as a whole, the method proposed in this article has improved the accuracy of protein function prediction based on the PPI network method to a certain extent and reduces the probability of false positive prediction results.
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额