Introduction to Applied Multivariate Analysis with R 原版PDF by ...
The majority of data sets collected by researchers in all disciplines are multivariate, meaning that several measurements, observations, or recordings are taken on each of the units in the data set. These units might be human subjects, archaeological artifacts, countries, or a vast variety of other things. In a few cases, it may be sensible to isolate each variable and study it separately, but in most instances all the variables need to be examined simultaneously in order to fully grasp the structure and key features of the data. For this purpose, one or another method of multivariate analysis might be helpful, and it is with such methods that this book is largely concerned. Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The aim of all the techniques is, in a general sense, to display or extract the signal in the data in the presence of noise and to nd out what the data show us in the midst of their apparent chaos.
请将以下内容进行压缩凝练（注意：可对内容进行删减，对错误进行修正，对语句顺序进行调整，符合美式英语标准，符合英语母语者语言习惯，句子简明易懂，术语使用准确，保留文章结构、不偏离论文主要内容）： The multivariate statistics are used to reduce the dimensionality of the data and thus achieve feature extraction, after which the feature components are applied to detect geochemical anomalies. Statistical analysis of geochemical data based on common statistical variables is required first, and then ANOVA, correlation analysis, regression analysis, cluster analysis, discriminant analysis, factor analysis, etc. are performed on this basis. However, multivariate statistics has obvious shortcomings. It uses mathematical-statistical methods to establish models, and after finding the functional relationship between variables, predictions can be made, but they tend to discuss whether the models or conclusions drawn on small-scale data are true and credible, and the prediction effect is poor. In addition, geochemical data usually do not satisfy normal distribution as well as log-normal distribution, which contradicts the premise of using methods of multivariate statistical analysis.
精简下面表达：Existing protein function prediction methods integrate PPI networks and multivariate bioinformatics data to improve the performance of function prediction. By combining multivariate information, the interactions between proteins become diverse. Different interactions’ functions in functional prediction are various. Combining multiple interactions simply between two proteins can effectively reduce the effect of false negatives and increase the number of predicted functions, but it can also increase the number of false positive functions, which contribute to nonobvious enhancement for the overall functional prediction performance. In this article, we have presented a framework for protein function prediction algorithms based on PPI network and semantic similarity with the addition of protein hierarchical functions to them. The framework relies on diverse clustering algorithms and the calculation of protein semantic similarity for protein function prediction. Classification and similarity calculations for protein pairs clustered by the functional feature are more accurate and reliable, allowing for the prediction of protein function at different functional levels from different proteomes, and giving biological applications greater flexibility.The method proposed in this paper performs well on protein data from wine yeast cells, but how well it matches other data remains to be verified. Yet until now, most unknown proteins have only been able to predict protein function by calculating similarities to their homologues. The predictions result of those unknown proteins without homologues are unstable because they are relatively isolated in the protein interaction network. It is difficult to find one protein with high similarity. In the framework proposed in this article, the number of features selected after clustering and the number of protein features selected for each functional layer has a significant impact on the accuracy of subsequent functional predictions. Therefore, when making feature selection, it is necessary to select as many functional features as possible that are important for the whole interaction network. When an incorrect feature was selected, the prediction results will be somewhat different from the actual function. Thus as a whole, the method proposed in this article has improved the accuracy of protein function prediction based on the PPI network method to a certain extent and reduces the probability of false positive prediction results.
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额