首页FEATURE SELECTION FOR KNOWLEDGE DISCOVERY AND DATA MINING
FEATURE SELECTION FOR KNOWLEDGE DISCOVERY AND DATA MINING
需积分: 10 101 浏览量 更新于2023-05-27 评论 1 收藏 12.74MB PDF 举报
FEATURE SELECTION FOR KNOWLEDGE DISCOVERY AND DATA MINING
KNOWLEDGE DISCOVERY AND
KLUWER INTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE
Congress Cataloging.in.Publication Data
Feature selection for knowledge discovery and data mining / by
Huan Liu and Hiroshi Motoda.
p. cm. -- (Kluwer international series in engineering and
computer science ; 454)
Inc1udes bibliographical references and index.
Database management. 2. Data mining.
Title. III. Series : Kluwer international series in engineering
and computer science ; SECS 454.
Copyright el 1998 by Springer Science+Business Media
Originally published by Kluwer Academic Publishers, New York in 1998. Second
rights reserved. No part of this publication may
reproduced, stored in a
retrieval system or transmitted in any form or
any means, mechanical, photo-
copying, recording, or otherwise, without the prior written permission of the publisher,
Springer Science+Business Media, LLC.
Printed an acid-free paper.
KNOWLEDGE DISCOVERY AND
1.1.1 Features 2
2.2.1 Search directions
2.3 Selection Criteria
2.3.3 Distance measures
2.3.4 Dependence measures
“修改如下论文段落(标题为“An efficientnot-only-linear correlationcoefficient based on machinelearning”，关键词为“correlationcoefficient、nonlinearrelationships、gene expression”)，保留大部分对其他学术论文的引用，文章使用的专业术语减至最少，文本语法正确，纠正拼写错误，使用主动语态和清晰的句子结构。 段落 New technologies have vastly improved data collection, generating a deluge of information acrossdifferent disciplines. This large amount of data provides new opportunities to address unansweredscientific questions, provided we have efficient tools capable of identifying multiple types ofunderlying patterns. Correlation analysis is an essential statistical technique for discoveringrelationships between variables [l]. Correlation coefficients are often used in exploratory datamining techniques, such as clustering or community detection algorithms, to compute a similarityvalue between a pair of objects of interest such as genes 2] or disease-relevant lifestyle factors3]. Correlation methods are also used in supervised tasks, for example, for feature selection toimprove prediction accuracy [4,5]. The Pearson correlation coefficient is ubiquitously deployedacross application domains and diverse scientific areas.Thus, even minor and significantimprovements in these techniques could have enormous consequences in industry and research.
以下是一篇即将投稿Minerals期刊（MDPI出版社）的论文初稿的部分内容，请按照该期刊对论文格式的要求，将以下内容进行压缩凝练（注意：可对内容进行删减，对错误进行修正，对语句顺序进行调整，符合美式英语标准，符合英语母语者语言习惯，句子简明易懂，术语使用准确，保留文章结构、不偏离论文主要内容）： Rocks and ore components directly enter the soil and water system sediments through physical weathering and chemical weathering, and the geochemical anomalies originally present in the rocks further spread with the entry into the soil or directly into the water system, forming soil anomalies and water system sediment anoma-lies.Geochemical anomaly detection is essentially the detection of signal anomalies in geochemical data, which refers to finding out the anomalous distribution of chemical elements themselves and the anomalous distribution of multiple elements in combination through feature extraction and analysis processing of geochemical data in the study area, and reflecting the mineral distribution through the distribution of geochemical ele-ments.Through the method of geochemical anomaly finding, the detected anomalies may contain information indicating specific minerals, which facilitates the rapid tracing of prospective areas and favorable areas for mineralization, identifies possible mineralizing elements and distribution characteristics in the work area, provides basic information for the strategic deployment of mineralization search, and provides good indications for later mineralization search.
请将以下内容进行压缩凝练（注意：可对内容进行删减，对错误进行修正，对语句顺序进行调整，符合美式英语标准，符合英语母语者语言习惯，句子简明易懂，术语使用准确，保留文章结构、不偏离论文主要内容）： The multivariate statistics are used to reduce the dimensionality of the data and thus achieve feature extraction, after which the feature components are applied to detect geochemical anomalies. Statistical analysis of geochemical data based on common statistical variables is required first, and then ANOVA, correlation analysis, regression analysis, cluster analysis, discriminant analysis, factor analysis, etc. are performed on this basis. However, multivariate statistics has obvious shortcomings. It uses mathematical-statistical methods to establish models, and after finding the functional relationship between variables, predictions can be made, but they tend to discuss whether the models or conclusions drawn on small-scale data are true and credible, and the prediction effect is poor. In addition, geochemical data usually do not satisfy normal distribution as well as log-normal distribution, which contradicts the premise of using methods of multivariate statistical analysis.
请将以下内容进行压缩凝练（注意：可对内容进行删减，对错误进行修正，对语句顺序进行调整，符合美式英语标准，符合英语母语者语言习惯，句子简明易懂，术语使用准确，保留文章结构、不偏离论文主要内容）： A widely adopted approach in unsupervised learning is autoencoder, which is based on backpropagation algorithms with optimization methods (e.g., gradient descent) that use the input data itself as supervision to guide the neural network to try to learn a mapping relationship to obtain a reconstructed output. It has the advantage of high generalization ability and does not require extensive data labeling. Autoencoder networks can be used to explore the potential features present in the data and are often used for unsupervised data feature extraction and analysis.
精简下面表达：Existing protein function prediction methods integrate PPI networks and multivariate bioinformatics data to improve the performance of function prediction. By combining multivariate information, the interactions between proteins become diverse. Different interactions’ functions in functional prediction are various. Combining multiple interactions simply between two proteins can effectively reduce the effect of false negatives and increase the number of predicted functions, but it can also increase the number of false positive functions, which contribute to nonobvious enhancement for the overall functional prediction performance. In this article, we have presented a framework for protein function prediction algorithms based on PPI network and semantic similarity with the addition of protein hierarchical functions to them. The framework relies on diverse clustering algorithms and the calculation of protein semantic similarity for protein function prediction. Classification and similarity calculations for protein pairs clustered by the functional feature are more accurate and reliable, allowing for the prediction of protein function at different functional levels from different proteomes, and giving biological applications greater flexibility.The method proposed in this paper performs well on protein data from wine yeast cells, but how well it matches other data remains to be verified. Yet until now, most unknown proteins have only been able to predict protein function by calculating similarities to their homologues. The predictions result of those unknown proteins without homologues are unstable because they are relatively isolated in the protein interaction network. It is difficult to find one protein with high similarity. In the framework proposed in this article, the number of features selected after clustering and the number of protein features selected for each functional layer has a significant impact on the accuracy of subsequent functional predictions. Therefore, when making feature selection, it is necessary to select as many functional features as possible that are important for the whole interaction network. When an incorrect feature was selected, the prediction results will be somewhat different from the actual function. Thus as a whole, the method proposed in this article has improved the accuracy of protein function prediction based on the PPI network method to a certain extent and reduces the probability of false positive prediction results.
Local region image retrieval is a special type of image retrieval that uses only a part of an image to find the source image that contains the same or similar region with the query from a database which has many difficulties, especially in large datasets. Retrieval efficiency and accuracy are the focus points. The common way is to preprocess the data by extracting,segmenting and classifying it into several categories and then training them which transforms to a classifying task when retrieval. In this paper, we proposed a novel framework to achieve competitive retrieval performance,by directly retrieving other than transforming to other types. Firstly, we use a Region Divider to split images into several regions with hierarchy region information; Secondly, a Feature Extractor is used to translate the regions into feature vectors; And then, a dimension reduction method is used to reduce the vector size. Finally, using a similarity metric method to calculate similarity and re-ranks. Experiments and tests were conducted on the selfmade test set based on Google Landmarks Dataset-v2 and PASCAL-VOC2012, compared with the open-source software VGG Image Search Engine with the latest version released on July 12,2022 . The proposed method shows the effectiveness and superiority. Furthermore, an additional experiment has been made, the analysis of accuracy correlated with the feature’s dimension has significance for fu- ture research and other work.这句话有错误吗？
给我 这个论文A Hybrid Data Analytics Framework with Sentiment Convergence and Multi-Feature Fusion for Stock Trend Prediction 总结
def get_code_data(code, start_date='20230101', end_date='20230201'): pro = ts.pro_api('16c7b894d325e9a3d19e679bd73b4cb027503a1d933449e663bfffc6') code_data = pro.daily(ts_code = code, start_date = start_date, end_date = end_date) feature_name = ['股票代码', '交易日期', '开盘价','最高价','最低价','收盘价', '前一日收盘价', '换手率', '涨跌幅', '交易量', '交易金额'] code_data.columns = feature_name return code_data这段代码中如何print其中的局部变量code_data
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额