数据挖掘速查表：关键函数详解

需积分: 10 74 浏览量更新于2024-09-12 收藏 163KB PDF 举报

在数据挖掘的世界里，掌握核心的工具和技术至关重要。这张名为"RReferenceCardforDataMining"的参考资料卡提供了一种高效的学习路径，帮助理解并应用数据挖掘中的关键概念，无需深陷理论的海洋。本文档由Yanchang Zhao创建，旨在通过实际操作来简化学习过程。主要内容聚焦于两个主要的数据挖掘方法：关联规则（Association Rules）和频繁项集（Frequent Itemsets）。首先，介绍的是APRIORI算法，这是一种基于层次划分（level-wise）和广度优先搜索（breadth-first）的方法，用于寻找频繁项集。在R语言中，你可以通过`apriori()`函数在`arules`包中实现这一算法，它能挖掘出交易中的关联关系。另一种算法是ECLAT，它采用等价类（equivalence classes）、深度优先搜索（depth-first search）以及集合交集（set intersection）策略，避免直接计数。在R中，`eclat()`函数同样在`arules`包中提供了ECLAT算法的应用。 `arules`包不仅是APRIORI和ECLAT算法的执行者，还包含了对频繁项集的多种处理，如最大频繁项集、封闭频繁项集和关联规则。同时，它还包括了一个可视化工具`arulesViz`，可以帮助用户直观地展示关联规则。接下来，文档关注的是序列模式（Sequential Patterns），这是挖掘数据中具有时间或顺序依赖性的模式。`cspade()`函数在`arulesSequences`包中实现了cSPADE算法，专门用来挖掘频繁的序列模式。而`seqefsub()`函数则在`TraMineR`包中提供了频繁子序列搜索的功能，进一步扩展了数据挖掘的分析维度。 `arulesSequences`包作为`arules`的补充，专为处理和挖掘序列数据而设计，使得在R环境中处理复杂的时间序列数据成为可能。`TraMineR`则专注于序列模式挖掘，为用户提供更全面的序列数据分析工具。这张表格犹如一座桥梁，将复杂的数据挖掘理论与实用的R语言实践相结合，通过实例演示和实际操作，帮助读者迅速掌握数据挖掘的核心技术。无论是初次接触数据挖掘的初学者，还是经验丰富的数据分析师，都可以从中找到适合自己的学习路径。

R Reference Card for Data Mining

by Yanchang Zhao, yanchang@rdatamining.com, January 3, 2013

The latest version is available at http://www.RDataMining.com. Click the link

also for document R and Data Mining: Examples and Case Studies.

The package names are in parentheses.

Association Rules & Frequent Itemsets

APRIORI Algorithm

a level-wise, breadth-ﬁrst algorithm which counts transactions to ﬁnd frequent

itemsets

apriori() mine associations with APRIORI algorithm (arules)

ECLAT Algorithm

employs equivalence classes, depth-ﬁrst search and set intersection instead of

counting

eclat() mine frequent itemsets with the Eclat algorithm (arules)

Packages

arules mine frequent itemsets, maximal frequent itemsets, closed frequent item-

sets and association rules. It includes two algorithms, Apriori and Eclat.

arulesViz visualizing association rules

Sequential Patterns

Functions

cspade() mining frequent sequential patterns with the cSPADE algorithm

(arulesSequences)

seqefsub() searching for frequent subsequences (TraMineR)

Packages

arulesSequences add-on for arules to handle and mine frequent sequences

TraMineR mining, describing and visualizing sequences of states or events

Classiﬁcation & Prediction

Decision Trees

ctree() conditional inference trees, recursive partitioning for continuous, cen-

sored, ordered, nominal and multivariate response variables in a condi-

tional inference framework (party)

rpart() recursive partitioning and regression trees (rpart)

mob() model-based recursive partitioning, yielding a tree with ﬁtted models

associated with each terminal node (party)

Random Forest

cforest() random forest and bagging ensemble (party)

randomForest() random forest (randomForest)

varimp() variable importance (party)

importance() variable importance (randomForest)

Neural Networks

nnet() ﬁt single-hidden-layer neural network (nnet)

Support Vector Machine (SVM)

svm() train a support vector machine for regression, classiﬁcation or density-

estimation (e1071)

ksvm() support vector machines (kernlab)

Performance Evaluation

performance() provide various measures for evaluating performance of pre-

diction and classiﬁcation models (ROCR)

roc() build a ROC curve (pROC)

auc() compute the area under the ROC curve (pROC)

ROC() draw a ROC curve (DiagnosisMed)

PRcurve() precision-recall curves (DMwR)

CRchart() cumulative recall charts (DMwR)

Packages

rpart recursive partitioning and regression trees

party recursive partitioning

randomForest classiﬁcation and regression based on a forest of trees using ran-

dom inputs

rpartOrdinal ordinal classiﬁcation trees, deriving a classiﬁcation tree when the

response to be predicted is ordinal

rpart.plot plots rpart models with an enhanced version of plot.rpart in the

rpart package

ROCR visualize the performance of scoring classiﬁers

pROC display and analyze ROC curves

Regression

Functions

lm() linear regression

glm() generalized linear regression

nls() non-linear regression

predict() predict with models

residuals() residuals, the difference between observed values and ﬁtted val-

ues

gls() ﬁt a linear model using generalized least squares (nlme)

gnls() ﬁt a nonlinear model using generalized least squares (nlme)

Packages

nlme linear and nonlinear mixed effects models

Clustering

Partitioning based Clustering

partition the data into k groups ﬁrst and then try to improve the quality of clus-

tering by moving objects from one group to another

kmeans() perform k-means clustering on a data matrix

kmeansCBI() interface function for kmeans (fpc)

kmeansruns() call kmeans for the k-means clustering method and includes

estimation of the number of clusters and ﬁnding an optimal solution from

several starting points (fpc)

pam() the Partitioning Around Medoids (PAM) clustering method (cluster)

pamk() the Partitioning Around Medoids (PAM) clustering method with esti-

mation of number of clusters (fpc)

cluster.optimal() search for the optimal k-clustering of the dataset

(bayesclust)

clara() Clustering Large Applications (cluster)

fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (clus-

ter)

kcca() k-centroids clustering (ﬂexclust)

ccfkms() clustering with Conjugate Convex Functions (cba)

apcluster() afﬁnity propagation clustering for a given similarity matrix (ap-

cluster)

apclusterK() afﬁnity propagation clustering to get K clusters (apcluster)

cclust() Convex Clustering, incl. k-means and two other clustering algo-

rithms (cclust)

KMeansSparseCluster() sparse k-means clustering (sparcl)

tclust(x,k,alpha,...) trimmed k-means with which a proportion

alpha of observations may be trimmed (tclust)

Hierarchical Clustering

a hierarchical decomposition of data in either bottom-up (agglomerative) or top-

down (divisive) way

hclust(d, method, ...) hierarchical cluster analysis on a set of dissim-

ilarities d using the method for agglomeration

birch() the BIRCH algorithm that clusters very large data with a CF-tree

(birch)

pvclust() hierarchical clustering with p-values via multi-scale bootstrap re-

sampling (pvclust)

agnes() agglomerative hierarchical clustering (cluster)

diana() divisive hierarchical clustering (cluster)

mona() divisive hierarchical clustering of a dataset with binary variables only

(cluster)

rockCluster() cluster a data matrix using the Rock algorithm (cba)

proximus() cluster the rows of a logical matrix using the Proximus algorithm

(cba)

isopam() Isopam clustering algorithm (isopam)

LLAhclust() hierarchical clustering based on likelihood linkage analysis

(LLAhclust)

flashClust() optimal hierarchical clustering (ﬂashClust)

fastcluster() fast hierarchical clustering (fastcluster)

cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchi-

cal clustering dendrograms (dynamicTreeCut)

HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)

Model based Clustering

Mclust() model-based clustering (mclust)

HDDC() a model-based method for high dimensional data clustering (HDclas-

sif )

fixmahal() Mahalanobis Fixed Point Clustering (fpc)

fixreg() Regression Fixed Point Clustering (fpc)

mergenormals() clustering by merging Gaussian mixture components (fpc)

Density based Clustering

generate clusters by connecting dense regions

dbscan(data,eps,MinPts,...) generate a density based clustering of

arbitrary shapes, with neighborhood radius set as eps and density thresh-

old as MinPts (fpc)

pdfCluster() clustering via kernel density estimation (pdfCluster)

Other Clustering Techniques

mixer() random graph clustering (mixer)

nncluster() fast clustering with restarted minimum spanning tree (nnclust)

orclus() ORCLUS subspace clustering (orclus)

Plotting Clustering Solutions

plotcluster() visualisation of a clustering or grouping in data (fpc)

bannerplot() a horizontal barplot visualizing a hierarchical clustering (clus-

ter)

下载后可阅读完整内容，剩余3页未读，立即下载

wannasmile

粉丝: 0
资源: 7

数据挖掘速查表：关键函数详解

【数据挖掘】期末考试备考复习宝典 （一文搞定，期末考试不再担忧）.doc

一张图搞定python基础

一张图搞定5500pdf

kettle 循环抽取表数据

yolov8搞定系列

javascript 快速搞定前端技术一面

一招搞定高等数学docx

python读取matlab数据_两分钟搞定Python读取matlab的.mat数据

3分钟搞定SpringBoot+Mybatis+druid多数据源和分布式事务

如何轻松搞定segmentation fault

最新资源

【数据挖掘】期末考试备考复习宝典（一文搞定，期末考试不再担忧）.doc