R Reference Card for Data Mining
by Yanchang Zhao, yanchang@rdatamining.com, January 3, 2013
The latest version is available at http://www.RDataMining.com. Click the link
also for document R and Data Mining: Examples and Case Studies.
The package names are in parentheses.
Association Rules & Frequent Itemsets
APRIORI Algorithm
a level-wise, breadth-first algorithm which counts transactions to find frequent
itemsets
apriori() mine associations with APRIORI algorithm (arules)
ECLAT Algorithm
employs equivalence classes, depth-first search and set intersection instead of
counting
eclat() mine frequent itemsets with the Eclat algorithm (arules)
Packages
arules mine frequent itemsets, maximal frequent itemsets, closed frequent item-
sets and association rules. It includes two algorithms, Apriori and Eclat.
arulesViz visualizing association rules
Sequential Patterns
Functions
cspade() mining frequent sequential patterns with the cSPADE algorithm
(arulesSequences)
seqefsub() searching for frequent subsequences (TraMineR)
Packages
arulesSequences add-on for arules to handle and mine frequent sequences
TraMineR mining, describing and visualizing sequences of states or events
Classification & Prediction
Decision Trees
ctree() conditional inference trees, recursive partitioning for continuous, cen-
sored, ordered, nominal and multivariate response variables in a condi-
tional inference framework (party)
rpart() recursive partitioning and regression trees (rpart)
mob() model-based recursive partitioning, yielding a tree with fitted models
associated with each terminal node (party)
Random Forest
cforest() random forest and bagging ensemble (party)
randomForest() random forest (randomForest)
varimp() variable importance (party)
importance() variable importance (randomForest)
Neural Networks
nnet() fit single-hidden-layer neural network (nnet)
Support Vector Machine (SVM)
svm() train a support vector machine for regression, classification or density-
estimation (e1071)
ksvm() support vector machines (kernlab)
Performance Evaluation
performance() provide various measures for evaluating performance of pre-
diction and classification models (ROCR)
roc() build a ROC curve (pROC)
auc() compute the area under the ROC curve (pROC)
ROC() draw a ROC curve (DiagnosisMed)
PRcurve() precision-recall curves (DMwR)
CRchart() cumulative recall charts (DMwR)
Packages
rpart recursive partitioning and regression trees
party recursive partitioning
randomForest classification and regression based on a forest of trees using ran-
dom inputs
rpartOrdinal ordinal classification trees, deriving a classification tree when the
response to be predicted is ordinal
rpart.plot plots rpart models with an enhanced version of plot.rpart in the
rpart package
ROCR visualize the performance of scoring classifiers
pROC display and analyze ROC curves
Regression
Functions
lm() linear regression
glm() generalized linear regression
nls() non-linear regression
predict() predict with models
residuals() residuals, the difference between observed values and fitted val-
ues
gls() fit a linear model using generalized least squares (nlme)
gnls() fit a nonlinear model using generalized least squares (nlme)
Packages
nlme linear and nonlinear mixed effects models
Clustering
Partitioning based Clustering
partition the data into k groups first and then try to improve the quality of clus-
tering by moving objects from one group to another
kmeans() perform k-means clustering on a data matrix
kmeansCBI() interface function for kmeans (fpc)
kmeansruns() call kmeans for the k-means clustering method and includes
estimation of the number of clusters and finding an optimal solution from
several starting points (fpc)
pam() the Partitioning Around Medoids (PAM) clustering method (cluster)
pamk() the Partitioning Around Medoids (PAM) clustering method with esti-
mation of number of clusters (fpc)
cluster.optimal() search for the optimal k-clustering of the dataset
(bayesclust)
clara() Clustering Large Applications (cluster)
fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (clus-
ter)
kcca() k-centroids clustering (flexclust)
ccfkms() clustering with Conjugate Convex Functions (cba)
apcluster() affinity propagation clustering for a given similarity matrix (ap-
cluster)
apclusterK() affinity propagation clustering to get K clusters (apcluster)
cclust() Convex Clustering, incl. k-means and two other clustering algo-
rithms (cclust)
KMeansSparseCluster() sparse k-means clustering (sparcl)
tclust(x,k,alpha,...) trimmed k-means with which a proportion
alpha of observations may be trimmed (tclust)
Hierarchical Clustering
a hierarchical decomposition of data in either bottom-up (agglomerative) or top-
down (divisive) way
hclust(d, method, ...) hierarchical cluster analysis on a set of dissim-
ilarities d using the method for agglomeration
birch() the BIRCH algorithm that clusters very large data with a CF-tree
(birch)
pvclust() hierarchical clustering with p-values via multi-scale bootstrap re-
sampling (pvclust)
agnes() agglomerative hierarchical clustering (cluster)
diana() divisive hierarchical clustering (cluster)
mona() divisive hierarchical clustering of a dataset with binary variables only
(cluster)
rockCluster() cluster a data matrix using the Rock algorithm (cba)
proximus() cluster the rows of a logical matrix using the Proximus algorithm
(cba)
isopam() Isopam clustering algorithm (isopam)
LLAhclust() hierarchical clustering based on likelihood linkage analysis
(LLAhclust)
flashClust() optimal hierarchical clustering (flashClust)
fastcluster() fast hierarchical clustering (fastcluster)
cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchi-
cal clustering dendrograms (dynamicTreeCut)
HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)
Model based Clustering
Mclust() model-based clustering (mclust)
HDDC() a model-based method for high dimensional data clustering (HDclas-
sif )
fixmahal() Mahalanobis Fixed Point Clustering (fpc)
fixreg() Regression Fixed Point Clustering (fpc)
mergenormals() clustering by merging Gaussian mixture components (fpc)
Density based Clustering
generate clusters by connecting dense regions
dbscan(data,eps,MinPts,...) generate a density based clustering of
arbitrary shapes, with neighborhood radius set as eps and density thresh-
old as MinPts (fpc)
pdfCluster() clustering via kernel density estimation (pdfCluster)
Other Clustering Techniques
mixer() random graph clustering (mixer)
nncluster() fast clustering with restarted minimum spanning tree (nnclust)
orclus() ORCLUS subspace clustering (orclus)
Plotting Clustering Solutions
plotcluster() visualisation of a clustering or grouping in data (fpc)
bannerplot() a horizontal barplot visualizing a hierarchical clustering (clus-
ter)