Vol 9 (3) | January 2016 | www.indjst.org
Indian Journal of Science and Technology
3
T. Sajana, C. M. Sheela Rani and K. V. Narayana
further clusters until desired no of clusters are formed.
BIRCH, CURE, ROCK, Chameleon, Echidna, Wards,
SNN, GRIDCLUST, CACTUS are some of Hierarchical
clustering algorithms in which clusters of Non convex,
Arbitrary Hyper rectangular are formed.
3.3 Density based Clustering algorithms:
Data objects are categorized into core points, border
points and noise points. All the core points are connected
together based on the densities to form cluster. Arbitrary
shaped clusters are formed by various clustering
algorithms such as DBSCAN, OPTICS, DBCLASD,
GDBSCAN, DENCLU and SUBCLU.
3.4 Grid based Clustering algorithms:
Grid based algorithm partitions the data set into no
number of cells to form a grid structure. Clusters are
formed based on the grid structure. To form clusters
Grid algorithm uses subspace and hierarchical clustering
techniques. STING, CLIQUE, Wave cluster, BANG,
OptiGrid, MAFIA, ENCLUS, PROCLUS, ORCLUS, FC
and STIRR. Compare to all Clustering algorithms Grid
algorithms are very fast processing algorithms. Uniform
grid algorithms are not sucient to form desired clusters.
To overcome these problem Adaptive grid algorithms
such as MAFIA and AMR Arbitrary shaped clusters are
formed by the grid cells.
3.5 Model based Clustering algorithms:
Set of data points are connected together based on
various strategies like statistical methods, conceptual
methods, and robust clustering methods. ere are two
approaches for model based algorithms one is neural
network approach and another one is statistical approach.
Algorithms such as EM, COBWEB, CLASSIT, SOM, and
SLINK are well known Model based clustering algorithms.
4. Comparison of Clustering
Algorithms
Various clustering methods discussed which mine
the data from Big Data. Every algorithm has its own
greatness and weakness. is paper presents various
clustering algorithms related to the 4 V’s of Big Data
characteristics.
4.1 Volume:
it refers to the ability of an algorithm to deal with large
amounts of a data. With respect to the Volume property
the criteria for clustering algorithms to be considered is
a. Size of the data set b. High dimensionality c. Handling
Outliers.
• Size of the data set: Data set is collection of attributes.
e attributes are categorical, nominal, ordinal, in-
terval and ratio. Many clustering algorithms support
numerical and categorical data.
• High dimensionality: To handle big data as the size of
data set increases no of dimensions are also increases.
It is the curse of dimensionality.
• Outliers: Many clustering algorithms are capable of
handle outliers. Noise data cannot be making a group
with data points.
4.2 Variety:
refers to the ability of a clustering algorithm to handle
dierent types of data sets such as numerical, categorical,
nominal and ordinal. A criterion for clustering algorithms
is (a) type of data set (b) cluster shape.
• Type of data set: e size of the data set is small or big
but many of the clustering algorithms support large
data sets for big data mining.
• Cluster shape: Depends on the data set size and type
shape of the cluster formed.
4.3 Velocity:
Refers to the computations of clustering algorithm
based on the criteria (a) running time complexity of a
clustering algorithm.
• Time complexity: If the computations of algorithms
take very less no then algorithm has less run time.
e algorithms the run time calculation done based
on Big O notation.
4.4 Value:
For a clustering algorithm to process the data
accurately and to form a cluster with less computation
input parameter are play key role. e values of various
clustering algorithms are given in Table 1.