A Survey On Partitioning and Hierarchical Based Data Mining Clustering Techniques
A Survey On Partitioning and Hierarchical Based Data Mining Clustering Techniques
A Survey On Partitioning and Hierarchical Based Data Mining Clustering Techniques
16787-16791
© Research India Publications. http://www.ripublication.com
1
M.Kiruthika, 2Dr.S.Sukumaran
1
Ph.D Research Scholar, 2Associate Professor
Department of Computer Science,
Erode arts and Science College (Autonomous),
Erode, India.
16787
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 24 (2018) pp. 16787-16791
© Research India Publications. http://www.ripublication.com
The Clustering objectives are: optimize a certain criterion function. The criterion function
may emphasize the local or global arrangement of the data
To reveal natural combinations. and its optimization is an iterative method.
To set off premise about the data. There are various types of partitioning clustering algorithms
To come across dependable and convincing are:
organization of the data. A. K–Means
Clustering groups the objects depending upon the K-Means algorithm is the well-liked clustering algorithm. It
information established in the data describing the objects or iteratively computes the clusters and their centroids. It is a
their associations. The objective of clustering is that the top down approach to clustering. It is used for making and
objects in a group are related to one other and different from analyzing the clusters with ‘n’ amount of data points, point
the objects in other groups [15]. A high-quality cluster is separated into ‘K’ clusters supported the similarity
contains high intra-class similarity and low inter-class measure criterion. The result generated by the algorithm
similarity. The procedure of clustering is performed with generally depends on initial cluster centroids chosen. It is an
four basic steps. unvarying clustering algorithm during which items are
stirred among sets of clusters till the necessary set is
reached. As such, it can be viewed as a sort of squared
Feature selection or extraction inaccuracy algorithm, though the convergence criteria need
Feature selections decide characteristic features from a set of not be distinct supported the squared inaccuracy [18]. A
candidates’. The feature extraction also exploits some high degree of association between components in clusters is
transformations on data to generate useful and novel features obtained, whereas a high degree of variation between
from the original ones. Both are very crucial to the components in different clusters is achieved at the same
efficiency of clustering applications. time.
16788
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 24 (2018) pp. 16787-16791
© Research India Publications. http://www.ripublication.com
16789
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 24 (2018) pp. 16787-16791
© Research India Publications. http://www.ripublication.com
ignores the interconnectivity among the clusters and The cluster stability measures are based on the cross-
provides preference to distance between the representatives classification table of the actual clustering of the complete
points of two clusters [6]. Also, it fails to contemplate data with the clustering based on the removal of one
important features of a particular cluster therefore affecting column. The values of APN, ADM and FOM ranges from 0
the clusters merging decisions. CURE technique only works to 1, with smaller value corresponding to highly consistent
for metric data. clustering results. AD has a value between 0 and infinity,
and smaller values are also preferred [21]. Some of other
external measures including F- measures, Fair- Counting F-
D. ROCK measures, Rand measures, Jaccard index, Fowlkes –
Mallows Index, Confusion Matrix and Mutual
ROCK (Robust Clustering using Links) is a hierarchical Information’s.
algorithm in that to form clusters it utilizes a link strategy.
From bottom to top links are merging together to form a
cluster. It introduced the concept of link and neighbour.
Link incorporates comprehensive data of other similar 4. CONCLUSION
sufficient neighbours in order that not only two points are Clustering is significant in data analysis and data mining
measured whenever merging or splitting clusters [17]. applications. This paper discussed the various partitioning
Bigger is that the link, higher the likelihood of points being and hierarchical based clustering techniques and evaluation
in same cluster. Traditional algorithms used functions for metrics for clustering. Partition clustering algorithms are
Categorical and Boolean attributes however here concept of very helpful when the clusters are of bowed shape having
links (common neighbours) is introduced. ROCK has similar size and the amount of clusters can be recognized
demonstrated its power by being effectively utilized for real earlier. Due to the disability in predicting the amount of
datasets. clusters in advance Hierarchical clustering algorithms are
used. They partition the dataset into numerous levels of
partitioning termed as dendograms. These algorithms are
3. EVALUATION METRICS FOR CLUSTERING very useful in mining but the cost of formation of
dendograms is extremely high for huge datasets. All the
The process of validating the results of a clustering clustering algorithms are validated using cluster validation
algorithm is called as cluster validity. The two cluster
metrics such as internal and stability measures.
validation metrics are
1. Internal measures: The quality of clustering are
measured using the basic information of internal
REFERENCES
measures. Connectivity, Silhouette Width and Dunn
Index are the internal measures of clusters. [1]. Abdullah. Z, A. R. Hamdan, “Hierarchical Clustering
2. Stability Measures: It is a special version of internal Algorithms in Data Mining”, World Academy of
measures, which assesses the reliability of a clustering Science, Engineering and Technology International
outcome by matching it with the clusters obtained after Journal of Computer, Electrical, Automation, Control
every column is detached, one by one. Average and Information Engineering, Vol:9, No:10, 2015.
Proportion of Non-overlap (APN), Average Distance [2]. Amandeep Kaur Mann and Navneet Kaur, “Survey
(AD), Average Distance between Means (ADM) and Paper on Clustering Techniques”, International Journal
Figure of Merit (FOM) are the stability measures of of Science, Engineering and Technology Research
clusters. (IJSETR) Volume 2, Issue 4, April 2013.
As determined by the k-nearest neighbors the level of [3]. Amudha. S, “An Overview of Clustering Algorithm in
connectedness of the clusters is indicates the connectivity. It Data Mining”, International Research Journal of
has a range between 0 and ∞ and should be minimized. The Engineering and Technology (IRJET), Volume: 03
Silhouette width is the average of every observation's Issue: 12, Dec -2016.
Silhouette value [21]. Silhouette validation is validating the
[4]. Anoop Kumar Jain and Prof. Satyam Maheswari,
outcome of clustering to find out the accuracy of the
“Survey of Recent Clustering Techniques in Data
obtained outcomes from the cluster value. It has a range
Mining”, International Journal of Computer Science
between -1 to 1. The Silhouette cluster interpretation result
and Management Research, pp.72-78, 2012.
is,
[5]. Ding. K, C. Huo, Y. Xu, Z. Zhong, and C. Pan, “Sparse
<0.25 horrible split
hierarchal clustering for VHR image change
0.26 – 0.50 weak structure detection,” Geoscience and Remote Sensing Letters,
0.51 – 0.71 reasonable structure IEEE, 12 (3), pp. 577 – 581, 2015.
0.71 – 1.00 excellent split
[6]. Fahad A, Alshatri N, Tari Z and Alamri A, “A survey
The Dunn Index is the ratio between the minimum distances of clustering algorithms for Big Data: Taxonomy and
between observations not in the same cluster to the largest empirical analysis”, IEEE Transactions on Emerging
intra-cluster distance. It has a range between 0 and ∞ and Topics in Computing, pp. 267–79, 2014.
must be maximized.
16790
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 24 (2018) pp. 16787-16791
© Research India Publications. http://www.ripublication.com
[7]. Garima, Hina Gulati and P.K.Singh, “Clustering [21]. Sukhdev Singh Ghuman, “Clustering Techniques- A
Techniques in Data Mining: A Comparison“, Review”, International Journal of Computer Science
International Conference on Computing for Sustainable and Mobile Computing, Vol.5 Issue.5, pg. 524-530,
Global Development (INDIACom), pp. 410 – 415, May- 2016.
IEEE 2015.
[22]. Suman and Pinki Rani, “A Survey on STING and
[8]. Harshada S. Deshmukh and Prof. P. L. Ramteke, CLIQUE Grid Based Clustering Methods”,
“Comparing the Techniques of Cluster Analysis for International Journal of Advanced Research in
Big Data”, International Journal of Advanced Research Computer Science, Volume 8, No. 5, May-June 2017.
in Computer Engineering & Technology (IJARCET),
[23]. Sunil Chowdary, D. Sri Lakshmi Prasanna and P.
Volume 4 Issue 12, 2015.
Sudhakar, “Evaluating and Analyzing Clusters in Data
[9]. Namrata S Gupta, Bijendra S.Agrawal and Rajkumar Mining using Different Algorithms”, International
M. Chauhan , “ Survey on Clustering Techniques of Journal of Computer Science and Mobile Computing,
Data Mining”, American International Journal of Vol.3 Issue.2, pg. 86-99, 2014.
Research in Science, Technology, Engineering & [24]. Vaishali R. Patel and Rupa G. Mehta, “Clustering
Mathematics, pp-206-111, 2015. Algorithms: A Comprehensive Survey”, International
[10]. Lori Dalton, Virginia Ballarin and Marcel Brun, Conference on Electronics, Information and
“Clustering Algorithms: On Learning, Validation, Communication Systems Engineering, 2011.
Performance, and Applications to Genomics”, Current
[25]. Vijayalakshmi. M and M.Renuka Devi, “A Survey of
Genomics, Vol. 10, No. 6, 2009.
Different Issue of Different clustering Algorithms
[11]. Pandove D and Goel S, “A comprehensive study on
Used in Large Data sets” , International Journal of
clustering approaches for Big Data mining”, IEEE
Advanced Research in Computer Science and Software
Transactions on Electronics and Communication
Engineering, pp.305-307, 2012.
System; Coimbatore, pg.26-27, Feb-2015.
[12]. Pradeep Rai and Shubha Singh, “A Survey of [26]. Zeynel Cebeci and Figen Yildiz, “Comparison of K-
Clustering Techniques”, International Journal of Means and Fuzzy C-Means Algorithms on Different
Computer Applications, October - 2010. Cluster Structures”, Journal of Agricultural
[13]. Pragati Shrivastava and Hitesh Gupta, “A Review of Informatics, Vol. 6, No. 3, 2015.
Density-Based clustering in Spatial Data”,
International Journal of Advanced Computer Research,
pp.2249-7277, Sep-2012. AUTHOR PROFILE
[14]. Rashedi. E, A. Mirzaei, and M. Rahmati, “An M. Kiruthika received Bachelor of Computer Science
information theoretic approach to hierarchical (B.Sc.) degree from the Anna University, in 2014 and the
clustering combination,” Neurocomputing, 148, pp. Master of Computer Application (MCA) degree from the
487-497, 2015. Anna University, in 2016. She also recieved the M.Phil
degree from the Bharthiar Unviersity, Coimbatore, in 2018.
[15]. Rama Kalaivani.E, Suganya. G and Kiruba. J,
Currently she is doing her Ph.D computer science in Erode
“Review on K-Means and Fuzzy C Means Clustering
Arts and Science College. Her research area includes Data
Algorithm”, Imperial Journal of Interdisciplinary
Mining.
Research (IJIR), Vol-3, Issue-2, 2017.
[16]. Saroj and Tripti Chaudhary, “Study on Various Dr. S. Sukumaran graduated in 1985 with a degree in
Clustering Techniques”, International Journal of Science. He obtained his Master Degree in Science and
M.Phil in Computer Science from the Bharathiar University.
Computer Science and Information Technologies,
He received the Ph.D degree in Computer Science from the
Vol. 6 (3), pp.3031-3033, 2015.
Bharathiar University. He has 30 years of teaching
[17]. Sajana. T, C. M. Sheela Rani and K. V. Narayana “A
Survey on Clustering Techniques for Big Data experience starting from Lecturer to Associate Professor. At
Mining”, Indian Journal of Science and Technology, present he is working as Associate Professor of Computer
Science in Erode Arts and Science College, Erode, Tamil
Vol 9(3), DOI: 10.17485/ijst/2016/v9i3/75971, January
nadu. He has guided for more than 55 M.Phil research
2016.
Scholars in various fields and guided 13 Ph.D Scholars.
[18]. Sonamdeep Kaur, Sarika Chaudhary and Neha
Currently he is Guiding 3 M.Phil Scholars and 6 Ph.D
Bishnoi, “A Survey: Clustering Algorithms in Data
Mining”, International Journal of Computer Scholars. He is member of Board studies of various
Applications, ISSN: 0975 – 8887, 2015. Autonomous Colleges and Universities. He published
around 63 research papers in national and international
[19]. Soni Madhulatha. T “An Overview on Clustering
journals and conferences. His current research interests
Methods”, IOSR Journal of Engineering, Vol. 2(4),
include Image processing, Network Security and Data
pp: 719-725, Apr-2012.
Mining.
[20]. Sukhvir Kaur, “Survey of Different Data Clustering
Algorithms”, International Journal of Computer
Science and Mobile Computing, Vol.5 Issue.5, pg.
584-588, May- 2016.
16791