New scalable varied density clustering algorithm for large datasets

Ahmed Fahim; Abdel badeeh Salem; Gunter Saake

New scalable varied density clustering algorithm for large datasets

Ahmed Fahim
, Abdel badeeh Salem
, Gunter Saake

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

1 Scopus citations

Abstract

Finding clusters in data is a challenging problem especially when the clusters are being ofwidely varied shapes, sizes, and densities. Herein a new scalable clustering technique whichaddresses all these issues is proposed. In data mining, the purpose of data clustering is toidentify useful patterns in the underlying dataset. Within the last several years, manyclustering algorithms have been proposed in this area of research. Among all these proposedmethods, density clustering methods are the most important due to their high ability to detectarbitrary shaped clusters. Moreover these methods often show good noise-handlingcapabilities, where clusters are defined as regions of typical densities separated by low or nodensity regions. In this chapter, we aim at enhancing the well-known algorithm DBSCAN, tomake it scalable and able to discover clusters from uneven datasets in which clusters areregions of homogenous densities. We achieved the scalability of the proposed algorithm byusing the k-means algorithm to get initial partition of the dataset, applying the enhancedDBSCAN on each partition, and then using a merging process to get the actual natural numberof clusters in the underlying dataset. This means the proposed algorithm consists of threestages. Experimental results using synthetic datasets show that the proposed clusteringalgorithm is faster and more scalable than the enhanced DBSCAN counterpart.

Original language	English
Title of host publication	Mathematical Modeling, Clustering Algorithms and Applications
Publisher	Nova Science Publishers, Inc.
Pages	179-194
Number of pages	16
ISBN (Print)	9781616686819
State	Published - Jan 2011
Externally published	Yes

Cite this

@inbook{bb2b1b99b7094001babc159e5d0fd63d,

title = "New scalable varied density clustering algorithm for large datasets",

abstract = "Finding clusters in data is a challenging problem especially when the clusters are being ofwidely varied shapes, sizes, and densities. Herein a new scalable clustering technique whichaddresses all these issues is proposed. In data mining, the purpose of data clustering is toidentify useful patterns in the underlying dataset. Within the last several years, manyclustering algorithms have been proposed in this area of research. Among all these proposedmethods, density clustering methods are the most important due to their high ability to detectarbitrary shaped clusters. Moreover these methods often show good noise-handlingcapabilities, where clusters are defined as regions of typical densities separated by low or nodensity regions. In this chapter, we aim at enhancing the well-known algorithm DBSCAN, tomake it scalable and able to discover clusters from uneven datasets in which clusters areregions of homogenous densities. We achieved the scalability of the proposed algorithm byusing the k-means algorithm to get initial partition of the dataset, applying the enhancedDBSCAN on each partition, and then using a merging process to get the actual natural numberof clusters in the underlying dataset. This means the proposed algorithm consists of threestages. Experimental results using synthetic datasets show that the proposed clusteringalgorithm is faster and more scalable than the enhanced DBSCAN counterpart.",

author = "Ahmed Fahim and Salem, \{Abdel badeeh\} and Gunter Saake",

year = "2011",

month = jan,

language = "English",

isbn = "9781616686819",

pages = "179--194",

booktitle = "Mathematical Modeling, Clustering Algorithms and Applications",

publisher = "Nova Science Publishers, Inc.",

address = "United States",

}

TY - CHAP

T1 - New scalable varied density clustering algorithm for large datasets

AU - Fahim, Ahmed

AU - Salem, Abdel badeeh

AU - Saake, Gunter

PY - 2011/1

Y1 - 2011/1

N2 - Finding clusters in data is a challenging problem especially when the clusters are being ofwidely varied shapes, sizes, and densities. Herein a new scalable clustering technique whichaddresses all these issues is proposed. In data mining, the purpose of data clustering is toidentify useful patterns in the underlying dataset. Within the last several years, manyclustering algorithms have been proposed in this area of research. Among all these proposedmethods, density clustering methods are the most important due to their high ability to detectarbitrary shaped clusters. Moreover these methods often show good noise-handlingcapabilities, where clusters are defined as regions of typical densities separated by low or nodensity regions. In this chapter, we aim at enhancing the well-known algorithm DBSCAN, tomake it scalable and able to discover clusters from uneven datasets in which clusters areregions of homogenous densities. We achieved the scalability of the proposed algorithm byusing the k-means algorithm to get initial partition of the dataset, applying the enhancedDBSCAN on each partition, and then using a merging process to get the actual natural numberof clusters in the underlying dataset. This means the proposed algorithm consists of threestages. Experimental results using synthetic datasets show that the proposed clusteringalgorithm is faster and more scalable than the enhanced DBSCAN counterpart.

AB - Finding clusters in data is a challenging problem especially when the clusters are being ofwidely varied shapes, sizes, and densities. Herein a new scalable clustering technique whichaddresses all these issues is proposed. In data mining, the purpose of data clustering is toidentify useful patterns in the underlying dataset. Within the last several years, manyclustering algorithms have been proposed in this area of research. Among all these proposedmethods, density clustering methods are the most important due to their high ability to detectarbitrary shaped clusters. Moreover these methods often show good noise-handlingcapabilities, where clusters are defined as regions of typical densities separated by low or nodensity regions. In this chapter, we aim at enhancing the well-known algorithm DBSCAN, tomake it scalable and able to discover clusters from uneven datasets in which clusters areregions of homogenous densities. We achieved the scalability of the proposed algorithm byusing the k-means algorithm to get initial partition of the dataset, applying the enhancedDBSCAN on each partition, and then using a merging process to get the actual natural numberof clusters in the underlying dataset. This means the proposed algorithm consists of threestages. Experimental results using synthetic datasets show that the proposed clusteringalgorithm is faster and more scalable than the enhanced DBSCAN counterpart.

UR - https://www.scopus.com/pages/publications/84892944712

M3 - Chapter

AN - SCOPUS:84892944712

SN - 9781616686819

SP - 179

EP - 194

BT - Mathematical Modeling, Clustering Algorithms and Applications

PB - Nova Science Publishers, Inc.

ER -

New scalable varied density clustering algorithm for large datasets

Abstract

Other files and links

Fingerprint

Cite this