An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation

Yue Pu; Wenbin Yao; Xiaoyong Li; Adi Alhudhaif

doi:10.1016/j.ins.2024.120187

An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation

Yue Pu
, Wenbin Yao
, Xiaoyong Li
, Adi Alhudhaif

Computer Sciences

Beijing University of Posts and Telecommunications

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05% compared to the original data and by 6.22% over HIAC.

Original language	English
Article number	120187
Journal	Information Sciences
Volume	663
DOIs	https://doi.org/10.1016/j.ins.2024.120187
State	Published - Mar 2024

Keywords

Adaptive KDE-decision graph
Clustering algorithm accuracy
Kernel density estimation
Outliers test

Access to Document

10.1016/j.ins.2024.120187

Cite this

@article{daf03a83446944b68009fc3f18a551b3,

title = "An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation",

abstract = "Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05\% compared to the original data and by 6.22\% over HIAC.",

keywords = "Adaptive KDE-decision graph, Clustering algorithm accuracy, Kernel density estimation, Outliers test",

author = "Yue Pu and Wenbin Yao and Xiaoyong Li and Adi Alhudhaif",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Inc.",

year = "2024",

month = mar,

doi = "10.1016/j.ins.2024.120187",

language = "English",

volume = "663",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation

AU - Pu, Yue

AU - Yao, Wenbin

AU - Li, Xiaoyong

AU - Alhudhaif, Adi

PY - 2024/3

Y1 - 2024/3

N2 - Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05% compared to the original data and by 6.22% over HIAC.

AB - Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05% compared to the original data and by 6.22% over HIAC.

KW - Adaptive KDE-decision graph

KW - Clustering algorithm accuracy

KW - Kernel density estimation

KW - Outliers test

UR - https://www.scopus.com/pages/publications/85185000729

U2 - 10.1016/j.ins.2024.120187

DO - 10.1016/j.ins.2024.120187

M3 - Article

AN - SCOPUS:85185000729

SN - 0020-0255

VL - 663

JO - Information Sciences

JF - Information Sciences

M1 - 120187

ER -

An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this