TY - JOUR
T1 - An adaptive highly improving the accuracy of clustering algorithm based on kernel density estimation
AU - Pu, Yue
AU - Yao, Wenbin
AU - Li, Xiaoyong
AU - Alhudhaif, Adi
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/3
Y1 - 2024/3
N2 - Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05% compared to the original data and by 6.22% over HIAC.
AB - Highly Improving the Accuracy of Clustering (HIAC) algorithm is designed to enhance clustering accuracy by introducing a gravitational force between data objects, drawing them closer together, and employing a decision graph to establish a weight threshold for differentiating neighbor classes and outliers. Despite its strengths, HIAC faces two shortcomings: (1) its inability to generate effective decision graphs for small-scale datasets and (2) the non-smooth probability curve within the decision graph, making threshold determination by visual inspection both difficult and imprecise. This study presents an improved adaptive algorithm based on Kernel Density Estimation (KDE-AHIAC). This approach automatically selects the bandwidth based on the density and distribution of the data, utilizing the kernel density function to create a decision graph that applies to any dataset. For threshold selection, we introduce an adaptive calculation method that leverages the smoothness and continuity of the kernel density curve, replacing the observational approach. Additionally, we incorporate an outlier test model using Analysis of Similarity (ANOSIM) to avert misclassification of valid samples as outliers. Through comprehensive experimentation, we tested KDE-AHIAC and found that it offers notable improvements over HIAC. KDE-AHIAC enhances the clustering accuracy of the dataset by 66.05% compared to the original data and by 6.22% over HIAC.
KW - Adaptive KDE-decision graph
KW - Clustering algorithm accuracy
KW - Kernel density estimation
KW - Outliers test
UR - http://www.scopus.com/inward/record.url?scp=85185000729&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2024.120187
DO - 10.1016/j.ins.2024.120187
M3 - Article
AN - SCOPUS:85185000729
SN - 0020-0255
VL - 663
JO - Information Sciences
JF - Information Sciences
M1 - 120187
ER -