TY - JOUR
T1 - A novel efficient Rank-Revealing QR matrix and Schur decomposition method for big data mining and clustering (RRQR-SDM)
AU - Paulraj, D.
AU - Mohamed Junaid, K. A.
AU - Sethukarasi, T.
AU - Vigilson Prem, M.
AU - Neelakandan, S.
AU - Alhudhaif, Adi
AU - Alnaim, Norah
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2024/2
Y1 - 2024/2
N2 - Big Data is an emerging technology with enormous potential to develop business and its administration. Due to the enormous volume, efficient data mining and clustering methods are crucial to extracting meaningful insights and patterns from large-scale datasets. Problems may arise from the need to analyze, capture, share, store, and visualize the data. Several methods have already been proposed for mining knowledge from big data. It is practically inefficient or impossible to handle these massive data using the proposed methods in a single machine because big data are frequently acquired from dispersed locations and stored on several machines. Matrix decomposition is one of the critical strategies to retrieve knowledge from diverse, noisy, huge data generated by modern applications and stored in dispersed locations. This study proposes a novel approach called the Rank-Revealing QR Matrix and Schur Decomposition Method (RRQR-SDM) specifically designed for big data mining and clustering tasks. The RRQR-SDM is designed to reveal the rank of the data matrix in a computationally efficient manner by using a modified QR decomposition, eliminating the need for expensive Singular Value Decomposition (SVD) computations. The proposed RRQR-SDM method offers several advantages over existing approaches. Firstly, exploiting the inherent low-rank structure reduces the computational complexity associated with large-scale datasets. By revealing the rank of the input matrix, it enables dimensionality reduction and efficient data compression. Secondly, the Schur decomposition enhances the interpretability of the data by providing a clear separation between the relevant and irrelevant components. This feature makes the RRQR-SDM method particularly suitable for data mining and clustering tasks where identifying the most significant features is essential. To evaluate the performance of the RRQR-SDM method, extensive experiments were conducted on various big data datasets. The results demonstrate that the proposed method outperforms state-of-the-art computational efficiency and clustering accuracy techniques.
AB - Big Data is an emerging technology with enormous potential to develop business and its administration. Due to the enormous volume, efficient data mining and clustering methods are crucial to extracting meaningful insights and patterns from large-scale datasets. Problems may arise from the need to analyze, capture, share, store, and visualize the data. Several methods have already been proposed for mining knowledge from big data. It is practically inefficient or impossible to handle these massive data using the proposed methods in a single machine because big data are frequently acquired from dispersed locations and stored on several machines. Matrix decomposition is one of the critical strategies to retrieve knowledge from diverse, noisy, huge data generated by modern applications and stored in dispersed locations. This study proposes a novel approach called the Rank-Revealing QR Matrix and Schur Decomposition Method (RRQR-SDM) specifically designed for big data mining and clustering tasks. The RRQR-SDM is designed to reveal the rank of the data matrix in a computationally efficient manner by using a modified QR decomposition, eliminating the need for expensive Singular Value Decomposition (SVD) computations. The proposed RRQR-SDM method offers several advantages over existing approaches. Firstly, exploiting the inherent low-rank structure reduces the computational complexity associated with large-scale datasets. By revealing the rank of the input matrix, it enables dimensionality reduction and efficient data compression. Secondly, the Schur decomposition enhances the interpretability of the data by providing a clear separation between the relevant and irrelevant components. This feature makes the RRQR-SDM method particularly suitable for data mining and clustering tasks where identifying the most significant features is essential. To evaluate the performance of the RRQR-SDM method, extensive experiments were conducted on various big data datasets. The results demonstrate that the proposed method outperforms state-of-the-art computational efficiency and clustering accuracy techniques.
KW - Big data
KW - Clustering
KW - Data mining
KW - Decomposition
KW - Noise
KW - Rank-Revealing matrix
KW - Schur decomposition
UR - http://www.scopus.com/inward/record.url?scp=85178127955&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.119957
DO - 10.1016/j.ins.2023.119957
M3 - Article
AN - SCOPUS:85178127955
SN - 0020-0255
VL - 657
JO - Information Sciences
JF - Information Sciences
M1 - 119957
ER -