To Cluster or Not to Cluster: The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering

Sumaia Mohammed ALGhuribi; Shahrul Azman Mohd Noah; Mawal A. Mohammed; Sultan Noman Qasem; Belal Abdullah Hezam Murshed

doi:10.1109/ACCESS.2023.3270260

To Cluster or Not to Cluster: The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering

Sumaia Mohammed ALGhuribi
, Shahrul Azman Mohd Noah
, Mawal A. Mohammed
, Sultan Noman Qasem
, Belal Abdullah Hezam Murshed

Software Engineering

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Collaborative filtering (CF) is one of the most widely utilised approaches in recommendation techniques. It suggests items to users based on the ratings of other users who share their preferences. Thus, one of the aims of CF is to find reliable neighbours. Typically, CF produces a sparse user-item rating matrix, when relying only on the ratings to identify the precise neighbours, resulting in poor performance. User reviews can be essential in overcoming those situations because of the diverse elements available in reviews. The most popular element is aspects, which can provide a fine-grained analysis of users' behaviours, thus improving personalised recommendations. However, increasing the number of aspects also results in sparsity, therefore may deteriorate the recommendation performance. As a result, clustering of aspects may lessen this sparsity, but it is yet unclear how much this would affect the performance of CF systems. This study proposes a CF approach based on aspect clustering that addresses the above issue in terms of rating prediction. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Our approach extracts aspects and represents them using Google's pre-trained Word2vec model. Then, aspects are organised into clusters using the K-means clustering algorithm. Multi-dimensional Euclidean distance is used as a similarity measure for finding the appropriate neighbours and predicted ratings of unseen items are then made using the k NN algorithm. This study also identifies the number of aspects that significantly impacts CF performance. Experiments are carried out using a real large-scale dataset: the Amazon movie dataset. Evaluation is also performed by comparing CF performance of the proposed approach with three different baseline approaches. Results show that the proposed approach improves CF performance compared to other approaches in terms of three predictive accuracy metrics.

Original language	English
Pages (from-to)	41979-41994
Number of pages	16
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3270260
State	Published - 2023

Keywords

aspects
Collaborative filtering
Euclidean distance
K-means clustering
user reviews
Word2vec

Access to Document

10.1109/ACCESS.2023.3270260

Cite this

@article{fdaa8f419eb64db08aa5114b710f8b77,

title = "To Cluster or Not to Cluster: The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering",

abstract = "Collaborative filtering (CF) is one of the most widely utilised approaches in recommendation techniques. It suggests items to users based on the ratings of other users who share their preferences. Thus, one of the aims of CF is to find reliable neighbours. Typically, CF produces a sparse user-item rating matrix, when relying only on the ratings to identify the precise neighbours, resulting in poor performance. User reviews can be essential in overcoming those situations because of the diverse elements available in reviews. The most popular element is aspects, which can provide a fine-grained analysis of users' behaviours, thus improving personalised recommendations. However, increasing the number of aspects also results in sparsity, therefore may deteriorate the recommendation performance. As a result, clustering of aspects may lessen this sparsity, but it is yet unclear how much this would affect the performance of CF systems. This study proposes a CF approach based on aspect clustering that addresses the above issue in terms of rating prediction. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Our approach extracts aspects and represents them using Google's pre-trained Word2vec model. Then, aspects are organised into clusters using the K-means clustering algorithm. Multi-dimensional Euclidean distance is used as a similarity measure for finding the appropriate neighbours and predicted ratings of unseen items are then made using the k NN algorithm. This study also identifies the number of aspects that significantly impacts CF performance. Experiments are carried out using a real large-scale dataset: the Amazon movie dataset. Evaluation is also performed by comparing CF performance of the proposed approach with three different baseline approaches. Results show that the proposed approach improves CF performance compared to other approaches in terms of three predictive accuracy metrics.",

keywords = "aspects, Collaborative filtering, Euclidean distance, K-means clustering, user reviews, Word2vec",

author = "\{Mohammed ALGhuribi\}, Sumaia and Noah, \{Shahrul Azman Mohd\} and Mohammed, \{Mawal A.\} and Qasem, \{Sultan Noman\} and Murshed, \{Belal Abdullah Hezam\}",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

doi = "10.1109/ACCESS.2023.3270260",

language = "English",

volume = "11",

pages = "41979--41994",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - To Cluster or Not to Cluster

T2 - The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering

AU - Mohammed ALGhuribi, Sumaia

AU - Noah, Shahrul Azman Mohd

AU - Mohammed, Mawal A.

AU - Qasem, Sultan Noman

AU - Murshed, Belal Abdullah Hezam

PY - 2023

Y1 - 2023

N2 - Collaborative filtering (CF) is one of the most widely utilised approaches in recommendation techniques. It suggests items to users based on the ratings of other users who share their preferences. Thus, one of the aims of CF is to find reliable neighbours. Typically, CF produces a sparse user-item rating matrix, when relying only on the ratings to identify the precise neighbours, resulting in poor performance. User reviews can be essential in overcoming those situations because of the diverse elements available in reviews. The most popular element is aspects, which can provide a fine-grained analysis of users' behaviours, thus improving personalised recommendations. However, increasing the number of aspects also results in sparsity, therefore may deteriorate the recommendation performance. As a result, clustering of aspects may lessen this sparsity, but it is yet unclear how much this would affect the performance of CF systems. This study proposes a CF approach based on aspect clustering that addresses the above issue in terms of rating prediction. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Our approach extracts aspects and represents them using Google's pre-trained Word2vec model. Then, aspects are organised into clusters using the K-means clustering algorithm. Multi-dimensional Euclidean distance is used as a similarity measure for finding the appropriate neighbours and predicted ratings of unseen items are then made using the k NN algorithm. This study also identifies the number of aspects that significantly impacts CF performance. Experiments are carried out using a real large-scale dataset: the Amazon movie dataset. Evaluation is also performed by comparing CF performance of the proposed approach with three different baseline approaches. Results show that the proposed approach improves CF performance compared to other approaches in terms of three predictive accuracy metrics.

AB - Collaborative filtering (CF) is one of the most widely utilised approaches in recommendation techniques. It suggests items to users based on the ratings of other users who share their preferences. Thus, one of the aims of CF is to find reliable neighbours. Typically, CF produces a sparse user-item rating matrix, when relying only on the ratings to identify the precise neighbours, resulting in poor performance. User reviews can be essential in overcoming those situations because of the diverse elements available in reviews. The most popular element is aspects, which can provide a fine-grained analysis of users' behaviours, thus improving personalised recommendations. However, increasing the number of aspects also results in sparsity, therefore may deteriorate the recommendation performance. As a result, clustering of aspects may lessen this sparsity, but it is yet unclear how much this would affect the performance of CF systems. This study proposes a CF approach based on aspect clustering that addresses the above issue in terms of rating prediction. The approach aims to reduce the sparseness in the multi-criteria rating matrix by grouping aspects into clusters based on their semantic similarity, which will be less expensive and require less memory to discover the neighbourhood set. Our approach extracts aspects and represents them using Google's pre-trained Word2vec model. Then, aspects are organised into clusters using the K-means clustering algorithm. Multi-dimensional Euclidean distance is used as a similarity measure for finding the appropriate neighbours and predicted ratings of unseen items are then made using the k NN algorithm. This study also identifies the number of aspects that significantly impacts CF performance. Experiments are carried out using a real large-scale dataset: the Amazon movie dataset. Evaluation is also performed by comparing CF performance of the proposed approach with three different baseline approaches. Results show that the proposed approach improves CF performance compared to other approaches in terms of three predictive accuracy metrics.

KW - aspects

KW - Collaborative filtering

KW - Euclidean distance

KW - K-means clustering

KW - user reviews

KW - Word2vec

UR - https://www.scopus.com/pages/publications/85159686462

U2 - 10.1109/ACCESS.2023.3270260

DO - 10.1109/ACCESS.2023.3270260

M3 - Article

AN - SCOPUS:85159686462

SN - 2169-3536

VL - 11

SP - 41979

EP - 41994

JO - IEEE Access

JF - IEEE Access

ER -

To Cluster or Not to Cluster: The Impact of Clustering on the Performance of Aspect-Based Collaborative Filtering

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this