An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets

Sumaia Mohammed ALGhuribi; Shahrul Azman Mohd Noah; Mawal Mohammed

doi:10.7717/peerj-cs.1525

An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets

Sumaia Mohammed ALGhuribi
, Shahrul Azman Mohd Noah
, Mawal Mohammed

Software Engineering

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.

Original language	English
Article number	e1525
Journal	PeerJ Computer Science
Volume	9
DOIs	https://doi.org/10.7717/peerj-cs.1525
State	Published - 2023

Keywords

Collaborative filtering
Recommender systems
Sentiment analysis
User reviews

Access to Document

10.7717/peerj-cs.1525

Cite this

@article{196c75185754407d8d8f9a20f3948ec0,

title = "An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets",

abstract = "Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.",

keywords = "Collaborative filtering, Recommender systems, Sentiment analysis, User reviews",

author = "\{Mohammed ALGhuribi\}, Sumaia and Noah, \{Shahrul Azman Mohd\} and Mawal Mohammed",

note = "Publisher Copyright: {\textcopyright} 2023 AL-Ghuribi et al.",

year = "2023",

doi = "10.7717/peerj-cs.1525",

language = "English",

volume = "9",

journal = "PeerJ Computer Science",

issn = "2376-5992",

publisher = "PeerJ Inc.",

}

TY - JOUR

T1 - An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets

AU - Mohammed ALGhuribi, Sumaia

AU - Noah, Shahrul Azman Mohd

AU - Mohammed, Mawal

PY - 2023

Y1 - 2023

N2 - Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.

AB - Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.

KW - Collaborative filtering

KW - Recommender systems

KW - Sentiment analysis

KW - User reviews

UR - https://www.scopus.com/pages/publications/85171271712

U2 - 10.7717/peerj-cs.1525

DO - 10.7717/peerj-cs.1525

M3 - Article

AN - SCOPUS:85171271712

SN - 2376-5992

VL - 9

JO - PeerJ Computer Science

JF - PeerJ Computer Science

M1 - e1525

ER -

An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this