TY - GEN
T1 - Topic Modelling with Bag-of-concepts Document Representation
AU - Rashad, Metwally
AU - Reyad, Ibrahim
AU - Abdelfatah, Mohamed
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Traditionally, text mining tasks have been implemented by applying topic models like Latent Dirichlet Allocation (LDA). These topic models occasionally produce noisy words in illogical topics with a high probability. The problem is that topic model-based approaches are sparse, have binary weighting for terms, and lack semantic data. The topic model technique is combined with a document representation technique called Bag-of-Concepts to solve these problems. The bag-of-concepts approach groups word vectors from word2vec to create concepts, which are subsequently represented in document vectors by these concept cluster occurrences. The performance of document proximity preservation is taken into account by Bag-of-concepts when using the suitable weighting formula concept frequency-inverse document frequency. Latent Dirichlet Allocation is adjusted for use in document clustering and quality tasks for topics. The results are compared with different LDA frameworks on text documents, as well as the bag-of-concepts representation of documents. LDA with Bag-of-concepts representation generates more cohesive themes in comparison to the other techniques.
AB - Traditionally, text mining tasks have been implemented by applying topic models like Latent Dirichlet Allocation (LDA). These topic models occasionally produce noisy words in illogical topics with a high probability. The problem is that topic model-based approaches are sparse, have binary weighting for terms, and lack semantic data. The topic model technique is combined with a document representation technique called Bag-of-Concepts to solve these problems. The bag-of-concepts approach groups word vectors from word2vec to create concepts, which are subsequently represented in document vectors by these concept cluster occurrences. The performance of document proximity preservation is taken into account by Bag-of-concepts when using the suitable weighting formula concept frequency-inverse document frequency. Latent Dirichlet Allocation is adjusted for use in document clustering and quality tasks for topics. The results are compared with different LDA frameworks on text documents, as well as the bag-of-concepts representation of documents. LDA with Bag-of-concepts representation generates more cohesive themes in comparison to the other techniques.
KW - Bag-of-concepts
KW - Document Representation
KW - Latent Dirichlet Allocation
UR - https://www.scopus.com/pages/publications/85142920760
U2 - 10.1109/NILES56402.2022.9942412
DO - 10.1109/NILES56402.2022.9942412
M3 - Conference contribution
AN - SCOPUS:85142920760
T3 - NILES 2022 - 4th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings
SP - 216
EP - 220
BT - NILES 2022 - 4th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th Novel Intelligent and Leading Emerging Sciences Conference, NILES 2022
Y2 - 22 October 2022 through 24 October 2022
ER -