Topic Modelling with Bag-of-concepts Document Representation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Traditionally, text mining tasks have been implemented by applying topic models like Latent Dirichlet Allocation (LDA). These topic models occasionally produce noisy words in illogical topics with a high probability. The problem is that topic model-based approaches are sparse, have binary weighting for terms, and lack semantic data. The topic model technique is combined with a document representation technique called Bag-of-Concepts to solve these problems. The bag-of-concepts approach groups word vectors from word2vec to create concepts, which are subsequently represented in document vectors by these concept cluster occurrences. The performance of document proximity preservation is taken into account by Bag-of-concepts when using the suitable weighting formula concept frequency-inverse document frequency. Latent Dirichlet Allocation is adjusted for use in document clustering and quality tasks for topics. The results are compared with different LDA frameworks on text documents, as well as the bag-of-concepts representation of documents. LDA with Bag-of-concepts representation generates more cohesive themes in comparison to the other techniques.

Original languageEnglish
Title of host publicationNILES 2022 - 4th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages216-220
Number of pages5
ISBN (Electronic)9781665452410
DOIs
StatePublished - 2022
Externally publishedYes
Event4th Novel Intelligent and Leading Emerging Sciences Conference, NILES 2022 - Giza, Egypt
Duration: 22 Oct 202224 Oct 2022

Publication series

NameNILES 2022 - 4th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings

Conference

Conference4th Novel Intelligent and Leading Emerging Sciences Conference, NILES 2022
Country/TerritoryEgypt
CityGiza
Period22/10/2224/10/22

Keywords

  • Bag-of-concepts
  • Document Representation
  • Latent Dirichlet Allocation

Fingerprint

Dive into the research topics of 'Topic Modelling with Bag-of-concepts Document Representation'. Together they form a unique fingerprint.

Cite this