A new hybrid schemes combining ontology and clustering for text documents

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Data mining is a process of analyzing data from different perspectives and summarizing it into valuable information. It consist of two activities such as clustering and classification. It mainly works with numeric data, text data and the web data. Text-based algorithms have problems when dealing with different languages (synonyms, homonyms). Also, web pages contain other forms of information except text, such as images or multimedia. As a consequence, hybrid document clustering approaches have been proposed in order to combine the advantages and limit the disadvantages of the existing approaches. The main motivation behind ontology is that different people have different needs with regard to the clustering of texts. The hybrid schemes are developed using ontology and the frequent item clustering of various algorithms Ontology Based Apriori Based Clustering, Ontology based FP-Growth Based Clustering, Ontology based FP-Bonsai Clustering Algorithm have been proposed to resolve the disadvantages of existing approaches. The performance of this enhanced document clustering algorithm was tested vigorously using different datasets with performance measures to show the efficiency in clustering. Hence Ontology based FP-Bonsai Clustering Algorithm (OFPBC) shows significant improvement in terms of purity of clustering. The result shows that the datasets namely Reuters 21578,20 new Group and TDT2 which results the accuracy 0.840, 0.817 and 0.847 in OFPBC, respectively.

Original languageEnglish
Pages (from-to)2447-2453
Number of pages7
JournalInformation Technology Journal
Volume12
Issue number12
DOIs
StatePublished - 2013
Externally publishedYes

Keywords

  • Apriori algorithm
  • Document clustering
  • FP-growth algorithm
  • Ontology

Fingerprint

Dive into the research topics of 'A new hybrid schemes combining ontology and clustering for text documents'. Together they form a unique fingerprint.

Cite this