A new hybrid schemes combining ontology and clustering for text documents

S. C. Pumtha; V. Thavavel; M. Punithavalli

doi:10.3923/iti.2013.2447.2453

A new hybrid schemes combining ontology and clustering for text documents

S. C. Pumtha
, V. Thavavel
, M. Punithavalli

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Data mining is a process of analyzing data from different perspectives and summarizing it into valuable information. It consist of two activities such as clustering and classification. It mainly works with numeric data, text data and the web data. Text-based algorithms have problems when dealing with different languages (synonyms, homonyms). Also, web pages contain other forms of information except text, such as images or multimedia. As a consequence, hybrid document clustering approaches have been proposed in order to combine the advantages and limit the disadvantages of the existing approaches. The main motivation behind ontology is that different people have different needs with regard to the clustering of texts. The hybrid schemes are developed using ontology and the frequent item clustering of various algorithms Ontology Based Apriori Based Clustering, Ontology based FP-Growth Based Clustering, Ontology based FP-Bonsai Clustering Algorithm have been proposed to resolve the disadvantages of existing approaches. The performance of this enhanced document clustering algorithm was tested vigorously using different datasets with performance measures to show the efficiency in clustering. Hence Ontology based FP-Bonsai Clustering Algorithm (OFPBC) shows significant improvement in terms of purity of clustering. The result shows that the datasets namely Reuters 21578,20 new Group and TDT2 which results the accuracy 0.840, 0.817 and 0.847 in OFPBC, respectively.

Original language	English
Pages (from-to)	2447-2453
Number of pages	7
Journal	Information Technology Journal
Volume	12
Issue number	12
DOIs	https://doi.org/10.3923/iti.2013.2447.2453
State	Published - 2013
Externally published	Yes

Keywords

Apriori algorithm
Document clustering
FP-growth algorithm
Ontology

Access to Document

10.3923/iti.2013.2447.2453

Cite this

@article{ff1635067f0346d99735d3bfc8360e4b,

title = "A new hybrid schemes combining ontology and clustering for text documents",

abstract = "Data mining is a process of analyzing data from different perspectives and summarizing it into valuable information. It consist of two activities such as clustering and classification. It mainly works with numeric data, text data and the web data. Text-based algorithms have problems when dealing with different languages (synonyms, homonyms). Also, web pages contain other forms of information except text, such as images or multimedia. As a consequence, hybrid document clustering approaches have been proposed in order to combine the advantages and limit the disadvantages of the existing approaches. The main motivation behind ontology is that different people have different needs with regard to the clustering of texts. The hybrid schemes are developed using ontology and the frequent item clustering of various algorithms Ontology Based Apriori Based Clustering, Ontology based FP-Growth Based Clustering, Ontology based FP-Bonsai Clustering Algorithm have been proposed to resolve the disadvantages of existing approaches. The performance of this enhanced document clustering algorithm was tested vigorously using different datasets with performance measures to show the efficiency in clustering. Hence Ontology based FP-Bonsai Clustering Algorithm (OFPBC) shows significant improvement in terms of purity of clustering. The result shows that the datasets namely Reuters 21578,20 new Group and TDT2 which results the accuracy 0.840, 0.817 and 0.847 in OFPBC, respectively.",

keywords = "Apriori algorithm, Document clustering, FP-growth algorithm, Ontology",

author = "Pumtha, \{S. C.\} and V. Thavavel and M. Punithavalli",

year = "2013",

doi = "10.3923/iti.2013.2447.2453",

language = "English",

volume = "12",

pages = "2447--2453",

journal = "Information Technology Journal",

issn = "1812-5638",

publisher = "Asian Network for Scientific Information",

number = "12",

}

TY - JOUR

T1 - A new hybrid schemes combining ontology and clustering for text documents

AU - Pumtha, S. C.

AU - Thavavel, V.

AU - Punithavalli, M.

PY - 2013

Y1 - 2013

N2 - Data mining is a process of analyzing data from different perspectives and summarizing it into valuable information. It consist of two activities such as clustering and classification. It mainly works with numeric data, text data and the web data. Text-based algorithms have problems when dealing with different languages (synonyms, homonyms). Also, web pages contain other forms of information except text, such as images or multimedia. As a consequence, hybrid document clustering approaches have been proposed in order to combine the advantages and limit the disadvantages of the existing approaches. The main motivation behind ontology is that different people have different needs with regard to the clustering of texts. The hybrid schemes are developed using ontology and the frequent item clustering of various algorithms Ontology Based Apriori Based Clustering, Ontology based FP-Growth Based Clustering, Ontology based FP-Bonsai Clustering Algorithm have been proposed to resolve the disadvantages of existing approaches. The performance of this enhanced document clustering algorithm was tested vigorously using different datasets with performance measures to show the efficiency in clustering. Hence Ontology based FP-Bonsai Clustering Algorithm (OFPBC) shows significant improvement in terms of purity of clustering. The result shows that the datasets namely Reuters 21578,20 new Group and TDT2 which results the accuracy 0.840, 0.817 and 0.847 in OFPBC, respectively.

AB - Data mining is a process of analyzing data from different perspectives and summarizing it into valuable information. It consist of two activities such as clustering and classification. It mainly works with numeric data, text data and the web data. Text-based algorithms have problems when dealing with different languages (synonyms, homonyms). Also, web pages contain other forms of information except text, such as images or multimedia. As a consequence, hybrid document clustering approaches have been proposed in order to combine the advantages and limit the disadvantages of the existing approaches. The main motivation behind ontology is that different people have different needs with regard to the clustering of texts. The hybrid schemes are developed using ontology and the frequent item clustering of various algorithms Ontology Based Apriori Based Clustering, Ontology based FP-Growth Based Clustering, Ontology based FP-Bonsai Clustering Algorithm have been proposed to resolve the disadvantages of existing approaches. The performance of this enhanced document clustering algorithm was tested vigorously using different datasets with performance measures to show the efficiency in clustering. Hence Ontology based FP-Bonsai Clustering Algorithm (OFPBC) shows significant improvement in terms of purity of clustering. The result shows that the datasets namely Reuters 21578,20 new Group and TDT2 which results the accuracy 0.840, 0.817 and 0.847 in OFPBC, respectively.

KW - Apriori algorithm

KW - Document clustering

KW - FP-growth algorithm

KW - Ontology

UR - https://www.scopus.com/pages/publications/84882439989

U2 - 10.3923/iti.2013.2447.2453

DO - 10.3923/iti.2013.2447.2453

M3 - Article

AN - SCOPUS:84882439989

SN - 1812-5638

VL - 12

SP - 2447

EP - 2453

JO - Information Technology Journal

JF - Information Technology Journal

IS - 12

ER -

A new hybrid schemes combining ontology and clustering for text documents

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this