Clustering prominent named entities in topic-specific text corpora

Abdulkareem Alsudais; Hovig Tchalian

Clustering prominent named entities in topic-specific text corpora

Abdulkareem Alsudais
, Hovig Tchalian

Information Systems

Claremont Graduate University

Research output: Contribution to conference › Paper › peer-review

2 Scopus citations

Abstract

Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.

Original language	English
State	Published - 2019
Event	25th Americas Conference on Information Systems, AMCIS 2019 - Cancun, Mexico Duration: 15 Aug 2019 → 17 Aug 2019

Conference

Conference	25th Americas Conference on Information Systems, AMCIS 2019
Country/Territory	Mexico
City	Cancun
Period	15/08/19 → 17/08/19

Keywords

Computational social science
Named entity recognition
Natural language processing

Cite this

@conference{db4b410fb3354a11b1467c056c4bf481,

title = "Clustering prominent named entities in topic-specific text corpora",

abstract = "Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.",

keywords = "Computational social science, Named entity recognition, Natural language processing",

author = "Abdulkareem Alsudais and Hovig Tchalian",

note = "Publisher Copyright: {\textcopyright} 2019 Association for Information Systems. All rights reserved.; 25th Americas Conference on Information Systems, AMCIS 2019 ; Conference date: 15-08-2019 Through 17-08-2019",

year = "2019",

language = "English",

}

TY - CONF

T1 - Clustering prominent named entities in topic-specific text corpora

AU - Alsudais, Abdulkareem

AU - Tchalian, Hovig

PY - 2019

Y1 - 2019

N2 - Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.

AB - Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.

KW - Computational social science

KW - Named entity recognition

KW - Natural language processing

UR - https://www.scopus.com/pages/publications/85084023252

M3 - Paper

AN - SCOPUS:85084023252

T2 - 25th Americas Conference on Information Systems, AMCIS 2019

Y2 - 15 August 2019 through 17 August 2019

ER -

Clustering prominent named entities in topic-specific text corpora

Abstract

Conference

Keywords

Other files and links

Fingerprint

Cite this