Clustering prominent named entities in topic-specific text corpora

Abdulkareem Alsudais, Hovig Tchalian

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

Named Entity Recognition (NER) refers to the computational task of identifying real-world entities in text documents. A research challenge is to use computational techniques to identify and utilize these entities to improve several NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and word embeddings models. Semantic similarity scores generated using word embeddings models for named entities are used to cluster similar entities of the people and organizations types. A human judge evaluated ten variations of the method after it was run on a corpus that consists of 4,821 articles on a specific topic. The performance of the method was measured using three quantitative measures. The results of these three metrics demonstrate that the method is effective in clustering semantically similar named entities.

Original languageEnglish
StatePublished - 2019
Event25th Americas Conference on Information Systems, AMCIS 2019 - Cancun, Mexico
Duration: 15 Aug 201917 Aug 2019

Conference

Conference25th Americas Conference on Information Systems, AMCIS 2019
Country/TerritoryMexico
CityCancun
Period15/08/1917/08/19

Keywords

  • Computational social science
  • Named entity recognition
  • Natural language processing

Fingerprint

Dive into the research topics of 'Clustering prominent named entities in topic-specific text corpora'. Together they form a unique fingerprint.

Cite this