Corpus Periodization framework to periodize a temporally ordered text corpus

Abdulkareem Alsudais, Hovig Tchalian

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

Corpus Periodization is the process of segmenting a corpus into a set of smaller and discursively coherent periods while retaining its chronological order. Corpus Periodization is often used by social researchers in fields such as sociology and history to examine texts of topic-specific and temporally ordered corpora. Currently, there are no robust, automated, and easy-to-implement methods to periodize text corpora. In this paper, we propose a new framework that automates Corpus Periodization. This method relies on a simple statistical significance test that assesses the changes in the number of documents between neighboring segments and a document similarity measure that evaluates the similarity of texts between neighboring segments. We tested the proposed solution on a corpus consisting of 4,821 news articles containing the term "corporate governance." We were able to reduce the original number of annual segments from twenty-eight to seven or fewer relevant periods.

Original languageEnglish
StatePublished - 2016
Externally publishedYes
Event22nd Americas Conference on Information Systems: Surfing the IT Innovation Wave, AMCIS 2016 - San Diego, United States
Duration: 11 Aug 201614 Aug 2016

Conference

Conference22nd Americas Conference on Information Systems: Surfing the IT Innovation Wave, AMCIS 2016
Country/TerritoryUnited States
CitySan Diego
Period11/08/1614/08/16

Keywords

  • Corpus Periodization
  • Corpus analysis
  • Temporal text mining

Fingerprint

Dive into the research topics of 'Corpus Periodization framework to periodize a temporally ordered text corpus'. Together they form a unique fingerprint.

Cite this