Abstract
Corpus Periodization is the process of segmenting a corpus into a set of smaller and discursively coherent periods while retaining its chronological order. Corpus Periodization is often used by social researchers in fields such as sociology and history to examine texts of topic-specific and temporally ordered corpora. Currently, there are no robust, automated, and easy-to-implement methods to periodize text corpora. In this paper, we propose a new framework that automates Corpus Periodization. This method relies on a simple statistical significance test that assesses the changes in the number of documents between neighboring segments and a document similarity measure that evaluates the similarity of texts between neighboring segments. We tested the proposed solution on a corpus consisting of 4,821 news articles containing the term "corporate governance." We were able to reduce the original number of annual segments from twenty-eight to seven or fewer relevant periods.
Original language | English |
---|---|
State | Published - 2016 |
Externally published | Yes |
Event | 22nd Americas Conference on Information Systems: Surfing the IT Innovation Wave, AMCIS 2016 - San Diego, United States Duration: 11 Aug 2016 → 14 Aug 2016 |
Conference
Conference | 22nd Americas Conference on Information Systems: Surfing the IT Innovation Wave, AMCIS 2016 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 11/08/16 → 14/08/16 |
Keywords
- Corpus Periodization
- Corpus analysis
- Temporal text mining