Improving the identification of the discourse function of news article paragraphs

Deya Banisakher, Victor W.H. Yarlott, Mohammed Aldawsari, Napthali D. Rishe, Mark A. Finlayson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Identifying the discourse structure of documents is an important task in understanding written text. Building on prior work, we demonstrate an improved approach to automatically identifying the discourse function of paragraphs in news articles. We start with the hierarchical theory of news discourse developed by van Dijk (1988) which proposes how paragraphs function within news articles. This discourse information is a level intermediate between phrase- or sentence-sized discourse segments and document genre, characterizing how individual paragraphs convey information about the events in the storyline of the article. Specifically, the theory categorizes the relationships between narrated events and (1) the overall storyline (such as MAIN EVENTS, BACKGROUND, or CONSEQUENCES) as well as (2) commentary (such as VERBAL REACTIONS and EVALUATIONS). We trained and tested a linear chain conditional random field (CRF) with new features to model van Dijk’s labels and compared it against several machine learning models presented in previous work. Our model significantly outperformed all baselines and prior approaches, achieving an average of 0.71 F1 score which represents a 31.5% improvement over the previously best-performing support vector machine model.

Original languageEnglish
Title of host publicationACL 2020 - Narrative Understanding, Storylines, and Events, Proceedings of the 1st Joint Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages17-25
Number of pages9
ISBN (Electronic)9781952148132
StatePublished - 2020
Externally publishedYes
Event1st Joint Workshop on Narrative Understanding, Storylines, and Events, NUSE 2020 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States
Duration: 9 Jul 2020 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference1st Joint Workshop on Narrative Understanding, Storylines, and Events, NUSE 2020 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
Country/TerritoryUnited States
CityVirtual, Online
Period9/07/20 → …

Fingerprint

Dive into the research topics of 'Improving the identification of the discourse function of news article paragraphs'. Together they form a unique fingerprint.

Cite this