Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets

Kazuaki Shima; Jinhua She; Yasunari Obuchi; Abdullah M. Iliyasu

doi:10.20965/jaciii.2022.p0513

Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets

Kazuaki Shima, Jinhua She, Yasunari Obuchi, Abdullah M. Iliyasu

Electrical Engineering

Tokyo University of Technology

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

This paper presents a method that uses a web questionnaire to create a corpus containing spontaneous utterances of natural ideas, which may contain grammatical mistakes. In an experimental implementation of the method, the subjects were informed that they were receiving nursing care from a person, and they were required to answer a web-based questionnaire in which their responses were recorded as speech utterances. Compared to the Wizard of Oz approach and interview-based corpus-creation methods, the presented method simplifies the collection of utterances. Furthermore, we conducted a two-fold assessment to verify the effectiveness of the presented method. First, the approach exhibited a significant reduction in workload compared to interview-style utterance collection. Second, we compared the variety of expressions collected when subjects were informed that they were talking to a person with those collected when they were informed that they were communicating with a nursing robot. The results indicate that, although the number of utterances was larger for a robot than for a person, in terms of other metrics such as time efficiency index, the total number of morphemes, the average number of morphemes per utterance, the number of unique morphemes, and coefficient of variation, the utterances were larger for a human speech target than for a robot.

Original language	English
Pages (from-to)	513-520
Number of pages	8
Journal	Journal of Advanced Computational Intelligence and Intelligent Informatics
Volume	26
Issue number	4
DOIs	https://doi.org/10.20965/jaciii.2022.p0513
State	Published - Jul 2022

Keywords

corpus
morpheme
natural utterance
spontaneity
web questionnaire

Access to Document

10.20965/jaciii.2022.p0513

Cite this

@article{1261453db8e84a56ba495fe997c0288d,

title = "Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets",

abstract = "This paper presents a method that uses a web questionnaire to create a corpus containing spontaneous utterances of natural ideas, which may contain grammatical mistakes. In an experimental implementation of the method, the subjects were informed that they were receiving nursing care from a person, and they were required to answer a web-based questionnaire in which their responses were recorded as speech utterances. Compared to the Wizard of Oz approach and interview-based corpus-creation methods, the presented method simplifies the collection of utterances. Furthermore, we conducted a two-fold assessment to verify the effectiveness of the presented method. First, the approach exhibited a significant reduction in workload compared to interview-style utterance collection. Second, we compared the variety of expressions collected when subjects were informed that they were talking to a person with those collected when they were informed that they were communicating with a nursing robot. The results indicate that, although the number of utterances was larger for a robot than for a person, in terms of other metrics such as time efficiency index, the total number of morphemes, the average number of morphemes per utterance, the number of unique morphemes, and coefficient of variation, the utterances were larger for a human speech target than for a robot.",

keywords = "corpus, morpheme, natural utterance, spontaneity, web questionnaire",

author = "Kazuaki Shima and Jinhua She and Yasunari Obuchi and Iliyasu, \{Abdullah M.\}",

note = "Publisher Copyright: {\textcopyright} Fuji Technology Press Ltd.",

year = "2022",

month = jul,

doi = "10.20965/jaciii.2022.p0513",

language = "English",

volume = "26",

pages = "513--520",

journal = "Journal of Advanced Computational Intelligence and Intelligent Informatics",

issn = "1343-0130",

publisher = "Fuji Technology Press",

number = "4",

}

TY - JOUR

T1 - Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets

AU - Shima, Kazuaki

AU - She, Jinhua

AU - Obuchi, Yasunari

AU - Iliyasu, Abdullah M.

N1 - Publisher Copyright: © Fuji Technology Press Ltd.

PY - 2022/7

Y1 - 2022/7

N2 - This paper presents a method that uses a web questionnaire to create a corpus containing spontaneous utterances of natural ideas, which may contain grammatical mistakes. In an experimental implementation of the method, the subjects were informed that they were receiving nursing care from a person, and they were required to answer a web-based questionnaire in which their responses were recorded as speech utterances. Compared to the Wizard of Oz approach and interview-based corpus-creation methods, the presented method simplifies the collection of utterances. Furthermore, we conducted a two-fold assessment to verify the effectiveness of the presented method. First, the approach exhibited a significant reduction in workload compared to interview-style utterance collection. Second, we compared the variety of expressions collected when subjects were informed that they were talking to a person with those collected when they were informed that they were communicating with a nursing robot. The results indicate that, although the number of utterances was larger for a robot than for a person, in terms of other metrics such as time efficiency index, the total number of morphemes, the average number of morphemes per utterance, the number of unique morphemes, and coefficient of variation, the utterances were larger for a human speech target than for a robot.

AB - This paper presents a method that uses a web questionnaire to create a corpus containing spontaneous utterances of natural ideas, which may contain grammatical mistakes. In an experimental implementation of the method, the subjects were informed that they were receiving nursing care from a person, and they were required to answer a web-based questionnaire in which their responses were recorded as speech utterances. Compared to the Wizard of Oz approach and interview-based corpus-creation methods, the presented method simplifies the collection of utterances. Furthermore, we conducted a two-fold assessment to verify the effectiveness of the presented method. First, the approach exhibited a significant reduction in workload compared to interview-style utterance collection. Second, we compared the variety of expressions collected when subjects were informed that they were talking to a person with those collected when they were informed that they were communicating with a nursing robot. The results indicate that, although the number of utterances was larger for a robot than for a person, in terms of other metrics such as time efficiency index, the total number of morphemes, the average number of morphemes per utterance, the number of unique morphemes, and coefficient of variation, the utterances were larger for a human speech target than for a robot.

KW - corpus

KW - morpheme

KW - natural utterance

KW - spontaneity

KW - web questionnaire

UR - http://www.scopus.com/inward/record.url?scp=85135255219&partnerID=8YFLogxK

U2 - 10.20965/jaciii.2022.p0513

DO - 10.20965/jaciii.2022.p0513

M3 - Article

AN - SCOPUS:85135255219

SN - 1343-0130

VL - 26

SP - 513

EP - 520

JO - Journal of Advanced Computational Intelligence and Intelligent Informatics

JF - Journal of Advanced Computational Intelligence and Intelligent Informatics

IS - 4

ER -

Web-Questionnaire-Based Corpus Creation Under Assumption of Human as Speech Targets

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this