Investigating the effect of classroom-based feedback on speaking assessment: a multifaceted Rasch analysis

Houman Bijani; Bahareh Hashempour; Khaled Ahmed Abdel Al Ibrahim; Salim Said Bani Orabah; Tahereh Heydarnejad

doi:10.1186/s40468-022-00176-3

Investigating the effect of classroom-based feedback on speaking assessment: a multifaceted Rasch analysis

Houman Bijani
, Bahareh Hashempour
, Khaled Ahmed Abdel Al Ibrahim
, Salim Said Bani Orabah
, Tahereh Heydarnejad

Psychology

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently, multifaceted Rasch measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency in scoring; however, no research has incorporated the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production produced by 300 test-takers in three phases. The results indicated that training can lead to more degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it will not lead the raters into total unanimity, except for making them more self-consistent. Even though rater training might result in higher internal consistency among raters, it cannot simply eradicate individual differences related to their characteristics. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long term after training; thus, it requires ongoing training throughout the rating period letting raters regain consistency.

Original language	English
Article number	26
Journal	Language Testing in Asia
Volume	12
Issue number	1
DOIs	https://doi.org/10.1186/s40468-022-00176-3
State	Published - Dec 2022

Keywords

Bias
Interrater consistency
Intrarater consistency
Multifaceted Rasch measurement (MFRM)
Rater training
Severity/leniency

Access to Document

10.1186/s40468-022-00176-3

Cite this

@article{9176208e5abf4fc1924a834b83512cbf,

title = "Investigating the effect of classroom-based feedback on speaking assessment: a multifaceted Rasch analysis",

abstract = "Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently, multifaceted Rasch measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency in scoring; however, no research has incorporated the facets of test takers{\textquoteright} ability, raters{\textquoteright} severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production produced by 300 test-takers in three phases. The results indicated that training can lead to more degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it will not lead the raters into total unanimity, except for making them more self-consistent. Even though rater training might result in higher internal consistency among raters, it cannot simply eradicate individual differences related to their characteristics. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long term after training; thus, it requires ongoing training throughout the rating period letting raters regain consistency.",

keywords = "Bias, Interrater consistency, Intrarater consistency, Multifaceted Rasch measurement (MFRM), Rater training, Severity/leniency",

author = "Houman Bijani and Bahareh Hashempour and Ibrahim, \{Khaled Ahmed Abdel Al\} and Orabah, \{Salim Said Bani\} and Tahereh Heydarnejad",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

doi = "10.1186/s40468-022-00176-3",

language = "English",

volume = "12",

journal = "Language Testing in Asia",

issn = "2229-0443",

publisher = "SpringerOpen",

number = "1",

}

TY - JOUR

T1 - Investigating the effect of classroom-based feedback on speaking assessment

T2 - a multifaceted Rasch analysis

AU - Bijani, Houman

AU - Hashempour, Bahareh

AU - Ibrahim, Khaled Ahmed Abdel Al

AU - Orabah, Salim Said Bani

AU - Heydarnejad, Tahereh

PY - 2022/12

Y1 - 2022/12

N2 - Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently, multifaceted Rasch measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency in scoring; however, no research has incorporated the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production produced by 300 test-takers in three phases. The results indicated that training can lead to more degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it will not lead the raters into total unanimity, except for making them more self-consistent. Even though rater training might result in higher internal consistency among raters, it cannot simply eradicate individual differences related to their characteristics. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long term after training; thus, it requires ongoing training throughout the rating period letting raters regain consistency.

AB - Due to subjectivity in oral assessment, much concentration has been put on obtaining a satisfactory measure of consistency among raters. However, the process for obtaining more consistency might not result in valid decisions. One matter that is at the core of both reliability and validity in oral assessment is rater training. Recently, multifaceted Rasch measurement (MFRM) has been adopted to address the problem of rater bias and inconsistency in scoring; however, no research has incorporated the facets of test takers’ ability, raters’ severity, task difficulty, group expertise, scale criterion category, and test version together in a piece of research along with their two-sided impacts. Moreover, little research has investigated how long rater training effects last. Consequently, this study explored the influence of the training program and feedback by having 20 raters score the oral production produced by 300 test-takers in three phases. The results indicated that training can lead to more degrees of interrater reliability and diminished measures of severity/leniency, and biasedness. However, it will not lead the raters into total unanimity, except for making them more self-consistent. Even though rater training might result in higher internal consistency among raters, it cannot simply eradicate individual differences related to their characteristics. That is, experienced raters, due to their idiosyncratic characteristics, did not benefit as much as inexperienced ones. This study also showed that the outcome of training might not endure in long term after training; thus, it requires ongoing training throughout the rating period letting raters regain consistency.

KW - Bias

KW - Interrater consistency

KW - Intrarater consistency

KW - Multifaceted Rasch measurement (MFRM)

KW - Rater training

KW - Severity/leniency

UR - https://www.scopus.com/pages/publications/85137600345

U2 - 10.1186/s40468-022-00176-3

DO - 10.1186/s40468-022-00176-3

M3 - Article

AN - SCOPUS:85137600345

SN - 2229-0443

VL - 12

JO - Language Testing in Asia

JF - Language Testing in Asia

IS - 1

M1 - 26

ER -

Investigating the effect of classroom-based feedback on speaking assessment: a multifaceted Rasch analysis

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this