Document Type : Research Paper

Author

Islamic Azad University, Zanjan Branch, Zanjan, Iran

Abstract

The current popularity of second/foreign language oral performance assessment has led to a growing interest in tasks as a tool for assessing language learners’ oral abilities. However, most oral assessment studies so far have investigated tasks separately; therefore, any possible relationship among them has remained unexplored. Twenty English as a foreign language (EFL) teachers rated the oral performances produced by 200 EFL learners before and after a rater training program using description, narration, summarizing, role-play, and exposition tasks. The findings demonstrated the usefulness of multifaceted Rasch measurement (MFRM) in detecting rater effects and demonstrating the consistency and variability in rater behavior aiming to evaluate the quality of rating. The outcomes indicated that test difficulty identification is complex, difficult, and at the same time multidimensional. On the other hand test takers’ ability is a more determining factor in their score variation than other intervening variables. The outcomes displayed no relationship between task difficulty and raters’ interrater reliability measures. The findings suggest that tasks have various effects on oral performance assessment tests and most importantly, performance conditions in estimating the oral ability of test takers. Since various groups of raters have biases to different tasks in use, the findings indicated that training programs can reduce raters’ biases and increase their consistency measures. The findings imply that decision makers had better not be concerned about raters’ expertise in oral assessment, whereas they should establish better rater training programs for raters to increase assessment reliability.

Keywords

Ahamadian, M.J. & Tavakoli, M. (2011). The effects of simultaneous use of careful online   planning and task repetition on accuracy, complexity, and fluency in EFL learners’ oral production. Language Teaching Research, 15(1), 35-59.
Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115.
Bijani, H. (2010). Raters’ perception and expertise in evaluating second language compositions. The Journal of Applied Linguistics, 3(2), 69-89.
Cohen, L., Manion, L. & Morrison, K. (2007). Research methods in education. London: Routledge.
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.
Eckes, T. (2015). Introduction to many-facet Rasch measurement. Frankfurt: Peter Lang Edition.
Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19(4), 347-368.
Fulcher, G., Davidson, F., & Kamp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing, 28(1), 5-29.
Gardner, R. (1992). Task classification, methodology and task selection. Unpublished manuscript, Department of Linguistics and Applied Linguistics,University of Melbourne.
In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341-366.
Jeong, H., & Hashizume, H. (2011). Testing second language oral proficiency in direct and semidirect settings: A social-cognitive neuroscience perspective. Language Learning, 61(3), 675-699.
Khabbazbashi, N. (2017). Topic and background knowledge effects on performance in speaking assessment. Language Testing, 34(1), 23-48.
Kuiken, F., & Vedder, I. (2014). Raters’ decisions, rating procedures and rating scales. Language Testing, 31(3), 279-284.
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319-340.
Leaper, D. A., & Riazi, M. (2014). The influence of prompt on group oral tests. Language Testing, 31(2), 177-204.
Ling, G., Mollaun, P., & Xi, X. (2014). A study on the impact of fatigue on human raters when scoring speaking responses. Language Testing, 31(4), 479-499.
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.
May, L. (2009). Co-constructed interaction in a paired speaking test: The rater’s perspective. Language Testing, 26(3), 397-421.
Nakatsuhara, F. (2011). Effect of test-taker characteristics and the number of participants in group oral tests. Language Testing, 28(4), 483-508.
O’Sullivan, B. (2002). Learner acquaintanceship and oral proficiency test pair task performance. Language Testing, 19(3), 277-295.
Robinson, P. (2001). Task complexity, task difficulty and task production: Exploring interactions in a componential framework. Applied Linguistics, 21(1), 27-57.
Skehan, P (1998). A cognitive approach to language learning. Oxford: Oxford University Press.
Skehan, P., & Foster, P. (1999). The influence of task structure and processing conditions on narrative retellings. Language Learning, 49(1), 93-120.
Steiger, J. H., (1980). Test for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245-251.
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3-22.
Winke, P., Gass, S., & Myford, C. (2012). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252.
Wright, B. D., & Linacre, J. M. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 369-386.