Iranian EFL Raters’ Cognitive Processes in Rating IELTS Speaking Tasks: The Effect of Expertise

Document Type: Research Paper


Imam Khomeini International University


Variations in rating the EFL learners’ oral performance are often attributed to the variations in the raters’ cognitive processes. Han’s (2016) 4-stage processing model was used to examine what cognitive processes expert and novice raters follow to rate a recorded response to the IELTS Speaking Task Two by using the IELTS rubrics. Novice and expert raters attended the 4-phase verbal protocol sessions in order to explore the cognitive processes underlying (a) their representations of IELTS speaking rubric, (b) qualitative assessment of a recorded sample response to IELTS Speaking Task Two, (c) quantitative assignment of ratings to the input and (d) revision of the assigned ratings. Qualitative data collection was followed by transcribing, segmenting, encoding, and analyzing the contents of the recorded verbal protocol reports. After content analysis, the four categories of (1) grammatical range and accuracy, (2) fluency and coherence, (3) lexical resources, and (4) pronunciation in IELTS speaking rubric were schemed into 80 themes. NVivo 8 and SPSS 19 were used to analyze the data qualitatively and quantitatively, respectively. Both qualitative and statistical findings showed that the L2 raters with a different range of expertise widely focus on different aspects of the spoken response input, have different interpretations, and apply different criteria when judging the verbal input. The findings of the present study may carry implications for rater training and validity of ratings. Expertise, as the findings of the study show, can exert an influence on the reliability of the ratings.


Article Title [Persian]

فرایند های شناختی ارزیاب های ایرانی در ارزیابی فعالیت های گفتاری آیلتس: تاپیر تخصص

Authors [Persian]

  • رجب اسفندیاری
  • پیام نور
دانشگاه بین المللی امام خمینی(ره)
Abstract [Persian]

ارزیابهای زبان دوم افرادی هستند که معمولا مرتکب اعمال الگوهای مغرضانه و متناقض در سنجش می گردند که به انها "اثر ارزیاب" یا خطاهای اندازه گیری می گوییم. این تحقیق به منظور یافتن تناقض های موجود در ارزیابی ارزیابهای مجرب و مبتدی و دربازنمود شناختی معیارهای ارزیابی آنها در سنجش آزمون IELTS انجام گردید. به این منظور، و در جهت دنبال کردن مدل شناختی هان (2016)، ارزیاب های مجرب و مبتدی IELTS به روش هدفمند انتخاب و در چهار فاز پروتکل کلامی به منظور رمزگشایی از بازنویسی شناختی آنها به شرح ذیل شرکت جستند. جمع آوری داده ها به کمک نرم افزارNVivo 8 و SPSSشامل رونویسی کردن به حروف آوایی، تقسیم بندی، رمز گشایی و تحلیل محتوایی گزارشهای پروتکل کلامی از شرکت کنندگان در فاز 1 پروتکلهای کلامی – فاز دریافت مفهومی از روبریک IELTS - در فاز 2 پروتکلهای کلامی – فاز نمره دهی کیفی به نمونه جواب شفاهی به فعالیت 2 در مصاحبه IELTS در فاز 3 پروتکلهای کلامی – فاز نمره دهی کمی به نمونه جواب شفاهی به فعالیت 2 در مصاحبه IELTSو در فاز 4 پروتکلهای کلامی – فاز ویرایش یا نهایی کردن نمرات–بود. نتایج تحقیق نشان داد که که ارزیابهای مجرب و مبتدی در میزان توجه به شاخص های ارزشیابی تفاوتهای معنا داری در هر یک از 4 فاز پروتکلهای کلامی هان دارند. نتایج این تحقیق بر تاثیر تجربه در عملکرد و نگرش ارزیابها به پاسخ شفاهی زبان آموزان تاکید کرده و خاطر نشان میکند که ارزیابهای مجرب و مبتدی تفسیر های گوناگون و قضاوت های متفاوتی را در ارزیابی کیفی و کمی داده های زبانی خواهند داشت.

Keywords [Persian]

  • تخصص
  • ارزیاب
  • شناختی
  • آیلتس
  • فعالیت گفتاری
Baddeley, A., Eysenck, M. W., & Anderson, M. C. (2009). Memory. New York, NY: Psychological Press.

Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74.

Bejar, I. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2-9.

Berns M (2005). Expanding on the Expanding Circle: Where do WE go from here? World Englishes, 24(1), 85–93.

Brennan, R. L. (2013). Commentary on “Validating the interpretations and uses of test scores”. Journal of Educational Measurement, 50, 74–83.

Bridgeman, B., Powers, D., Stone, E., & Mollaun, P. (2011). TOEFL iBT speaking test scores as indicators of oral communicative language proficiency. Language Testing, 29(1), 91–108.

Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple. Language Testing, 29, 19–27.

Cumming, A., Kantor, R., & Powers, D. (2002). Scoring of TOEFL Essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework (TOEFL Monograph Series N 22). Princeton, NJ: Educational Testing Service.

Davis, L. (2015). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.

Ducasse, A. M. (2010). Interaction in paired oral proficiency assessment in Spanish: Rater and candidate input into evidence based scale development and construct definition. Frankfurt: Peter Lang.

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.

Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt: Peter Lang.

Eckes, T. (2012). Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior. Language Assessment Quarterly, 9, 270–292

Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions (TOEFL Research Report No. RR-03-17). Princeton, NJ: Educational Testing Service.

Ericsson, K. A. (2006). The Influence of experience and deliberate practice on the development of superior expert performance. In K. A. Ericsson, N. Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), The Cambridge handbook of expertise and expert performance (pp. 683–704). Cambridge: Cambridge University Press.

Esfandiari, R., & Myford, C. M. (2013). Severity differences among self-assessors, peer assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131.

Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1, 1–16.

Furneaux, C., & Rignall, M. (2007). The effect of standardization-training on rater judgements for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 422–445). Cambridge: Cambridge University Press.

Govaerts, M. J. B., Schuwirth, L. W. T., Van der Vleuten, C. P. M., & Muijtjens, A. M. M. (2011). Workplace-based assessment: effects of rater expertise. Advances in Health Sci Educ 16, 151–165

Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of online rater training and monitoring . System, 29, 505-20.

Han, Q. (2016). Rater cognition in L2 speaking assessment: A review of the literature. Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics, 16(1), 1-24.

Hsieh, C. N. (2011). Rater effects in ITA testing: ESL teachers’ versus American undergraduates’ judgments of accentedness, comprehensibility, and oral proficiency. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 9, 47-74.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.

Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior – a longitudinal study. Language Testing, 28, 179–200.

Kim, H. J. (2015). Investigating raters’ development of rating ability on a second language speaking test (Unpublished doctoral dissertation). Teachers College, Columbia University, New York, NY.

Lazaraton, A. (2005). Non-native speakers as language assessors: Recent research and implications for assessment practice. Paper presented at the BAAL, Bristol.

Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28, 543–560.

Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19, 246–76.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. New York: Peter Lang.

May, L. (2011). Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly, 8(2), 127-145.

Mislevy, R. J. (2010). Some implications of expertise research for educational assessment. Research papers in education, 25, 253-270

Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.

O’Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 446–478). Cambridge: Cambridge University Press.

Purpura, J. E. (2012). What is the role of strategic competence in a processing account of L2 learning or use? Paper presented at the American Association for Applied Linguistics Conference, Boston, MA.

Purpura, J. E. (2014). Cognition and language assessment. In A. J. Kunnan (Ed.), the companion to language assessment (pp.1452–1476). Boston, MA: John Wiley & Sons, Inc.

Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28, 463–481.

Sakyi, A. A. (2003). A study of the holistic scoring behaviors of experienced and novice ESL instructors (Unpublished doctoral dissertation). Toronto: University of Toronto.

Tosuncuoglu, I. (2018). Importance of Assessment in ELT. Journal of Education and Training Studies, 6(9), 163-167

Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283-304.

Wolfe, E.W., Chiu, C.W. T., & Myford, C. M. (1999). The manifestation of common rater effects in multi-faceted Rasch analyses. Princeton, NJ: Educational Testing Service, Center for Performance Assessment.

Wolfe, E. W., Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. The Journal of Technology, Learning and Assessment, 10(1). 1-22.