Baddeley, A., Eysenck, M. W., & Anderson, M. C. (2009). Memory. New York, NY: Psychological Press.
Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74.
Bejar, I. I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2-9.
Berns M (2005). Expanding on the Expanding Circle: Where do WE go from here? World Englishes, 24(1), 85–93.
Brennan, R. L. (2013). Commentary on “Validating the interpretations and uses of test scores”. Journal of Educational Measurement, 50, 74–83.
Bridgeman, B., Powers, D., Stone, E., & Mollaun, P. (2011). TOEFL iBT speaking test scores as indicators of oral communicative language proficiency. Language Testing, 29(1), 91–108.
Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple. Language Testing, 29, 19–27.
Cumming, A., Kantor, R., & Powers, D. (2002). Scoring of TOEFL Essays and TOEFL 2000 prototype writing tasks: An investigation into raters’ decision making and development of a preliminary analytic framework (TOEFL Monograph Series N 22). Princeton, NJ: Educational Testing Service.
Davis, L. (2015). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.
Ducasse, A. M. (2010). Interaction in paired oral proficiency assessment in Spanish: Rater and candidate input into evidence based scale development and construct definition. Frankfurt: Peter Lang.
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185.
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt: Peter Lang.
Eckes, T. (2012). Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior. Language Assessment Quarterly, 9, 270–292
Erdosy, M. U. (2004). Exploring variability in judging writing ability in a second language: A study of four experienced raters of ESL compositions (TOEFL Research Report No. RR-03-17). Princeton, NJ: Educational Testing Service.
Ericsson, K. A. (2006). The Influence of experience and deliberate practice on the development of superior expert performance. In K. A. Ericsson, N. Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), The Cambridge handbook of expertise and expert performance (pp. 683–704). Cambridge: Cambridge University Press.
Esfandiari, R., & Myford, C. M. (2013). Severity differences among self-assessors, peer assessors, and teacher assessors rating EFL essays. Assessing Writing, 18(2), 111-131.
Fahim, M., & Bijani, H. (2011). The effects of rater training on raters’ severity and bias in second language writing assessment. Iranian Journal of Language Testing, 1, 1–16.
Furneaux, C., & Rignall, M. (2007). The effect of standardization-training on rater judgements for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 422–445). Cambridge: Cambridge University Press.
Govaerts, M. J. B., Schuwirth, L. W. T., Van der Vleuten, C. P. M., & Muijtjens, A. M. M. (2011). Workplace-based assessment: effects of rater expertise. Advances in Health Sci Educ 16, 151–165
Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of online rater training and monitoring . System, 29, 505-20.
Han, Q. (2016). Rater cognition in L2 speaking assessment: A review of the literature. Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics, 16(1), 1-24.
Hsieh, C. N. (2011). Rater effects in ITA testing: ESL teachers’ versus American undergraduates’ judgments of accentedness, comprehensibility, and oral proficiency. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 9, 47-74.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior – a longitudinal study. Language Testing, 28, 179–200.
Kim, H. J. (2015). Investigating raters’ development of rating ability on a second language speaking test (Unpublished doctoral dissertation). Teachers College, Columbia University, New York, NY.
Lazaraton, A. (2005). Non-native speakers as language assessors: Recent research and implications for assessment practice. Paper presented at the BAAL, Bristol.
Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28, 543–560.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19, 246–76.
Lumley, T. (2005). Assessing second language writing: The rater’s perspective. New York: Peter Lang.
May, L. (2011). Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly, 8(2), 127-145.
Mislevy, R. J. (2010). Some implications of expertise research for educational assessment. Research papers in education, 25, 253-270
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
O’Sullivan, B., & Rignall, M. (2007). Assessing the value of bias analysis feedback to raters for the IELTS writing module. In L. Taylor & P. Falvey (Eds.), IELTS Collected Papers: Research in speaking and writing assessment (pp. 446–478). Cambridge: Cambridge University Press.
Purpura, J. E. (2012). What is the role of strategic competence in a processing account of L2 learning or use? Paper presented at the American Association for Applied Linguistics Conference, Boston, MA.
Purpura, J. E. (2014). Cognition and language assessment. In A. J. Kunnan (Ed.), the companion to language assessment (pp.1452–1476). Boston, MA: John Wiley & Sons, Inc.
Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28, 463–481.
Sakyi, A. A. (2003). A study of the holistic scoring behaviors of experienced and novice ESL instructors (Unpublished doctoral dissertation). Toronto: University of Toronto.
Tosuncuoglu, I. (2018). Importance of Assessment in ELT. Journal of Education and Training Studies, 6(9), 163-167
Wei, J., & Llosa, L. (2015). Investigating differences between American and Indian raters in assessing TOEFL iBT speaking tasks. Language Assessment Quarterly, 12(3), 283-304.
Wolfe, E.W., Chiu, C.W. T., & Myford, C. M. (1999). The manifestation of common rater effects in multi-faceted Rasch analyses. Princeton, NJ: Educational Testing Service, Center for Performance Assessment.
Wolfe, E. W., Matthews, S., & Vickers, D. (2010). The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. The Journal of Technology, Learning and Assessment, 10(1). 1-22.