Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F. (2004). Statistical analyses for language assessment book. Cambridge University Press.
Bachman, L. F. (2010). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press.
Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12(2), 86 –107.
Berry, V. (1993). Personality characteristics as a potential source of language test bias. Language testing: New openings. Jyvaskyla, Finland: Institute for Educational research. University of Jyvaskyla.
Bond, T. G. & Fox, C. M. (2007). Applying the Rasch model: fundamental measurement in the human sciences. New York and London: Routledge.
Bouwer, R., Béguin, A., Sanders, T., & Van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32(1), 83-100.
Brennan, R. L. (2010). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1-21.
Brennan, R. L. (2001a). Generalizability theory. New York: Springer Verlag.
Brennan, R. L. (2001b). Manual for urGENOVA. Iowa City, IA: Iowa Testing Programs, University of Iowa.
Brennan, R. L. (2000). Performance assessment from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339–353.
Brennan, R., Goa, X., & Colton, D. (1995). Generalizability analyses of Work Keys Listening and Writing tests. Educational and Psychological Measurement, 55(2), 157–176.
Brennan, R. L. (1992). Elements of generalizability theory. Iowa City, IA.
Brennan, R. L., & Kane, M. T. (1977). An index of dependability for mastery tests. Journal of Educational Measurement, 14(3), 277-289.
Brown, J. D. (2011). What do the L2 generalizability studies tell us? International Journal of Assessment and Evaluation in Education, 1, 1–37.
Brown Jr, J. and Glasner, A. (1999). Assessment matters in higher education. McGraw-Hill Education: UK.
Burt, C. (1936). The analysis of examination marks. In P. Hartog & E. C. Rhodes (Eds.), The marks of examiners (pp. 245-314). London: Macmillan.
Cardinet, J., Johnson, S., & Pini, G. (2011). Applying generalizability theory using EduG Taylor & Francis.
Cardinet, J., Tourneur, Y., & Allal, L. (1981). Extensions of generalizability theory and its applications in educational measurement. Journal of Educational Measurement, 18, 183–204.
Cardinet, J., Tourneur, Y., & Allal, L. (1976). The symmetry of generalizability theory: Application to educational measurement. Journal of Educational Measurement, 13, 119–135.
Cerdan, R., Vidal-Arbarca, E, Martinez, T., & Gil, L. (2009). Impact of question-answering tasks on search processes and reading comprehension. Learning and Instruction, 19(1),13–27.
Crick, J. E. (1983). Manual for GENOVA: a generalized analysis of variance system, Iowa. American College Testing Program.
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and psychological measurement, 64(3), 391-418.
Cronbach, L. J., Nageswari, R., & Gleser, G. C. (1963). Theory of generalizability: Aliberation of reliability theory. The British Journal of Statistical Psychology, 16, 137–163.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: Wiley.
Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16, 407– 424.
Eason, S. H. (1989). Why Generalizability Theory Yields Better Results than Classical Test Theory.
Everitt, B. S., and Howell, D.C. (2005). Repeated measures analysis of variance. Encyclopedia of Statistics in Behavioral Science.
Fan, C., H. and Hansmann, P., R. (2015). Applying generalizability theory for making quantitative RTI progress-monitoring decisions. Assessment for Effective Intervention,40(4), 205–215.
Fan, X., and Sun, S. (2014). Generalizability theory as a unifying framework of measurement reliability in adolescent research. Journal of Early Adolescence. 34(1), 38–65.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (pp. 105–146). Washington, DC: The American Council on Education/Macmillan.
Ferrara, S. (1993, April). Generalizability theory and scaling: Their roles in writing assessment and implications for performance assessments in other content areas. In annual meeting of the National Council on Measurement in Education, Atlanta.
Finlayson, D. S. (1951). The reliability of the marking of essays. British Journal of Educational Psychology, 21(2), 126-134.
Gebril, A. (2010). Bringing Reading-to-Write and Writing-Only Assessment Tasks Together: A Generalizability Analysis. Assessing Writing, 15(2), 100–117.
Gleser, G. C., Cronbach, L. J., & Rajaratnam, N. (1965). Generalizability of scores influenced by multiple sources of variance. Psychometrika, 30, 395–418.
Green, R. (2013). Statistical analyses for language testers. New York: Palgrave Macmillan.
Han, T., & Ege, İ. (2013). Using generalizability theory to examine classroom instructors' analytic evaluation of EFL writing. International Journal of Education, 5(3), 20.
Hoyt, C. J. (1941). Test reliability estimated by analysis of variance. Psychometrika, 6, 153–160.
Huang, C. (2009). Magnitude of task-sampling variability in performance assessment: A metaanalysis. Educational and Psychological Measurement, 69(6), 887–912.
Huang, J. (2012). Using generalizability theory to examine the accuracy and validity of large scale ESL writing assessment. Assessing Writing, 17, 123–139.
Huang, J. (2008). How accurate are ESL students’ holistic writing scores on large-scale assessments? A generalizability theory approach. Assessing Writing, 13, 201–218.
Huang, J., & Foote, C. J. (2010). Grading between the lines: What really impacts professors’ holistic evaluation of ESL graduate student writing? Language Assessment Quarterly, 7, 219–333.
Huang, J., & Han, T. (2013). Holistic or analytic – A Dilemma for Professors to Score EFL Essays? Leadership and Policy Quarterly, 2(1), 1–18.
In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language testing, 33(3), 341-366.
Kane, M. (2010). Validity and fairness. Language testing, 27(2), 177-182.
Kendeou, P., McMaster, K. L., & Christ, T. J. (2016). Reading comprehension: Core components and processes. Policy Insights from the Behavioral and Brain Sciences, 3(1), 62-69.
Kieffer, K. M. (1998). Why Generalizability Theory is Essential and Classical Test Theory is Often Inadequate? [Proceeding]. Paper Presented at the Annual Meeting of the SouthWestern Psychological Association. New Orleans, LA: USA.
Kirk, R. E. (1982). Experimental Design (2nd edition). Belmon. CA: Brooks/ Cole.
Kunnan, A. J. (1992). An investigation of a criterion-referenced test using G-theory, and factor and cluster analyses. Language Testing, 9(1), 30-49.
Lee, Y. W. (2005). Dependability of scores for a new ESL speaking test: Evaluating prototype tasks. Monograph Series MS-28. Princeton, NJ: Educational Testing Service.
Lee, Y. W., & Kantor, R. (2007). Evaluating prototype tasks and alternative rating schemes for a new ESL writing test through G-theory. International Journal of Testing, 7, 353–385.
Lin, C. K. & Zhang, J. (2014). Investigating correspondence between language proficiency standards and academic content standards: A generalizability theory study. Language Testing, 7, 1–19.
Linn, R. L. (1981) ‘Curricular validity: Convincing the courts that it was taught without precluding the possibility of measuring it’, the Ford Foundation, Boston College, MA.
Lord, F. M. (1959). Test of the same length do have the same standard errors of measurement? Educational and Psychological Measurement, 19, 233–239.
Lord, F. M. (1955). Estimating test reliability. Educational and Psychological Measurement, 15, 325–336.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15, 158–180.
Welch, R. C. (2014). Action research from concept to presentation: A practical handbook to writing your master's thesis. Author House.
Marcoulides, G. A. (2000, March). Generalizability theory: Advancements and implementations. Invited colloquium presented at the 22nd Language Testing Research Colloquium, Vancouver, BC, Canada.
Marshall, E. (1998). The Marshall plan for novel writing. Cincinnati, OH: Writer's Digest Books.
Messick, S. (1989). Validity In. R. Linn (Ed.) Educational measurement (pp.13-103).
Morrell, J. (2006). Between the lines: Master the subtle elements of fiction writing. Media, Inc.
Mostow, J., Huang, Y. T., Jang, H., Weinstein, A., Valeri, J., & Gates, D. (2017). Developing, evaluating, and refining an automatic generator of diagnostic multiple choice cloze questions to assess children's comprehension while reading. Natural Language Engineering, 23(2), 245-294.
Pearson, P., Valencia, W., & Wixson, K. (2014). Complicating the world of reading assessment: Toward better assessments for better teaching. Theory into Practice, 53(3), 236–246.
Popham, W. J. (2000). Modern educational measurement. Boston, Allyn & Bacon.
Rouet, J. F., Vidal-Abarca, E., Erboul, A. B., & Millogo, V. (2001). Effects of information search tasks on the comprehension of instructional text. Discourse Processes, 31(2), 163-186.
Rupp, A. A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language testing, 23(4), 441-474.
Selgin, P. (2007). By cunning & craft: Sound advice and practical wisdom for fiction writers. Writer's Digest Books.
Shavelson, R.J., & Webb, N.M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Smith, Jr., E. V. & Kulikowich, J. M. (2004). An application of generalizability theory and many facet rasch measurement using a complex problem solving skills assessment. Educational and Psychological Measurement, 64, 617–639.
Stansfield, C. W., & Kenyon, D. M. (1992). Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. System, 20(3), 347-364.
van de Watering, G., & van der Rijt, J. (2006). Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items. Educational Research Review, 1(2), 133-147.
van Steensel, R., Oostdam, R., & van Gelderen, A. (2013). Assessing reading comprehension in adolescent low achievers: Subskills identification and task specificity. Language testing, 30(1), 3-21.
Strube, M. J. (2002). Reliability and generalizability theory. In L.G. grimm & P.R. Yarnold (Eds.), Reading and understanding more multivariate statistics (pp. 23-66). Washington, DC: American Psychological Association.
Sudweeks, R.R., Reeve. S., Bradshaw, W. S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing 9, 239–261.
Swartz, C.W., Hooper, S.R., Montgomery, J.W., Wakely, M.B., De Kruif, R.E.L., Reed, M., Brown, T.T., Levine, M.D., & White, K.P. (1999). Using generalizability theory to estimate the reliability of writing scores derived from holistic and analytic scoring methods. Educational and Psychological Measurement, 59, 492–506.
Thompson, B. (2003). A brief introduction to generalizability theory. In B. Thompson (Ed), Score reliability: Contemporary thinking on reliability issues (pp. 43– 58). Thousand Oaks, CA: Sage.
Van den Bergh, H., De Maeyer, S., Van Weijen, D., & Tillema, M. (2012). Generalizability of text quality scores. Measuring writing: Recent insights into theory, methodology and practices, 27, 23-32.
Vidal-Abarca, E., Gilabert, R., & Rouet, J. F. (1998). The role of question type on learning form scientific text. Paper presented at Seminario ‘Comprension y produccion de textos cientificos’, Aveiro, Portugal.
Vispoel, W. P., Morris, C. A., & Kilinc, M. (2018). Practical applications of generalizability theory for designing, evaluating, and improving psychological assessments. Journal of personality assessment, 100(1), 53-67.
Vispoel, W. P., Morris, C. A., & Kilinc, M. (2017). Applications of generalizability theory and their relations to classical test theory and structural equation modeling. Psychological Methods, 23(1), 1.
Vispoel, W. P., Morris, C. A., & Kilinc, M. (2016). Using G-theory to enhance evidence of reliability and validity for common uses of the Paulhus Deception Scales. Assessment, 25(1), 69-83.
Wang, H. (2010). Investigating the justifiability of an additional test use: An application of assessment use argument to an English as a foreign language test (Doctoral dissertation). Retrieved from ProQuest. (AAT 3441468)
Webb, N.M., Rowley, G. L., & Shavelson, R. J. (1988). Using generalizability theory in counseling and development. Measurement and Evaluation in Counseling and Development, 21, 81– 90.
Webb, N. M., & Shavelson, R. J. (2005). Generalizability theory: Overview. Encyclopedia of statistics in behavioral science.
Wu, Y-F. and Tzou, H. (2015). A Multivariate generalizability theory approach to standard setting. Applied Psychological Measurement. 39(7), 507–524.
Zhang, J., & Lin, C. K. (2016). Generalizability theory with one-facet nonadditive models. Applied psychological measurement, 40(6), 367-386.