Abbott, ML. (2006). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7–36. https://doi.org/10.1177/0265532207071510.
Ahmadi, A., & Jalili, T. (2014). A confirmatory study of differential item functioning on EFL reading comprehension. Applied Research on English Language, 3(6), 55–68. https://doi.org/10.22108/are.2014.15489.
Alavi, S. M., & Ghaemi, H. (2011). Application of structural equation modeling in EFL testing: a report of two Iranian studies. Language Testing in Asia, 1(3), 22–35. https://doi.org/10.1186/2229-0443-1-3-22.
Alavi, S. M., & Janbaz, F. (2014). Comparing two pre-listening supports with Iranian EFL learners: opportunity or obstacle. RELC Journal, 45(3), 253–267. https://doi.org/10.1177/0033688214546963.
Alavi, SM, Rezaae, AA, Amirian, SMR. (2011a). Academic discipline DIF in an English language proficiency test. Journal of English Language Teaching and Learning, 7, 39–65.
Alavi, SM, Kaivanpanah, S, Nayernia, A. (2011b). The factor structure of a written English proficiency test: a structural equation modeling. Iranian Journal of Applied Language Studies, 3(2). https://doi.org/10.22111/ijals.2011.1008.
Amirian, SMR, Alavi, SM, Fidalgo, AM. (2014). Detecting gender DIF with an English proficiency test in EFL context. Iranian Journal of Language Testing, 4(2), 187–203.
Aryadoust, V. (2011). Validity arguments of the speaking and listening modules of international English language testing system: a synthesis of existing research. The Asian ESP Journal, 7(1), 28–54.
Aryadoust, A. (2012). Differential item functioning in while-listening performance tests: the case of international English language testing system (IELTS) listening module. International Journal of Listening, 26(1), 40–60. https://doi.org/10.1080/10904018.2012.639649.
Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. (PP 1–30). Cambridge: Cambridge Scholars.
Badger, R, & Yan, X. (2006). The use of tactics and strategies by Chinese students in the listening component of IELTS. IELTS Research Reports, 9 Retrieved on 15 Sept 2015, from http://www.ielts.org.
Bae, J, & Bachman, L. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27(2), 213–234. https://doi.org/10.1177/0265532209349470.
Barati, H, Ketabi, S, Ahmadi, A. (2006). Differential item functioning in high-stakes tests: the effect of field of study. IJAL, 19(2), 27–42.
Bentler, PM, & Chou, CP. (1987). Practical issues in structural modeling. Sociological Methods & Research, 16(1), 78–117. https://doi.org/10.1177/0049124187016001004.
Bodie, G. D., & Worthington, D. L. (2010). Revisiting the listening styles profile (LSP-16): a confirmatory factor analytic approach to scale validation and reliability estimation.
Boomsma, A (1987). The robustness of maximum likelihood estimation in structural equation models. In P Cuttance, R Ecob (Eds.), Structural modelling by examples, (pp. 160–188). Cambridge: Cambridge University Press The International Journal of Listening, 24(2), 69–88. https://doi.org/10.1080/10904011003744516.
Cai, H. (2013). Partial dictation as a measure of EFL listening proficiency: evidence from confirmatory factor analysis. Language Testing, 30(2), 177–199. https://doi.org/10.1177/0265532212456833.
Cambridge IELTS (2016). Cambridge IELTS 11: Official examination papers from University of Cambridge: ESOL examinations. Cambridge: Cambridge Publications.
Cambridge IELTS (2017). Cambridge IELTS 12: Official examination papers from University of Cambridge: ESOL examinations. Cambridge: Cambridge Publications.
Carr, NT. (2006). The factor structure of test task characteristics and examinee performance. Language Testing, 23(3), 269–289. https://doi.org/10.1191/0265532206lt328oa.
Chen, J, de la Torre, J, Zhang, Z. (2013). Relative and absolute fir evaluation in cognitive diagnostic modelling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.
Cronbach, LJ, & Meehl, PE. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957.
Ding, L, Velicer, WF, Harlow, LL. (1995). Effects of estimation methods, number of indicators per factor, and improper solutions on structural equation modeling fit indices. Structural Equation Modelling: Multidisciplinary Journal, 2(2), 119–143. https://doi.org/10.1080/10705519509540000.
Drabinova, A, & Martinkova, P. (2017). Detection of differential item functioning with non- linear regression: a non-IRT approach accounting for guessing. Journal of Educational Measurement, 54(4), 498–517. https://doi.org/10.1111/jedm.12158.
Ferne, T, & Rupp, AA. (2007). A synthesis of 15 years of research on DIF in language testing: methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2), 113–148. https://doi.org/10.1080/15434300701375923.
Fidalgo, AM, Alavi, SM, Amirian, SMR. (2014). Strategies for testing and practical significance in detecting DIF with logistic regression models. Language Testing, 31(4), 433–451. https://doi.org/10.1177/0265532214526748.
Field, J. (2005). The cognitive validity of the lecture-based question in the IELTS listening paper. IELTS Research Reports, 9, 17–65 Retrieved on 15 Oct 2015 from http://www.ielts.org.
Field, A (2009). Discovering statistics using SPSS. Los angeles: Sage Publications.
George, AC, & Robitzsch, A. (2014). Multiple group cognitive diagnosis models, with an emphasis on differential item functioning. Psychological Test and Assessment Modeling, 56(4), 405–432.
Geranpayeh, A, & Kunnan, AJ. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190–222. https://doi.org/10.1080/15434300701375758.
Guilera, G, Gómez-Benito, J, Hidalgo, MD, Sánchez-Meca, J. (2013). Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: a meta-analysis. Psychological Review, 18(4), 553–571. https://doi.org/10.1037/a0034306.
IELTS Handbook. (2007). Retrieved on 5 Sept 2015 from http://www.bing.com/search?q=IELTS+handbook+2007.
Harding, L. (2011). Accent, listening assessment and the potential for a shared-L1 advantage: a DIF perspective. Language Testing, 29(2), 163–180 55 (2), 79–94. https://doi.org/10.1037/h0056564.
Harding, L, Alderson, JC, Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: elaborating on diagnostic principles. Language Testing, 32(3), 317–336. https://doi.org/10.1177/0265532214564505.
Hooper, D, Coughlan, J, Mullen, M. (2008). Structural equation modelling: guidelines for determining model fit. Electroninc Journal of Business Research Methods, 6(1), 53–60.
Hou, L., de la Torre, J., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnosis modeling: applying Wald test to investigate DIF for DINA model. Journal of Educational Measurement, 51 (1), 98–125. doi:https://doi.org/10.1111/jedm.12036.
In'nami, Y, & Koizumi, R. (2011). Structural equation modelling in language testing and learning research: a review. Language Assessment Quarterly, 8(3), 250–273. https://doi.org/10.1080/15434303.2011.582203.
Jakeman, V, & McDowell, C (2004). Step up to IELTS. Cambridge: Cambridge University.
Jakeman, V, & McDowell, C (2006). Action plan for IELTS. Cambridge: Cambridge University.
Junker, BW, & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064.
Kane, M. (2013). Validating the interpretation and use of test scores. Journal of Educational Measurement, 50(1), 13–14. https://doi.org/10.1111/jedm.12000.
Kane, MT. (2016). Validity as the evaluation of the claims based on test scores. Assessment in Education: Principles, Policy, & Practice, 23(2), 309–311. https://doi.org/10.1080/0969594x.
Khine, MS (2013). Application of structural equation modeling in educational research and practice. Rotterdam: Sense Publishers.
Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1), 89–114. https://doi.org/10.1177/026553220101800104.
Kim, YH, & Jang, EE. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: a multidimensionality model-based DBF/DIF approach. Language Learning, 59(4), 825–865. https://doi.org/10.1111/j.1467-9922.2009.00527.
Kimura, H. (2016). Foreign language listening anxiety: a self-presentational view. International Journal of Listening, 00, 1–21. https://doi.org/10.1080/10904018.2016.1222909.
KÖk, I. (2017). Relationship between listening comprehension strategy use and listening comprehension proficiency. International journal of Listening, 0, 1–17. https://doi.org/10.1080/10904018.2016.1276457.
Kunnan, AJ. (1994). Modeling relationships among some test-taker characteristics and performance on EFL tests: an approach to construct validation. Language Testing, 11(3). https://doi.org/10.1177/026553229401100301.
Kunnan, AJ. (1998). An introduction to structural equation modeling for language assessment. Language Testing, 15(3), 295–332. https://doi.org/10.1177/026553229801500302.
Le, L. (2006). Analysis of differential item functioning. Paper Prepared for the Annual Meetings of the American Educational Research Association in San Francisco, 7–11.
Li, FM (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning. In Unpublished doctoral dissertation. Athen: University of Georgia.
Li, H, & Suen, HK. (2013). Detecting native language group differences at the sub-skills level of reading: a differential skill functioning approach. Language Testing, 30, 273–298. https://doi.org/10.1177/0265532212459031.
Li, X, & Wang, WC. (2015). Assessment of differential item functioning under cognitive diagnosis models: the Dina model example. Journal of Educational Measurement, 52(1), 28–54. https://doi.org/10.1111/jedm.12061/pd.
London Teacher Training College (2005). IELTS training course. London: London Teacher Training College.
Mantel, N, & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719–748. https://doi.org/10.1093/jnci/22.4.719.
McCarter, S (2006). Tips for IELTS: a must-have book for all IELTS candidates. Oxford: Macmillan.
Messick, S. (1974). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 2(2). https://doi.org/10.1002/j.2333-8504.1974.tb01034.x/pdf.
Messick, S. (1986). The once and future issues of validity: assessing the meaning and consequences of measurement. American Psychologist, 2(12), 1–24. https://doi.org/10.1002/j.2330-8516.1986.tb00185.x/pdf (1974, 1986, 1995, 1996).
Messick, S. (1995). Validity of psychological assessment: validation of inferences from person’s responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749.
Messick, S. (1996). Validity and wash back in language testing. Language Testing, 13(1), 241–256.
Michaelides, MP. (2008). An illustration of a Mantel-Haenszel procedure to flag misbehaving common items in test equating. Practical Assessment, Research and Evaluation, 13(7).
Monahan, PO, & Ankenmann, RD. (2005). Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel chi-square test for differential item functioning. Journal of Educational Measurement, 42, 101–131. https://doi.org/10.1177/014662169301700401.
Monahan, PO, & Ankenmann, RD. (2010). Alternative matching scores to control type I error of the Mantel-Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models. Applied Psychological Measurement, 34, 193–210. https://doi.org/10.1177/0146621609359283.
Nakatsuhara, F, Inoue, C, Taylor, L. (2017). An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS speaking test. IELTS Research Reports, Online Series, 1.
Newton, PE, & Baird, JA. (2016). The great validity debate. Assessment in education: principles, policy & practice, 23(2), 173–177. https://doi.org/10.1080/0969594x.1172871.
Newton, PE, & Shaw, SD. (2015). Disagreement over the best way to use the world validity and options for reaching consensus. Assessment in Education: Principles, Policy & Practice, 23(2), 281–283. https://doi.org/10.1080/0969594X.2016.1141750.
Ockey, G, & Choi, I. (2015). Structural equation modeling reporting practices for language assessment. Language Assessment Quarterly, 12(3), 305–319. https://doi.org/10.1080/15434303.2015.1050101.
Pae, T. (2004). DIF for examinees with different academic backgrounds. Language Testing, 21, 53–73. https://doi.org/10.1191/0265532204lt274oa.
Pae, T. (2012). Causes of gender DIF on an EFL language test: a multiple-data driven analysis over nine years. Language Testing, 29(4), 533–554.
Phakiti, A. (2008). Strategic competence as a fourth-order factor model: a structural equation modeling. Language Assessment Quarterly, 5(1), 20–42.
Phakiti, A. (2016). Test-takers’ performance appraisals, appraisal calibration, state-trait strategy use, and state-trait IELTS listening difficulty in a simulated IELTS listening test. IELTS Research Reports Online Series, 6, 1–3.
Pilcher, N, & Richards, K. (2017). Challenging the power invested in the international English language testing system (IELTS): why determining ‘English’ preparedness needs to be undertaken within the subject context. Power and Education, 9(1), 3–7.
Rezaee, A, & Shabani, E. (2010). Gender differential item functioning analysis of University of Tehran English Proficiency Test. Pazhuheshe- Zabanhaye Khareji, 56, 89–108.
Rogers, WT, & Yang, P. (1996). Test-wiseness: Its nature and application. European Journal of Psychological Assessment, 12(3), 247–259.
Roussel, S, Gruson, B, Galan, JP. (2017). What types of training improve learners’ performances in second language listening comprehension? International Journal of Listening, 00, 1–14. https://doi.org/10.1080/10904018.2017.1331133.
Sawaki, Y (2012). Factor analysis. The Encyclopedia of Applied Linguistics. Blackwell Publishing Ltd. https://doi.org/10.1002/9781405198431.wbeal0407.
Sawaki, Y, Stricker, LJ, Oranje, AH. (2009). Factor structure of the TEOFL internet- based test. Language Testing, 26(1), 005–030.
Schmitt, T. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psycho-educational Assessment, 29(4). https://doi.org/10.1177/0734282911406653.
Schoonen, R. (2005). Generalizability of writing scores: a structural equation modeling. Language Testing, 22(1), 1–30.
Sireci, SG. (2017). Interview with Stephen G. Sireci on validity. Journal of Measurement and Evaluation in Education and. Psychology, 8(1), 158–168.
Song, MY. (2008). Do divisible sub-skills exist in second language comprehension? A structural equation modeling approach. Language Testing, 25(4), 435–464.
Song, X, Cheng, L, Klinger, D. (2015). DIF investigations across groups of gender and academic background in a large scale high-stakes language test. Papers in Language Testing and Assessment, 4(1), 97–124.
Su, YH, & Wang, WC. (2005). Efficiency of the Mantel, generalized Mantel-Haenzel, and logistic discriminant function analysis methods in detecting for polytomous items. Applied Measurement in Education, 18(4), 313–350. https://doi.org/10.1207/s15324818ame1804.
Tatsuoka, K, Linn, R, Tatsuoka, M, Yamamoto, K. (1988). Differential item functioning resulting from the use of different solution strategies. Journal of Educational Measurement, 25(4), 301–319. https://doi.org/10.1111/j.1745-3984.1988.tb00310.x.
Templin, J, & Henson, RA. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7.
de la Torre, J, & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640.
Vandergrift, L. (1997). The Cinderella of communication strategies: reception strategies in interactive listening. The Modern Language Journal, 81(4).
Vandergrift, L. (2006). Second language listening: listening ability or language proficiency? The Modern Language Journal, 90(1).
Vandergrift, L. (2007). Recent developments in second and foreign language listening comprehension research. Language Teaching, 40, 191–210.
Winke, P, & Lim, H. (2014). The effects of testwiseness and test-taking anxiety on L2 listening test performance: a visual (eye-tracking) and attentional investigation. IELTS Research Reports, 3, 5–6.
Wright, SP. (1992). Adjusted P-values for simultaneous inference. Biometrics, 48(1), 1005–1013.
Zhang, W (2006). Detecting differential item functioning using the DINA model, unpublished doctoral dissertation. Greensboro: University of North Carolina.
Zhang, L. (2015). Recent research into IELTS reading and listening assessment by Linda Taylor and Cyril Weir (Eds.) Language Assessment Quarterly, 12, 234–238. https://doi.org/10.1080/15434303.2014.1003218.
Zumbo, BD. (2003). Does item-level DIF manifest itself in scale-level analyses? Implications for translating language tests. Language Testing, 20(2), 136–147.
Zumbo, BD. (2007). Three generations of DIF analyses: considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233.