Abbott, ML. (2006). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7–36. https://doi.org/10.1177/0265532207071510.
Article
Google Scholar
Ahmadi, A., & Jalili, T. (2014). A confirmatory study of differential item functioning on EFL reading comprehension. Applied Research on English Language, 3(6), 55–68. https://doi.org/10.22108/are.2014.15489.
Alavi, S. M., & Ghaemi, H. (2011). Application of structural equation modeling in EFL testing: a report of two Iranian studies. Language Testing in Asia, 1(3), 22–35. https://doi.org/10.1186/2229-0443-1-3-22.
Article
Google Scholar
Alavi, S. M., & Janbaz, F. (2014). Comparing two pre-listening supports with Iranian EFL learners: opportunity or obstacle. RELC Journal, 45(3), 253–267. https://doi.org/10.1177/0033688214546963.
Article
Google Scholar
Alavi, SM, Rezaae, AA, Amirian, SMR. (2011a). Academic discipline DIF in an English language proficiency test. Journal of English Language Teaching and Learning, 7, 39–65.
Google Scholar
Alavi, SM, Kaivanpanah, S, Nayernia, A. (2011b). The factor structure of a written English proficiency test: a structural equation modeling. Iranian Journal of Applied Language Studies, 3(2). https://doi.org/10.22111/ijals.2011.1008.
Amirian, SMR, Alavi, SM, Fidalgo, AM. (2014). Detecting gender DIF with an English proficiency test in EFL context. Iranian Journal of Language Testing, 4(2), 187–203.
Google Scholar
Aryadoust, V. (2011). Validity arguments of the speaking and listening modules of international English language testing system: a synthesis of existing research. The Asian ESP Journal, 7(1), 28–54.
Google Scholar
Aryadoust, A. (2012). Differential item functioning in while-listening performance tests: the case of international English language testing system (IELTS) listening module. International Journal of Listening, 26(1), 40–60. https://doi.org/10.1080/10904018.2012.639649.
Article
Google Scholar
Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. (PP 1–30). Cambridge: Cambridge Scholars.
Badger, R, & Yan, X. (2006). The use of tactics and strategies by Chinese students in the listening component of IELTS. IELTS Research Reports, 9 Retrieved on 15 Sept 2015, from http://www.ielts.org.
Bae, J, & Bachman, L. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27(2), 213–234. https://doi.org/10.1177/0265532209349470.
Article
Google Scholar
Barati, H, Ketabi, S, Ahmadi, A. (2006). Differential item functioning in high-stakes tests: the effect of field of study. IJAL, 19(2), 27–42.
Google Scholar
Bentler, PM, & Chou, CP. (1987). Practical issues in structural modeling. Sociological Methods & Research, 16(1), 78–117. https://doi.org/10.1177/0049124187016001004.
Article
Google Scholar
Bodie, G. D., & Worthington, D. L. (2010). Revisiting the listening styles profile (LSP-16): a confirmatory factor analytic approach to scale validation and reliability estimation.
Google Scholar
Boomsma, A (1987). The robustness of maximum likelihood estimation in structural equation models. In P Cuttance, R Ecob (Eds.), Structural modelling by examples, (pp. 160–188). Cambridge: Cambridge University Press The International Journal of Listening, 24(2), 69–88. https://doi.org/10.1080/10904011003744516.
Article
Google Scholar
Cai, H. (2013). Partial dictation as a measure of EFL listening proficiency: evidence from confirmatory factor analysis. Language Testing, 30(2), 177–199. https://doi.org/10.1177/0265532212456833.
Article
Google Scholar
Cambridge IELTS (2016). Cambridge IELTS 11: Official examination papers from University of Cambridge: ESOL examinations. Cambridge: Cambridge Publications.
Google Scholar
Cambridge IELTS (2017). Cambridge IELTS 12: Official examination papers from University of Cambridge: ESOL examinations. Cambridge: Cambridge Publications.
Google Scholar
Carr, NT. (2006). The factor structure of test task characteristics and examinee performance. Language Testing, 23(3), 269–289. https://doi.org/10.1191/0265532206lt328oa.
Article
Google Scholar
Chen, J, de la Torre, J, Zhang, Z. (2013). Relative and absolute fir evaluation in cognitive diagnostic modelling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.
Article
Google Scholar
Cronbach, LJ, & Meehl, PE. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957.
Article
Google Scholar
Ding, L, Velicer, WF, Harlow, LL. (1995). Effects of estimation methods, number of indicators per factor, and improper solutions on structural equation modeling fit indices. Structural Equation Modelling: Multidisciplinary Journal, 2(2), 119–143. https://doi.org/10.1080/10705519509540000.
Article
Google Scholar
Drabinova, A, & Martinkova, P. (2017). Detection of differential item functioning with non- linear regression: a non-IRT approach accounting for guessing. Journal of Educational Measurement, 54(4), 498–517. https://doi.org/10.1111/jedm.12158.
Article
Google Scholar
Ferne, T, & Rupp, AA. (2007). A synthesis of 15 years of research on DIF in language testing: methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2), 113–148. https://doi.org/10.1080/15434300701375923.
Article
Google Scholar
Fidalgo, AM, Alavi, SM, Amirian, SMR. (2014). Strategies for testing and practical significance in detecting DIF with logistic regression models. Language Testing, 31(4), 433–451. https://doi.org/10.1177/0265532214526748.
Article
Google Scholar
Field, J. (2005). The cognitive validity of the lecture-based question in the IELTS listening paper. IELTS Research Reports, 9, 17–65 Retrieved on 15 Oct 2015 from http://www.ielts.org.
Google Scholar
Field, A (2009). Discovering statistics using SPSS. Los angeles: Sage Publications.
Google Scholar
George, AC, & Robitzsch, A. (2014). Multiple group cognitive diagnosis models, with an emphasis on differential item functioning. Psychological Test and Assessment Modeling, 56(4), 405–432.
Google Scholar
Geranpayeh, A, & Kunnan, AJ. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190–222. https://doi.org/10.1080/15434300701375758.
Article
Google Scholar
Guilera, G, Gómez-Benito, J, Hidalgo, MD, Sánchez-Meca, J. (2013). Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: a meta-analysis. Psychological Review, 18(4), 553–571. https://doi.org/10.1037/a0034306.
Article
Google Scholar
IELTS Handbook. (2007). Retrieved on 5 Sept 2015 from http://www.bing.com/search?q=IELTS+handbook+2007.
Harding, L. (2011). Accent, listening assessment and the potential for a shared-L1 advantage: a DIF perspective. Language Testing, 29(2), 163–180 55 (2), 79–94. https://doi.org/10.1037/h0056564.
Article
Google Scholar
Harding, L, Alderson, JC, Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: elaborating on diagnostic principles. Language Testing, 32(3), 317–336. https://doi.org/10.1177/0265532214564505.
Article
Google Scholar
Hooper, D, Coughlan, J, Mullen, M. (2008). Structural equation modelling: guidelines for determining model fit. Electroninc Journal of Business Research Methods, 6(1), 53–60.
Google Scholar
Hou, L., de la Torre, J., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnosis modeling: applying Wald test to investigate DIF for DINA model. Journal of Educational Measurement, 51 (1), 98–125. doi:https://doi.org/10.1111/jedm.12036.
Article
Google Scholar
In'nami, Y, & Koizumi, R. (2011). Structural equation modelling in language testing and learning research: a review. Language Assessment Quarterly, 8(3), 250–273. https://doi.org/10.1080/15434303.2011.582203.
Article
Google Scholar
Jakeman, V, & McDowell, C (2004). Step up to IELTS. Cambridge: Cambridge University.
Google Scholar
Jakeman, V, & McDowell, C (2006). Action plan for IELTS. Cambridge: Cambridge University.
Google Scholar
Junker, BW, & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272. https://doi.org/10.1177/01466210122032064.
Article
Google Scholar
Kane, M. (2013). Validating the interpretation and use of test scores. Journal of Educational Measurement, 50(1), 13–14. https://doi.org/10.1111/jedm.12000.
Article
Google Scholar
Kane, MT. (2016). Validity as the evaluation of the claims based on test scores. Assessment in Education: Principles, Policy, & Practice, 23(2), 309–311. https://doi.org/10.1080/0969594x.
Article
Google Scholar
Khine, MS (2013). Application of structural equation modeling in educational research and practice. Rotterdam: Sense Publishers.
Book
Google Scholar
Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1), 89–114. https://doi.org/10.1177/026553220101800104.
Article
Google Scholar
Kim, YH, & Jang, EE. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: a multidimensionality model-based DBF/DIF approach. Language Learning, 59(4), 825–865. https://doi.org/10.1111/j.1467-9922.2009.00527.
Article
Google Scholar
Kimura, H. (2016). Foreign language listening anxiety: a self-presentational view. International Journal of Listening, 00, 1–21. https://doi.org/10.1080/10904018.2016.1222909.
Article
Google Scholar
KÖk, I. (2017). Relationship between listening comprehension strategy use and listening comprehension proficiency. International journal of Listening, 0, 1–17. https://doi.org/10.1080/10904018.2016.1276457.
Article
Google Scholar
Kunnan, AJ. (1994). Modeling relationships among some test-taker characteristics and performance on EFL tests: an approach to construct validation. Language Testing, 11(3). https://doi.org/10.1177/026553229401100301.
Article
Google Scholar
Kunnan, AJ. (1998). An introduction to structural equation modeling for language assessment. Language Testing, 15(3), 295–332. https://doi.org/10.1177/026553229801500302.
Article
Google Scholar
Le, L. (2006). Analysis of differential item functioning. Paper Prepared for the Annual Meetings of the American Educational Research Association in San Francisco, 7–11.
Li, FM (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning. In Unpublished doctoral dissertation. Athen: University of Georgia.
Google Scholar
Li, H, & Suen, HK. (2013). Detecting native language group differences at the sub-skills level of reading: a differential skill functioning approach. Language Testing, 30, 273–298. https://doi.org/10.1177/0265532212459031.
Article
Google Scholar
Li, X, & Wang, WC. (2015). Assessment of differential item functioning under cognitive diagnosis models: the Dina model example. Journal of Educational Measurement, 52(1), 28–54. https://doi.org/10.1111/jedm.12061/pd.
Article
Google Scholar
London Teacher Training College (2005). IELTS training course. London: London Teacher Training College.
Google Scholar
Mantel, N, & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22(4), 719–748. https://doi.org/10.1093/jnci/22.4.719.
Article
Google Scholar
McCarter, S (2006). Tips for IELTS: a must-have book for all IELTS candidates. Oxford: Macmillan.
Google Scholar
Messick, S. (1974). The standard problem: meaning and values in measurement and evaluation. American Psychologist, 2(2). https://doi.org/10.1002/j.2333-8504.1974.tb01034.x/pdf.
Messick, S. (1986). The once and future issues of validity: assessing the meaning and consequences of measurement. American Psychologist, 2(12), 1–24. https://doi.org/10.1002/j.2330-8516.1986.tb00185.x/pdf (1974, 1986, 1995, 1996).
Article
Google Scholar
Messick, S. (1995). Validity of psychological assessment: validation of inferences from person’s responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749.
Article
Google Scholar
Messick, S. (1996). Validity and wash back in language testing. Language Testing, 13(1), 241–256.
Article
Google Scholar
Michaelides, MP. (2008). An illustration of a Mantel-Haenszel procedure to flag misbehaving common items in test equating. Practical Assessment, Research and Evaluation, 13(7).
Monahan, PO, & Ankenmann, RD. (2005). Effect of unequal variances in proficiency distributions on type-I error of the Mantel-Haenszel chi-square test for differential item functioning. Journal of Educational Measurement, 42, 101–131. https://doi.org/10.1177/014662169301700401.
Article
Google Scholar
Monahan, PO, & Ankenmann, RD. (2010). Alternative matching scores to control type I error of the Mantel-Haenszel procedure for DIF in dichotomously scored items conforming to 3PL IRT and nonparametric 4PBCB models. Applied Psychological Measurement, 34, 193–210. https://doi.org/10.1177/0146621609359283.
Article
Google Scholar
Nakatsuhara, F, Inoue, C, Taylor, L. (2017). An investigation into double-marking methods: comparing live, audio and video rating of performance on the IELTS speaking test. IELTS Research Reports, Online Series, 1.
Newton, PE, & Baird, JA. (2016). The great validity debate. Assessment in education: principles, policy & practice, 23(2), 173–177. https://doi.org/10.1080/0969594x.1172871.
Article
Google Scholar
Newton, PE, & Shaw, SD. (2015). Disagreement over the best way to use the world validity and options for reaching consensus. Assessment in Education: Principles, Policy & Practice, 23(2), 281–283. https://doi.org/10.1080/0969594X.2016.1141750.
Article
Google Scholar
Ockey, G, & Choi, I. (2015). Structural equation modeling reporting practices for language assessment. Language Assessment Quarterly, 12(3), 305–319. https://doi.org/10.1080/15434303.2015.1050101.
Article
Google Scholar
Pae, T. (2004). DIF for examinees with different academic backgrounds. Language Testing, 21, 53–73. https://doi.org/10.1191/0265532204lt274oa.
Article
Google Scholar
Pae, T. (2012). Causes of gender DIF on an EFL language test: a multiple-data driven analysis over nine years. Language Testing, 29(4), 533–554.
Article
Google Scholar
Phakiti, A. (2008). Strategic competence as a fourth-order factor model: a structural equation modeling. Language Assessment Quarterly, 5(1), 20–42.
Article
Google Scholar
Phakiti, A. (2016). Test-takers’ performance appraisals, appraisal calibration, state-trait strategy use, and state-trait IELTS listening difficulty in a simulated IELTS listening test. IELTS Research Reports Online Series, 6, 1–3.
Google Scholar
Pilcher, N, & Richards, K. (2017). Challenging the power invested in the international English language testing system (IELTS): why determining ‘English’ preparedness needs to be undertaken within the subject context. Power and Education, 9(1), 3–7.
Article
Google Scholar
Rezaee, A, & Shabani, E. (2010). Gender differential item functioning analysis of University of Tehran English Proficiency Test. Pazhuheshe- Zabanhaye Khareji, 56, 89–108.
Google Scholar
Rogers, WT, & Yang, P. (1996). Test-wiseness: Its nature and application. European Journal of Psychological Assessment, 12(3), 247–259.
Article
Google Scholar
Roussel, S, Gruson, B, Galan, JP. (2017). What types of training improve learners’ performances in second language listening comprehension? International Journal of Listening, 00, 1–14. https://doi.org/10.1080/10904018.2017.1331133.
Article
Google Scholar
Sawaki, Y (2012). Factor analysis. The Encyclopedia of Applied Linguistics. Blackwell Publishing Ltd. https://doi.org/10.1002/9781405198431.wbeal0407.
Sawaki, Y, Stricker, LJ, Oranje, AH. (2009). Factor structure of the TEOFL internet- based test. Language Testing, 26(1), 005–030.
Article
Google Scholar
Schmitt, T. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psycho-educational Assessment, 29(4). https://doi.org/10.1177/0734282911406653.
Article
Google Scholar
Schoonen, R. (2005). Generalizability of writing scores: a structural equation modeling. Language Testing, 22(1), 1–30.
Article
Google Scholar
Sireci, SG. (2017). Interview with Stephen G. Sireci on validity. Journal of Measurement and Evaluation in Education and. Psychology, 8(1), 158–168.
Google Scholar
Song, MY. (2008). Do divisible sub-skills exist in second language comprehension? A structural equation modeling approach. Language Testing, 25(4), 435–464.
Article
Google Scholar
Song, X, Cheng, L, Klinger, D. (2015). DIF investigations across groups of gender and academic background in a large scale high-stakes language test. Papers in Language Testing and Assessment, 4(1), 97–124.
Google Scholar
Su, YH, & Wang, WC. (2005). Efficiency of the Mantel, generalized Mantel-Haenzel, and logistic discriminant function analysis methods in detecting for polytomous items. Applied Measurement in Education, 18(4), 313–350. https://doi.org/10.1207/s15324818ame1804.
Article
Google Scholar
Tatsuoka, K, Linn, R, Tatsuoka, M, Yamamoto, K. (1988). Differential item functioning resulting from the use of different solution strategies. Journal of Educational Measurement, 25(4), 301–319. https://doi.org/10.1111/j.1745-3984.1988.tb00310.x.
Article
Google Scholar
Templin, J, & Henson, RA. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.
Article
Google Scholar
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199. https://doi.org/10.1007/s11336-011-9207-7.
Article
Google Scholar
de la Torre, J, & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/10.1007/BF02295640.
Article
Google Scholar
Vandergrift, L. (1997). The Cinderella of communication strategies: reception strategies in interactive listening. The Modern Language Journal, 81(4).
Article
Google Scholar
Vandergrift, L. (2006). Second language listening: listening ability or language proficiency? The Modern Language Journal, 90(1).
Article
Google Scholar
Vandergrift, L. (2007). Recent developments in second and foreign language listening comprehension research. Language Teaching, 40, 191–210.
Article
Google Scholar
Winke, P, & Lim, H. (2014). The effects of testwiseness and test-taking anxiety on L2 listening test performance: a visual (eye-tracking) and attentional investigation. IELTS Research Reports, 3, 5–6.
Google Scholar
Wright, SP. (1992). Adjusted P-values for simultaneous inference. Biometrics, 48(1), 1005–1013.
Article
Google Scholar
Zhang, W (2006). Detecting differential item functioning using the DINA model, unpublished doctoral dissertation. Greensboro: University of North Carolina.
Google Scholar
Zhang, L. (2015). Recent research into IELTS reading and listening assessment by Linda Taylor and Cyril Weir (Eds.) Language Assessment Quarterly, 12, 234–238. https://doi.org/10.1080/15434303.2014.1003218.
Article
Google Scholar
Zumbo, BD. (2003). Does item-level DIF manifest itself in scale-level analyses? Implications for translating language tests. Language Testing, 20(2), 136–147.
Article
Google Scholar
Zumbo, BD. (2007). Three generations of DIF analyses: considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233.
Article
Google Scholar