On the construct and perceptual validity measures of L1-based vs. L2-based elicitation as a measure of L2 classroom performance assessment

Mohammadi Darabad, Ali; Abbasian, Gholam-Reza; Mowlaie, Bahram; Rostami Abusaeedi, Ali-Asghar

doi:10.1186/s40468-023-00218-4

Research
Open access
Published: 22 February 2023

On the construct and perceptual validity measures of L1-based vs. L2-based elicitation as a measure of L2 classroom performance assessment

Ali Mohammadi Darabad¹,
Gholam-Reza Abbasian ORCID: orcid.org/0000-0003-1507-1736²,
Bahram Mowlaie¹ &
…
Ali-Asghar Rostami Abusaeedi³

Language Testing in Asia volume 13, Article number: 13 (2023) Cite this article

1427 Accesses
Metrics details

Abstract

This study aimed at investigating the English as a foreign language (EFL) learners’ perceptions of L1-based and L2-based elicitations in the English classroom employing an explanatory sequential mixed-method design. Ninety-seven Iranian intermediate EFL learners of English have been selected from Islamic Azad University (Science and Research Branch) in Tehran Province using a convenient sampling method. Of these, in the qualitative phase, 15 individuals were selected through a convenience sampling method as the focus group (N = 15; n = 8 for the L1 group and n = 7 for the L2 group). In the quantitative phase, 90 intermediate EFL learners were selected. The selected participants’ L2 performances were assessed through L1-based and L2-based elicitation techniques. They completed two validated researcher-made questionnaires to capture their perceptions of the elicitation techniques. Accordingly, five separate exploratory factor analyses were run to investigate the underlying constructs of the five components of the L1-based and L2-based perception questionnaires, the results of which showed that the correlation matrices were not singular and there were perfect correlations among all variables of L1-based and L2-based perception questionnaires. The findings show that the majority of respondents prefer to use their L2 in speaking classes and believe that L2 should be the prior language in general speaking classes. More than two-thirds of the respondents prefer to use L2 when they want to communicate with each other inside and outside of the classroom and prefer to use L2 in doing their assignments or performing orally in class. More than half of the respondents prefer to use L2 in assessment sessions.

Introduction

The use of the student’s first language (L1) in the EFL classroom has long been debated on the grounds of contradictory findings. The existing contradictory findings revolve around the context-related nature of language learning. Some authors advocate the use of L1 in English classes (e.g., Auerbach, 1993; Schweers, 1999), while others negate the role of L1 use. Some other scholars (e.g., Ellis, 1984) hold a moderate position that too much L1 use could be a barrier to both language learning and learners against L2 exposure.

In the Iranian EFL setting, the only experience of using the target language happens in the classroom. Consequently, it is common for EFL teachers to use the students’ mother tongue as a tool to convey the message as a means of interaction. Although many researchers (e.g., Chiou, 2014; Liu & Zeng, 2015; Shin et al. 2019) believe that teaching through target L2 gives better results, the issue requires further research in general and in the Iranian EFL setting in particular.

Different aspects of L1 use have been investigated through many empirical studies in EFL education. For example, Mohammadi Darabad et al. (2021) investigated the L1-based elicitation technique concerning L2 performance assessment, seeking the validity measures of transfer. Their findings showed that the L1-based elicitation technique was a valid measure of L2 performance assessment. However, validity is preferred to be approached from a unitary perspective (Messick, 1989). In this sense, a validity claim is not supported solely by a single source of evidence. Considering construct validity as the central and pivotal constituent in validation studies, five sources of evidence are still required to make valid use of the interpretation and use of the score of a testing instrument. These sources include evidence based on test content, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, empirical or criterion-related validity, and consequences. Based on this, the stakeholders’ judgment can also be taken as a source of inference. This is one of the many points left so far intact in the literature in general and in the Iranian EFL context in particular. To spot this gap, this study aims at investigating the perceptual validity of the L1-based elicitation technique as a teaching device in an L2 classroom from the learners’ perspectives. Accordingly, an attempt has been made to explore the students’ perceptions of L1-based and L2-based elicitations as the construct of the questionnaires using various statistical techniques, including factor analysis. The validated questionnaires then were conducted to examine the participants’ perceptions of the L1-based and L2-based elicitations concerning their L2 performance assessment.

Literature review

The discussion about whether the first language should be involved or not in language classrooms has been an argumentative topic for so long. This discussion revolves around the L1 use (Cook, 2008; Mohebbi & Alavi, 2014), the role of learners’ first language in the foreign language classroom (Al Sharaeai, 2012; Littlewood & Shufang, 2022; Rivers, 2011; Tsagari & Diakou, 2015; Turnbull, 1999), functions of L1 in L2 learning (Molway et al., 2022), language transfer and skills, i.e., speaking, listening, reading, and writing (Perkins & Zhang, 2022), and language teachers’ and learners’ perceptions of L1 use in an L2 classroom context, elicitation techniques, assessment mechanisms, and validity issues concerning the assessment mechanism and perceptions. It seems that these contentious subjects will be at the center of the educational agenda in the future. Regarding the use of L1 and focusing on various areas of language features and skills, i.e., morphosyntax, vocabulary, phonology, reading, and writing, many researchers (e.g., Eckman, 2014; Lardiere, 2014; McManus, 2022; Perkins & Jiang, 2019; Perkins & Zhang, 2022; Polio, 2014; Ringbom & Jarvis, 2009) concluded that encouraging learners to attend to similarities and differences between L1 and L2 is an important instructional strategy for improving L2 abilities. They also concluded that learners depend on perceiving L1-L2 similarities between individual items and their functional equivalences between two underlying grammatical systems. Therefore, similarities and differences between L1 and L2 play an important role.

Emphasizing the importance of collecting the stakeholders’ perspectives for validation (e.g., Bachman & Palmer, 2010; Ferrando, 2013; Gokturk Saglam & Tsagari, 2022; Im et al., 2019; Macaro & Lee, 2013; Nguyen & Habok, 2021; Shohamy, 2001; Yao, 2011), some attitudinal studies, though sporadically, have been reviewed. Some researchers (e.g., Afzal, 2012; Alshammari, 2011; Khati, 2011; Saito & Ebsworth, 2004; Sharma, 2006) concluded that many teachers and learners prefer to employ L1 to explain new vocabulary, concepts, and grammar rules, to give instructions for activities, to understand the subject, and to communicate with the teacher or other students. In some other local studies (e.g., Mahmoudi & Amirkhiz, 2011; Nazary, 2008), students believed that the dominant language in English classrooms should be English rather than the students’ L1, and the students were reluctant to use their L1 in English class.

In another study, Molway et al. (2022) focused on teachers’ reported practices regarding the amounts and functions of L1 and L2 use in the L2 classroom and explored some of the many possible factors shaping those practices, including experiences during pre-service training, number of years in service and national context in terms of language education policy, and the social value of the L2s. Their main finding is that teachers’ reported practices vary significantly by location (whether they were teaching in Spain or in England). Teachers of English in Spain report more frequent use of the L2 across all classroom language functions investigated (i.e., grammar teaching, giving details about tests and exams, and teaching cultural content). Despite the existing body of research regarding the use of L1 for grammar teaching in both L1 Spanish and L1 English contexts, over 50% of teachers based in Spain reported a predominant use of L2 for this function. Regarding details about tests and exams, the teachers in England made use of L1 to ensure student comprehension of critical aspects of examination techniques. The teachers also use L1 for administrative matters to save valuable classroom time for more meaningful learning opportunities. Finally, regarding teaching cultural content, teachers in England reported very high levels of L1 use for this purpose.

Concerning the elicitation and assessment mechanisms, Gass (2018) believes that elicitation methods and the types of eliciting data are selected by the researchers to understand how the languages are learned. Two common elicitation tasks, namely judgments and elicited imitation, were reviewed by Gass (2018). The effectiveness of four types of judgment tasks, including magnitude estimation, grammaticality judgments, truth-value judgments, and preference judgments, was investigated by Plonsky et al. (2019). Their findings supported the effectiveness of these elicitation types on L2 assessment. Investigating the ways of language proficiency measurement, Wu and Ortega (2013) and Gaillard and Tremblay (2016) emphasized the use of elicited imitation as a worthy measure of general proficiency. This claim was also advocated by Yan et al.’s (2016) meta-analysis study in which 21 studies were analyzed. According to the obtained results, elicited imitation was a discrimination factor across proficiency levels.

Concerning validity issues, Stansfield and Kenyon (1992) examined the concurrent validity of semi-direct and direct tests in a number of languages, the results of which demonstrated high correlations between the two types of tests. The use of semi-direct tests was recommended as a valid and practical substitute for direct tests. Wigglesworth and O’Loughlin (1993) investigated the comparability of two versions of an oral interaction test, i.e., a tape-based (semi-direct) version, and a live interview (direct) version. They showed that the two versions were highly comparable.

Using quantitative and qualitative procedures, Shohamy (1994) explored the validity of semi-direct versus direct tests. The correlational analyses revealed high concurrent validity of the two tests (Shohamy & Stansfield, 1991; Shohamy et al., 1989); however, the tests differed in a number of aspects. Qualitative analyses specified that the differences were in the topics, number of functions employed in the elicitation tasks, and the communicative strategies, i.e., more paraphrasing and self-correction on the semi-direct test and more shifts to L1 resources on the direct test.

These studies and some more have shown that there has been a judicious quantity of research on L1 use in English classrooms, and a majority of them have addressed the teachers’ ideas and attitudes toward the issue; however, less research has focused on the student’s perceptions of the L1 use in these settings. The issue has been investigated more in English as a second language (ESL) settings than in EFL contexts. Much more importantly, studies like what were reported are mainly addressing attitudes toward L1 use, but they have never approached it in terms of validity perspectives. What adds to the novelty of this study is, first, to approach L1 as a valid device and, second, to address and reinforce its validity in the light of the unitary concept. The theoretical notions of validity shifted from many distinct types to a unified notion with multiple features which closely resemble that of Messick (1989). Hubley and Zumbo (1996) believed that despite theoretical advances, actual validation practice still tends to be based on the classical notion of validity, creating a persistent gap between psychometric theory and research practice. According to Messick (1998), the disjunction between validation practice and validity theory would be solved by technology-based assessment, and both theory and practice would be unified. Messick recommended that all components of validity be incorporated under the concept of construct validity.

This study along with others (e.g., Nakatsuhara & Jaiyote, 2015; Nakatsuhara et al., 2018; Zhou, 2016) employed Messick’s framework to lead validation practice. This framework supported our findings in the validation of performance assessment and filled the existing gap between validation practice and validity theory.

Method

Participants

Ninety-seven Iranian intermediate EFL learners (18–25 years old) were selected conveniently from Islamic Azad University (Science and Research Branch) in Tehran province. In a bid to compensate for the sampling issue on one hand and the relative sample size limitation on the other, the mixed-methods research design, in the form of two complementary phases, was followed. So, the integration of the qual-quant trend added to the depth and vigor of the findings. In the quantitative phase, 90 participants were selected based on their performance in Cambridge Preliminary English Test (PET), 2015. The selected participants’ second language (L2) performance was assessed through L1-based and L2-based elicitations techniques. They completed two researcher-made perceptions questionnaires. For the qualitative phase, 15 intermediate English language learners as the focus group of the study were selected through a convenience sampling method (N = 15: n = 8 for the L1 group and n = 7 for the L2 group). According to Denscombe (2007, p. 115), a “focus group consists of a small group of people, usually between six and nine in number.” Believing that a focus-group interview provides a setting for the relatively homogeneous group to reflect on the questions asked by the interviewer, the selected participants were interviewed.

Instrumentation

Two researcher-made Likert-scale perception questionnaires, constructed based on the results obtained from focus-group interviews and literature review, were employed to capture the EFL learners’ perceptions of L1-based and L2-based elicitation techniques. To this end, after reviewing the existing literature and following the procedures for focus-group interviews offered by Elliot (2005), an attempt was made to execute such an interview for the qualitative phase of the study. Therefore, a focus group was first defined, and the questions were formulated. Then, the focus groups’ members comprising 15 individuals (n = 8 for the L1 group; n = 7 for the L2 group) were carefully recruited. A homogeneous group of participants comprised the focus group. The following criteria have been considered in the selection of individual groups: age, gender, power, and language proficiency (Elliot, 2005).

Regarding the number of questions posed in the discussion sessions, Elliot considers 12 as a maximum, 10 to be better, and 8 as an ideal number of questions. Reviewing the existing literature on the issues, 8 general questions were developed and raised during the discussion sessions. The participants in the focus groups were not aware of the contents of the questions they were being asked. Some criteria have been set to make sure that the participants have no problems in understanding and responding to the questions posed. The questions were short, to the point, one dimensional, unambiguous, worded precisely, open ended, and non-threatening or embarrassing (Elliot, 2005). The questions are classified into three types: (1) engagement questions, (2) exploration questions, and (3) exit questions. Engagement questions are used to introduce the topic to participants and make them comfortable with the topic of discussion. Exploration questions take the participants to the main part and body of the discussion. Exit questions are used to check if anything was missed in the discussion. Examples are as follows:

Engagement question: Do you ever use your mother tongue when you are doing a task in English (e.g., retelling a story, reading comprehension, listening, speaking, writing a short story)?

Exploration question: What do you do when you cannot understand what your teacher says in English?

Exit question: “Is there anything else you would like to say about your preferences toward L1- and/or L2-based elicitation?”

Considering the abovementioned considerations and criteria, the researcher (moderator) and his assistant conducted the focus groups. The discussion was facilitated by the moderator, and the notes were taken by the assistant. The sessions were also recorded for further analysis. Finally, the focus-group interviews were conducted in two 90-min sessions (see Appendix for the interview questions and protocols). Following the constant comparison analysis technique, emergent-systematic focus-group design (Onwuegbuzie et al., 2009), the data obtained from focus groups were analyzed. The extracted codes (30 items), categories (5 items), and themes (2 items) were employed to construct the questionnaires. Accordingly, two 30-item Likert-scale questionnaires (see Appendix) were constructed and piloted, the reliability and validity of which were confirmed using Cronbach’s alpha and face and content validity by the experts, respectively.

Data collection procedure

Those participants whose scores lay between 140 and 170, based on the Cambridge English Language Assessment rating scale, were identified as qualified individuals to participate in the study. According to the purposes, five kinds of elicitation techniques were employed, namely: (1) asking questions, (2) asking questions combined with pictures, (3) asking questions combined with activities, (4) asking questions combined with texts and dialogues, and (5) asking questions combined with nonverbal language. Defining, synonyms, paraphrasing, forgetting, and asking multiple questions via the participants’ L1 (Farsi) were focused. Each technique followed three steps, including opening, questioning, and main activity. In the opening step, the teacher opened the teaching–learning process. In the questioning step, the teacher asked a simple question that was related to the topic of descriptive text, for example, about animals, to elicit the students to talk. In the main activity step, the teacher explained a descriptive text. While the students were retelling a story, which referred to a similar situation and experience of the learners (as a task), the teacher provided them with the definitions of target materials in their L1 (Farsi) and L2 (English), e.g., words, and asked them to come up with the matching word in English. To add a natural taste to the elicitation process, the teacher would pretend to forget the word, the grammatical structure, pronunciation, etc. so that grounds could be intentionally paved for the students to supply the target answer. The teacher would ask questions in Farsi (L1 group) whose answers would require the students to use the target linguistic feature. Some grammar-eliciting techniques such as picture description, conversations, readings, retelling stories, and examples were employed, and the required explanations were also provided by shifting to the learners’ L1. Headlines, words, pictures, proverbs, personal notes, free writing, etc. were also provided as a tool for eliciting the learners’ ideas. As a formative performance assessment, three similar speaking tests were conducted with a 1-week interval between the tests. Finally, the piloted questionnaires were distributed among the participants and completed. Before the tests, the necessary explanations about the tests and the objectives had been given to the participants.

Data analysis

Researchers working on focus groups do not have a fixed and single framework for analyzing the qualitative data obtained from the focus-group discussion session. But some qualitative data analysis techniques have been identified as appropriate for analyzing these types of data. One of the frameworks suggested by Leech and Onwuegbuzie (2008) encompasses several analytical techniques, including constant comparison analysis, classical content analysis, keywords in context, and discourse analysis. Following the constant comparison analysis technique, emergent-systematic focus-group design (Onwuegbuzie et al., 2009), the data obtained from the focus groups were analyzed.

Table 1 summarizes the extracted codes, categories, and themes. Initially, in the open coding phase, 30 codes were extracted from the respondents’ statements during the interview sessions. These codes were, then, grouped into 5 categories in axial coding stage. Finally, these categories were grouped into 2 themes (L1/L2 use for learning purposes and L1/L2 use for testing purposes) that expressed the content of each of the groups. In doing so, the researchers used two groups to assess if the themes that emerged from one group (L1-based group) also emerged from another group (L2-based group, n = 7). This assisted the researchers in reaching data saturation.

Table 1 The extracted codes, categories, and themes for the role of L1 and L2 using the constant comparison analysis technique

On the construct and perceptual validity measures of L1-based vs. L2-based elicitation as a measure of L2 classroom performance assessment

Abstract

Introduction

Literature review

Method

Participants

Instrumentation

Data collection procedure

Data analysis

Results

Discussion

Conclusion and implications

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords