Skip to main content

A phenomenographic study on language assessment literacy: hearing from Iranian university teachers

Abstract

This study aims to disclose the Iranian university teachers’ perceptions of the fundamentals of language assessment literacy (LAL). To this aim, using purposive sampling, eighteen university teachers from two Iranian universities were invited to participate in semi-structured interviews. Their viewpoints were audio-recorded, transcribed, and subjected to a phenomenographic analysis. Findings yielded two overarching LAL domains: knowledge (e.g., having an acceptable level of digital LAL, satisfying ethical requirements, benefiting more from performance assessment, considering students’ individual differences, making assessment valid, assuring that tests are reliable, and having an acceptable level of pedagogical content knowledge) and skills (e.g., involving students in assessment, using alternative assessment methods, employing logically traditional assessment methods, informing students about test results, administering tests in standardized ways, using valid grading procedures, and bringing positive wash-back effects). After discussing the results, the study concludes with proposing a range of implications for different testing stakeholders and highlighting some avenues for further research.

Introduction

Assessment is construed as one of the most prominent facets of instructional contexts since it can have a considerable impact on the quality of teaching and hence learning. One of the branches of assessment which has gained noticeable momentum by the seminal work of Black and William (1998) is “assessment for learning.” In the succeeding years, more studies validated that assessment should be considered as an influencing factor to boost deep learning (Coombe et al., 2020; Elwood & Klenowski, 2002), raise second language (L2) learners’ motivation (Brookhart & Bronowicz, 2003), improve L2 learners’ self-concept (Black & Wiliam, 2010), and increase L2 learners’ understanding of quality assessment (Smith et al., 2013). As Malone (2013) argues, “strong, properly implemented assessment provides teachers, students, and all testing stakeholders with important information about student performance and about the extent to which learning objectives have been attained in the classroom” (p. 330). There is, indeed, a reciprocal relationship between teaching and assessment such that “assessment informs and improves teaching and vice versa” (Malone, 2013, p. 330). However, this interrelated linkage between teaching and assessment cannot be established and strengthened unless L2 teachers are equipped with sufficient assessment knowledge. This undeniable need for assessment knowledge provoked Stiggins (1991, 1995) to propose the very construct of “assessment literacy.”

In L2 education, assessment literacy is concerned with the understanding of testing stakeholders (with special focus on teachers and test-makers) of measurement principles and procedures to systematically design, appropriately implement, and fairly rate tests (Inbar-Lourie, 2008; Stiggins, 2001; Taylor, 2009). Despite the lack of common consensus on LAL concept (Fulcher, 2012), in light of Standards for Teacher Competence in Educational Assessment of Students by the American Federation of Teachers (AFT), it can be argued that an L2 teacher with a high level of LAL is specialized in “selecting and developing assessments for the classroom, administering and scoring tests, using scores to aid instruction, communicating results to stakeholders, and being aware of the inappropriate and unethical uses of tests” (, 1990, p. 74).

Given the pivotal roles that L2 teachers are expected to play in classroom assessment, such as collecting consistent and accurate information about L2 learners’ progress, using appropriately this information to modify their instructional practices, and offering effective feedback on the performance of their students, it should be assured that L2 teachers possess required LAL (Gebril, 2017; Shepard, 2000; Sultana, 2019). In other words, due to these growing responsibilities and increasing expectations, L2 teachers need the required assessment knowledge and skills to pave the way for the success of this highly complex, demanding, vital, and multifaceted process of L2 learning. The very same necessity also applies to higher education context since, as Stiggins (1995) notes, university teachers need to “know the difference between sound and unsound assessment” (p. 240).

The nature of university teachers’ perceptions of LAL in the higher education context is of central importance. Along with Brown et al. (2011), it can be argued that assessment objectives may not be met owing to “lack of teacher cooperation, knowledge, or belief in the proposed new usage of assessment” (p. 75). For both theoretical and practical reasons, it is very crucial to examine how LAL is perceived by university teachers. In this way, university authorities can see if university teachers are equipped with an acceptable level of LAL. If not so, they can take the required steps to amend the shortcomings and boost LAL by running pre-service and in-service teacher training courses. Hence, the present research explored perceptions of university teachers about LAL in the Iranian EFL higher education context to provide valuable insights for different testing stakeholders.

Review of the literature

Dimensions of LAL

Over the years, a plethora of attempts have been made by leading scholars to provide a comprehensive model of LAL. For example, Boyles (2005) presents a collection of LAL competencies that are required for foreign language teachers in the USA context. These competencies cover a range of abilities, including the ability to diagnose proper assessment methods, use various tools of assessment, analyze and interpret tests results, give answer appropriately to the test results and their meanings, and situate the assessment results properly in their teaching.

In a similar vein, a framework of fundamental competencies of LAL was suggested by Inbar-Lourie (2008). In his framework, he considers LAL “as a body of knowledge and research grounded in theory and epistemological beliefs, and connected to other bodies of knowledge in education, linguistics and applied linguistics” (p. 396) instead of taking it “as a collection of assessment tools and forms of analysis” (p. 396). The core competencies in his framework include the social role of language assessment, contemporary views about the essence of language competence, classroom and external testing practices, and the effects of language assessments on different testing stakeholders (Inbar-Lourie, 2008).

Informed by the previous works (Coombe et al., 2012; Davies, 2008; Fulcher, 2010, 2012; Inbar-Lourie, 2013; Malone, 2013; Hill & McNamara, 2011; Scarino, 2013; Taylor, 2009), Giraldo (2018) presents a comprehensive framework to illuminate different dimensions of LAL. His framework entails three central components. The first component is knowledge with three dimensions that deals with theoretical considerations, including awareness of the basic tenets of applied linguistics, awareness of theories and concepts of assessment, and awareness of assessment context. The second component is skills with five dimensions, namely instructional skills, design skills for language assessments, skills in educational measurements, and technological skills. The third component is language assessment principles that refers to familiarity with and actions toward critical issues in language testing practices. Based on this component, for example, teachers need to treat all students or test-takers with respect or make testing practices democratic and people-oriented by considering students’ voices and concerns in the test processes. It should be, nevertheless, noted that this framework suffers from two major drawbacks. Firstly, it is essentially theoretical-oriented and has not been built on empirical findings. Secondly, the components of LAL can be simply fallen into two broad categories: knowledge and skills. The components of language assessment principles can be put under the other two categories (knowledge and skills).

Research on teachers’ perceptions of LAL

There has been an increasing interest in LAL among researchers and practitioners over the last decades (Davies, 2008; Fulcher, 2010, 2012; Malone, 2013) which has led to a large body of research across different settings (Bøhn & Tsagari, 2021; Coombe et al., 2012; Homayounzadeh & Razmjoo, 2021; Janatifar & Marandi, 2018; Popham, 2009; Shahzamani & Tahririan, 2021; Stiggins, 1995; Vogt & Tsagari, 2014; Wang et al., 2006; Watmani et al., 2020; Xu & Brown, 2016; Yan & Fan, 2021; Zhou et al., 2020). Due to space limit, only some are reviewed to lay the groundwork for the current study.

In the Chinese context, Xu and Brown (2016) explored EFL teachers’ (n=891) conceptions about LAL using the adapted version of the teacher assessment language questionnaire (TALQ). They found that the participants suffer from insufficient assessment literacy levels. Furthermore, Lam (2019) investigated Hong Kong secondary school teachers’ (n=66) assessment literacy to measure L2 writing in a qualitative study. The findings revealed that although most of the participants had required assessment literacy, some could not even make a distinction between “assessment of learning” and “assessment for learning.” Exploring teacher educators’ understandings of LAL in Norway was the focus of a study carried out by Bøhn and Tsagari (2021), indicating that for the participants, LAL includes four primary competences, namely disciplinary, assessment-specific, pedagogical, and collaboration. More recently, Yan and Fan (2021) conducted a qualitative study to explore the contextual and experiential factors affecting language testers, EFL teachers, and graduate students’ LAL profile in the Chinese context. Their results disclosed that the LAL profile differed for the three groups at individual and group levels. At the individual level, LAL development processes were different for each participant even though they shared some common patterns. At the group level, the language testers and graduate students enjoyed a higher level of LAL than the EFL teachers.

In Iranian contexts, Janatifar and Marandi (2018) carried out a study to verify the components of LAL through a modified version of Fulcher’s (2012) LAL survey. The responses of high school teachers (n=280) indicated that in the EFL context of Iran, LAL includes four major components, including test design and development, large-scale standardized testing and classroom assessment, beyond-the-test aspects, and reliability and validity. In addition, the literacy assessment of EFL teachers (n=88) was compared with non-EFL teachers (n=112) in a recent study by Watmani et al. (2020). Using the teacher assessment literacy scale of Plake et al. (1993), the participants’ literacy assessment was measured. The findings were indicative of a huge lack of required literacy assessment for the participants along with the different perceptions of literacy assessment among the EFL and non-EFL teachers. Additionally, Homayounzadeh and Razmjoo (2021) conducted a study to measure LAL of instructors (n=12) and graduate and post-graduate students (n=46). They used Pastore and Andrade’s (2019) LAL framework which includes conceptual, praxiological, and socio-emotional dimensions. Their findings disclosed that there were discrepancies between the instructors’ and students’ LAL. For example, there existed differences between the two groups regarding how ethical requirements should be exercised at the socio-emotional dimension. Recently, Shahzamani and Tahririan (2021) carried out a study to gauge ESP instructors’ (n=21), content instructors’ (n=8), and ELT instructors’ (n=13) LAL to assess reading comprehension with respect to formative assessment. Their findings documented that the three groups had largely the same levels of LAL to assess reading comprehension.

As it can be implied from the above-mentioned studies, scare attention has been given to university teachers’ perceptions of the fundamentals of LAL in the Iranian higher education context. In addition, to the best knowledge of the researchers, no qualitative study has already addressed teachers’ perceptions of LAL in the EFL context of Iran. Hence, the present qualitative study aims to further our understanding of LAL’s fundamentals by investigating university teachers’ perceptions in the Iranian higher education context. The hope is that this study’s results can improve the quality of assessment practices in the classroom by making a positive shift in university teachers’ conceptualizations of LAL.

Context of the study

This study was conducted in the academic context of Iran, namely Lorestan University and Ayatollah Burojerdi University, both of which are state universities. In general, Iranian universities fall into three categories: state universities, non-governmental sectors (Azad), and distance education (Payam-e-Noor University) (Rasian, 2009). Given the fact that the present study was carried out at two state universities, their description is in order. The state universities are under the management of the Ministry of Science, Research, and Technology and dependent on the government’s funding. They, in fact, represent the main higher education system to which the students that obtain a good rank on the National Entrance Examination (Konkur) can be admitted. Studying at the state universities is nearly tuition-free for students and university teachers with a robust professional resume and outstanding professional competence can be employed at these universities. The university teachers need to be present on the university campus 4 days a week. Their working hour is forty and their job duties include research, teaching, and academic services. As university teachers become qualified as associate professors and full professors, they allocate more time to research and assign less time to teaching. Furthermore, they are fully free to choose and deliver the course materials, opt for their own method of pedagogy, and decide on the type of assessment. It should also be noted that tertiary education in Iran is divided into four levels: Associated degree (A.A.), Bachelor of Arts (B.A.), Master of Arts (M.A.), and Doctoral of Philosophy (Ph.D.). Every academic year is run in two semesters which start in September and end in June (Rasian, 2009).

Method

Research design

LAL is construed as a multi-faceted and complex phenomenon which could be interpreted differently by various individual in diverse contexts. To capture such diversity, a phenomenographic approach was employed to collect the required data in this study. Phenomenography is an appropriate qualitative approach to disclose a group of people’s perceptions of a particular phenomenon (Cresswell & Poth, 2018). According to Marton (1986), phenomenography is considered “as having the purpose of the qualitatively different ways in which people experience, conceptualize, perceive, and understand various aspects of, and various phenomena in, the world around them” (p. 31). It aims, as Marton (1986) adds, to “discover and systematize ways of thinking that synthesize how people interpret different aspects of reality” (p. 180).

Participants

A sample of 18 university teachers specialized in applied linguistics (n=8), linguistics (n=6), and the English literature (n=4) were selected using purposive sampling. The reason for using purposive sampling was that it allows researchers to identify, select, and categorize rich data cases in qualitative paradigm (Miles & Huberman, 1994). It should be noted that the data collected process continued until data saturation occurred; that is, the data happened to be repetitive and nothing new was shared by the participants. The participants included 10 male university teachers (55.55%) and 8 female ones (44.45%). Their teaching experiences ranged from 1 to 10 (27.7%), 11 to 20 (44%), to 21 to 30 (27.7%). The participants’ demographic information is presented in Table 1.

Table 1 Demographic information of the participants

It is worthy to note that some ethical considerations were observed throughout the study. Prior to running the semi-structured interviews, the researchers informed the participants about the goals of the current study. The participants were assured that their privacy would be warranted such that their identities and their responses to the questions would be kept confidential during the semi-structured interviews. They signed an informed consent form in Persian. The researchers tried to create a positive and warm climate during the semi-structured interviews to let the participants express their perceptions with ease. Moreover, the participants were told that they could withdraw from the study whenever they wished. Finally, the researchers tried their best to present the participants’ perceptions as accurately as possible by avoiding any biases and misinterpretations.

Instruments and data collection procedures

To collect the required data, a phenomenographic semi-structured interview (Marton & Booth, 1997) was administered wherein the participants were invited to reflect upon their perceptions of LAL. Initially, an interview checklist was developed by undertaking the following steps. Firstly, a thorough investigation of the literature pertaining to LAL was accomplished (e.g., Brookhart, 2013; Eyal, 2012; Firoozi et al., 2019; Fulcher, 2012; Giraldo, 2018, 2021; Inbar-Lourie, 2013; Pastore & Andrade, 2019, Taylor, 2013; Zhou et al., 2020). In a sense, the researchers examined if there was any theory, framework, and instrument that might have already been used to measure the relevant constructs (Dörnyei, 2007). Based on the prior literature, an initial draft of the broad concepts of LAL was prepared. Afterward, these broad concepts were analyzed and broken down into smaller segments in the form of questions. Then, to appraise the validity of the checklist, two university professors specialized in second language assessment at Tehran University were invited to comment on its content and language. In accordance with their comments, the vague and problematic items were either amended or discarded from the checklist. In principle, the checklist revolved around some fundamental concepts of LAL, including involvement of students in assessment, alternative assessment methods, traditional assessment methods, digital language assessment literacy, ethical requirements, standardized test administration, reporting test results, performance assessment, students’ individual differences, grading procedures, test validity, test reliability, wash-back effects, and university teachers’ pedagogical content knowledge.

Having designed and validated the checklist, the semi-structure interviews were administered. They began with an open-ended question: What does language assessment literacy mean to you? Then, the first researcher continued the interviews with the rest of the questions in the checklist, yet when needed, he also raised additional, relevant questions to elicit more expanded and elaborate responses. The checklist, in fact, helped the researchers to run the interviews more systematically (Patton, 1990). The semi-structured interviews were run in the participants’ mother tongue (in Persian) to allow the participants express their ideas with greater ease and avoid the barriers that the second language might put up in their way. The participants’ words were recorded by a voice recorder to be meticulously transcribed and analyzed later. To increase variations among the participants’ perceptions, as noted by Durden (2018), the first researcher ran the semi-structured interviews in diverse temporal and spatial contexts. The underlying reason was not to let the time and place of the interviews affect the participants’ conceptions about LAL. It should be noted that each interview lasted around 45–60 min.

Data analysis

The collected data was analyzed according to the multi-stage approach proposed by Marton (1986). This phenomenographic approach includes six distinct stages detailed below. At the first stage, prior to analyzing the data, the researchers tried to identify the preconceived conceptions. For this purpose, the researchers meticulously went through the collected data and wrote down an initial draft of the preconceived perceptions. Then, having identified the primary conceptions about LAL, the researchers focused on the participants’ actual words and extracted the pre-categories from the participants’ actual words. They tried best to be as faithful as to the conceptions presented in the participants’ actual words. At the next stage, the researchers compared and contrasted a single participant’s words with other participants’ ones. During this stage, the researchers tried to spot the regularities embedded in the participants’ words and they created increasingly larger pre-categories. In addition, they sought to identify the connections among the diverse pre-categories. As suggested by Sjostrom and Dahlgren (2002), at the fourth stage, the pre-categories were analyzed based on three principles: frequency (i.e., the number of times a conception related to LAL was mentioned by the participants), position (i.e., determining the relevance of a given conception to the other conceptions, and verifying how much it is relevant), and pregnancy (i.e., identifying the conceptions related to LAL that received the most attention by the participants). At the fifth stage, the researchers selected the excerpts that mostly represented the categories. During this stage, the researchers took special care to stick to the intended meanings of the conceptions expressed by the participants. At the sixth stage, the researchers analyzed the pre-categories in different meaning pools. To establish the final categories, the researchers examined these meaning pools first individually and then cross-sectionally. As Åkerlind (2017) stresses, this strict multiple-stage approach was followed to guarantee the quality and transferability of the results.

It should be stressed that some measures were undertaken to enhance the reliability and validity of the findings. Regarding the findings’ reliability, the data were coded concurrently by two researchers. Their codings’ inter-reliability was 0.87 which was acceptable for the current study’s purposes. When a disagreement raised, they solved it by discussing the coding. Concerning the accuracy and the credibility of the findings, member checking strategy and respondent validation were used. As Lincoln and Guba (1985) note, member checking is a common strategy to establish findings’ credibility. To this aim, all the extracted constructs and concepts were reviewed by three analysts. Then, a copy of them were given to five of the participants to approve, add more comments, and assure their meanings. This technique allowed the respondents to check out their intentions, rectify mismatches, and include additional information. Based on their initial feedback, some modifications were made and they were asked to check the modifications. The participants confirmed that the extracted constructs and concepts represented their intended conceptions.

Results and discussion

Having analyzed the collected data, the researchers gained two overarching categories in LAL: knowledge (e.g., having an acceptable level of digital LAL, satisfying ethical requirements, benefiting more from performance assessment, considering students’ individual differences, making assessment valid, assuring that tests are reliable, and having an acceptable level of pedagogical content knowledge) and skills (e.g., involving students in assessment, using alternative assessment methods, employing logically traditional assessment methods, informing students about test results, administering tests in standardized ways, using valid grading procedures, and bringing positive wash-back effects). They are detailed below (Fig. 1).

Fig. 1
figure 1

The model of fundamentals of LAL in higher education contexts

Knowledge

Having an acceptable level of digital LAL

Digital language assessment literacy (DLAL) was the first theme catching the participants’ attention. Digital assessment literacy is defined as “the role of the teacher as an assessor in a technology-rich environment” (Eyal, 2012, p. 37). Concerning the significance of DLAL, one of the participants commented:

  1. 1.

    Despite the paramount importance of digital language assessment literacy, most university teachers are not equipped with it. For example, I myself don’t know how to use the options of a platform like Adobe Connect to administer an online test. (UT17)

Corroborating with the previous statement, another participant stated:

  1. 2.

    With the non-sustaining spread of online classes across the country, we need to make our digital assessment literacy update. To be honest, we all have difficulties assessing our students’ learning in online classes. We most often just give some questions via Whats’app and students usually answer them on an open-book format. (UT11)

The statements above show that DLAL is perceived as an integral part of LAL. In alignment with Eyal (2012), the study’s findings evidence that teachers should be equipped with the required literacy to assess students’ performance by using modern digital technologies. The study’s findings may be explained from this view that the increasing interest in using modern digital technologies in language assessment programs is associated with efficiency and availability they provide for testing stakeholders (Chapelle, 2008). Furthermore, these results may be discussed from the perspective that as there exist constant advances in technology and the primacy of computer-assisted language learning (CALL), university teachers need to be updated in this domain (Winke & Fei, 2008). Also, the findings support Eyal (2012), arguing that teachers should possess the required level of literacy to be efficient assessors in the digital era.

Satisfying ethical requirements

“Satisfying ethical requirements” was another recurring theme elicited from the data. The university teachers maintained that assessment practices should meet ethical requirements. In this regard, one of the participants remarked:

  1. 3.

    Students should perceive testing practices as fair. For this, I try to clarify the grading criteria and announce the objectives of tests, in advance. In this way, my students can become motivated to demonstrate their abilities better. (UT17)

In line with the previous statement, another participant stressed the importance of the ethical requirements in assessment practices as follows:

  1. 4.

    I never let my students’ scores be polluted by irrelevant constructs, such as cheating and gender bias. I mean that I do not allow the quality of my assessment practices to be adversely affected by test-takers’ gender. Or, I do not permit my students to cheat during test administrations to increase their scores unfairly. (UT12)

These statements documented the significance of considering ethical requirements in assessment practices. This could be explained by the point when tests are perceived as unfair by students, they may make students feel angry, confused, powerless, embarrassed, and frustrated (Buttner, 2004; Chory et al., 2017; Horan et al., 2010). Another possible explanation could be students’ perceptions of un/fairness of testing practices which may cause students to show positive and negative affective reactions, as well as behavioral responses (Rasooli et al., 2019). Additionally, in alignment with Fan et al. (2020), the study’s results can be discussed from this perspective that if assessment practices do not consider the ethical requirements, the decisions taken based on test results may not be very accurate. In turn, these incorrect decisions may adversely affect students’ life and destiny. This study’s findings show support to the previous studies, revealing that ethics in assessment practices has an immense effect on student learning (Holmgren & Bolkan, 2014), amount of students’ engagement (Berti et al., 2010), and students’ motivation to continue learning (Chory-Assad, 2002).

Benefiting more from performance assessment

Another theme which was frequently highlighted by the participants was “benefiting more from performance assessment.” In this respect, one of the university teachers noted:

  1. 5.

    Students’ activities over the course are of paramount importance. In my final evaluation, students’ class attendance, students’ class participation, students’ responses to my oral questions are important. (UT13)

Besides, one of the respondents underscored the significance of performance assessment as follows:

  1. 6.

    A major part of students’ final scores is the tasks that they do during the course. For example, every week I encourage my students to read passages, summarize them in their own words, and present them in the classroom. Or, the stage is ready for students who like to prepare a reading and present it in the classroom. (UT10)

The extracts evidence that knowing how to implement performance-based assessment is essential for university teachers. The findings can be discussed from the perspective that performance-based assessment such as doing a project may give a better picture of the processes involved in student learning (Backman, 2002). Furthermore, along with Bachman and Palmer (2010), it may be argued that performance-based assessment is more useful because it permits students to apply their knowledge and skills to solve real-life problems. This argument receives support from Brindley (1994), who considers performance assessment as a way to appraise “both knowledge and ability for use” (p. 75). For example, as the findings demonstrated, valuing students’ participation in instruction can be so facilitative for students to use their L2 competence to do a task. Additionally, in alignment with Backman (2002), it could be noted that performance assessment should be practiced since it measures complex abilities that traditional assessment practices are not able to measure easily. That is, performance assessment exposes students to more cognitively and affectively challenging tasks compared to the traditional assessment. The study’s findings are in consistent with Giraldo (2018, 2021), asserting that LAL includes knowledge of the advantages and disadvantages of implementing performance assessment.

Considering students’ individual differences

“Considering students’ individual differences” was the other theme emerged from the respondents’ words. In this regard, a university teacher remarked:

  1. 7.

    Testing practices would be effective if students’ individual differences, such as learning styles, cultural background, age, and first language be taken into account. For one thing, some students are good at answering close-ended items while other students may feel more at ease with open-ended items. (UT6)

Additionally, another participant complained about ignoring students’ individual differences in assessment practices in the classroom. He commented:

  1. 8.

    Since in Iran, students come from different cultural and ethnic backgrounds, tests should be value-free in terms of culture and ethnicity. For example, I remember that some of my students were irritated in the last semester due to some value-laden questions in the final exam. (UT14)

The participants’ words disclosed that university teachers should be aware of students’ individual differences in assessment practices. The study’s findings may be explained by this well-recognized fact that student learning is highly affected by individual characteristics such as interests, learning styles, and motivation; also, student performance is likely to be influenced by such individual characteristics during test taking (Tomlinson & Moon, 2013; Wang et al., 2006). Furthermore, in consistent with AfL (2009), the results may receive support from the view that since different students with different cognitive and affective individual differences approach tests differently, test results are likely not to be the same for all test-takers. Another possible explanation for the study’s results, as Tomlinson & Moon, 2013 and Moon (2016) note, is that to provide enough opportunities for students to demonstrate their knowledge and understanding of materials, differentiated assessment should be practiced. Moreover, the study’s findings may trace back to this view that when assessment is differentiated for varying students’ individual differences, it is more equitable (Moon, 2016; Tierney, 2013). The study’s findings accord with those of Crosthwaite et al. (2015), reporting that students with different cognitive and affective individual differences consider teachers’ assessment differently. For example, teachers’ assessment of class participation was fair for students who valued class participation while it was unfair for the students who were more interested in working alone.

Making assessment valid

The other recurrent theme emerging from the data was “making assessment valid.” To stress the significance of validity, one of the participants asserted:

  1. 9.

    I make my best effort to assure that my tests are visually appealing, their contents represent the course syllabus, and they met the principles of communicative approaches. This makes the test results positively affect students’ life and destiny. (UT15)

In addition, one the participants lamented that validity of tests usually remains unnoticed in assessment practices.

  1. 10.

    Though validity guarantees test quality, most of the testing practices lack the required validity. The possible reason for this problem is that test validation is highly complex, demanding, and, of course, expensive for teachers. (UT18)

The statements above highlight the significance of validity in assessment practices. One possible reason for the study’s findings is that if requirements of test validation are not satisfied, tests cannot measure what they are intended to measure (Fulcher & Davidson, 2007; McNamara, 2006). Furthermore, study’s results support this long-standing view that test validation incorporates three essential elements, including content, construct, and criterion (Bachman & Palmer, 2010; Hamp-Lyons & Lynch, 1998). In alignment with Messick (1994, 1996), the study’s findings may be explained from the viewpoint that validity gives meaning to test scores; that is, validity evidence shows that test performance is closely linked to performance outside the class. Moreover, the study’s results may be explained from this view that, as McNamara (2006) asserts, validity increases the degree to which correct interpretations and decisions can be made about students’ life. Likewise, as the study’s demonstrated, despite the utmost importance of test validation, it is not widely practiced in classroom assessment. The underlying reason for this problem is that measuring test validity is complex, demanding, time-consuming, and expensive for university teachers (Bachman, 1988, Bachman, 1990). In brief, validity is the backbone of any assessment practices and should be given enough attention by university teachers.

Assuring that tests are reliable

“Assuring that tests are reliable” was the other theme that gained remarkable attention by the participants. In this respect, one of the participants stated that:

  1. 11.

    Reliability and consistency of test results should be acceptable. For this, for example, the number of items should be large enough, they need to cover all the contents of course, and they should be of different kinds from multiple-choice item to essay items. (UT11)

In congruence with the precedent statement, the participants stressed that an integral part of LAL is knowledge of basic statistics.

  1. 12.

    Having knowledge of basic statistics is important for a teacher. I mean, if we aim to have reliable tests, we should know how to estimate reliability through the available statistical formulae. (UT3)

These statements are indicative of the significance of test reliability in assessment practices. One possible reason for the study’s findings, as Bachman, 1988, Bachman, 1990) highlights, is that reliability is very essential to assure that our tests can consistently measure the intended characteristics. Moreover, the study’s results could be explained from this perspective that the results of reliable tests are of great value in terms of time-saving for university teachers. That is, for busy university teachers, when a test enjoys a high level of reliability, it can be assumed that test results have not been adversely affected by students’ temporary psychological and physical state, environmental factors, test forms, and multiple raters (Bachman & Palmer, 2010; Fulcher & Davidson, 2007). Additionally, in alignment with McNamara (2006), the study’s results may be discussed from this respect that reliable tests generate dependable, repeatable, and consistent information about students’ abilities. In consequence, this information makes way for more meaningful interpretations of test results and correct decisions about students’ life and destiny.

Having an acceptable level of pedagogical content knowledge

The last theme in the domain of knowledge was “having an acceptable level of pedagogical content knowledge.” Pedagogical content knowledge, as defined by Rodabaugh (1994), is “a teacher’s internalized procedural knowledge of how particular topics, problems and issues can be organized, adapted and presented to learners with diverse interests and capabilities” (p. 12). Concerning the importance of this factor, one of the participants asserted:

  1. 13.

    Though it seems irrelevant, pedagogical content knowledge is an integral part of LAL, I believe. My reason for this is that when teaching is not done effectively, the testing practices lose their meanings and effects. So, familiarity with the basics of applied linguistics is quite essential for university teachers. (UT11)

This quote indicated that university teachers should be equipped with sufficient pedagogical content knowledge. The study’s findings may be explained from this respect that by having an acceptable level of pedagogical content knowledge, university teachers know teaching methodologies, subject knowledge, and how to adapt them to learning purposes. As this is done well, testing practices can be found meaningful by students (Rasooli et al., 2018). Additionally, in alignment with Rasooli et al. (2018), it could be argued that when university teachers are not equipped with the required pedagogical content knowledge, they are likely to fail to explain and deliver the learning contents adequately. This, in turn, adversely affects students’ opportunities to learn and show their abilities. The study’s findings are in accordance with the previous studies (Chory, 2007; Lankiewicz, 2014; Rodabaugh, 1994), reporting that students’ perception of un/fairness of assessment practices is highly affected by pedagogical content knowledge of teachers. Likewise, in line with the study’s findings, Bempechat et al. (2013) found that students regarded teachers’ testing practices as unfair and ineffective due to the failure of their teachers to illuminate the assignments in-depth, allot time to address students’ confusions, and cover the lessons patiently.

Skills

Involving students in assessment

The first theme related to Skills domain was “involving students in assessment” (Fig. 1). As its name suggests, the participants maintained that university teachers should engage students in testing practices. In support of this, one of the participants noted:

  1. 14.

    When I involve my students in testing practices, I feel that they become more motivated to be in class. When their voices are heard, they feel more self-confident and their self-efficacy increases. (UT2)

In accordance with the precedent statement, another participant approached the issue as follows:

  1. 15.

    Well, we need to incorporate our students in our assessment practices, since this makes them monitor and self-regulate their learning and get a better picture of their actual abilities. (UT5)

The above statements disclosed that students should be engaged in assessment practices. One possible reason for the study’s findings, as Nicol and Macfarlane (2006) note, could lie in the point that when students are involved in assessing their abilities, their self-regulation competence increases. According to Zimmerman and Schunk (2011), self-regulated individuals are capable of monitoring, controlling, and regulating their own cognitions, affect, and behaviors; thus, they are in a better position to reach their goals. Additionally, along with Asghar (2010), the study’s results may be explained from this view that involvement of students in testing practices may positively affect their motivation and self-efficacy by helping students believe that they have the required abilities to do testing tasks. Furthermore, the study’s results may be justified from this respect that students may control their anxiety (Hayes & Embretson, 2013) and disturbing emotions better (Tapia & Panadero, 2010) in a testing context where they see themselves as an integral part of it. The study’s findings provide support to those of Pintrich and De Groot (1990), indicating that involving students in assessment practices brings about noticeable positive benefits for student learning and achievement. Likewise, the study’s results show support to the previous studies (Sivan, 2000; Tillema et al., 2011), reporting that an effective strategy to develop quality assessment is incorporating students’ voices and concerns in the design, implementation, and scoring of tests.

Using alternative assessment methods

“Using alternative assessment methods” was another theme that gained noticeable attention by the respondents. In this respect, one of the participants expressed disdain with not using alternative assessment, such as self-assessment, peer-assessment, and portfolio assessment in the Iranian higher education contexts. His exact words are reported below:

  1. 16.

    Though using alternative assessment methods may be very advantageous, they are rarely exercised by university teachers. This reservation may be due to the implementation difficulties with such assessment methods. For example, administering peer-assessment is really time-consuming and using portfolio assessment demands diverse skills. (UT7)

Additionally, the participants pinpointed when university teachers are skilled enough to implement alternative assessment methods, their assessment practices are more advantageous for student learning. For this, a participant commented:

  1. 17.

    Well, sometimes I try to use alternative assessment methods such as self-assessment. I’ve found that self-assessment helps students reflect upon their performance. And based on their reflection, they can self-regulate better their future learning by rectifying their errors. (UT1)

The university teachers’ words represented that LAL encompasses the required skills to implement alternative assessment methods. One possible explanation for the findings may be associated with the increasing interest in learner-centered pedagogy opposed to teacher-centered pedagogy over the last decades (Aryadoust, 2016; Topping, 2009). In this regard, the alternative assessments, such as peer-assessment and self-assessment have been introduced and practiced as viable alternatives to consider assessment for learning (Rosman et al., 2015). Moreover, in alignment with Black and William (1998), the study’s results may be explained from this perspective that alternative assessment methods can generate valuable information that can be used to offer feedback tailored to students’ needs and interests. This, in turn, may promote student learning. Also, the study’s findings can be discussed more with reference to self-regulation (Pintrich, 2000). Alternative assessment methods, most specially self-assessment, as Andrade and Brookhart (2014) stress, make students’ self-regulation competence increase. As a consequence, the students’ awareness on tasks’ goals would be raised and they can check their progress toward them. The study’s findings are partially in consistent with those of Heidi and Du (2007), revealing that their participants acknowledged the benefits of self-assessment in helping them find out the assignments’ expectations, identify their weaknesses, and plan to rectify them. Additionally, the study’s findings lend credence to the studies done by Black and William (1998). In brief, their results revealed that owing to the implementation of peer-assessment, the learning quality increased and both low-achiever and high-achiever students could benefit from the instructions.

Employing logically traditional assessment methods

The other theme extracted from the collected data was “employing logically traditional assessment methods.” The participants pinpointed that traditional assessment methods should not be frowned upon completely when it comes classroom assessment.

  1. 18.

    It is not fair to say that traditional assessment methods should be completely discarded. For example, one of their biggest advantages is their practicality. I can implement them in diverse contexts for different purposes. (UT5)

Furthermore, another university teacher confirmed the previous perception, maintaining that:

  1. 19.

    This is a reality that most university teachers use traditional assessment methods, such as multiple-choice, true-false, matching, and gap-fill questions. Well, one underlying reason for such extensive interest is that they can easily compare students’ performance. (UT4)

As the participants’ words evidence, the university teachers stressed that there are some undeniable advantages with traditional assessment methods if used appropriately. Easy, fast, and economical design, implementation, and grading seem to be the reasons for such strong advocate of the traditional assessment methods (Bachman & Palmer, 2010; Bailey, 1998). Another possible reason of this may be related to the easy analysis of their results. That is, as Bachman, 1990) argues, teachers have fewer difficulties analyzing and comparing students’ scores over time and across a large, diverse group of students. Another possible explanation for the findings, as Dikli (2003) points out, is that traditional assessment methods are easy to score and let teachers determine which areas their students have excelled in and which they are struggling with. In light of the findings, as Bailey (1998) and Dikli (2003) discuss, the tendency of the university teachers toward traditional assessment methods lies in the fact that they “look like” tests, and hence, there is less resistance against them on the part of students.

Informing test-takers about test results

“Informing test-takers about test results” was another recurring theme extracted from the participants’ responses. To confirm the significance of informing test-takers about test results, one of the participants noted:

  1. 20.

    I don’t know why my colleagues do not often turn back students’ test sheets with enough feedback. As I do it, not only do my students understand that tests are important but also they see their mistakes and can modify their inter-language systems. (UT10)

Another relevant issue related to this theme was keeping students’ scores confidential. In this regard, one of the university teachers quoted:

  1. 21.

    I think it is really unethical to announce students’ scores before the class. This may damage self-efficacy and self-confidence of the students who have taken a low score. In consequence, they may lose their motivation to continue learning. (UT12)

As it can be implied from the participants’ responses, university students need to be informed about tests’ results and receive enough feedback about their performance, and their scores should be kept confidential. One possible explanation for the study’s findings is that a large segment of teachers’ career duties is to extract, diagnose, and offer appropriate feedback to student learning in a constant, ongoing, communicative manner (Cowie & Bell, 1999). What is more, as Moss and Brookhart (2009) highlight, the feedback given to students’ errors opens valuable opportunities for the students to modify or consolidate their learning. Furthermore, along with Tunstall and Gipps (1996), the study’s results can be explained from this respect that keeping students informed about test results can increase their motivation. That is, when students receive positive feedback ability from their teachers and see their own scores, they may feel a sense of achievement and increase their future effort. Moreover, the study’s results can be justified from this view that fair assessment requires keeping students’ scores confidential (Rasooli et al., 2018), since publicizing students’ scores may hurt the self-efficacy of those students who have not got high scores (Green et al., 2007). The study’s findings are in accordance with the previous studies (Carless, 2006; Lizzio & Wilson, 2008). They reported that the presence of effective feedback was conceived as an overarching dimension of quality and fair testing practices. To close, university teachers with high level of LAL are aware of the value of informing students of their own test results but not publicizing them.

Administrating tests in standardized ways

“Administering tests in standardized ways” was another theme that was repeatedly highlighted by the participants.

  1. 22.

    I try my best to administer tests in a standardized way. For example, prior to running tests, I consult the date with my students, I explain the staff they need to bring in test sessions, I clearly determine the location wherein tests will be run, and I inform them about the time duration of tests. (UT13)

Another respondent also expressed disdain for the lack of standardized administrations in the end-of-semester exams. She remarked:

  1. 23.

    Test administrations at universities are rarely standardized. For example, the test setting is usually noisy, proctors are not well trained to treat appropriately test-takers, there is not enough accommodation for students who are left-handed or disabled, and there is no break during the test administration. (UT8)

The statements above indicate that a major part of LAL is related to the required skills to administer tests in standardized ways. In alignment with DeLuca et al. (2016) and Abedi et al. (2004), the study’s findings are discussed from this view that standardized administrations aim to present students with equal opportunities to show what they have learned and what they can do with it. That is, there should be fairness and equity with test administrations and accommodations so that no a single group of students receive special advantages. As Haladyna et al. (1991) assert, another reason of the study’s findings is that one of the major sources of students’ score pollution is the actual administration of tests. For example, when students are interfered with responses, they are ridiculed or scorned by proctors and other test-takers on test days, or they get anxious and tired due to the uncomfortable test settings, their scores may be adversely polluted. The study’s findings provide support to Abedi et al. (2004) who found that students may feel a sense of unfairness when test administration is not standardized by advantaging some students. Additionally, the study’s findings are in keeping with those of Siegel (2007, 2014), showing that while accommodation and standardized test administrations, on the surface, are considered as an integral part of LAL, the practices need more attention to respond differentially to students’ special learning needs.

Using valid grading procedures

“Using valid grading procedures” was another theme gaining remarkable attention by the participants. The participants stressed that the grading procedures should be transparent, consistent, and logical.

  1. 24.

    I believe that the grading criteria should be clear and fair for all students. They should know how their performance would be graded and what parts of the materials have more weigh in the final evaluation. In this way, students’ scores are got less polluted by irrelevant factors. (UT14)

Corroborating with the previous statement, another respondent was of the following opinion:

  1. 25.

    When the grading procedures are not vague and we communicate about them, students can manage their time and energy in a better way and demonstrate their abilities well. (UT4)

These remarks show that an integral part of LAL is using valid grading procedures. Grading procedures, according to Tata (1999), should be transparent, consistent, and unbiased. That is, grading procedures are perceived as fair by being consistent, using accurate information, and maintaining an impartial process. The study’s findings may be discussed from this respect that university teachers are expected to apply valid grading procedures consistently and impartially to all test-takers. This is due to the fact that by implementing valid grading procedure students’ scores would not be polluted by irrelevant constructs (Gordon & Fay, 2010). Another possible explanation for the study’s results is that by being informed about the grading procedures, students can manage their studies to get better scores. In turn, as Tata (1999) puts it, high grades offer both immediate benefits for students like intrinsic motivation and approval of parents, as well as long-term benefits like admission to graduate school and preferred employment. The study’s finding are in consistent with the previous studies (Gordon & Fay, 2010; Rasooli et al., 2018; Tata, 1999), reporting that grading procedures have a great impact on students’ perceptions of test fairness, teachers’ assessment quality, and student learning.

Bringing positive wash-back effects

The last theme highlighted by the participants was “bringing positive wash-back effect.” That is, the participants maintained that university teachers need to have the required level of LAL to implement the tests in such a way that they the tests have positive effects on test-takers’ lives and education.

  1. 26.

    An important part of LAL, I think, is the ability to design tests that can bring positive wash-back effects. For example, when they are developed based on the tenets of communicative approaches, students are encouraged to spend more time improving their communicative competence. Because they know that to pass the tests, they need to handle communicative tasks. (UT16)

In alignment with the previous statement, another respondent commented:

  1. 27.

    Given the effects of testing practices on my teaching, I usually pay particular attention to the design, administration, and grading of my tests. By making my test quality, my students know that they should put more time and energy into their studies. (UT1)

These words indicate that university teachers need to develop their tests in a way that they create positive wash-back effects. One possible reason for these findings is the reality that instruction and assessment go hand-in-hand (Bachman & Palmer, 2010). That is, along with Lantolf and Peohner (2014), if the instruction is going to be effective, it must entail assessment, and simultaneously, fair assessment is not possible without considering instruction. Additionally, coupled with Green and Johnson (2010), the study’s findings may be explained from this perspective that testing practices are administered to achieve two purposes: assessment of learning and assessment for learning. Test results, on one hand, are summative and can be used to make high stake decisions like college admission and they, on the other hand, are diagnostic and formative used to inform instruction (Fan et al., 2020). Moreover, the study’s findings are indicative of this view that, as Tsagari (2009) notes, the examination is a strong instrumental motivation for students to learn in the examination preparation. The study’s findings receive support from the previous studies (Cheng et al., 2011; Tsagari, 2009). They found that students had a tendency to select activities intended for test orientation or test-specific coaching to prepare for a particular examination. To close, as Brown (1997) points out, “if you want to change student learning then change the methods of assessment” (p. 7).

Conclusion

The present study intended to examine the fundamentals of LAL from the Iranian university teachers’ perceptions. Findings yielded two overarching LAL domains: knowledge (e.g., having an acceptable level of digital LAL, satisfying ethical requirements, benefiting more from performance assessment, considering students’ individual differences, making assessment valid, assuring that tests are reliable, and having an acceptable level of pedagogical content knowledge) and skills (e.g., involving students in assessment, using alternative assessment methods, employing logically traditional assessment methods, informing students about test results, administering tests in standardized ways, using valid grading procedures, and bringing positive wash-back effects). As the study’s findings demonstrated, it can be concluded that university teachers should be equipped with required LAL to make the way for quality education.

In light of the study’s findings, some implications are presented. The first implication is that university students’ voices should be heard and they should be given the right to question test processes. By doing so, university students feel treated fairly and empowered when they see that their voices and concerns are influential in their life and destiny (Hamid & Hoang, 2016). Another implication is that alternative assessment methods, such as peer-assessment, self-assessment, and portfolio assessments should be more practiced. The consequences are likely to be positive if they are implemented with consideration of the contextual factors and students are supported and trained how to use them (Brown & Harris, 2014; McDonald, 2013), discuss and agree on grading criteria (Brookhart, 2013), and have experience with the subject (Morrison et al., 2014). The next implication is that university teachers’ assessment can be perceived as effective, efficient, and equitable, if they accommodate students’ individual differences (e.g., interests, readiness levels, age, gender, culture, first language, learning styles) (Harris & Brown, 2013; Rasooli et al., 2018). The following implication is that university teachers should design and implement tests to create positive wash-back effects. To meet this aim, they can implement more performance-based assessment, for example, by having students participate in class discussions or do term projects. A further implication of the findings is that university officials should be aware of the central importance of LAL and take up the required steps to boost it by running pre-service and in-service teacher training programs. Indeed, the study’s findings, along with Xu and Brown (2016), call for joint attempts by university officials, teacher educators, and university teachers for making up consistent literacy assessment programs throughout university teacher professional life. The final implication is for the researchers in LAL field. They should be on the alert that LAL components may change over time as views on L2 teaching change.

While the current study’s findings provide a clear picture of the fundamentals of LAL in the Iranian higher education context, they should be interpreted in light of the limitations imposed on the study. The present study probed university teachers’ perceptions of LAL components in classroom contexts. To reach a more comprehensive conceptualization, future research can explore the university students’ conceptions about LAL components. Furthermore, while this study provided some valuable insights into the effects of pedagogical content knowledge on the efficacy of university teachers’ assessment practices, interested researchers can investigate how university teachers’ pedagogical content knowledge can affect students’ perceptions of quality assessment. In addition, in alignment with the findings, future research can empirically explore the kind and amount of wash-back to classroom assessment with the application of LAL components. Additional research is also required to identify grading criteria considered as efficient by both university teachers and students. Future studies can also research digital assessment literacy from the perceptions of both university teachers and university students. Last but not least, more research is required to explore university teachers’ and students’ perceptions of the effects of informing students about test results on their learning.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

LAL:

Language assessment literacy

CA:

Classroom assessment

DLAL:

Digital language assessment literacy

UT:

University teacher

References

Download references

Funding

No grant, fund, or other supports was received by the authors.

Author information

Authors and Affiliations

Authors

Contributions

The study was planned and implemented by Dr. Rezai, as the major author, along with the cooperation of Dr. Alibakhshi, Dr. Farokhipour, and Dr. Miri in the data collection and analysis phases as well as revising the manuscript. In the end, the four authors read and confirmed the final manuscript.

Corresponding author

Correspondence to Afsheen Rezai.

Ethics declarations

Ethics approval and consent to participate

The authors affirmed that during formulating and completing the current study, all ethical requirements were considered and satisfied.

Competing interests

There is no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Interview checklist

1. What does language assessment literacy mean to you?

2. In your opinion, who is a literate assessment university teacher?

3. Do you think that you should involve your students in CA?

4. Do you think that ethical requirements should be met in CA?

5. What is your opinion about traditional assessment methods such as multiple-choice, true-false, and so on?

6. What is the importance of making tests valid in CA?

7. Why should university teachers make their tests as reliable as possible?

8. Do you think that a university teacher should make the grading valid?

9. Can we consider the skills of administering tests in a standardized way as a part of LAL?

10. Do you agree that bringing a positive wash-back effect is a component of LAL?

11. Can we say that pedagogical content knowledge is an integral part of LAL?

12. Why should we regard digital language assessment literacy as a part of LAL?

13. Should a university teacher take the individual difference of test-taker into account?

14. Do you agree that a university teacher should inform the students about test results?

15. Why should accept that using alternative assessment methods is an essential apart of LAL?

16. Can we say that literate assessment university teachers use more performance assessment?

17. Is there anything else that you may want to add?

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rezai, A., Alibakhshi, G., Farokhipour, S. et al. A phenomenographic study on language assessment literacy: hearing from Iranian university teachers. Lang Test Asia 11, 26 (2021). https://doi.org/10.1186/s40468-021-00142-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40468-021-00142-5

Keywords