The comparative impacts of portfolio-based assessment, self-assessment, and scaffolded peer assessment on reading comprehension, vocabulary learning, and grammatical accuracy: insights from working memory capacity

Hammad Al-Rashidi, Anwar; Vadivel, Balachandran; Ramadan Khalil, Nawroz; Basim, Nirvana

doi:10.1186/s40468-023-00237-1

Research
Open access
Published: 05 May 2023

The comparative impacts of portfolio-based assessment, self-assessment, and scaffolded peer assessment on reading comprehension, vocabulary learning, and grammatical accuracy: insights from working memory capacity

Language Testing in Asia volume 13, Article number: 24 (2023) Cite this article

3307 Accesses
Metrics details

Abstract

This research was carried out to comparatively study the impacts of portfolio-based assessment, self-assessment, and scaffolded peer assessment on reading comprehension, vocabulary learning, and grammatical accuracy of Afghan English as a foreign language learners. To accomplish this, 172 learners enrolled at a language institute, through an Oxford Quick Placement Test (OQPT), 120 lower-intermediate learners and 5 higher-intermediate learners were selected. These selected participants were assigned into four groups: portfolio group (N = 30), self-assessment group (N = 30), scaffolded peer assessment group (N = 35), and control group (N = 30). The five higher-intermediate learners were injected into the scaffolded peer assessment group to function as the mediators, hence more participants in the group. After selecting the participants, through a reading-span test developed by Shahnazari (2013), learners’ working memory (WM) span was determined. It was discovered that 16 subjects in the portfolio condition, 14 self-assessment learners, 18 participants in the peer assessment group, and 13 participants in the control condition had high WM, while the rest of the participants had low WM. Thereafter, through validated instructor-made tests, subjects’ reading comprehension, knowledge of targeted lexical items, and grammatical accuracy at baseline were determined. Then, a ten-session treatment began. After the treatment, a follow-up post-test was administered. The results of three two-way between-group MANOVA disclosed that all three experimental conditions outstripped the comparison group on the second occasion and that high WM learners outstripped low WM learners (with a large effect size on reading comprehension test (partial eta squared = .365), a moderate effect size on the same test among high vs. low WM learners (partial eta squared = .095), a large effect size on vocabulary post-test (partial eta squared = .465), a moderate effect size on the same test among high vs. low WM learners (partial eta squared = .083), a large effect size on grammar test (partial eta squared = .500), and a moderate effect size on the same test among high vs. low WM learners (partial eta squared = .072)). The results further revealed that subjects in the scaffolded peer assessment group outstripped subjects in other experimental conditions, but the difference was non-significant. Additionally, the difference between the portfolio assessment and self-assessment group was not statistically significant. The implications of the study are reported.

Introduction

There has always been a passion among L2 practitioners to improve their learners’ language proficiency. To this end, they have always tried and tested different instructional techniques and instruments, so they might facilitate language learning among their learners. In ESL/EFL environments, alternative assessment techniques are frequently utilized to enhance learning. According to Hargreaves et al. (2002), alternative assessment is intended to create strong, productive learning for students themselves in contrast to standardized testing. As examples of alternate evaluation methods, they give conferences, observational checklists, self- or peer assessments, diaries, and learning logs in addition to portfolios. Portfolio evaluation, on the other hand, is undoubtedly the most well-known and significant example of an alternate assessment approach.

A topic that has gained some research attention in the ESL literature is portfolio assessment (Lam, 2017), a well-studied assessment-as-learning strategy (Alam, 2019; Lam, 2020). The portfolio is a planned pupil work collection that demonstrates the student’s efforts, development, and achievements in one or more areas of the curriculum, according to Paulson et al. (1991). Portfolio evaluation, which is typically utilized in writing classes, has been shown to support writing improvement self-assessment, and peer assessment (Barrot, 2016; Lam, 2017). In a similar vein, it has a favorable impact on students’ autonomy, motivation, and reflective thinking (Lee, 2017; Sultana et al., 2020).

Another example of alternative assessment is self-assessment, a self-monitoring procedure training language learners how to use metacognition (Esteve et al., 2012). This implies that when students evaluate themselves, they control metacognitive processes (Takarroucht, 2021). Metacognitive processes include self-regulation abilities, metacognitive knowledge, and metacognitive experiences (Iwai, 2011). The definition of metacognitive knowledge is the understanding of task demands and approaches. The capacity to identify performance problems through reflection and problem-solving is known as metacognition (Tarricone, 2011). Executing metacognitive methods, including planning, monitoring, and assessing, is a requirement for developing self-regulation skills (Iwai, 2011). A group of higher-order processes known as metacognitive strategies are in charge of identifying performance flaws and carrying out cognitive techniques (Tarricone, 2011). Metacognitive methods are a form of self-regulation.

Sorting through the literature reveals that peer assessment is another instantiation of alternative assessment. Students can discuss their personal performance and academic requirements with their peers using the communication strategy known as peer evaluation. Peer assessment is a type of collaborative learning and formative evaluation that can be utilized in EFL/ESL courses. Peer assessment can enhance students’ production abilities by incorporating them into revisions (Zhao, 2010), make learners more interested in production (Shih, 2010), scaffold students’ production process, and enhance critical thinking (Hyland, 2000). Authors are allowed to exhaust their texts and get others’ interpretations of them (Joordens et al., 2009). Moreover, peer evaluation might promote learner autonomy (Yang et al., 2006).

In addition to what went above, individual differences are thought to be crucial in language learning and processing (Kidd et al., 2018), and they can reduce or even modify the effects of instruction (Li, 2017). They have been demonstrated to have significant explanatory value when predicting learning outcomes in second or foreign language learning (Pawlak, 2017). One such individual difference is working memory (WM). It is an attentional mechanism with a finite capacity that facilitates sophisticated cognitive processing (Cowan, 2017). WM, according to Baddeley (2017), is a system made up of storage subsystems that are in charge of temporarily storing and processing both verbal and visual-spatial information (the phonological loop and the visuospatial sketchpad, respectively); a domain-general component that is in charge of controlling and regulating attention; and an episodic buffer that acts as a link between the storage subsystems and the episodic buffer. Attention management, analogical reasoning, explicit deduction, information retrieval, and decision-making are just a few of the cognitive processes that the WM is critical to for L2 learning (Tagarelli et al., 2016), as well as the storage of metalinguistic knowledge as L2 language learners comprehend and produce it.

It is impossible to understate the role that reading comprehension plays in academic success. People’s lives are significantly impacted by learning to read (Alawajee & Almutairi, 2022). The secret to learning new things and succeeding at work is reading (Castles et al., 2018). According to Seymour (cited in Pallathadka et al., 2022), reading comprehension is the capacity to interpret information from texts. Reading comprehension is a cognitive process that involves deriving meaning from texts, according to Woolley (2011), and it strongly depends on the reader’s ability to comprehend written texts accurately and fluently.

It is undeniable that vocabulary is crucial to learning a second language (Kargar Behbahani & Kooti, 2022). According to those that have studied vocabulary, Harmer (2001) considers it to be the language’s main organ and its flesh. Furthermore, according to Mediha and Enisa (2014), vocabulary is essential to the communication of any message. Furthermore, Wilkins (1972) believes that a big vocabulary is more crucial than grammar while acquiring an L2. Consequently, learning new words is a crucial component of studying any second or foreign language.

Growing concerns about learners’ language accuracy in recent years have led to a reassertion of the importance of grammar in syllabus design and class material, even to the point of paying explicit attention to grammatical forms and rules. It has been essential for English teachers to instruct students in grammar correctly. But as Ellis (1997) emphasized, there are several pedagogical approaches available to language practitioners; the question is how to teach grammar from among them. Grammar instruction always receives significantly greater attention from English teachers at high schools than other language instruction. This is primarily because of the school final exam, which focuses primarily on grammar and uses the pass percentage of the pupils as a measure of the effectiveness of the teachers (Torkabad & Fazilatfar 2014).

Being an English teacher working for Afghanistan’s ministry of education, I frequently observe my students’ less-than-satisfactory performance on language tests, both teacher-made exams and high-stakes standardized ones. One contributing factor to this low performance on language tests is the fact that enough time is not dedicated to language instruction in the government-initiated curriculum. Therefore, there is certainly a need to look for alternative possibilities to make the most of the time at hand and ensure learners’ language growth.

Despite the plethora of research on the above-mentioned alternative assessment examples and individual differences (i.e., WM capacity), studies investigating the interplay between individual variations and instructional circumstances or approaches are still somewhat rare (Benson & Dekeyser, 2019; Ruiz et al., 2018). Additionally, it is well-acknowledged that vocabulary serves as the foundation of language. Notwithstanding, according to Ritonga et al., (2022), almost no study has ever attempted to investigate the effect of alternative assessment on vocabulary learning. Furthermore, in a world wherein English is seen as the most significant lingua franca, Afghan EFL learners’ general English proficiency and particularly their reading comprehension skill is extremely insufficient (Pallathadka et al., 2022). Grammar is given more emphasis in Afghan classrooms than other linguistic skills. This is because high-stake exams in Afghanistan are mostly dependent on grammar. Although grammar is highly valued in Afghan high schools, Afghan EFL students fail to acquire the grammatical features to which they are exposed, hence their grammatical knowledge is inadequate (Patra et al., 2022).

In addition to what has been explained above, numerous researchers have been concerned with the impacts of one of the examples of alternative assessment on language learning. After all, one question that remains is this: what alternative assessment is more facilitative of different language skills or language components? To accomplish it, this investigation seeks to vehemently fill this lacuna, add to the literature, and help language practitioners around the globe understand which alternative assessment is more helpful in developing learners’ language abilities. Moreover, to gain a wider perspective into how these different alternative assessment procedures might help language learners sharpen their linguistic skills, the potential role of WM capacity is also investigated to see how this individual difference could mediate language learning through different examples of alternative assessment.

Based on the above explanations, this study has three major objectives. The first objective of this study is to investigate the comparative effect of portfolio assessment, self-assessment, and scaffolded peer assessment on vocabulary learning with WM as an intervening variable. Secondly, the study objectifies to see which of the aforementioned assessment types is more facilitative of reading comprehension with WM as a moderating variable. Finally, this experiment looks into the comparative effect of these kinds of alternative assessments on the grammatical accuracy of language learners across different WM capacities. Therefore, this study looks to find an appropriate answer to the below-mentioned research questions:

Research question 1: Is there any significant difference between learners receiving portfolio assessment, those receiving self-assessment, and those receiving scaffolded peer assessment on reading comprehension across different WM capacities?
Research question 2: Is there a noticeable difference in vocabulary learning across various WM capacities between students who receive portfolio evaluation, those who receive self-assessment, and those who receive scaffolded peer assessment?
Research question 3: In terms of grammatical accuracy across various WM capacities, are there any notable differences between students who receive portfolio evaluation, those who receive self-assessment, and those who receive scaffolded peer assessment?

As mentioned above, language instructors have always been on the lookout to find a panacea for their learners’ language growth. Despite the numerous studies which have independently explored the efficacy of the above-cited alternative assessment varieties, to the best of what the researcher knows, no research has ever attempted to compare different alternative assessment strategies to see which is more facilitative of language skills. Additionally, this study also examines WM contributing role to see if individuals with different WM capacities can develop their language skills similarly. The researcher hopes that the results of this study help language teachers understand which strategy is of more help in actual language classrooms. Furthermore, the researcher also hopes that the results of this endeavor pose several theoretical implications for researchers in instructed SLA domain. Besides, this study’s findings might help course designers, materials developers, learners, and all stakeholders.

Literature review

In this section, at first theoretical considerations in alternative assessment at stakes, portfolio assessment, self-assessment, and peer assessment are discussed. Then and only then the experimental studies regarding these instantiations of alternative assessment are dealt with.

Theoretical underpinnings

Portfolio assessment

Electronic or printed dossiers containing student-written scripts are called portfolios. These scripts have been selected over time and are often supported by a reflective journal. In the field of education, portfolio assessment is frequently considered to be preferable to the more common, product-focused standardized tests (Kirkpatrick & Gyem, 2012). Numerous studies in second/foreign language (L2) have highlighted the benefits of portfolio assessment in terms of L2 teachers’ positive experiences with various types of it (Lee, 2017); the contribution of the portfolio to L2 learners’ autonomy, self-regulated learning, social awareness, and metacognitive awareness (Behbahani et al., 2011); and the mediation role of portfolio assessment in revising works-in-progress (Azizi & Namaziandost, 2023; Mphahlele, 2022). Because of the rigidity of L2 teachers (Xu & Brown, 2016), insufficient literacy in language assessment (Gan & Lam, 2020), and low student involvement (Lee & Coniam, 2013), its complex and comprehensive grading (Song & August, 2002), and the test-driven, dominant culture in most educational systems, portfolio assessment has remained highly contentious in actual classroom settings despite the claimed educational benefits (Lam, 2018). As a result, there have been several difficulties with fully implementing portfolio assessment in L2 contexts, prompting Hyland and Hyland (2019) to ask for more extensive study on these issues.

The process-oriented peer assessment approach to L2 writing redefines it from a pedagogical standpoint as a recursive and metacognitive activity that involves L2 learners in routine reflection on their language development (Lam, 2019). According to Vygotsky’s (1987) social constructivism model of learning, second-language learners learn best when they actively create their knowledge of the target language through social interactions rather than just receiving it and serves as the foundation for portfolio-based assessment. The L2 learners’ “knowledge of writing as a socially situated practice in academic discourse groups,” for instance, is strengthened by writing portfolios (Duff, 2010, p. 169). As a result, it can evaluate the development of L2 writers’ higher-level writing abilities (such as textual and discursive writing) as well as their lower-level writing abilities (such as writing mechanics and punctuation) (Steen-Utheima & Hopfenbeck, 2018).

Successful learner engagement, according to Chappuis (2014), depends on how well L2 learners grasp the aims in writing portfolios, how quickly they can visualize the gap between their current situation and those aims, and how to attain the aims. In a similar vein, it is advised that L2 writing instructors foster self-reflection by scaffolding the students through tutorials to the entire portfolio assessment process (Kusuma et al., 2021; Rezai et al., 2023), using examples and prompts (Gregory et al., 2001), extending deadlines to further engage students (Lam, 2020), and disclosing the assessment rubrics to them (Panadero & Romero, 2014).

Self-assessment

Many language teachers and academics agree that self-assessment and other alternative modes of assessment have received much research and support. Numerous types of research in the area have shown that self-assessment is very important and effective in fostering different language learning techniques and skills as well as in increasing the awareness and motivation required for language acquisition (Birjandi & Hadidi Tamjid, 2010). Self-assessment in particular hence looks appropriate to be included in the language learning curriculum.

To provide a thorough image of what pupils know and need to learn, assessment describes the procedures used to gather, trade, and negotiate data from a variety of important sources (Ebadi & Rahimi, 2019). Bachman et al. (2010) speak of self-assessment when one assesses their work. The technique of self-assessment should therefore be promoted and taught to every learner. The core of self-assessment, according to Locke et al. (1996), is the basic evaluation of one’s deservingness, effectiveness, and competence as a person. This idea is a wide, latent, higher-order attribute that includes neuroticism, self-efficacy, and self-esteem.

High levels of self-evaluation enable people to adapt to new circumstances and strive to fulfill their obligations to the best of their abilities (Al-Mamoory & Abathar Witwit, 2021; Jiang et al., 2022). Those with high levels of self-awareness can pause, reflect, and alter their emotional experiences (Putro et al., 2022). To enhance their learning, high-level self-awareness learners control their emotional experiences (Hu, 2022). Eysenck (1990) claimed that CSA can be used as a gauge of emotional stability in this regard. Additionally, self-evaluation promotes students’ wellbeing (Jahara et al., 2022). Learners should exercise their metacognitive skills, critical thinking, affective thinking, self-efficacy beliefs, and academic emotion (Wei, 2020; Zhang, 2022; Davoudi & Heydarnejad, 2020; Khajavy, 2021; Khajavy et al., 2020; Namaziandost & Cakmak, 2020) to implement self-assessment.

Scaffolded peer assessment

Feedback is the process through which students analyze critiques of their learning and apply them to themselves to become better students (Carless & Boud, 2018). For students to provide constructive critiques and comments on each other’s work in an organized learning process, there are two options: peer assessment and peer review. Peer assessment procedures allow for building critical judgment in addition to improving the activities being evaluated (Lipnevich & Smith, 2022; Malecka et al., 2020; Nicol, 2020).

There are various ways that peer assessment can assist learners. First of all, learning through peer assessment can help assessors better their job. In particular, they can improve their knowledge of the project’s specifications, evaluation criteria, and topic (Noroozi et al., 2016); produce additional ideas; learn from the work of their peers; and critically evaluate their work (Hsia et al., 2016). Students can gain insight into how to enhance their performance as assessees whose work is evaluated by peers (Hsia et al., 2016). The advantages of obtaining peer feedback are mostly dependent on how useful the feedback is and, more crucially, how effectively pupils apply it. Also, the utilization of feedback has a substantial impact on how well students’ final projects turn out.

Unfortunately, pupils lack subject-matter expertise. Some comments might be false or deceptive. Assessees may become confused when many assessors make conflicting comments (Mostert & Snowball, 2013). Students also doubt their peers’ abilities to offer feedback and do not regard them as “knowledge authorities” (Gielen et al., 2010, p. 305). This cynicism can affect assessees in both good and bad ways. In particular, the skepticism may lead to resistance to peer feedback or a reluctance to follow the recommendations of peer assessors. On the other side, a skeptic’s mindset might inspire assessees to come up with suggestions for improvement (Gielen et al., 2010; Jiangmei, 2023).

Peer learning, which is often referred to as collaborative learning, is based on social constructivism and holds that when learners socially interact with their peers outside of the classroom, learning occurs more actively (Roschelle & Teasley, 1995). Through exchanging personal tales, perceptions, and reflections, students positively rely on one another and aid one another’s mental models (Johnson & Johnson, 1987). Members of the group attempt to individually contribute to progress learning and accomplish a group goal in a cooperative learning environment (Johnson et al., 2014). This approach supports students’ cooperative knowledge-building (Naserpour & Zarei, 2021). While everyone in the group accepts responsibility for their learning, there is a strong interdependence among them (Bolukbas et al., 2011).

In Sawyer’s (2006) work, the help provided during the educational process to meet students’ needs when they are introduced to novel concepts and skills is referred to as scaffolding. This could lead to higher and more thorough levels of learning (Naserpour & Zarei, 2021). The zone of proximal development (ZPD), a main idea in socio-cultural theory, and folding have a close association. According to Vygotsky (1987), ZPD is the difference between a child’s actual and anticipated levels of development, which are determined by how well they can manage problems when given direction from adults or more proficient peers (Verenikina, 2008). Scaffolding is the temporary assistance of an expert given to a beginner to boost their independence. This help is gradually lessened or withdrawn as students demonstrate mastery, complete activities on their own, and develop their skills and capabilities (Diaz-Rico & Weed, 2002 cited in Homayouni, 2022).

Working memory

The term “working memory” describes the capacity to retain and process data while performing continuous cognitive tasks (Li, 2023). The term “working memory” was first used to refer to a revised understanding of short-term memory as a cognitive resource for concurrent information storage and manipulation as opposed to only a passive storage device. WM is a subject of many studies in SLA because of its alleged impact on the procedure and results of language learning (SLA). Harrington and Sawyer (1992), who looked into the function of WM in text understanding, and Mackey et al. (2002), who looked into the relationships between WM and L2 interaction, are two pioneering research on WM in SLA. Since these landmark findings, interest in the mediating function of WM in numerous facets of L2 learning has steadily increased. Despite the increasing interest in WM in L2 research, there has been a lack of consensus regarding its conceptualization, measurement, and process. This has led to a variety of inconsistent, and occasionally contradictory, results from the research.

Several theories have been put up to explain the connections between the various WM components (Miyake & Shah, 1999; Namaziandost et al., 2022). Two models, the multi-componential model, and the unitary model serve as the fundamental representations of these theories. Baddeley (2017) promoted the multi-component model, which divides WM into four parts: the central executive, the phonological loop, the visual-spatial sketchpad, and the episodic buffer. According to Baddeley (2017), the central executive coordinates across various components focuses and shifts attention, allocates resources, and communicates with long-term memory. A passive storage system for keeping and practicing auditory information is the phonological loop. It is a tool for acquiring vocabulary and is crucial for learning new vocabulary, not just random correlations between well-known words. For storing and practicing knowledge in the form of pictures, shapes, colors, directions, places, and their arrangements, turn to the visual-spatial sketchpad. The episodic buffer serves as a temporary storage area for combining discrete information bits into larger units, connecting short-term and long-term memory, and connecting data from various sources and data in various formats.

The reading span test that Daneman and Carpenter (1980) devised, which simultaneously examines the storing and processing components, is where the unitary model’s North American origins may be found. The storage and processing tasks in this architecture are interdependent and share the same resource pool. The storage and processing operations are trade-offs, thus providing more resources to one will result in fewer resources for the other. The unitary model states that executive control, despite playing a significant role, as well as storage alone, such as phonological loop and visuospatial sketch pad, cannot describe WM.

Experimental underpinnings

Portfolio assessment

As an example of alternative assessment, numerous researchers have investigated the efficacy of portfolio-based assessment of language growth. For example, Barrot (2021) looked into the impacts of e-portfolio on ESL learners’ writing. Eighty-nine L2 English speakers from four English classrooms participated in the study. An e-portfolio was used by two classes in the treatment group (N = 48), whereas a traditional portfolio was utilized by the other two classes in the control group (N = 41). Findings showed that e-portfolio learners outstripped the traditional portfolio group. These outcomes were linked to the e-portfolio’s flexible, accessible, interactive capabilities, and its capacity to expose learners to peer pressure.

Another research that has studied the potential of portfolio assessment in reading comprehension in an EFL setting is that of Amani and Salehi (2017). Their study objectified to evaluate the effects of the portfolio as a descriptive evaluation technique on the growth of Iranian EFL students’ text understanding skills using Prospect 2 as the foundational text. To achieve this, 20 female EFL students from an Iranian guidance school were chosen. Members of the experimental group received the portfolio assessment, whereas the control group members received the traditional assessment. The students in both groups took two text understanding assessments as a pretest and post-test to gauge their level of reading comprehension before and after the intervention. Descriptive and inferential statistical techniques were used to conduct the statistical study. The results did not demonstrate that the portfolio was superior to the traditional scoring method in helping children develop their reading comprehension.

In another research, Nourdad and Banagozar (2022) examined the potential role of e-portfolio evaluation on vocabulary learning and retention. Ninety-two guidance schools were chosen as the study’s subjects to achieve this goal. They were split into two experimental and control groups at random. The experimental group practiced e-portfolio evaluation while the control group adhered to the traditional in-class quizzes. The experimental group’s members were instructed to make their e-portfolios and keep a log of the lessons they learnt both during and after the online sessions. Also, it was requested that they upload the reflection sheets to their e-portfolios. To collect information regarding the impact of portfolio assessment in each grade, three parallel tests were used: a pretest, an immediate post-test, and a delayed post-test (a total of nine tests). The treatment participants outstripped the control condition in terms of acquisition and retention of EFL vocabulary, according to the findings of a one-way ANCOVA.

Examining the efficacy of portfolio-based assessment in language growth is not restricted to the above-mentioned studies. In newly published research, Rezai et al., (2022a, 2022b) wondered whether e-portfolio assessment can cultivate EFL learners’ vocabulary, motivation, and attitudes. After homogenizing 100 EFL male students for this project, 50 were randomly assigned to the experimental group and 50 were placed in the control group. Following that, they completed the pretest, interventions, and post-test procedures. Eighteen 1-h sessions were held twice a week, and the experimental group received their training using e-portfolios, whereas the control group received their training through more traditional means. Using the use of an independent-sample t test, mean calculations, and percent calculations, the acquired data were examined. The post-test results showed that the experimental group fared better than the control group in terms of vocabulary knowledge improvements. The results also showed that in terms of motivation after the interventions, there was a significant difference between the two groups. The results also demonstrated that the participants’ sentiments about the e-portfolios were quite favorable.

Self-assessment

Although numerous studies on the impacts of self-assessment on L2 learning have been undertaken over the past 10 years, none has looked into how self-assessment reports affect L2 learning. It was for this reason that Rezai et al., (2022a, 2022b) sought to examine Iranian teenagers’ perceptions of the efficiency of self-assessment reports in developing writing skills as well as how self-assessment reports enhance their writing abilities. The researchers chose one whole grade 11 class for this study. A self-assessment report based on Nunan’s (2004) template was created and distributed to the students to help them evaluate their writing each week during the 15 sessions of instruction, which were held twice a week. Six students were used in a focus group interview that followed. The students’ writing abilities in terms of content, language, and organization showed considerable improvement, according to the findings. The focus group interview results also revealed four themes: improving students’ understanding of evaluation standards, fostering greater self-control, giving students a say in their academic futures, and boosting students’ writing drive.

The effects of self-evaluation, planning, goal-setting, and reflection on students’ self-efficacy and writing performance before and after revision were examined by Chung et al. (2021). Their findings revealed that the treatment condition had significantly improved on the post-test in terms of writing performance. In addition, they discovered that participants’ self-efficacy changed dramatically from before to after the revision.

One alternate method for gauging students’ English-speaking prowess is self-evaluation. Students are allowed to learn about, practice, and improve their speaking skills through this evaluation. Nonetheless, it is unlikely that projects of this nature were typical throughout Indonesia. Alek et al. (2020) wanted to understand how pupils at Link and Match vocational high school felt about using self-assessment to evaluate their speaking performance. Five items about the use of self-assessment were included in the questionnaire used to collect the data for this study. The data in this qualitative study had undergone a descriptive analysis. Thirty students from vocational high schools who were majoring in multimedia were included in this study. The majority of students believed that self-evaluation was highly beneficial since it helped them understand their functional capabilities and how to improve them to meet course objectives, particularly the speaking course objective. Furthermore, some students believed that self-assessment was very helpful because the teacher did not frequently utilize this assignment and the students did not enjoy trying to evaluate themselves. These researchers concluded that to investigate and evaluate pupils’ speaking abilities, self-assessment is highly helpful.

Peer assessment

Peer assessment has been more prevalent in classrooms and other learning environments in recent years. Despite the widespread belief that peer assessment improves learning across empirical investigations, the outcomes are conflicting. Li et al. (2020) combined findings based on 134 impact sizes from 58 trials in a meta-analysis. The performance of peer assessment learners is improved by 0.291 standard deviation units as compared to those who do not. They also conducted a meta-regression study to look at the variables that may affect the peer assessment effect. The most important element is rating system training. Peer assessment effect size is significantly greater when students have received rater training than when they have not. Peer assessment that is computer-mediated rather than paper-based is also linked to larger learning gains. Other factors (including rating format, rating standards, and peer assessment frequency) also have observable effects but are not statistically significant. Finally, these L2 researchers suggested that researchers and educators can use the findings of the meta-analysis as a guide to decide how to use peer evaluation as a learning tool effectively.

In another study, Moghimi (2022) explored the comparative effects of peer assessment and self-assessment, and gender on Iranian EFL learners’ accuracy in speech. Based on the Quick Oxford Placement Exam, 60 homogeneous were chosen. An OQPT, peer, and self-assessment questionnaires served as the study’s tools. To calculate the results, SPSS version 20 was used. The means were similar, but the male students’ mean score was slightly higher than the female students. Furthermore, assessment types had a substantial impact on speech accuracy performance and that peer assessment was superior to self-assessment in this area.

Another study that has dealt with the efficacy of peer assessment coupled with scaffolding on oral skills and lexical growth is that of Homayouni (2022). The researcher chose 5 intermediate English learners and 37 lower-intermediate English learners through cluster sampling to achieve this goal. Then, 5 more proficient students and 20 lower-intermediate participants were assigned at random to the experimental group. The intermediate learner was given the role of the mediator in groups of 5, and they were in charge of providing feedback to their peers. There was no mediator assigned to the control group, which included the remaining individuals. Throughout four training sessions, both the scaffolded peer assessment of speaking and vocabulary learning was conducted. A one-way repeated measures ANOVA and an independent sample t test were performed in this randomized pre-test-post-test-delayed post-test trial. The outcomes of the statistical analysis showed that scaffolded peer evaluation had a significant positive influence on learners’ vocabulary growth and speaking ability. That is, both speaking abilities and vocabulary knowledge can be developed by using scaffolded peer assessment in a group-oriented setting. The study’s pedagogical implications suggest that language instructors can use the sociocultural theory and social constructivism concepts put out by Vygotsky (1987) to widen and deepen students’ ZPD.

Working memory

As an individual difference trait, WM is claimed to mediate language learning. To verify this claim, Chow et al. (2021) investigated the roles of reading anxiety and WM in text understanding among Chinese EFL students. There were 105 Chinese ESL undergraduates altogether. The results revealed that verbal WM and reading anxiety, as reflected by reading traits and state anxiety, were the only two independent predictors of ESL reading comprehension. Moreover, there was no discernible connection between reading anxiety and WM. The association between verbal WM and ESL reading comprehension was found to be somewhat mediated by reading anxiety, according to mediation analyses. These findings provide insight into the strategies for improving ESL learning and emphasize the significance of affective and cognitive components in determining ESL text grasping.

In another study, Teng and Zhang (2021) purported to investigate how WM functions in vocabulary learning with multimedia input. They focused on the potential connections between executive WM and phonological short-term memory (PSTM), as well as the effects of three different input conditions (definition + word information + video, definition + word information, and definition) on the acquisition of vocabulary in a second language (L2). Ninety-five students in all completed the three learning scenarios and passed the two WM tests: the reading span exam, which assesses complex executive WM, and the non-word span test, which evaluates PSTM. They tested both receptive and productive vocabulary knowledge both at the beginning and end of the 2 weeks. Based on repeated-measures analysis of covariance (ANCOVA), our results show that complex and phonological WM plays a significant role in vocabulary learning and retention under the three conditions. They also showed that the definition + word information + video condition has pronounced effects on vocabulary learning and retention.

In another study, Patra et al. (2022) looked at how learning English future tense was impacted by processing instruction (PI) and output-based activities, with WM serving as a mediating factor. To achieve this, 99 participants with pre-intermediate English proficiency as determined by the Oxford Placement Test were chosen for the study. They were split into three groups, each of which contained 33 learners: PI, output, and control. Utilizing a reading-span test, it was discovered that only 14 of the PI group’s subjects, 15 of the output group’s participants, and 13 of the comparison group’s students had poor WM levels, while the other participants had strong WM levels. Then, a Bonferroni adjustment post hoc test and a two-way between-group analysis of variance were carried out. The analysis’ findings demonstrated that the output and PI groups both outstripped the control group. The grammatical gain between the PI and output groups was also the same. Moreover, students with high WM did better than those with low WM. These L2 researchers concluded that output-based learning activities and PI can help teachers adopt powerful tactics to increase the knowledge and awareness of L2 learners.

All in all, the abovementioned studies point to the efficacy of portfolio assessment, self-assessment, and peer assessment. However, sorting through the literature reveals that there remains a paucity of research examining the comparative effects of these types of assessment on language development. Among the studies cited above, only Moghimi (2022) examined the comparative effects of peer assessment and self-assessment on learners’ accuracy in speaking. One study is not enough in making sure whether peer assessment is superior to self-assessment. Additionally, to the best of what the researcher knows, no study has ever attempted to examine the mediating role of WM on the effects of different types of alternative assessment in language development. It is for these reasons that this study attempts to fill the gap and comparatively examine the effects of portfolio assessment, self-assessment, and scaffolded peer assessment on reading comprehension, vocabulary learning, and grammatical accuracy in an EFL setting. The researcher hopes that the results gleaned from this study will add to the literature, fill a knowledge gap, help language teachers assist in language development in their learners, and help material designers how to design better textbooks.

Method

In this section, the study’s design, setting and subjects, instruments, data collection procedure, and method of data analysis are discussed in detail.

Design

Since it was impossible for the researcher to randomly select the participants of the study, a quasi-experimental pretest-post-test control design (Ary et al., 2019) was employed in this current quantitative investigation. Four groups participated in this exploration: three treatment groups and a control group. The experimental groups included a portfolio group, a self-assessment group, and a peer assessment group. The variables of the study include an independent variable (i.e., type of treatment) with four levels discussed just above, three dependent variables (i.e., scores on tests of reading comprehension, vocabulary, and grammar), along with a moderating variable (WM capacity). It needs to be mentioned that learners’ reading comprehension, vocabulary growth, and grammatical accuracy were checked on two occasions, once before the treatment (pretest), and once right after the treatment (post-test).

Setting and participants

A hundred and twenty-five students studying English at a private language institute in Kandahar, Afghanistan, participated in this study. They were chosen for the study through convenient sampling. This sample was chosen out of 172 subjects. To be more specific, through an Oxford Quick Placement Test (OQPT), 120 subjects with lower-intermediate command of English, and five learners with intermediate level were chosen. The philosophy behind selecting the higher-intermediate learners was to assign more proficient learners in the peer assessment condition to serve as the mediator in the group. The participants were between the ages of 15 and 19. All participants in this study had Persian as their L1 with English serving as their target language. The subjects who had been selected were then assigned to four conditions: portfolio condition (N = 30), self-assessment condition (N = 30), peer assessment condition (N = 35), and control condition (N = 30), with 30 subjects in each. According to the results of the reading-span test (to be discussed in the following section), 16 subjects in the portfolio condition, 14 learners in the self-assessment group, 18 participants in the peer assessment group, and 13 participants in the control condition had high WM, while the rest of the participants had low WM. Additionally, a signed consent form was taken from all the participants before the research. For students below the legal age of 18, their parents were asked to sign the form.

Instruments

At the beginning of the research, the researcher functioning as the teacher of the classrooms used an OQPT to determine the subjects’ proficiency level. Thereafter, the researcher developed three instructor-made tests of reading comprehension, vocabulary knowledge, and grammar. Tests of vocabulary and text comprehension were based on Focus on Vocabulary 1: Bridging vocabulary designed by Schmitt et al. (2011). Furthermore, the grammar test was based on Oxford Living Grammar (pre-intermediate level) designed by Harrison (2009). Furthermore, to check participants’ WM capacity, a reading-span test developed and validated by Shahnazari (2013) was used. This measure of WM is a test in which testees need to read the sentences and make a judgment on whether the sentences are grammatically plausible. Additionally, they need to memorize the last word of each sentence. According to Shahnazari (2013), the number of words each examinee can recall constitutes their WM span. Because the researcher himself designed the items of the tests based on the aforementioned textbooks, these instructor-made tests were adopted by him, while the OQPT and the reading-span test were adapted for the study. To make sure of the validity and reliability of the adopted instruments certain procedures were undertaken. First of all, to construct and validate the instruments, the researcher used the known-group technique (Ary et al., 2019). In this group differential strategy, the researcher administered the adopted instruments to a group of English language teachers who knew the answers to the items. The difference between their performance and those of the participants at the pretest turned out to be statistically significant based on the independent sample t test results at p < 0.05, hence the validity of the instruments. Moreover, the check the reliability of the instruments, using SPSS software, alpha Cronbach’s value was determined which turned out to be 0.76 verifying the reliability of the instruments. These adopted instruments had multiple-choice, fill-in-the-blank, and open-ended items. In addition to this, two versions of each instrument were adopted. A version was administered at the onset of the study (i.e., pretest), and another version with similar in form but with different items at the end of the treatment (i.e., post-test). It should not be forgotten that this study targeted the present continuous linguistic features. Furthermore, as far as the validity of the portfolio-assessment instrument is concerned, according to Lynch (2001), to have a valid portfolio instrument, we need fairness and consequential validity. Thus, learners were allowed to select the materials of their choice from among the submitted materials to raise the fairness of the instrument. Additionally, if it turns out that the participants in the portfolio assessment can gain the materials, the consequential validity of the instrument is automatically confirmed.

Data collection procedure

First of all, an OQPT was administered for the research to come up with a homogenized sample. For this study, based on the OQPT results only lower-intermediate learners of English were selected along with five higher-intermediate learners of English. These lower-intermediate learners were assigned to four conditions: a portfolio condition, a self-assessment condition, a scaffolded peer assessment condition, and a control group. In addition, the higher-intermediate learners were injected into the peer assessment group to function as the group’s head and mediator. To be more specific, the participants in the scaffolded peer assessment condition were divided into five groups each with six learners, along with a higher-intermediate learner as the mediator. Then, the first version of the instructor made tests of reading comprehension, lexis, and grammar were given. After that, the researcher administered an adapted reading-span test discussed above to determine the participants’ WM span. Thereafter, the treatment began. In a treatment that lasted 10 sessions, the first two sessions were devoted to the administration of the OQPT, pretest, and reading-span test. The researcher decided to split the treatment into three halves. In the first phase, the researcher gave students enough guidance on how to choose, gather, and reflect on their activities in their portfolios as well as complete the self-assessment checklists, so they could become more independent and autonomous in their reading comprehension, lexical expansion, and grammatical accuracy.

In the first phase, the students in the portfolio and self-assessment conditions were given instructions during the first two instructional sessions. One assignment was due in the classroom, and the other was due outside the classroom, both on different subjects. To keep track of their tasks in chronological order, they created files. The researcher corrected the students’ work using the checklists each session and addressed the substance of them in the class along with individual conferences because the researcher discovered that self-assessment using checklists requires comprehensive teaching. Students believed they could use the checklist to self-evaluate their papers after four weeks of teaching. Based on the qualitative observations, they improved in self-correction starting with the fifth instructional session.

Students improved in the second phase at using the checklist to self-evaluate their work. Except for some of the learners who required additional assistance, the teacher opted to reduce and eventually discontinue the teacher-student conferences. Nearly all of the students had the opportunity to self-evaluate their work throughout the second half of the treatment, complete the checklists, and add the papers to their portfolios for instructor random inspection. Following that, the researcher reviewed the pupils’ portfolios every other session and noted the comments in the checklists for the portfolios. This allowed both the students and the teacher to reflect on all of the activities that were documented in the portfolio.

In the third phase and the scaffolded peer assessment condition, group participants were divided into different groups with a more proficient learner selected for each group to function as the head and mediator. Then, the instructional materials were given to the participants. In this cooperative scaffolded type of alternative assessment, attempts were made to develop learners’ ZPD. That is, attempts were made to help learners do something under the guidance of a more proficient peer (i.e., mediator) that they could not do on their own. In this experimental condition, under the teacher’s guidance, the mediators provided mediation to their peers. In other words, peers evaluated the comments produced by their buddies and advised on how those buddies can fix their inaccurate responses. This procedure was repeated in every session until the treatment finished.

At the last session of the treatment, the post-test was given, and learners’ scores on both the pre and post-test were statistically compared using SPSS software which allowed the researcher to conduct statistical tests of significance.

Data analysis

To perform tests of statistical significance, the researcher resorted to the SPSS software. At first, because the researcher needed to ensure the normal distribution of the data, a one-way Kolmogorov–Smirnov (K-S) test was conducted. Then, to check the effects of the treatment concerning the mediating role of WM, three two-way between-group MANOVAs were carried out. Post hoc tests will also be conducted to check the interaction effects.

Results

The study’s questions are attempted to be statistically analyzed in this section.

Research question 1: Is there any significant difference between learners receiving portfolio assessment, those receiving self-assessment, and those receiving scaffolded peer assessment on reading comprehension across different WM capacities?

In this research question, there is an independent variable (i.e., type of assessment) with three levels (i.e., portfolio assessment, self-assessment, and scaffolded peer assessment), a mediating variable (i.e., WM capacity) with two levels, and two interval-dependent variables (i.e., pre- and post-test scores on a reading comprehension test). In such a scenario, one needs to run two-way between-group MANOVA (Rezai, 2015). However, this test of statistical significance has some assumptions. Firstly, we need to make sure whether the data are normally distributed. Thus, a one-sample K-S test must be performed (Pallant, 2020).

Table 1 presents the results of a one-sample K-S test. As Table 1 shows, the Sig. (2-tailed) value in all four sub-parts of the table exceeds 0.05, so the normality assumption is confirmed. Now, we need to ensure the homogeneity assumption (Pallant, 2020). To ensure the homogeneity assumption, one needs to run Levene’s test of equality of error variances (Rezai, 2015).

Table 1 One-sample Kolmogorov–Smirnov test

The comparative impacts of portfolio-based assessment, self-assessment, and scaffolded peer assessment on reading comprehension, vocabulary learning, and grammatical accuracy: insights from working memory capacity

Abstract

Introduction

Literature review

Theoretical underpinnings

Portfolio assessment

Self-assessment

Scaffolded peer assessment

Working memory

Experimental underpinnings

Portfolio assessment

Self-assessment

Peer assessment

Working memory

Method

Design

Setting and participants

Instruments

Data collection procedure

Data analysis

Results

Discussion

Implications of the study

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors’ information

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords