Hong Kong boasts a primarily monoethnic population (92% ethnic Chinese) (Bacon-Shone et al., 2015). This ethnic composition, together with an absence of urban-rural differences, has been conducive to an effectively centralized education system. For three decades, Hong Kong’s education system had closely followed Britain’s 6-5-2-3 system: 6 years of primary education, 5 years of secondary education followed by the Certificate of Education Examination, 2 years of pre-university secondary education followed by the Advanced Level Examination, and 3 years of university education (Marsh & Lee, 2014). The British colonial government implemented the 9-year compulsory education policy in 1978, guaranteeing free primary education and junior secondary education. The Government of Hong Kong extended it to 12 years in 2007 to cover the entire period of secondary education and aimed for it to coincide with the revised 6-3-3-4 education structure. In 2009, the New Senior Secondary (NSS) curriculum was implemented with a significant overhaul of the test structure. The NSS curriculum is anchored by four core subjects (including the English language) and one major summative test, the HKDSE, which tests the corresponding subjects at the end of Form 6.
Hong Kong’s education system is also characterized by its systematic approach, at least in rhetoric, to measuring students’ learning outcomes in both summative and formative manners (see Curriculum Development Council, and Hong Kong Examinations and Assessment Authority, 2007). The Hong Kong Examinations and Assessment Authority (HKEAA) (hereafter referred to as the “testing authority”), a statutory body of the Government of Hong Kong, administers the tests. It is worth noting that the Territory-wide System Assessment (TSA) administered at the end of each key stage (Primary 3, Primary 6, and Secondary 6) is a formative, low-stakes test. This clarification is necessary to highlight the potentially substantial washback effect of the HKDSE, a major high-stakes and summative test. For the approximately 550,000 primary and secondary school students (Hong Kong Special Administrative Region Government, 2019) enrolled annually in the system, this university entrance test is the finale that concludes their years of compulsory education.
Assessment of English reading literacy in Hong Kong
Reading literacy, which the Organisation for Economic Co-operation and Development (2019b) affirms as a fundamental ability for learners to succeed in other subjects—and more importantly, in life—is a major subject on the PISA test. While PISA primarily captures reading literacy in the learners’ first language, it does not mean that second language reading literacy is of lower importance. The development of reading literacy often stems from social and cultural needs (Grabe, 2008), and as previously explained, Hongkongers are inevitably subject to such needs. The same may be observed in other East Asian countries and regions, hence their respective standardization of the English language.
Hong Kong is one of the few regions in Asia that publish official documents on assessment frameworks. Still, documents from both the Education Bureau and HKEAA are vague, making informed instruction and test design difficult. This stands in contrast with the testing authority’s enthusiasm for “Reading to Learn,” a strategy that promotes in-depth reading for cognitive development since 2001 (see Curriculum Development Council, and Hong Kong Examinations and Assessment Authority, 2007). Part of the problem is that the construct reading literacy is arguably ill-defined, and the test designers have failed to include a matrix of descriptors in test specifications. It should be noted that the testing authority does not specify the distribution or weight given to each learning objective that is outlined in the Assessment Guide (see Curriculum Development Council, and Hong Kong Examinations and Assessment Authority, 2007); the document analysis in this research paper, therefore, seeks to uncover this distribution.
Psycholinguistically oriented research on reading
It seems impossible to adequately define reading as a construct. Indeed, even Alderson (2010) struggles to give a clear-cut definition of the ability in his much-referenced book, except that it entails understanding of a text through both cognitive and metacognitive processes. Without any attempt to draw a firm dichotomy, one might acknowledge that it concerns both bottom-up and top-down cognitive processes in meaning making, the former entailing the use of smaller, linguistic units in understanding the bigger picture and the latter the activation of the reader’s schemata in interpreting the text (Goodman, 1969, 1982, as cited in Alderson, 2010; Khalifa & Weir, 2009). While there is ample evidence supporting either the bottom-up or the top-down orientation, the interactive approaches to reading as underlined in Grabe (1991) encapsulate the intricate interaction of both orientations as one engages in reading.
Whether it is an indivisible whole or operates on dissectible components, however, has not achieved consensus. Goodman (1967, 1969, as cited in Koda, 2005), for example, contends that reading is naturally embedded in communication and therefore is unlikely to be understood by discrete measures of components. Similarly, Rosenshine (1980, as cited in Khalifa & Weir, 2009) concludes that there is inconsistent evidence concerning the distinctiveness of reading skills. On the other hand, the potential divisibility of the construct is probably more welcomed where testing is concerned, for the components can be measured post hoc with, for instance, correlational and factorial research (Khalifa & Weir, 2009) where reading is considered more a product than a process. But even these quantitative analyses are subject to sampling issues and variation in the method of analysis (Alderson, 2010; Khalifa & Weir, 2009).
It is therefore expected that second language (L2) reading can only be complicated in its operations. Researchers may have steered clear of the likelihood of it being a unitary construct given the dual-language involvement and taken on the componential view (Koda, 2007; Weir, 2005). Still, to measure L2 reading skills entails the assessment of both the reading ability and language proficiency, which are not entirely distinct from each other. Alderson (2010) contends based on research evidence that a test assessing L2 reading skills would require the reader to cross a “linguistic threshold” (p. 121), which varies according to task demands.
Assessment of reading literacy in national and international contexts
Two bodies of literature are helpful in situating the present study’s aim: the first on cognitive processes in assessments of reading literacy, and the second on manipulation of test difficulty. This literature review aims to demonstrate that while researchers have long lauded the merits of and advocated for incorporating a comprehensive range of cognitive processes in language assessment, they have thus far made few attempts to suggest corresponding measures to realign test difficulty, which serves to accommodate and edify varying learner attributes.
Levels of cognitive processes in reading literacy assessment
Bloom (1956) defines cognitive processes as those of recalling knowledge and developing intellectual abilities and skills. While the classification of cognitive processes in Bloom’s (1956) taxonomy primarily serves the purpose of formulating educational objectives in curricula rather than assessments, it has since emphasized the inter-relation of the range of such objectives in the taxonomy and the importance of exploring educational activities for higher order problem solving. In other words, cognitive processes concern both breadth and depth of knowledge. Since assessment is integral to and should be aligned with curricular objectives, the development of scales and measurements of cognitive processes for analysis of assessment materials has been embraced by a host of disciplines. It is of particular interest to non-language subjects such as mathematics and science, both of which are “highly formal” and “practical” (Tamir, 1980, as cited in Edwards & Dall’Alba, 1981, p. 159) and therefore could be captured in the taxonomy.
Numerous studies have used variations of frameworks mostly based on Bloom’s (1956) Taxonomy to analyze the cognitive processes in instructional materials and assessment tasks, though most have focused on mathematics and sciences (Berger et al., 2010; Momsen et al., 2017). There has been a lack of studies exploring the design of assessment tasks at different levels of cognitive processes in the context of language subjects or the humanities in general. This may be attributed to a correspondingly scarce amount of research on the development of scales and measurements for language assessment analysis. Hess (2004) first addresses this paucity of research by introducing a descriptive framework for assessing reading literacy based on Webb’s (1999) Depth of Knowledge model. Johnson and Mehta (2011) later build on the Complexity-Resources-Abstractness and Strategy (CRAS) framework developed by Edwards and Dall’Alba (1981) and bring it to the attention of language assessors, taking advantage of the flexibility of the scale in terms of its applicability to an extensive range of subjects. It is therefore feasible to adopt one or more frameworks to analyze language assessment tasks, although prior analyses so far have been empirical and undertaken mostly at the classroom or school level, as opposed to the systemic level.
Unfortunately, at the systemic level, the significance of incorporating a range of cognitive processes in reading assessments has been overlooked in Hong Kong. As the city-state constantly claims one of the top spots in international testing, local authorities have downplayed the fact that students take the PISA in their native language of Chinese (Organisation for Economic Co-operation and Development, 2019a). As such, the PISA results provide little explanatory or predictive relevance to the English reading literacy test in the HKDSE. The transfer of abilities of word decoding from one’s first language to a second language is far from automatic; it is especially true when the orthographic and phonological processes required of the first language and the second language are vastly different (Verhoeven, 2011), as is the case with English and Chinese. Therefore, the PISA results do not justify complacency; rather, it is a warning to the local testing authority that a critical missing piece in the HKDSE test must be filled: in contrast to PISA, the HKDSE does not currently guarantee the assessment of a full range of cognitive processes, even though the curriculum documents present as such (see Curriculum Development Council, and Hong Kong Examinations and Assessment Authority, 2007). Researchers who wish to examine whether there is a skew in the range of cognitive processes in the test need to analyze each question with reference to a cognitive process model.
Manipulation of test difficulty
It is essential that the level of cognitive processes and test difficulty be saliently distinguished in the context of assessment, even though the two terms may be used interchangeably colloquially. Scholars have attempted to clarify the difference, and successfully so, by considering the level of cognitive processes as one of the many factors that might affect test difficulty (Bensoussan et al., 1984; Gan, 2011). Other factors, such as the number of steps required of the learner in each question, the sequencing of the tasks, and the number of parties involved (applicable to group work), can be manipulated to balance the difficulty brought by higher order cognitive processes (K. Hess, personal communication, January 27, 2020). It is therefore plausible to posit that with a well-established hierarchy of test difficulty, students with varying abilities would be able to complete tasks according to their ability without being restricted to the levels of cognitive processes required in the tasks. In other words, there can be difficult tests without higher order cognitive processes, and vice versa.
Researchers have also looked at the interplay among factors contributing to reading literacy (Olmez, 2016; Skehan, 2001). This body of research contributes to the literature of second language test difficulty; Skehan (2001), in particular, proposes five psycholinguistic categories that would affect test difficulty: interactivity, familiar information, degree of structure, complex and numerous operations, and communicative stress. These domains largely coincide with the text factor outlined in the assessment framework under the PISA reading literacy test. The text factor is one of the three factors identified by the OECD with respect to their PISA methodology, along with the other two factors, reader and task (Organisation for Economic Co-operation and Development, 2019b). While the inherent reader factor (e.g., motivation or prior knowledge of the learners) is independent from the test, meaning there is less of an intervention feasible on the reader factor side, the text factor and the task factor could be manipulated by assessors and used to interact with the learner factor by activating the learners’ schema (Snow, 2002). The level of cognitive processes, while deemed highly relevant to test difficulty, essentially represents the task factor. Assessors may additionally turn to the text factor to manipulate test difficulty by determining four domains in the text, namely: source (whether the texts come from a single source or multiple sources), organization and navigation (whether the texts are dynamically or linearly organized), format (whether the texts are continuously presented in paragraphs or not), and text type (for example, literary or factual) (Snow, 2002). These could all be presented as possible solutions to appeal to policymakers to revise the assessment design of a test.
In conclusion, with test difficulty taken into consideration, a revision of assessment design that employs a full range of cognitive processes can be realized. While scales and measurements of cognitive processes for analysis of assessment have been primarily designed for non-language assessments, models have been further developed to fit the language assessment context, though without evidence from document analysis. This research study attempts to fill the gap in the literature by shedding light on how the range of cognitive processes varies in the test over the past 8 years relative to the accuracy rate.
Conceptual framework
This study is guided by Bloom’s (1956) taxonomy of educational objectives, which establishes the significance of the cognitive domain, and Stanovich’s (1980) interactive compensatory model. The former is so prominent that it would be virtually impossible to write a paper on this topic without acknowledging it; the latter model posits that a deficit in one’s cognitive process may be compensated for by greater reliance on another cognitive process, thus explaining the surprising empirical evidence of poor readers using higher order contextual processing to compensate for a deficit in word recognition. The aim is to determine whether a full range of cognitive processes is assessed in the HKDSE test, hence the emphasis on the task factor, which has been identified in the previous section as one of the key three factors involved in reading assessment methodology (the other two being the reader factor and the text factor). I also explore if items assessing higher order cognitive processes correspond to extremely low accuracy rates, also defined earlier as the percentages of test-takers who answered the items correctly. An affirmative answer to that analysis would identify a critical need for revision of the assessment design.
Taxonomy of educational objectives
The widely applied Bloom’s (1956) taxonomy is a model for classifying and analyzing cognitive educational objectives devised primarily for teachers and curriculum makers. The cognitive objectives are categorized into six subdivisions according to the complexity of learner behavior in the outcomes: knowledge, comprehension, application, analysis, synthesis, and evaluation, with evaluation ranking as the highest in the taxonomy. While it is designed for formulating activities in curricula rather than objectives in assessments, the fact that assessment is integral to curricular objectives means that the taxonomy has significant implications on test design. It should be noted that the hierarchy is dynamic in a way that the highest order cognitive process in the taxonomy (i.e., evaluation) could herald the acquisition of new knowledge, which in turn leads to a new series of cognitive learning. Anderson et al. (2001) later revise the taxonomy with refined definitions of cognitive processes in each subdivision and with an increased emphasis on higher order cognitive processes, which enhance not only retention but also transfer of knowledge. Admittedly, the taxonomy is limited in its descriptive capacity as there have been disputes as to whether all educational objectives could be saliently classified; this potential limitation is mitigated by inter-rater reliability when coding the assessment items. Additionally, the use of hierarchy might induce the illusion that the attainment of skills and abilities in one level would become a prerequisite for the attainment of skills and abilities in the next level. This pitfall, nonetheless, could be addressed by Stanovich’s (1980) interactive compensatory model.
Interactive compensatory model
The interactive compensatory model debunks the myth that higher order cognitive processes are only implicated in the reading process of good readers. Empirical evidence suggests poor readers might use more contextual reasoning, which is considered a higher order cognitive process, to aid their reading when compared to good readers. Instead of calling the evidence paradoxical, Stanovich (1980) explains it with the assumption of compensatory processing in the model. He posits that higher order cognitive processes are not exclusive to good readers, nor are they a prerequisite for entering a level in the cognitive process hierarchy. It is possible that poor readers “compensate” for their poor word recognition, which is at the lower end of cognitive processes, with a heavier reliance on higher order knowledge sources such as the context of the text to help with understanding. This contrasts poor readers with good readers who do not need to rely on contextual processing to aid in reading; it is posited that good readers could manage word recognition without a context provided. Therefore, higher order cognitive processes are not intended only for good readers, nor should they be. It should be noted, however, that while the theory focuses on reading fluency, it does not specify the context of second language learning. Furthermore, although it relies heavily on the interplay between the reader factor and the text factor, it mentions little about the cognitive processes implicated in the task factor.
My theoretical perspective combines Bloom’s (1956) taxonomy of educational objectives and Stanovich’s (1980) interactive compensatory model to analyze the range of cognitive processes assessed in the reading literacy test. This paper argues that a revision of the assessment would be warranted in the case of a consistent skew towards items assessing lower order cognitive processes. Bloom’s (1956) taxonomy of educational objectives emphasizes the importance of exploring educational activities for higher order problem solving, which in turn influences assessment activities. Based on this theory, this study explores whether the testing authority places outsized emphasis on lower order cognitive processes. Stanovich’s (1980) interactive compensatory model asserts that reading performance is a “synthesized” (p. 35) process in which readers draw on multiple knowledge sources, meaning even poor readers could tap high-level knowledge sources such as semantic knowledge and contextual knowledge. It hypothesizes that items assessing higher order cognitive processes could be incorporated in the test while maintaining reasonable test difficulty. Through a combined use of Bloom’s (1956) taxonomy of educational objectives and Stanovich’s (1980) interactive compensatory model, I explore a potential need to broaden the range of cognitive processes in Hong Kong’s English language reading literacy assessment and present the following research questions:
-
1.
How has the range of cognitive processes in the HKDSE English reading literacy test varied over the first 8 years of its administration? Is there a pattern to the cognitive processes assessed?
-
2.
If there is a pattern, how does it correspond with the accuracy rate of the items?