From: Designing and validating a scale for evaluating the sources of unreliability of a high-stakes test
Item | Â | Mean | SD |
---|---|---|---|
22 | Some items were designed based on the extra-curricular sources | 3.45 | .95 |
26 | The difficulty of items in the general and specific sections was not balanced and was disproportional | 3.39 | 1.00 |
21 | The format of some options was not familiar to me | 3.37 | 1.02 |
23 | The items were ambiguous and unclear | 3.25 | .99 |
20 | The items were of low quality (e.g., some items had two or more correct options) | 3.20 | 1.11 |
24 | The items were irrelevant to their domains of knowledge, or they had overlap to a large extent | 3.07 | .97 |
25 | The length of the test was too short to test the test-takers’ ability in different domains | 2.99 | 1.07 |
19 | The items were designed so that mathematical errors could occur in calculating the scores | 2.93 | .94 |
18 | The items were biased against test-takers with physical disabilities, like color blindness | 2.43 | 1.07 |
17 | The items were biased against males or females | 2.21 | 1.01 |
27 | Some test-takers benefitted more from educational services because of the more related courses they passed in their previous level of education | 3.39 | 1.04 |
28 | Some majors were gender-specific and gender-biased which was a source of frustration for talented candidates | 2.98 | 1.10 |
29 | The security protocols were not completely followed in preparing the exams, so some candidates could have access to the test | 2.71 | 1.15 |