An analysis of the differences among L2 listening comprehension test formats

Mihara, Kei

doi:10.1186/s40468-015-0021-5

Research
Open access
Published: 26 August 2015

An analysis of the differences among L2 listening comprehension test formats

Kei Mihara¹

Language Testing in Asia volume 5, Article number: 12 (2015) Cite this article

4915 Accesses
1 Citations
2 Altmetric
Metrics details

Abstract

Background

The present study aims to investigate which variables affect English as a foreign language (EFL) students’ listening comprehension test performance. It examines two types of variables: (1) test formats and (2) test materials.

Methods

First, three types of test formats are investigated: (1) questions are not written but given orally only once in English and in the students’ first language (L1) after they listen to the spoken text and (2) questions are not written but given orally in English and in the students’ L1 before and after they listen to the spoken text. The third type is a control group: Questions are written and also given orally in English after the students listen to the spoken text. The first type of test is similar to the Test of English as a Foreign Language Paper-Based Testing (TOEFL PBT) and the second type the General Tests of English Language Proficiency (G-TELP). The third type, a control group, is the format of the Test of English for International Communication (TOEIC). Second, this study examines whether there are any differences between dialogues and monologues in terms of students’ performance.

Results

The results show that test formats do not make a statistically significant difference to students’ test performance.

Conclusions

Repeating questions after listening to the spoken text does not help them perform better, even if they listen to questions not only in English but also in their L1. As for differences in test materials, the results are not decisive. It is not possible to determine whether there are any differences between dialogues and monologues.

Background

The present study investigates whether or not test takers’ performance is affected by factors other than their English proficiency. The purpose of the present study is twofold. It first explores the possibilities of the differences in test formats affecting test results. It also tries to determine whether the differences in test materials have some influence on students’ test performance.

Three test formats were investigated here. They were all multiple-choice formats, but they differed in the mode of presentation of the questions, e.g., whether they had a chance to listen to the questions once or twice. Previous studies have examined the effects of test format on test takers’ performance; however, their results were mixed, indicating the need for further research. As for the differences in test materials, less research has been conducted so far. Thus, the present study examined whether there are any differences by comparing conversations between two people with short talks given by a single speaker.

Literature review

Previous research shows that it is easier for EFL students to take a multiple-choice listening comprehension test if questions and options are both written on paper. Yanagawa and Green (2008) investigated three formats: (1) previewing both the question and options, (2) previewing the question, and (3) previewing the options. They used Part 3 of the Test of English for International Communication (TOEIC) listening section. The spoken texts were short conversations of 25 to 59 words. Japanese test takers of the TOEIC test sites in and around Tokyo were asked to participate in their study, and a total of 279 people volunteered. They found that question preview helps test takers produce more correct answers, while option preview does not lead to high scores. They suggested that in the case of an option preview format test, teachers should advise their students “not to give too much attention to answer options prior to listening, given the lack of any significant benefit from option preview” (Yanagawa and Green 2008, p. 120). Iimura (2010a) compared four formats: (1) previewing both the question and options, (2) previewing the question, (3) previewing the options, and (4) previewing neither the question nor the options. A total of 40 Japanese university students participated in his study. He used materials taken from the Grade 2 EIKEN Test.^{Footnote 1} The experiment was conducted using a computer. In Format 1, for example, students saw both the question and options on the screen, and 10 seconds later they listened to a dialogue via headphones. Then they chose the answer by clicking on the button. The results showed that only the full preview and non-preview formats produced a statistically significant difference. He therefore mentioned that “format difference did not considerably affect listening performance” (p. 31). Further study was conducted by Chang and Read (2013), who compared two formats: (1) the written mode, which allowed the participants to preview both questions and options, and (2) the oral mode, which presented both items orally. The participants were 87 university students in Taiwan, who were divided into two groups. The materials were three types of spoken texts: dialogues with 6–9 utterances, conversations with 2–3 utterances, and short talks for about 20–30 seconds. One group took a test of which the first half was presented in oral mode and the rest in written mode. The other group took a test in which the order of the modes was reversed. The results showed that “students performed slightly better with the written mode” (Chang and Read 2013, p. 580), corresponding to grades of 66 % in the oral mode as opposed to 68 % in the written mode. They also mentioned that “the majority of participants considered test items in the written mode easier than in the oral mode” (Chang and Read 2013, p. 582) since 78 % of them answered that they preferred the written mode.

However, there are some studies indicating that the test is not easier simply because the questions are written. Filipi (2012) investigated whether questions should be offered in the target language (French, German, Italian, Japanese, Chinese, or Indonesian) or the test takers’ first language (English) in listening comprehension tests. They used the test known as the Assessment of Language Competence (ALC), which was developed to examine the listening and reading skills of students mainly at the secondary school level in Australia, New Zealand, and the Asia-Pacific region. A total of 348 students participated in trial tests, and about 25,000 students took the final test. The results indicated that questions written in the target language were more difficult and challenging, and that “some students may be disadvantaged if questions appear in the target language because they might understand the stimulus but not the questions or options for the answers” (p. 525). Filipi (2012) also conducted a questionnaire study, finding that a large proportion of participants believed the test items were likely to be more difficult when the question was written in the target language. From Filipi’s (2012) findings, we can presume that L1 support might help students understand the questions and options correctly. With L1 support, students could understand test questions correctly even if the questions were not written.

As for oral repetition, less research has been conducted on the effects of repetition of questions, although previous research has showed the effects of repetition of a spoken text (Chang and Read 2006; Sakai 2009). Iimura (2010b) examined whether repeating the question orally affects test takers’ performance in a multiple-choice listening test. He examined a new multiple-choice format where all three components (question, text, and options) were given orally. He used conversations taken from the Grade 3, Pre-2, 2, and Pre-1 EIKEN Test. The spoken texts were approximately 50 words in length. He compared two formats: (1) items were presented in the order question, text, options, and (2) items were presented in the order question, text, question, options. A total of 58 Japanese university students participated in his study. The results showed that when the questions and options were both given orally, “there was no significant difference between the mean scores in the two formats and repeating questions did not enhance listening performance” (p. 52). A possible reason might be that students did not understand the questions in English. The questions might have contained some difficult vocabulary items. If they do not comprehend the questions accurately, repeating questions is not likely to boost their performance. Therefore, in this point, too, we can postulate that L1 support might facilitate students’ understanding when repeating questions.

Regarding spoken texts, Yanagawa and Green (2008) and Iimura (2010a, b) examined dialogues or conversations between two people. However, monologues are also worth investigating, since both forms of spoken texts are used in popular external tests such as the Test of English for International Communication (TOEIC), the Test of English as a Foreign Language Internet-Based Testing (TOEFL iBT), and the International English Language Testing System (IELTS). Chang and Read (2013) used both dialogues and monologues, but they did not focus on spoken texts. Papageorgiou, Stevens, and Goodwin (2012) examined whether differences in the type of spoken text can lead to differences in test takers’ performance on a multiple-choice listening test. They took data from 494 examinees, whose first language was Spanish, during a routine administration of the Michigan English Test. Papageorgiou et al. (2012) created three pairs of long dialogue and monologue stimuli with identical content and vocabulary in order to investigate which type of input is more difficult for test takers. Their results were rather inconclusive. In one case, a dialogic input was easier because information was presented in direct speech while it was delivered in reported speech in the monologue. In another, however, a monologic input was “more structured and detailed” (p. 388) and was therefore easier than a dialogic version. They said, “The study of the relative difficulty of dialogic and monologic input is a complex issue due to the numerous, well-documented variables that affect listening comprehension” (p. 391). Therefore, the present study investigates both dialogues and monologues in order to determine whether there are any differences between them. This study also examines whether L1 support helps students perform better when listening to the question twice.

Methods

Participants

The present study involved 60 first-year university students who were enrolled in three general English classes in Japan. They were all Japanese students from the Faculty of Science and Engineering. Their ages ranged from 18 to 20. They had been learning English as a foreign language for six years or more. There were 19 males and 1 female from each of the three classes. Actually, there were 24 to 26 students in each class, and all of them were invited to take part in the research. However, some students decided not to take part in the experiment after listening to my explanation. In addition, there was only one female student in one of the classes. Therefore, I asked those who were cooperative with me to participate in this research. Fortunately, I found 19 male students and 1 female student from each class to equalize the groups. The classes were held twice a week for 90 minutes and were compulsory for all first-year students. The experiment was conducted in class, using the first 15 minutes of a 90-minute class. The students were asked to take a multiple-choice listening comprehension test twice a week, eight times altogether. For ethical reasons, I promised the students that the results would not count as part of their grades. However, I told them that the experiment would be a good practice for them. I explained to the students in Classes 1 and 3 that the TOEFL and the TOEIC are high-stakes tests, and that they would have an advantage if they achieved a high score on the test. I told those in Class 2 that they had to take the G-TELP as a term-end test. The university required all first-year students to take the G-TELP twice a year—at the beginning of the academic year and at the end of the second semester.

The three classes (Classes 1, 2, and 3) had to be statistically insignificant in their English proficiency given the study’s design. In order to establish their comparability, an analysis of variance (ANOVA) was performed using the raw scores of a proficiency test. The test administered was G-TELP Level 3. There are five levels in the General Tests of English Language Proficiency (G-TELP). Level 3 consists of grammar, listening, and reading and vocabulary sections, and is equivalent to TOEIC 400 to 600. The G-TELP is provided by the International Testing Services Center (ITSC) in San Diego, California in the USA. Similar to the TOEIC, it is especially popular in South Korea and Japan. The reason why the G-TELP was chosen to measure the students’ English proficiency was that all of the participants in the present study had taken this test at the beginning of the academic year. The university paid the examination fees and asked all of the first-year students to take the G-TELP so that they could measure their achievement. Therefore, all of the participants’ G-TELP data were available. Since the present study focused on listening, the descriptive statistics (number of participants, means and standard deviation) of the listening section as well as the total score are shown in Tables 1 and 2. The results of the ANOVA presented in Table 3 confirmed that there were no significant differences among the three classes. Thus, they were considered equivalent in their English proficiency.

Table 1 Descriptive statistics of the listening section of the proficiency test

<<means on Factor B, full score?=?12>>
	1	2	3	4
mean:	6.417	6.950	7.617	6.800
n:	60	60	60	60

<<means on Factor B, full score = 6>>
	1	2	3	4
mean:	2.700	3.567	4.083	3.100
n:	60	60	60	60

An analysis of the differences among L2 listening comprehension test formats

Abstract

Background

Methods

Results

Conclusions

Background

Literature review

Methods

Participants

Materials

Procedure

Research questions

Results

Total scores

Conversations

Talks

Analysis

Conversations

Talks

Discussion

Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords