Argument-based validation of a high-stakes Listening test in Vietnam
More than a decade ago, the Vietnamese Government announced an educational reform to enhance the quality of English language education in the country. An important aspect of this reform is the introduction of the localized test of English proficiency which covers four language skills, namely listening, speaking, reading, and writing. This high-stakes English test is developed and administered by only a limited number of institutions in Vietnam. Although the validity of the test is a considerable concern for test-takers and test score users, it has remained an under-researched area. This study aims to partly address the issue by validating a listening test developed by one of the authorized institutions in Vietnam. In this thesis, the test is referred to as the Locally Created Listening Test or the LCLT. Using the argument-based approach to validation (Kane, 1992, 2013; Chapelle, 2008), this research aims to develop a validity argument for the evaluation, generalization and explanation inferences of the LCLT. Three studies were carried out to elicit evidence to support these inferences. The first study investigated the statistical characteristics of the LCLT test scores, focusing on the evaluation and generalization inference. The second study shed light on the extent to which test items engaged the target construct. The third study examined whether test-takers’ scores on the LCLT correlated well with their scores on an international English test that measured a similar construct. Both the second and third study were carried out to support the explanation inference. These three studies did not provide enough evidence to successfully support the validity argument for the LCLT. The test was found to have major flaws that affected the validity of score interpretations. In light of the research findings, suggestions were given for the betterment of future LCLTs. At the same time, this research helped to uncover the impacts of certain text and task-related factors on the test-takers’ performance. Such insights led to practical implications for the assessment of second language listening in general. The results of this research also contributed to the theory and practice of test localization, a relatively new paradigm in language testing and assessment.