A Study of Issues and Techniques for Creating Core Vocabulary Lists for English as an International Language
Core vocabulary lists have long been a tool used by language learners and instructors seeking to facilitate the initial stages of foreign language learning (Fries & Traver, 1960: 2). In the past, these lists were typically based on the intuitions of experienced educators. Even before the advent of computer technology in the mid-twentieth century, attempts were made to create such lists using objective methodologies. These efforts regularly fell short, however, and – in the end – had to be tweaked subjectively. Now, in the 21st century, this is unfortunately still true, at least for those lists whose methodologies have been published. Given the present availability of sizable English-language corpora from around the world and affordable personal computers, this thesis seeks to fill this methodological gap by answering the research question: How can valid core vocabulary lists for English as an International Language be created? A practical taxonomy is proposed based on Biber’s (1988, 1995) multi-dimensional analysis of English texts. This taxonomy is based on correlated linguistic features and reasonably covers representative spoken and written texts in English. The four-part main study assesses the variance in vocabulary data within each of the four key text types: interactive (face-to-face conversation), academic exposition, imaginative narrative, and general reported exposition. The variation in word types found at progressive intervals in corpora of various sizes is measured using the Dice coefficient, a coefficient originally used to measure species variation in different biotic regions (Dice, 1945). The second study proceeds to compare the most frequent vocabulary types in each of the four text types using an equal-sized collection of each text type. Of special interest is the difference between spoken and written texts. Though types are arguably the proper unit to investigate when comparing vocabulary variation, few learners would want to approach vocabulary learning one word type at a time (Nation & Meara, 2002; Bauer & Nation, 1993). The third study thus compares the effect reordering words as families (as opposed to types) has on core vocabulary lists. An analysis is made of the major differences resulting from grouping the members of each word family under a single headword and summing their individual frequencies. Methods are then discussed for how core vocabulary lists of various sizes can be constructed based on the findings of these three studies. Recommendations are made regarding the size and composition of the source corpus and the core list extraction and construction methodology based on the learning objectives.