A Study of Issues and Techniques for Creating Core Vocabulary Lists for English as an International Language

Sorell, C. Joseph

doi:10.26686/wgtn.17005807.v1

thesis_access.pdf (5.15 MB)

A Study of Issues and Techniques for Creating Core Vocabulary Lists for English as an International Language

thesis

posted on 2021-11-13, 21:23 authored by Sorell, C. Joseph

Core vocabulary lists have long been a tool used by language learners and instructors seeking to facilitate the initial stages of foreign language learning (Fries & Traver, 1960: 2). In the past, these lists were typically based on the intuitions of experienced educators. Even before the advent of computer technology in the mid-twentieth century, attempts were made to create such lists using objective methodologies. These efforts regularly fell short, however, and – in the end – had to be tweaked subjectively. Now, in the 21st century, this is unfortunately still true, at least for those lists whose methodologies have been published. Given the present availability of sizable English-language corpora from around the world and affordable personal computers, this thesis seeks to fill this methodological gap by answering the research question: How can valid core vocabulary lists for English as an International Language be created? A practical taxonomy is proposed based on Biber’s (1988, 1995) multi-dimensional analysis of English texts. This taxonomy is based on correlated linguistic features and reasonably covers representative spoken and written texts in English. The four-part main study assesses the variance in vocabulary data within each of the four key text types: interactive (face-to-face conversation), academic exposition, imaginative narrative, and general reported exposition. The variation in word types found at progressive intervals in corpora of various sizes is measured using the Dice coefficient, a coefficient originally used to measure species variation in different biotic regions (Dice, 1945). The second study proceeds to compare the most frequent vocabulary types in each of the four text types using an equal-sized collection of each text type. Of special interest is the difference between spoken and written texts. Though types are arguably the proper unit to investigate when comparing vocabulary variation, few learners would want to approach vocabulary learning one word type at a time (Nation & Meara, 2002; Bauer & Nation, 1993). The third study thus compares the effect reordering words as families (as opposed to types) has on core vocabulary lists. An analysis is made of the major differences resulting from grouping the members of each word family under a single headword and summing their individual frequencies. Methods are then discussed for how core vocabulary lists of various sizes can be constructed based on the findings of these three studies. Recommendations are made regarding the size and composition of the source corpus and the core list extraction and construction methodology based on the learning objectives.

History

Copyright Date

2013-01-01

Date of Award

2013-01-01

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Applied Linguistics

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

950201 Communication Across Languages and Culture

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Victoria University of Wellington School

School of Linguistics and Applied Language Studies

Advisors

Nation, Paul; Macalister, John

Usage metrics

Keywords

Core vocabulary Corpus linguistics Zipf's law School: School of Linguistics and Applied Language Studies 200401 Applied Linguistics and Educational Linguistics 950201 Communication Across Languages and Culture Degree Discipline: Applied Linguistics Degree Level: Doctoral Degree Name: Doctor of Philosophy Applied Linguistics and Educational Linguistics

Licence

Author Retains Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A Study of Issues and Techniques for Creating Core Vocabulary Lists for English as an International Language

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

ANZSRC Type Of Activity code

Victoria University of Wellington Item Type

Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports