In What Order Should Learners Learn Japanese Vocabulary? A Corpus-based Approach
This thesis attempts to answer the following two main research questions:1) In what order should learners of Japanese as a second language learn words and characters in order to be able to read Japanese? 2) How will the order vary according to the purpose of learning? To answer these questions, a Vocabulary Database for Reading Japanese (VDRJ) and a Character Database of Japanese (CDJ) were first developed from the Balanced Contemporary Corpus of Written Japanese (BCCWJ) 2009 monitor version (NINJAL, 2009) which contains book texts and internet-forum site texts with 33 million running words in total. Word and character rankings for international students, non-academic learners and general written Japanese were included in these databases. These rankings were proven to be valid for their respective purposes as they provided higher text coverage for the target texts than other texts. After analysing the use of vocabulary and characters in Japanese, three groups of domain-specific words, namely common academic words, limited-academic-domain words and literary words were extracted. In order to test the expected efficiency for learning these groups of words, an index entitled Text Covering Efficiency (TCE) in different types of texts was proposed. The TCE represents the expected return per unit of text length from learning a group of words. As such, the TCE score in the target text domain should determine the order in which words in this domain are most efficiently learned. Indeed, the extracted common academic words and limited-academic-domain words showed significantly higher text coverage and TCE scores in academic texts than in other texts. Literary words also provided high text coverage and high TCE scores in literary texts, despite a lower efficiency level than that of academic vocabulary in academic texts. Learning domain-specific words is expected to be much more efficient than learning other words at the intermediate level. At the advanced level or above, learning domain-specific words will be further more efficient in some domains such as the natural sciences. In sum, the TCE has been shown to provide useful information for deciding on the learning order of various groups of words. Other findings based on the analyses using the databases and word lists include the features of some indices for dispersion and adjusted frequency, lexical features of different media and genres, indexicality of the distributions of word origins and parts of speech, and the discrepancy between learning orders of words and Kanji. A Lexical Learning Possibility Index for a Reading Text (LEPIX) was also proposed for the simplification of a text as a vocabulary learning resource.