Open Access Te Herenga Waka-Victoria University of Wellington
thesis_access.pdf (2.95 MB)

Quantifying Substitutability

Download (2.95 MB)
posted on 2021-11-15, 02:27 authored by Wang, David X.

In this thesis, we will tackle the problem of how keyphrase extraction systems can be evaluated to reveal their true efficacy. The aim is to develop a new semantically-oriented approximate string matching criteria, one that is comparable to human judgements, but without the cost and energy associated with manual evaluation. This matching criteria can also be adapted for any information retrieval (IR) system where the evaluation process involves comparing candidate strings (produced by the IR system) to a gold standard (created by humans). Our contributions are threefold. First, we define a new semantic relationship called substitutability – how suitable a phrase is when used in place of another – and then design a generic system which measures/quantifies this relationship by exploiting the interlinking structure of external knowledge sources. Second, we develop two concrete substitutability systems based on our generic design: WordSub, which is backed by WordNet; and WikiSub, which is backed by Wikipedia. Third, we construct a dataset, with the help of human volunteers, that isolates the task of measuring substitutability. This dataset is then used to evaluate the performance of our substitutability systems, along with existing approximate string matching techniques, by comparing them using a set of agreement metrics. Our results clearly demonstrate that WordSub and WikiSub comfortably outperform current approaches to approximate string matching, including both lexical-based methods, such as R-precision; and semantically-oriented techniques, such as METEOR. In fact, WikiSub’s performance comes sensibly close to that of an average human volunteer, when comparing it to the optimistic (best-case) interhuman agreement.


Copyright Date


Date of Award



Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level


Degree Name

Master of Science

ANZSRC Type Of Activity code

970108 Expanding Knowledhe in the Information and Computing Sciences

Victoria University of Wellington Item Type

Awarded Research Masters Thesis



Victoria University of Wellington School

School of Engineering and Computer Science


Gao, Xiaoyang; Andreae, Peter