Applications of the Bures-Wasserstein Distance in Linear Metric Learning
Metric Learning has proved valuable in information retrieval and classification problems, with many algorithms using statistical formulations to their advantage. Optimal Transport, a powerful framework for computing distances between probability distributions, has seen significant uptake in Machine Learning and some applications in Metric Learning. The Bures-Wasserstein distance, a particular formulation of Optimal Transport, has seen limited application in Metric Learning despite being well suited to the field. In this thesis Linear Metric Learning algorithms incorporating the Bures-Wasserstein distance are developed, utilising its novel properties. The first set of algorithms we contribute use the Bures-Wasserstein distance for learning linear metrics using Cyclic Projections, Riemannian barycenters and gradient-based algorithms. The convergence properties and classification performance of these algorithms are then compared to benchmark Linear Metric Learning algorithms. Our algorithms achieve competitive classification performance on feature and image datasets and converge reliability.
The second set of contributions extend Linear Metric Learning with the Bures-Wasserstein distance to a low-rank context to address issues related to scalability. An extension of the previous Cyclic Projections algorithm and a Riemannian optimisation method are developed. Convergence and classification experiments analogous to the full-rank case are conducted to examine how the reduced rank affects both of these performance metrics. Both algorithms converge reliably under a variety of synthetic conditions. Reduction in rank generally decreased the training time for algorithms until effects caused by the size of the training data dominated. Reducing rank either left classification error unchanged or increased it. With respect to other low-rank Linear Metric Learning algorithms our algorithms again performed competitively with respect to classification error.
The last set of contributions consider Linear Metric Learning on Symmetric Positive Definite (SPD) matrices. Some datasets are expressed as or are suited to a SPD representation. As such, Metric Learning algorithms that respect the structure of these matrices have found success when adapted to this kind of data. Another extension of the Cyclic Projections algorithm is developed to use the Generalised Bures-Wasserstein distance on SPD matrices. Two other methods based on a ground metric formulation of the Bures-Wasserstein distance for learning linear metrics on SPD data are also introduced. One ground metric formulation uses an approximation (inspired by the Sinkhorn distance) to increase its scalability. Convergence experiments on synthetic problems adapted to SPD matrices are conducted on the ground metric algorithms to establish their convergence properties. Classification experiments are also conducted with respect to several types of SPD data. While the Cyclic Projections method is competitive, the ground metric algorithms have poor performance. Despite their poor classification performance, the approximation method does evaluate faster than the Generalised Bures-Wasserstein distance, potentially offering some scalability improvements.