Meta-Learning Loss Functions for Deep Neural Networks
Humans can often quickly and efficiently solve new complex learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully. In this thesis, we developed methods for meta-learning the loss function of deep neural networks. In particular, we first introduced a method for meta-learning symbolic model-agnostic loss function called Evolved Model Agnostic Loss (EvoMAL). This method consolidates recent advancements in loss function learning and enables the development of interpretable loss functions on commodity hardware. Through empirical and theoretical analysis, we uncovered patterns in the learned loss functions, which later inspired the development of Sparse Label Smoothing Regularization (SparseLSR), which is a significantly faster and more memory-efficient way to perform label smoothing regularization. Second, we challenged the conventional notion that a loss function must be a static function by developing Adaptive Loss Function Learning (AdaLFL), a method for meta-learning adaptive loss functions. Lastly, we developed Neural Procedural Bias Meta-Learning (NPBML) a task-adaptive few-shot learning method that meta-learns the parameter initialization, optimizer, and loss function simultaneously.