Backpropagation-free learning with an information surrogate

Ma, Wan-Duo

doi:10.26686/wgtn.21331005

Backpropagation-free learning with an information surrogate

thesis

posted on 2024-01-08, 02:03 authored by Wan-Duo MaWan-Duo Ma

This thesis explores modern deep neural networks from an information-theoretical point of view. The main contribution of the thesis is a new and potentially more efficient training framework alternative to conventional end-to-end backpropagation training. The thesis develops and analyzes several architectures using this framework, including a new and more biologically plausible learning architecture.

To address the computational difficulties of dealing with information-theoretical quantities, this thesis turns to an existing statistical technique, the Hilbert-Schmidt independence criterion (HSIC). HSIC is a non-parametric kernel method to characterize the (in)dependence of random variables. In this thesis, HSIC is used to computationally formulate and explore the information bottleneck principle. The Information bottleneck can be seen as a trade-off in the hidden representation between the information needed for predicting the task-specific target, and the information retrained about the input. The thesis explores the information bottleneck and its HSIC formulation through the following ideas:

Blind facial basis discovery. The use of HSIC as an approximate measure of independence is explored through a small problem resembling Independent Component Analysis (ICA) or blind basis separation. Three-dimensional computer avatars are often implemented as a parametric model, wherethe parameters that control the facial expression are defined as part of a so-called blendshape system. However, high-quality avatar models are constructed by laborious manual digital sculpting. The proposed method uses HSIC as an ICA-like criterion to discover a distinct facial basis from the given facial animation. The result shows that an ICA criterion can be simply and effectively implemented using HSIC regularization. In the visual result, the proposed method successfully generates a distinct facial basis. The use of HSIC in this chapter is then adopted in the rest of the chapters as the mutual information surrogate.

HSIC-bottleneck. The HSIC-bottleneck is an alternative to conventional cross-entropy loss and backpropagation that has a number of distinct advantages. The network is learned by fully localized bottleneck objectives that breaks the need for end-to-end training. Additionally, the network output representation can be used directly for classification regardless of the number of dimensions. This thesis shows that a very deep neural network without the skip-connection techniques is learnable with fully localized objectives in HSIC-bottleneck framework, avoiding the learning difficulties often seen when backpropagation is applied to very deep networks with-out skip connections. The HSIC-bottleneck is the backbone concept of this thesis. It will be extended and applied to several research problems in the following chapters.

HSIC-subsampling. HSIC-subsampling provides an efficient sampling methodology to accelerate HSIC computation. By taking advantage of stochastic minibatch gradient descent learning, HSIC-subsampling is capable of approximating the entire HSIC computation after a few training iterations. It can be directly applied to various objectives that involve HSIC computation such as HSIC-bottleneck.

Predictive bottleneck. The predictive bottleneck is an extension of theHSIC-bottleneck idea. Rather than using an accurate (and potentially some-what expensive) dependency measure such as HSIC in the objective, the predictive bottleneck uses the lightweight auxiliary networks to approximate the localized information bottleneck objective. Furthermore, the predictive bottleneck demonstrates the ability to explore the relevant information in the training input that cannot be easily discovered by traditional learning. Two prominent results are empirically demonstrated in these experiments. Firstly, a gray-scale dataset is manipulated by embedding the most significant bits into the least significant bits and replacing the most significant bits with Gaussian noise. The predictive bottleneck can recognize the embedded information without memorizing the noise information. Secondly, the predictive bottleneck reduces the network’s sensitivity tonoise. The experiments demonstrate that the predictive bottleneck noise has less impact on the network performance.

Biological information bottleneck (BioiB). BioiB re-thinks supervision in deep neural network learning. It demonstrates that it is sufficient to expose relevant task information at an intermediate layer rather than at the output layer as in traditional supervised learning. Given suitable input features, local InfoMax and biological Barlow-like principles are sufficient for the unsupervised emergence of disentangled concepts suitable for classification, without output supervision. BioiB shows improved learning speeds and accuracy when compared to existing biologically motivated methods on the benchmarked datasets. Practically, the experiments show that existing neural models such as VGG16, VGG19, ResNet32 can be adapted to the BioiB framework, with performance comparable to or exceeding that of backpropagation and existing biologically plausible alternatives.

In summary, this thesis introduces a computationally attractive class of approaches to information-theoretic learning in deep networks, but also demonstrates the performance of these methods on well-known deep architectures and public datasets. In other words, the work presented here will be of interest both to the research community and to industrial production applications.

History

Copyright Date

2022-10-14

Date of Award

2022-10-14

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

2 Strategic basic research

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Alternative Language

en; en

Victoria University of Wellington School

School of Engineering and Computer Science

Advisors

Lewis, John; Kleijn, Bastiaan

Usage metrics

Keywords

Deep Learning Information Theory Kernel Method School: School of Engineering and Computer Science 080109 Pattern Recognition and Data Mining 170203 Knowledge Representation and Machine Learning Degree Discipline: Computer Science Degree Level: Doctoral Knowledge Representation and Machine Learning Pattern Recognition and Data Mining Degree Name: Doctor of Philosophy

Licence

Author Retains Copyright

Backpropagation-free learning with an information surrogate

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

ANZSRC Type Of Activity code

Victoria University of Wellington Item Type

Language

Alternative Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports