Sample-efficient Optimization Using Neural Networks
The solution to many science and engineering problems includes identifying the minimum or maximum of an unknown continuous function whose evaluation inflicts non-negligible costs in terms of resources such as money, time, human attention or computational processing. In such a case, the choice of new points to evaluate is critical. A successful approach has been to choose these points by considering a distribution over plausible surfaces, conditioned on all previous points and their evaluations. In this sequential bi-step strategy, also known as Bayesian Optimization, first a prior is defined over possible functions and updated to a posterior in the light of available observations. Then using this posterior, namely the surrogate model, an infill criterion is formed and utilized to find the next location to sample from. By far the most common prior distribution and infill criterion are Gaussian Process and Expected Improvement, respectively. The popularity of Gaussian Processes in Bayesian optimization is partially due to their ability to represent the posterior in closed form. Nevertheless, the Gaussian Process is afflicted with several shortcomings that directly affect its performance. For example, inference scales poorly with the amount of data, numerical stability degrades with the number of data points, and strong assumptions about the observation model are required, which might not be consistent with reality. These drawbacks encourage us to seek better alternatives. This thesis studies the application of Neural Networks to enhance Bayesian Optimization. It proposes several Bayesian optimization methods that use neural networks either as their surrogates or in the infill criterion. This thesis introduces a novel Bayesian Optimization method in which Bayesian Neural Networks are used as a surrogate. This has reduced the computational complexity of inference in surrogate from cubic (on the number of observation) in GP to linear. Different variations of Bayesian Neural Networks (BNN) are put into practice and inferred using a Monte Carlo sampling. The results show that Monte Carlo Bayesian Neural Network surrogate could performed better than, or at least comparably to the Gaussian Process-based Bayesian optimization methods on a set of benchmark problems. This work develops a fast Bayesian Optimization method with an efficient surrogate building process. This new Bayesian Optimization algorithm utilizes Bayesian Random-Vector Functional Link Networks as surrogate. In this family of models the inference is only performed on a small subset of the entire model parameters and the rest are randomly drawn from a prior. The proposed methods are tested on a set of benchmark continuous functions and hyperparameter optimization problems and the results show the proposed methods are competitive with state-of-the-art Bayesian Optimization methods. This study proposes a novel Neural network-based infill criterion. In this method locations to sample from are found by minimizing the joint conditional likelihood of the new point and parameters of a neural network. The results show that in Bayesian Optimization methods with Bayesian Neural Network surrogates, this new infill criterion outperforms the expected improvement. Finally, this thesis presents order-preserving generative models and uses it in a variational Bayesian context to infer Implicit Variational Bayesian Neural Network (IVBNN) surrogates for a new Bayesian Optimization. This new inference mechanism is more efficient and scalable than Monte Carlo sampling. The results show that IVBNN could outperform Monte Carlo BNN in Bayesian optimization of hyperparameters of machine learning models.