Evolutionary Computation for Designing Deep Recurrent Neural Networks
Recurrent Neural Networks (RNNs) are a major class of Artificial Neural Networks (ANNs). Suitable network architecture is vital for RNNs to achieve high performance. However, designing the architecture of deep RNNs (DRNNs) is a complicated and time-consuming process. Each layer of a DRNN contains numerous hyper-parameters. It is challenging to optimize the hyper-parameters of all layers jointly to achieve the best possible learning performance. Trial-and-error methods for designing DRNNs have been proven to be laborious and highly costly in practice. Therefore, an efficient automatic architecture search technique is needed to design DRNNs.
Genetic Algorithm (GA) is a popular evolutionary computation based approach. Existing research works already explored the use of GAs in designing ANNs. But, most of the existing approaches focus on the design of fixed-depth ANNs. However, it is not easy to fix the depth of DRNNs in advance, since different problems need DRNNs of different depths. Hence, the potential of GA has to be further explored to evolve the architecture of DRNNs of varying depths.
The primary goal of this thesis is to develop advanced GA approaches for designing Long Short Term Memory (LSTM) based DRNN architectures of varying depths effectively and efficiently.
Firstly, this thesis proposes a GA-based algorithm called Two-Stage Surrogate-Assisted GA (TS-SA-GA) with a progressive incremental strategy to design the architecture of LSTM networks of varying depths. The progressive approach can effectively extend well-designed shallow networks to high performing deep networks. Moreover, the new GA-based algorithm adopts newly designed knowledge-driven crossover and mutation operators to identify and repair LSTM network designs affected by inappropriate use of activation functions, thereby significantly reducing the chances for GA to evolve hard-to-train LSTM network architectures. Furthermore, this thesis proposes a two-stage surrogate method to predict the trainability and fitness of LSTM networks. This improves the efficiency and effectiveness of the GA-based algorithm to evolve LSTM networks.
Secondly, this thesis proposes a new algorithm called Evolutionary Design of LSTM Ensembles (ED-LSTM-Ensemble) based on multi-objective optimization techniques to evolve LSTM networks and ensembles simultaneously to directly utilize the ensemble performance to drive the evolution of base LSTM networks. Additionally, this thesis proposes a connection weight inheritance strategy to evolve LSTM networks and ensembles efficiently and effectively.
Thirdly, this thesis proposes a new algorithm called Skip Connections through Evolutionary and Differential Architecture search (SCEDA) to design LSTM networks together with appropriate skip connections automatically. Designing LSTM networks together with skip connections is highly challenging. This thesis hybridize gradient-based DAS and evolutionary architecture search to optimize the architecture of LSTM networks with suitable skip connections in a single evolutionary process to improve the efficiency and effectiveness of the search process.
Finally, this thesis conducts an in-depth empirical analysis of the impact of activation functions on the trainability of LSTM networks based on the concept of Edge Of Chaos. The analysis reveals the strong interrelation between the activation functions across multiple layers of an LSTM network and the trainability of the network. The analysis also highlights the importance of controlling activation functions to avoid generating untrainable LSTM networks. On the basis of the analysis, this thesis proposes a machine learning model to guide the use of activation functions in LSTM networks.