Evolutionary Computation for Designing Deep Recurrent Neural Networks

Anasseriyil Viswambaran, Ramya

doi:10.26686/wgtn.28578443

Evolutionary Computation for Designing Deep Recurrent Neural Networks

thesis

posted on 2025-03-11, 22:44 authored by Ramya Anasseriyil Viswambaran

Recurrent Neural Networks (RNNs) are a major class of Artificial Neural Networks (ANNs). Suitable network architecture is vital for RNNs to achieve high performance. However, designing the architecture of deep RNNs (DRNNs) is a complicated and time-consuming process. Each layer of a DRNN contains numerous hyper-parameters. It is challenging to optimize the hyper-parameters of all layers jointly to achieve the best possible learning performance. Trial-and-error methods for designing DRNNs have been proven to be laborious and highly costly in practice. Therefore, an efficient automatic architecture search technique is needed to design DRNNs.

Genetic Algorithm (GA) is a popular evolutionary computation based approach. Existing research works already explored the use of GAs in designing ANNs. But, most of the existing approaches focus on the design of fixed-depth ANNs. However, it is not easy to fix the depth of DRNNs in advance, since different problems need DRNNs of different depths. Hence, the potential of GA has to be further explored to evolve the architecture of DRNNs of varying depths.

The primary goal of this thesis is to develop advanced GA approaches for designing Long Short Term Memory (LSTM) based DRNN architectures of varying depths effectively and efficiently.

Firstly, this thesis proposes a GA-based algorithm called Two-Stage Surrogate-Assisted GA (TS-SA-GA) with a progressive incremental strategy to design the architecture of LSTM networks of varying depths. The progressive approach can effectively extend well-designed shallow networks to high performing deep networks. Moreover, the new GA-based algorithm adopts newly designed knowledge-driven crossover and mutation operators to identify and repair LSTM network designs affected by inappropriate use of activation functions, thereby significantly reducing the chances for GA to evolve hard-to-train LSTM network architectures. Furthermore, this thesis proposes a two-stage surrogate method to predict the trainability and fitness of LSTM networks. This improves the efficiency and effectiveness of the GA-based algorithm to evolve LSTM networks.

Secondly, this thesis proposes a new algorithm called Evolutionary Design of LSTM Ensembles (ED-LSTM-Ensemble) based on multi-objective optimization techniques to evolve LSTM networks and ensembles simultaneously to directly utilize the ensemble performance to drive the evolution of base LSTM networks. Additionally, this thesis proposes a connection weight inheritance strategy to evolve LSTM networks and ensembles efficiently and effectively.

Thirdly, this thesis proposes a new algorithm called Skip Connections through Evolutionary and Differential Architecture search (SCEDA) to design LSTM networks together with appropriate skip connections automatically. Designing LSTM networks together with skip connections is highly challenging. This thesis hybridize gradient-based DAS and evolutionary architecture search to optimize the architecture of LSTM networks with suitable skip connections in a single evolutionary process to improve the efficiency and effectiveness of the search process.

Finally, this thesis conducts an in-depth empirical analysis of the impact of activation functions on the trainability of LSTM networks based on the concept of Edge Of Chaos. The analysis reveals the strong interrelation between the activation functions across multiple layers of an LSTM network and the trainability of the network. The analysis also highlights the importance of controlling activation functions to avoid generating untrainable LSTM networks. On the basis of the analysis, this thesis proposes a machine learning model to guide the use of activation functions in LSTM networks.

History

Copyright Date

2023-02-28

Date of Award

2023-02-28

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains All Rights

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

1 Pure basic research

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Victoria University of Wellington School

School of Engineering and Computer Science

Advisors

Chen, Aaron; Xue, Bing; Nokooei, Mohammad

Usage metrics

Keywords

Degree Name: Doctor of Philosophy Degree Level: Doctoral Degree Discipline: Computer Science School: School of Engineering and Computer Science Long short term memory Deep learning Recurrent neural network 461103 Deep learning 461104 Neural networks

Licence

Author Retains Copyright

Evolutionary Computation for Designing Deep Recurrent Neural Networks

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

ANZSRC Type Of Activity code

Victoria University of Wellington Item Type

Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports