An Analysis of Selection in Genetic Programming
This thesis presents an analysis of the selection process in tree-based Genetic Programming (GP), covering the optimisation of both parent and offspring selection, and provides a detailed understanding of selection and guidance on how to improve GP search effectively and efficiently. The first part of the thesis providesmodels and visualisations to analyse selection behaviour in standard tournament selection, clarifies several issues in standard tournament selection, and presents a novel solution to automatically and dynamically optimise parent selection pressure. The fitness evaluation cost of parent selection is then addressed and some cost-saving algorithms introduced. In addition, the feasibility of using good predecessor programs to increase parent selection efficiency is analysed. The second part of the thesis analyses the impact of offspring selection pressure on the overall GP search performance. The fitness evaluation cost of offspring selection is then addressed, with investigation of some heuristics to efficiently locate good offspring by constraining crossover point selection structurally through the analysis of the characteristics of good crossover events. The main outcomes of the thesis are three new algorithms and four observations: 1) a clustering tournament selection method is developed to automatically and dynamically tune parent selection pressure; 2) a passive evaluation algorithm is introduced for reducing parent fitness evaluation cost for standard tournament selection using small tournament sizes; 3) a heuristic population clustering algorithm is developed to reduce parent fitness evaluation cost while taking advantage of clustering tournament selection and avoiding the tournament size limitation; 4) population size has little impact on parent selection pressure thus the tournament size configuration is independent of population size; and different sampling replacement strategies have little impact on the selection behaviour in standard tournament selection; 5) premature convergence occurs more often when stochastic elements are removed from both parent and offspring selection processes; 6) good crossover events have a strong preference for whole program trees, and (less strongly) single-node or small subtrees that are at the bottom of parent program trees; 7) the ability of standard GP crossover to generate good offspring is far below what was expected.