Removing Redundancy and Reducing Fitness Evaluation Costs in Genetic Programming
One of the greater issues in Genetic Programming (GP) is the computational effort required to run the evolution and discover a good solution. Phenomena such as program bloating (where genetic programs rapidly grow in size) can quickly exhaust available memory resources and slow down the evolutionary process, while the heavy cost of performing fitness evaluation can make problems which have a lot of available data very slow to solve. These issues may limit GP in some tasks it can appropriately be applied to, as well as inhibit its applications in time/space sensitive environments. In this thesis, we look at developing solutions to some of these issues in GP computational cost. First, we develop an algebraic program simplification method based on simple rules and hashing techniques, and use this method in conjunction with the standard GP on a variety of tasks. Our results suggest that program simplification can lead to a significant reduction in program size, while not significantly changing the effectiveness of the systems in finding solution programs. Secondly, we analyse the effects of program simplification on the internal GP "building blocks" to investigate whether simplification is a destructive or constructive force. Using two models for building blocks (numerical-nodes and the more complex fixed-depth subtree), we track building blocks through GP runs on a symbolic regression problem, both with and without using simplification. We find that the program simplification process can both disrupt and construct building blocks in the GP populations. However, GP systems using simplification appear to retain important building blocks, and the simplification process appears to lead to an increase in genetic diversity. These may help explain why using simplification does not reduce the effectiveness of GP systems in solving tasks. Lastly, we develop two methods of reducing the cost of fitness evaluation by reducing the number of node evaluations performed. The first method is elitism avoidance, which avoids re-evaluating programs which have been placed in the population using elitismreproduction. Thismethod reduces the CPU time for evolving solutions for six different GP tasks. The second method is a subtree caching mechanism which store fitness evaluations for subtrees in a cache so that they may be fetched when these subtrees are encountered in future fitness evaluations. Results suggest that using this mechanism can significantly reduce both the number of node evaluations and the overall CPU time used in evolving solutions, without reducing the fitness of the solutions produced.