Empirical Analysis of Schemata in Genetic Programming
Schemata and buiding blocks have been used in Genetic Programming (GP) in several contexts including subroutines, theoretical analysis and for empirical analysis. Of these three the least explored is empirical analysis. This thesis presents a powerful GP empirical analysis technique for analysis of all schemata of a given form occurring in any program of a given population at scales not previously possible for the kinds of global analysis performed. There are many competing GP forms of schema and, rather than choosing one for analysis, the thesis defines the match-tree meta-form of schema as a general language expressing forms of schema for use by the analysis system. This language can express most forms of schema previously used in tree-based GP. The new method can perform wide-ranging analyses on the prohibitively large set of all schemata in the programs by introducing the concepts of maximal schema, maximal program subset, representative set of schemata, and representative program subset. These structures are used to optimize the analysis, shrinking its complexity to a manageable size without sacrificing the result. Characterization experiments analyze GP populations of up to 501 60- node programs, using 11 forms of schema including rooted-hyperschemata and non-rooted fragments. The new method has close to quadratic complexity on population size, and quartic complexity on program size. Efficacy experiments present example analyses using the new method. The experiments offer interesting insights into the dynamics of GP runs including fine-grained analysis of convergence and the visualization of schemata during a GP evolution. Future work will apply the many possible extensions of this new method to understanding how GP operates, including studies of convergence, building blocks and schema fitness. This method provides a much finer-resolution microscope into the inner workings of GP and will be used to provide accessable visualizations of the evolutionary process.