Learning Classifier Systems for Understanding Patterns in Data
In the field of data-mining, symbolic techniques have produced optimal solutions, which are expected to contain informative patterns. Visualizing these patterns can improve the understanding of the ground truth of the explored domain. However, up to now, the symbolic algorithms struggle to produce optimal solutions for domains that have an overlapping distribution of feature patterns. Furthermore, the majority of problems have an overlapping distribution. Thus, novel techniques are needed to improve symbolic techniques’ capacity to address overlapping domains, so that it is practicable to achieve the visualization of the underlying patterns.
Michigan-style Learning Classifier Systems (LCSs) are rule-based symbolic learning systems that utilize evolutionary computation to construct a population of rules to capture patterns and knowledge of the explored domains. LCSs have been applied to many data-mining tasks and have achieved good performance. Recently, employing visualization methods to improve the understanding level of models has become more and more popular in the data-mining community. In the LCSs field, visualization techniques orientated to explore feature patterns have not been developed. Investigating LCSs’ models is commonly based on reading rules or viewing their distribution. However, LCSs’ models may contain hundreds or even thousands of unduplicated rules, which makes the identification of patterns challenging.
Previously, Butz defined LCSs’ optimal solutions as [O] sets, which are expected to utilize a minimal number of non-overlapping rules to present an explored domain completely and correctly. In the last two decades, rule compaction algorithms have been designed to search for [O]s by compacting LCSs’ models, where rules that violate [O]’s definition are considered redundant rules. However, in many problems, an ideal [O] does not exist. Even if such a ruleset exists, redundant rules are often discovered in the compacted models. Furthermore, compaction often results in a decreased prediction performance. The LCSs community used to believe the reduced performance is an unavoidable and acceptable price for producing a compacted model. It is observed that across multiple LCS produced populations for the same problem, the irrelevant/redundant rules are varied, but useful/accurate rules are consistent. According to this observation, this thesis collects the common accurate rules and finds that for an arbitrary clean dataset, the common rules can form a determinate and unique solution, i.e. the proposed natural solution. A natural solution is composed of all consistent and unsubsumable rules under the global search space. A natural solution can correctly and completely represent the explored clean datasets. Furthermore, searching for natural solutions can produce concise correct solutions without reducing performance.
To visualize the knowledge in the solutions, three visualization methods are developed, i.e. Feature Important Map (FIM), Action-based Feature Importance Map (AFIM), and Action-based Average value Map (AFVM).
FIM can trace how LCSs form patterns during the training process. Besides, AFIM and AFVM precisely present patterns in LCSs’ optimal solution respectively regarding attribute importance and specified attribute’s value’s importance.
For the sake of efficiently producing natural solutions, a new type of compaction algorithm is introduced, termed Razor Cluster Razor (RCR). RCR is the first algorithm that considers Pittsburgh-style LCSs’ conceptions to Michigan-style LCSs’ compaction algorithms, i.e. compacting is based on multiple models. RCR was first designed for Boolean domains, then RCR has been extended to adapt to real-value LCSs’ models. The conducted experiments demonstrated that natural solutions are producible for both Boolean domains and real domains.
The experiments regarding producing natural solutions lead this thesis to discover that LCSs have an over-general issue when addressing domains that have an overlapping distribution, i.e. optimal rules’ encodings naturally share overlapped niches. The over-general issue causes LCSs to fail in maintaining the prediction performance, i.e. due to the optimal rules and over-general rules being repeatedly introduced and removed, training performance can never achieve 100% accuracy. This newly discovered issue inspires the development of the Absumption method as a complement to the existing Subsumption method, which continuously seeks to produce optimal rules by correcting over-general rules. Absumption not only improves LCSs’ prediction performance in overlapping domains but also enables LCSs to employ hundreds of cooperative rules to precisely represent an explored domain.
The success of Absumption demonstrates LCSs’ capacity in addressing overlapping domains. However, introducing Absumption to LCSs does increase the cost of computer resources for training, which results in a need for more efficient exploration. Furthermore, LCSs employed search techniques tend to evolve maximally generalized rules, rather than produce rules that do not overlap with the existing rules in the population. Thus, [O]s do not fit LCSs’ fundamental search techniques. As a result, LCSs’ accurate models often do not contain an optimal solution, which results in the LCSs produced models being poorly interpretable. This hampers LCSs from being a good data-mining technique. In this thesis, the Absumption and Subsumption based Learning Classifier System (ASCS) is developed. ASCSs consider natural solution as the search objective and promote the Absumption mechanism and Subsumption mechanism as the primary search strategies. Thus, it is possible to remove the traditional evolutionary search algorithms, i.e. crossover, mutation, roulette wheel deletion, and competition selection. Experiments demonstrated that ASCSs can use thousands of cooperative rules to represent an explored domain and enable easy pattern visualization that was previously not possible.