Open Access Te Herenga Waka-Victoria University of Wellington
Browse

Learning Classifier Systems for Multi-Objective Reinforcement Learning Problems

Download (14.53 MB)
thesis
posted on 2022-11-17, 19:41 authored by Xiu Cheng

The real world is full of problems with multiple conflicting objectives. However, Reinforcement Learning (RL) traditionally deals with only a single learning objective. Recently, several Multi-Objective Reinforcement Learning (MORL) algorithms have been proposed. Nevertheless, many of these algorithms rely on tabular representations of the value function, which are only suitable for solving small-scale problems. In addition, although some existing MORL techniques can learn the Pareto optimal solutions, they can only be applied to simple multi-step problems in a discrete Markov environment. However, many real-world problems involve partially-observable continuous environments, where continuous environments are often discretise before the systems learn the best policies from a starting to a goal state. Therefore, effective MORL techniques are needed to address these problems.

Learning Classifier Systems (LCSs) have been widely used to tackle RL problems as they have a good generalization ability and provide a simple, understandable rule-based solution. The rule-based solution has good scalability to solve large-scale problems. In addition, the rule-based solution is explainable. Thus the user is not only able to know what the solution is but understand why the solution is suited to the task and when to trust the solution. Moreover, the accuracy-based LCS, XCS, has been adopted successfully for single-objective RL problems. Thus, LCSs, especially XCS, have huge potential to solve MORL problems.

In this thesis, LCSs are enabled to learn the Pareto optimal policies for discrete, partially-observable, discrete and continuous MORL problems. The objectives are to develop XCS-based algorithms to learn the Pareto optimal policies for MORL problems, improve the generalization ability of the multi-objective XCS-based algorithm where possible, evaluate the generalization ability of developed multi-objective XCS-based algorithms for solving MORL problems, and apply multi-objective XCS-based algorithms to solve large-scale partially-observed MORL problems. The following tasks have been completed in this thesis.

A new multi-objective LCS-based algorithm MO-XCS has been developed based on XCS for multi-objective learning. This algorithm is designed to learn a group of Pareto optimal solutions through a single learning process. For this purpose, four technical issues in XCS have been identified and addressed. Experimental studies on three bi-objective maze problems further demonstrate the effectiveness of MO-XCS.

A new multi-objective LCS-based algorithm MOXCS, different from MO-XCS, has been developed based on XCS and MOEA/D. It employs a decomposition strategy based on MOEA/D in XCS to approximate complex Pareto Fronts. The experimental results show that on complex bi-objective maze problems, MOXCS can learn a group of Pareto optimal solutions for MORL problems without requiring huge storage. Analysis of the learned policies shows successful trade-offs between the distance to the reward and the amount of reward itself. With integer inputs, MOXCS can address the partially observable Markov problem Deep Sea Treasure. Lastly, the generalization ability of MOXCS is tested in the Multi-Maze and Multi-Maze Connection environments. However, it is shown that the Multi-Maze Connection domain is not a proper benchmark for testing the generalization ability of the MORL algorithm.

CoinRun is a continuous domain, unlike the previous Maze and Deep Sea Treasure, which are discrete. Two bi-objective CoinRun environments are developed in this thesis for testing the generalization ability of the MORL algorithm. Several changes in the developed bi-objective CoinRun environments have been made. First, they have been changed from continuous environments to the discretized environments by the technique of discretizing continuous inputs. Second, those environments have been changed from the non-Markov environments to Markov environments by adding extra characters. Third, in order to enable the agent to sense the environment a sub-actions technique has been developed; when the agent takes action several following actions will be taken automatically. With these changes and techniques, the developed bi-objective CoinRun problems are solved by MOXCS. The generalization ability of MOXCS will be evaluated by exchange those two bi-objective CoinRun environments as the training and testing environment. The evidence shows MOXCS has the generalization ability for solving the MORL problem in an unseen environment with proper training in a similar environment.

The single-objective Mountain Car is extended as a bi-objective MORL problem by adding another objective for the mountain car to take more action, `zero-throttles'. MOXCS is implemented to resolve the multi-objective mountain car problem. The result shows that MOXCS has good potential to resolve this large-scale Markov problem.

This research work has shown that the LCS-based algorithms can learn Pareto optimal policies for the discrete, partially-observed, and adapted continuous MORL problems and have the potential to scale to complex, large-scale MORL problems.

History

Copyright Date

2022-11-17

Date of Award

2022-11-17

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

CC BY-NC-ND 4.0

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

1 Pure basic research

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Victoria University of Wellington School

School of Engineering and Computer Science

Advisors

Zhang, Mengjie; Browne, Will Neil