Population-based Ensemble Learning with Tree Structures for Classification

Evans, Benjamin

doi:10.26686/wgtn.17136296.v1

thesis_access.pdf (1.54 MB)

Population-based Ensemble Learning with Tree Structures for Classification

thesis

posted on 2021-12-07, 14:50 authored by Evans, Benjamin

Ensemble learning is one of the most powerful extensions for improving upon individual machine learning models. Rather than a single model being used, several models are trained and the predictions combined to make a more informed decision. Such combinations will ideally overcome the shortcomings of any individual member of the ensemble. Most ma- chine learning competition winners feature an ensemble of some sort, and there is also sound theoretical proof to the performance of certain ensem- bling schemes. The benefits of ensembling are clear in both theory and practice. Despite the great performance, ensemble learning is not a trivial task. One of the main difficulties is designing appropriate ensembles. For exam- ple, how large should an ensemble be? What members should be included in an ensemble? How should these members be weighted? Our first contribution addresses these concerns using a strongly-typed population- based search (genetic programming) to construct well-performing ensem- bles, where the entire ensemble (members, hyperparameters, structure) is automatically learnt. The proposed method was found, in general, to be significantly better than all base members and commonly used compari- son methods trialled. With automatically designed ensembles, there is a range of applica- tions, such as competition entries, forecasting and state-of-the-art predic- tions. However, often these applications also require additional prepro- cessing of the input data. Above the ensemble considers only the original training data, however, in many machine learning scenarios a pipeline is required (for example performing feature selection before classification). For the second contribution, a novel automated machine learning method is proposed based on ensemble learning. This method uses a random population-based search of appropriate tree structures, and as such is em- barrassingly parallel, an important consideration for automated machine learning. The proposed method is able to achieve equivalent or improved results over the current state-of-the-art methods and does so in a fraction of the time (six times as fast). Finally, while complex ensembles offer great performance, one large limitation is the interpretability of such ensembles. For example, why does a forest of 500 trees predict a particular class for a given instance? In an effort to explain the behaviour of complex models (such as ensem- bles), several methods have been proposed. However, these approaches tend to suffer at least one of the following limitations: overly complex in the representation, local in their application, limited to particular fea- ture types (i.e. categorical only), or limited to particular algorithms. For our third contribution, a novel model agnostic method for interpreting complex black-box machine learning models is proposed. The method is based on strongly-typed genetic programming and overcomes the afore- mentioned limitations. Multi-objective optimisation is used to generate a Pareto frontier of simple and explainable models which approximate the behaviour of much more complex methods. We found the resulting rep- resentations are far simpler than existing approaches (an important con- sideration for interpretability) while providing equivalent reconstruction performance. Overall, this thesis addresses two of the major limitations of existing ensemble learning, i.e. the complex construction process and the black- box models that are often difficult to interpret. A novel application of ensemble learning in the field of automated machine learning is also pro- posed. All three methods have shown at least equivalent or improved performance than existing methods.

History

Copyright Date

2019-01-01

Date of Award

2019-01-01

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Masters

Degree Name

Master of Science

ANZSRC Type Of Activity code

3 APPLIED RESEARCH

Victoria University of Wellington Item Type

Awarded Research Masters Thesis

Language

en_NZ

Victoria University of Wellington School

School of Engineering and Computer Science

Advisors

Zhang, Mengjie; Xue, Bing

Usage metrics

Keywords

Machine learning Ensemble learning Interpretable machine learning Genetic programming Automated machine learning School: School of Engineering and Computer Science 080108 Neural, Evolutionary and Fuzzy Computation Degree Discipline: Computer Science Degree Level: Masters Degree Name: Master of Science Neural, Evolutionary and Fuzzy Computation

Licence

Author Retains Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Population-based Ensemble Learning with Tree Structures for Classification

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

ANZSRC Type Of Activity code

Victoria University of Wellington Item Type

Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports