Automatically evolving difficult benchmark feature selection datasets with genetic programming

Lensen, Andrew; Xue, Bing; Zhang, Mengjie

doi:10.26686/wgtn.12493808.v1

lensen2018automatically.pdf (722.25 kB)

Automatically evolving difficult benchmark feature selection datasets with genetic programming

conference contribution

posted on 2020-06-16, 22:19 authored by Andrew LensenAndrew Lensen, Bing XueBing Xue, Mengjie ZhangMengjie Zhang

© 2018 Copyright held by the owner/author(s). There has been a wealth of feature selection algorithms proposed in recent years, each of which claims superior performance in turn. A wide range of datasets have been used to compare these algorithms, each with different characteristics and quantities of redundant and noisy features. Hence, it is very difficult to comprehensively and fairly compare these feature selection methods in order to find which are most robust and effective. In this work, we examine using Genetic Programming to automatically synthesise redundant features for augmenting existing datasets in order to more scientifically test feature selection performance. We develop a method for producing complex multi-variate redundancies, and present a novel and intuitive approach to ensuring a range of redundancy relationships are automatically created. The application of these augmented datasets to well-established feature selection algorithms shows a number of interesting and useful results and suggests promising directions for future research in this area.

History

Preferred citation

Lensen, A., Xue, B. & Zhang, M. (2018, July). Automatically evolving difficult benchmark feature selection datasets with genetic programming. In GECCO 2018 - Proceedings of the 2018 Genetic and Evolutionary Computation Conference GECCO '18: Genetic and Evolutionary Computation Conference (pp. 458-465). ACM. https://doi.org/10.1145/3205455.3205552

Publisher DOI

https://doi.org/10.1145/3205455.3205552

Conference name

GECCO '18: Genetic and Evolutionary Computation Conference

Title of proceedings

GECCO 2018 - Proceedings of the 2018 Genetic and Evolutionary Computation Conference

Publication or Presentation Year

2018-07-02

Pagination

458-465

Publisher

ACM

Publication status

Published

Usage metrics

Keywords

Uncategorised value

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Automatically evolving difficult benchmark feature selection datasets with genetic programming

History

Preferred citation

Publisher DOI

Conference name

Title of proceedings

Publication or Presentation Year

Pagination

Publisher

Publication status

Usage metrics

Categories

Keywords

Licence

Exports