Automatically evolving difficult benchmark feature selection datasets with genetic programming

2020-06-16T22:19:51Z (GMT) by Andrew Lensen Bing Xue Mengjie Zhang
© 2018 Copyright held by the owner/author(s). There has been a wealth of feature selection algorithms proposed in recent years, each of which claims superior performance in turn. A wide range of datasets have been used to compare these algorithms, each with different characteristics and quantities of redundant and noisy features. Hence, it is very difficult to comprehensively and fairly compare these feature selection methods in order to find which are most robust and effective. In this work, we examine using Genetic Programming to automatically synthesise redundant features for augmenting existing datasets in order to more scientifically test feature selection performance. We develop a method for producing complex multi-variate redundancies, and present a novel and intuitive approach to ensuring a range of redundancy relationships are automatically created. The application of these augmented datasets to well-established feature selection algorithms shows a number of interesting and useful results and suggests promising directions for future research in this area.

Categories

Keyword(s)

License

CC BY-NC-ND 4.0