Tran 2016 Genetic programming for feature construction and selection.pdf (482.23 kB)

Genetic programming for feature construction and selection in classification on high-dimensional data

Download (482.23 kB)
journal contribution
posted on 25.03.2021, 21:03 by Binh Tran, Bing Xue, Mengjie Zhang
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance. This is a post-peer-review, pre-copyedit version of an article published in 'Memetic Computing'. The final authenticated version is available online at: https://doi.org/10.1007/s12293-015-0173-y. The following terms of use apply: https://www.springer.com/gp/open-access/publication-policies/aam-terms-of-use.

History

Preferred citation

Tran, B., Xue, B. & Zhang, M. (2016). Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, 8(1), 3-15. https://doi.org/10.1007/s12293-015-0173-y

Journal title

Memetic Computing

Volume

8

Issue

1

Publication date

01/03/2016

Pagination

3-15

Publisher

Springer Science and Business Media LLC

Publication status

Published

Contribution type

Article

Online publication date

19/12/2015

ISSN

1865-9284

eISSN

1865-9292

Article number

1

Language

en

Exports