lensen2017New.pdf (384.23 kB)

New representations in genetic programming for feature construction in k-means clustering

Download (384.23 kB)
conference contribution
posted on 06.10.2020 by Andrew Lensen, Bing Xue, Mengjie Zhang
© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.

History

Preferred citation

Lensen, A., Xue, B. & Zhang, M. (2017, January). New representations in genetic programming for feature construction in k-means clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (10593 LNCS pp. 543-555). Springer International Publishing. https://doi.org/10.1007/978-3-319-68759-9_44

Title of proceedings

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

10593 LNCS

Publication or Presentation Year

01/01/2017

Pagination

543-555

Publisher

Springer International Publishing

Publication status

Published

ISSN

0302-9743

eISSN

1611-3349

Exports

Logo branding

Categories

Exports