New representations in genetic programming for feature construction in k-means clustering
conference contribution
posted on 2020-10-06, 22:08 authored by Andrew LensenAndrew Lensen, Bing XueBing Xue, Mengjie ZhangMengjie Zhang© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.
History
Preferred citation
Lensen, A., Xue, B. & Zhang, M. (2017, January). New representations in genetic programming for feature construction in k-means clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (10593 LNCS pp. 543-555). Springer International Publishing. https://doi.org/10.1007/978-3-319-68759-9_44Publisher DOI
Title of proceedings
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Volume
10593 LNCSPublication or Presentation Year
2017-01-01Pagination
543-555Publisher
Springer International PublishingPublication status
PublishedISSN
0302-9743eISSN
1611-3349Usage metrics
Categories
No categories selectedLicence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC