Open Access Te Herenga Waka-Victoria University of Wellington
Browse

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Download (916.98 kB)
journal contribution
posted on 2022-09-12, 01:23 authored by Baligh Al-Helali, Qi ChenQi Chen, Bing XueBing Xue, Mengjie ZhangMengjie Zhang

Incompleteness is one of the problematic data quality challenges in real-world machine learning tasks. A large number of studies have been conducted for addressing this challenge. However, most of the existing studies focus on the classification task and only a limited number of studies for symbolic regression with missing values exist. In this work, a new imputation method for symbolic regression with incomplete data is proposed. The method aims to improve both the effectiveness and efficiency of imputing missing values for symbolic regression. This method is based on genetic programming (GP) and weighted K-nearest neighbors (KNN). It constructs GP-based models using other available features to predict the missing values of incomplete features. The instances used for constructing such models are selected using weighted KNN. The experimental results on real-world data sets show that the proposed method outperforms a number of state-of-the-art methods with respect to the imputation accuracy, the symbolic regression performance, and the imputation time.


  

This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s00500-021-05590-y

History

Preferred citation

Al-Helali, B., Chen, Q., Xue, B. & Zhang, M. (2021). A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft Computing, 25(8), 5993-6012. https://doi.org/10.1007/s00500-021-05590-y

Journal title

Soft Computing

Volume

25

Issue

8

Publication date

2021-04-01

Pagination

5993-6012

Publisher

Springer Science and Business Media LLC

Publication status

Published

Online publication date

2021-02-07

ISSN

1432-7643

eISSN

1433-7479

Language

en