Image classification is an important and fundamental task in computer vision and machine learning. The task is to classify images into one of some pre-defined groups based on the content in the images. However, image classification is a challenging task due to high variations across images, such as illumination, viewpoint, scale variations, deformation, and occlusion. To effectively solve image classification, it is necessary to extract or learn a set of meaningful features from raw pixels or images. The effectiveness of these features significantly affects classification performance. Feature learning aims to automatically learn effective features from images for classification. However, feature learning is difficult due to the high variations of images and the large search space. Genetic Programming (GP) as an Evolutionary Computation (EC) technique is known for its powerful global search ability and high interpretability of the evolved solutions. Compared with other EC methods, GP has a flexible representation of variable length and can search the solution space without any assumptions on the solution structure. The potential of GP in feature learning for image classification has not been comprehensively investigated due to the use of simple representations, e.g., functions and program structures. The overall goal of this thesis is to further investigate and explore the potential of GP for image classification by developing a new GP-based approach with a new representation to automatically learning effective features for different types of image classification tasks. Firstly, this thesis proposes a new GP-based approach with image descriptors to learning global and/or local features for image classification by developing a new program structure, a new function set, a new terminal set, and a new fitness function. These new designs allow GP to detect small regions from the relatively large input image, extract features using image descriptors from the detected regions or the input image, and combine the extracted features for classification. The results show that the new approach significantly outperforms five GP-based methods, eight traditional methods, and three convolutional neural network methods in almost all the comparisons on eight different datasets. Secondly, this thesis proposes a new GP-based approach with a flexible program structure and image-related operators for feature learning in image classification. The new approach learns effective features transformed by multiple layers, i.e., a filtering layer, a pooling layer, a feature extraction layer, and a feature concatenation layer, in a flexible way. The results show that the new approach achieves better performance than a large number of effective methods on 12 benchmark datasets. The solutions and features learned by the new approach provide high interpretability. Thirdly, this thesis proposes the first GP-based approach to automatically and simultaneously learning features and evolving ensembles for image classification. The new approach can learn high-level features through multiple transformations, select effective classification algorithms and optimise the parameters for these classification algorithms to build effective ensembles. The new approach outperforms a large number of benchmark methods on 12 different image classification datasets. Finally, this thesis proposes a multi-population GP-based approach with knowledge transfer and ensembles to improving both the generalisation performance and computational efficiency of GP-based feature learning algorithms for image classification. The new approach can achieve better generalisation performance and computational efficiency than baseline GP-based feature learning method. The new approach can achieve better performance on 11 datasets than a large number of benchmark methods, including many neural network-based methods.