Open Access Te Herenga Waka-Victoria University of Wellington
thesis_access.pdf (3.01 MB)

Deep Learning-based Image Analysis for High-content Screening

Download (3.01 MB)
posted on 2021-10-27, 23:42 authored by Zeng, Dylon

High-content screening is an empirical strategy in drug discovery toidentify substances capable of altering cellular phenotype — the set ofobservable characteristics of a cell — in a desired way. Throughout thepast two decades, high-content screening has gathered significant attentionfrom academia and the pharmaceutical industry. However, imageanalysis remains a considerable hindrance to the widespread applicationof high-content screening. Standard image analysis relies on feature engineeringand suffers from inherent drawbacks such as the dependence onannotated inputs. There is an urging need for reliable and more efficientmethods to cope with increasingly large amounts of data produced.

This thesis centres around the design and implementation of a deeplearning-based image analysis pipeline for high-content screening. Theend goal is to identify and cluster hit compounds that significantly alterthe phenotype of a cell. The proposed pipeline replaces feature engineeringwith a k-nearest neighbour-based similarity analysis. In addition, featureextraction using convolutional autoencoders is applied to reduce thenegative effects of noise on hit selection. As a result, the feature engineeringprocess is circumvented. A novel similarity measure is developed tofacilitate similarity analysis. Moreover, we combine deep learning withstatistical modelling to achieve optimal results. Preliminary explorationssuggest that the choice of hyperparameters have a direct impact on neuralnetwork performance. Generalised estimating equation models are usedto predict the most suitable neural network architecture for the input data.

Using the proposed pipeline, we analyse an extensive set of images acquiredfrom a series of cell-based assays examining the effect of 282 FDAapproved drugs. The analysis of this data set produces a shortlist of drugsthat can significantly alter a cell’s phenotype, then further identifies fiveclusters of the shortlisted drugs. The clustering results present groups ofexisting drugs that have the potential to be repurposed for new therapeuticuses. Furthermore, our findings align with published studies. Comparedwith other neural networks, the image analysis pipeline proposedin this thesis provides reliable and better results in a shorter time frame.


Copyright Date


Date of Award



Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Statistics and Operations Research

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level


Degree Name

Master of Science

ANZSRC Type Of Activity code


Victoria University of Wellington Item Type

Awarded Research Masters Thesis



Victoria University of Wellington School

School of Mathematics and Statistics


Nguyen, Binh