Improving Polygenic Risk Score Accuracy Through Integration of Epistatic Gene-Gene and Gene-Gene-Environment Interactions for Type 2 Diabetes and Celiac Disease
posted on 2025-09-10, 00:06authored byKeri Multerer
<p dir="ltr">Polygenic diseases, as herein subject diseases Type 2 Diabetes (T2D) and Celiac disease (CD), result from numerous genetic variants with small effect sizes, where cumulative effects may be captured for <i>polygenic risk scores (PRS)</i>. While PRS can help identify individuals at higher risk before disease onset, they currently explain only a small portion of heritability, limiting their accuracy and clinical utility. Current PRS methods seek to improve accuracy by increasing the number of variants analysed at the expense of decreased interpretability for the resulting thousands to hundreds of thousands of disease-associated variant SNPs. In this thesis such high dimensional data was used to develop a novel PRS pipeline to improve risk predictions for T2D and CD developed using a training population of 235,986 and 235,987 genotyped people in the UK Biobank respectively. This PRS pipeline was then applied to T2D and CD validation sets for improved PRS predictions.</p><p dir="ltr">I addressed the high-dimensionality issue with feature reduction methods which utilises machine learning (ML) and importance algorithms to prioritize selected features for inclusion in association studies, the output of which are effect sizes (beta coefficients) used in PRS calculations. Currently published PRS methodology is limited to marginal effect sizes of single SNPs (G) calculated with genome-wide association studies (GWAS) prompting me to improve accuracy by integrating the contribution of epistatic SNP-SNP interactions (GxG) and SNP-SNP-environment interactions (GxGxE) into risk calculations.</p><p dir="ltr">To enable such enhancements, I developed methods to include GxG and GxGxE features into four PRS risk calculations in the validation set, namely PRS<sub>G,</sub> PRS<sub>GxG,</sub> PRS<sub>G+(GxG)</sub>, and PRS<sub>GxGxE</sub>, and integrated them into combined PRS<sub>cr-mult</sub> calculation. PRS<sub>cr-mult</sub> significantly enhances stratified risk predictions for T2D and CD across clinical quintiles and genetic decile risk thresholds to provide a more personalized risk assessment. The construction and utility of these are described in this thesis.</p><p dir="ltr">Aim 1 (Chapter 3) of my thesis was to produce, validate and benchmark a workflow pipeline that accommodates genetic and environmental factors contributing to PRS. This pipeline included higher order interactions from an exhaustive epistatic interaction search followed by novel feature selection and GxGxE discovery methods. Feature weights were calculated with a penalised regression model and used to develop four PRS calculations. Aim 2 (Chapter 4) was to utilise the pipeline to improve accuracy and heritability for T2D and Aim 3 (Chapter 5) was to achieve the same outcome for CD.</p><p dir="ltr">Overall, my novel approach improved PRS risk predictions, identifying an additional 27% and 10% cases at-risk in the validation set for T2D and CD, respectively, compared to the traditional PRSG. Finally, I used machine learning (ML) techniques to combine the PRS calculations, for an improved PRS I termed “PRS<sub>cr-mult</sub>“ and identify unique genetic features driving risk within each PRS calculation.</p>
History
Copyright Date
2025-09-09
Date of Award
2025-09-09
Publisher
Te Herenga Waka—Victoria University of Wellington
Rights License
CC BY-NC-ND 4.0
Degree Discipline
Biomedical Science
Degree Grantor
Te Herenga Waka—Victoria University of Wellington
Degree Level
Doctoral
Degree Name
Doctor of Philosophy
ANZSRC Socio-Economic Outcome code
200104 Prevention of human diseases and conditions