Integrating epistasis in genome wide association studies to improve understanding of health and disease
Genome Wide Analysis Studies (GWAS) are typically used to examine monogenic disorders caused by single nucleotide polymorphism’s (SNP’s). The genetic architecture that leads to acquisition of a polygenic disease phenotype where combinatorial effects of SNP’s take place is more complex. Epistasis occurs when 2 or more SNP’s interact with a statistical marginal effect that is not significant enough to be detected in GWAS. To overcome the omission of potentially relevant SNP’s in GWAS the addition of an epistasis detection algorithm allows identification of those variants that increase in statistical power when under combinatorial effect. Epistasis analysis is carried out in this thesis in the context of Alzheimer’s disease and Type two Diabetes.
The aim of this thesis is to statistically show that including an epistasis algorithm in a typical GWAS would significantly lower p-values of phenotypically associated SNPs. In addition, SNPs that do not meet significance cut off’s in GWAS produce larger marginal effect when interacting in an epistatic manner. Data was sourced from the UK Biobank repository, consisting of 502,000 individuals. Bioinformatic methods to process and filter data led into epistasis analysis algorithm PLINK. The top twenty epistatic interactions were ranked by test statistic chi squared and exhaustive search of literature and gene databases allowed notation of gene function related to the epistatic SNP’s.
My results show a large difference in the significance of both main effect and non-main effect genes. Associations to phenotype when an epistasis analysis algorithm is used is compared to traditional GWAS. Whilst detection of SNPs that fail to meet GWAS significance thresholds is important, SNPs that would be detected in a GWAS anyway such as APOE4 in Alzheimer’s showed a much greater significance when under an epistasis detection algorithm. Current literature examines epistasis and GWAS separately and by providing the suggestion to combine the two, a significant gap in the literature is filled. By improving significance of detected genes associated to a disease phenotype we can work towards potential drug targets in precision medicine.