Exploring a Bioinformatics Clustering Algorithm

Matti, Mukhlis

doi:10.26686/wgtn.16915627.v1

Exploring a Bioinformatics Clustering Algorithm

thesis

posted on 2021-11-01, 22:18 authored by Matti, Mukhlis

This thesis explores and evaluates MAXCCLUS, a bioinformatics clustering algorithm, which was designed to be used to cluster genes from microarray experimental data. MAXCCLUS does the clustering of genes depending on the textual data that describe the genes. MAXCCLUS attempts to create clusters of which it selects only the statistically significant clusters by running a significance test. It then attempts to generalise these clusters by using a simple greedy generalisation algorithm. We explore the behaviour of MAXCCLUS by running several clustering experiments that investigate various modifications to MAXCCLUS and its data. The thesis shows (a) that using the simple generalisation algorithm of MAXCCLUS gives better result than using an exhaustive search algorithm for generalisation, (b) the significance test that MAXCCLUS uses needs to be modified to take into consideration the dependency of some genes on other genes functionally, (c) it is advantageous to delete the non domain-relevant textual data that describe the genes but disadvantageous to add more textual data to describe the genes, and (d) that MAXCCLUS behaves poorly when it attempts to cluster genes that have adjacent categories instead of having two distinct categories only.

History

Copyright Date

2004-01-01

Date of Award

2004-01-01

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Computer Science

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Masters

Degree Name

Master of Science

Victoria University of Wellington Item Type

Awarded Research Masters Thesis

Language

en_NZ

Victoria University of Wellington School

School of Mathematics, Statistics and Computer Science

Advisors

Andreae, Peter

Usage metrics

Keywords

Bioinformatics Cluster analysis Gene expression Statistical methods Data mining Algorithms School: School of Mathematics, Statistics and Computer Science 089999 Information and Computing Sciences not elsewhere classified Marsden: 280401 Analysis of Algorithms and Complexity Degree Discipline: Computer Science Degree Level: Masters Degree Name: Master of Science Information and Computing Sciences not elsewhere classified

Licence

Author Retains Copyright

Exploring a Bioinformatics Clustering Algorithm

History

Copyright Date

Date of Award

Publisher

Rights License

Degree Discipline

Degree Grantor

Degree Level

Degree Name

Victoria University of Wellington Item Type

Language

Victoria University of Wellington School

Advisors

Usage metrics

Categories

Keywords

Licence

Exports