Open Access Te Herenga Waka-Victoria University of Wellington
Browse

Semi-supervised Model-Based Clustering via Finite-Mixtures using Proportional Odds Models for Ordinal Data

Download (3.01 MB)
thesis
posted on 2025-10-20, 07:54 authored by Ying Cui
<p><strong>This thesis introduces a semi-supervised learning via the finite-mixture approach for model-based clustering in analyzing ordinal data. Our research focuses on applying this technique to ordered categorical data in a matrix format, such as those obtained from surveys with Likert scale responses. These data matrices have subjects as rows and a set of ordinal variables (e.g., survey questions) as columns. We employ the proportional odds model, a popular and widely used approach for analyzing such data, as our basic model structure. We propose an approach to analyze datasets containing both labeled and unlabeled observations from multiple clusters and the data with unknown cluster memberships come from a finite-mixture component. The model fitting is performed using the expectation–maximization algorithm, incorporating the observations with labeled cluster memberships to find the cluster memberships for unlabeled data.</strong></p><p>To evaluate the performance of our proposed method, we conducted a simulation study across six different scenarios, each with varying the proportions of known and unknown cluster memberships. The fitted models accurately estimate parameters in most of the designed scenarios, indicating that our technique is effective in clustering partially-labeled data with ordered categorical response variables. Additionally, this simulation study shows that the standard errors of the corresponding model's parameters can be estimated using the asymptotic method.</p><p>This thesis also presents a simulation study to compare the clustering results with the true ones across six different scenarios using three measurements: Adjusted Rand Index, Normalized Variation of Information, and Normalized Information Distance. The last simulation study of this thesis evaluates the performance of eight common information criteria used for model selection.</p><p>Finally, this thesis illustrates our approach with real-world Chinook salmon trial data collected by the Cawthron Institute, which is one of New Zealand's largest independent science organizations in aquaculture sector. The clustering analysis provide a possible way of using cheaper or non-destructive biomarkers in distinguishing the fish health by incorporating the known health labels created from fish data with more expensive or destructive biomarkers.</p>

History

Copyright Date

2025-10-20

Date of Award

2025-10-20

Publisher

Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Statistics and Operations Research

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level

Doctoral

Degree Name

Doctor of Philosophy

ANZSRC Type Of Activity code

1 Pure basic research

Victoria University of Wellington Item Type

Awarded Doctoral Thesis

Language

en_NZ

Victoria University of Wellington School

School of Mathematics and Statistics

Advisors

Liu, Ivy; McMillan, Louise