Open Access Te Herenga Waka-Victoria University of Wellington
thesis_access.pdf (1.95 MB)

Analysis and Diagnostics of Categorical Variables with Multiple Outcomes

Download (1.95 MB)
posted on 2021-11-08, 23:23 authored by Suesse, Thomas Falk

Surveys often contain qualitative variables for which respondents may select any number of the outcome categories. For instance, for the question "What type of contraceptive have you used?" with possible responses (oral, condom, lubricated condom, spermicide, and diaphragm), respondents would be instructed to select as many of the J = 5 outcomes as apply. This situation is known as multiple responses and outcomes are referred to as items. This thesis discusses several approaches to analysing such data. For stratified multiple response data, we consider three ways of defining the common odds ratio, a summarising measure for the conditional association between a row variable and the multiple response variable, given a stratification variable. For each stratum, we define the odds ratio in terms of: 1 item and 2 rows, 2 items and 2 rows, and 2 items and 1 row. Then we consider two estimation approaches for the common odds ratio and its (co)variance estimators for these types of odds ratios. The model-based approach treats the J items as a Jdimensional binary response and then uses logit models directly for the marginal distribution of each item by applying the generalised estimating equation (GEE) (Liang and Zeger 1986) method. The non-model-based approach uses Mantel-Haenszel (MH) type estimators. The model-based (or marginal model) approach is still applicable for more than two explanatory variables. Preisser and Qaqish (1996) proposed regression diagnostics for GEE. Another model fitting approach is the homogeneous linear predictor model (HLP) based on maximum likelihood (ML) introduced by Lang (2005). We investigate deletion diagnostics as the Cook distance and DBETA for multiple response data using HLPmodels (Lang 2005), which have not been considered yet, and propose a simple "delete=replace" method as an alternative approach for deletion. Methods are compared with the GEE approach. We also discuss the modelling of a repeated multiple response variable, a categorical variable for which subjects can select any number of categories on repeated occasions. Multiple responses have been considered in the literature by various authors; however, repeated multiple responses have not been considered yet. Approaches include the marginal model approach using the GEE and HLP methods, and generalised linear mixed models (GLMM). For the GEE method, we also consider possible correlation structures and propose a groupwise correlation estimation method yielding more efficient parameter estimates if the correlation structure is indeed different for different groups, which is confirmed by a simulation study. Ordered categorical variables occur in many applications and can be seen as a special case of multiple responses. The proportional odds model, which uses logits of cumulative probabilities, is currently the most popular model. We consider two approaches focusing on the mis-specification of a covariate. The binary approach considers the proportional oddsmodel as J-1 logistic regression models and applies the cumulative residual process introduced by Arbogast and Lin (2005) for logistic regression. The multivariate approach views the proportional odds model as a member of the class of multivariate generalised linear models (MGLM), where the response variable is a vector of indicator responses.


Copyright Date


Date of Award



Te Herenga Waka—Victoria University of Wellington

Rights License

Author Retains Copyright

Degree Discipline

Statistics and Operations Research

Degree Grantor

Te Herenga Waka—Victoria University of Wellington

Degree Level


Degree Name

Doctor of Philosophy

Victoria University of Wellington Item Type

Awarded Doctoral Thesis



Victoria University of Wellington School

School of Mathematics, Statistics and Operations Research


Liu, Ivy; Wang, Dong