On the Resolution of Compositional Datasets into Convex Combinations of Extreme Vectors
Large compositional datasets of the kind assembled in the geosciences are often of remarkably low approximate rank. That is, within a tolerable error, data points representing the rows of such an array can approximately be located in a relatively small dimensional subspace of the row space. A physical mixing process which would account for this phenomenon implies that each observation vector of an array can be estimated by a convex combination of a small number of fixed source or 'endmember' vectors. In practice, neither the compositions of the endmembers nor the coefficients of the convex combinations are known. Traditional methods for attempting to estimate some or all of these quantities have included Q-mode 'factor' analysis and linear programming. In general, neither method is successful. Some of the more important mathematical properties of a convex representation of compositional data are examined in this thesis as well as the background to the development of algorithms for assessing the number of endmembers statistically, locating endmembers and partitioning geological samples into specified endmembers. Keywords and Phrases: Compositional data, convex sets, endmembers, partitioning by least squares, iteration, logratios.