Inference for High-Dimensional Doubly Multivariate Data under General Conditions

John Zachary Hossler, The University of Montana


With technological, research, and theoretical advancements, the amount of data being generated for analysis is growing rapidly. In many cases, the number of subjects may be small, but the number of measurements taken on each subject may be very large. Consider, for example, two groups of patients. The subjects in one group are diseased and the other subjects are not. Over 9,000 relative fluorescent unit (RFU) signals, measures of the presence and abundance of proteins, are collected in a microarray or protoarray from each subject. Typically these kind of data show marked skewness (departure from normality) which invalidates standard multivariate normal-based theory. What is more, due to the cost involved, only a limited number of subjects can be included in the study. Therefore, standard large-sample asymptotic theory cannot be applied. It is of interest to determine if there are any differences in RFU signals between the two groups, and more importantly, if there are any RFU signal and group interaction effects. If such an interaction is detected, further research is warranted to identify any of these biological signals, commonly known as biomarkers.

To address these types of phenomena, we present inferential procedures in two-factor repeated measures multivariate analysis of variance (RM-MANOVA) models where the covariance structure is unknown and the number of measurements per subject tends to infinity. Both in the univariate case, in which the number of dimensions or response variables is one, and the multivariate case, in which there are several response variables, different sums of squares and cross product matrices are proposed to compensate for the unknown structure of the covariance matrix and unbalanced group sizes. Based on the new matrices, we present some multivariate test statistics, deriving their asymptotic distributions under fairly general conditions. We then use simulation results to assess the performance of the tests, and we analyze a real data set to demonstrate their applicability.


© Copyright 2012 John Zachary Hossler