“Inference for High-Dimensional Doubly Multivariate Data under General Conditions”

Document Type

Presentation Abstract

Presentation Date

5-1-2012

Abstract

With technological, research, and theoretical advancements, the amount of data being generated for analysis is growing rapidly. In many cases, the number of subjects may be small, but the number of measurements taken on each subject may be very large. Consider, for example, two groups of patients. The subjects in one group are diseased and the other subjects are not. Over 9,000 relative fluorescent unit (RFU) signals, measures of the presence and abundance of proteins, are collected in a microarray or protoarray from each subject. Typically these kind of data show marked skewness (departure from normality) which invalidates the multivariate normalbased theory. What is more, due to the cost involved, only a limited number of subjects have to be included in the study. Therefore, the standard large-sample asymptotic theory cannot be applied. It is of interest to determine if there are any differences in RFU signals between the two groups, and more importantly, if there are any RFU signal effects which depend on the presence or absence of the disease. If such an interaction is detected, further research is warranted to identify any of these biological signals, commonly known as biomarkers.

To address these types of phenomena, we present inferential procedures in two-factor repeated measures multivariate analysis of variance (RM-MANOVA) models where the covariance structure is unknown and the number of measurements per subject tends to infinity. Both in the univariate case, in which the number of dimensions or response variables is one, and the multivariate case, in which there are several response variables, different sums of squares and cross product matrices are proposed to compensate for the unknown structure of the covariance matrix and unbalanced group sizes. Based on the new matrices, we present some multivariate test statistics, deriving their asymptotic distributions under fairly general conditions. We then use simulation results to assess the performance of the tests, and we analyze a real data set to demonstrate their applicability.

Additional Details

Doctoral Dissertation Defense. Link to the presenter's dissertation.

Dissertation Committee:
Solomon W. Harrar, Chair (Mathematical Sciences),
Johnathan M. Bardsley (Mathematical Sciences),
Jon Graham (Mathematical Sciences),
Jesse V. Johnson (Computer Science)
David Patterson (Mathematical Sciences)

Tuesday, May 1, 2012
4:10 pm in Math 103

This document is currently not available here.

Share

COinS