Year of Award
2019
Document Type
Dissertation
Degree Type
Doctor of Philosophy (PhD)
Degree Name
Mathematics
Department or School/College
Department of Mathematical Sciences
Committee Chair
Ekaterina Smirnova
Commitee Members
Leonid Kalachev, Jonathan Graham, Johnathan Bardsley, Nathan Insel
Keywords
Accelerometry, High Dimensional, Microbiome Filtering, Permutation Test, Physical Activity, Quality Control
Abstract
Modern studies in medicine, epidemiology, pharmacy and other fields generate high dimensional data. We developed statistical analysis methods for two types of such data: activity and microbiome data. Specifically, reliable measures of the frequency, duration and intensity of physical activity provided by wearable technology were used in the analysis of activity data. Accelerometry-derived measures of physical activity were compared with established predictors of 5-year all-cause mortality in older adults, aged between 50 and 85 years from the 2003- 2006 National Health and Nutritional Examination Survey, in terms of individual, relative, and combined predictive performance. A total of 33 predictors of 5-year all-cause mortality, including 20 measures of objective physical activity, were compared using single-predictor and multiple logistic regression. The results show that objective accelerometry-derived physical activity measures outperform traditional predictors of 5-year mortality in single predictor models, and offer some improvement in multiple predictor models beyond what age and other traditional predictors provide. This highlights the importance of wearable technology for providing reproducible, unbiased, and prognostic biomarkers of health. In microbiome data, we concentrated on pre-processing steps, where both the sparsity of counts and the large number of observed taxa were considered. The current approach is to remove taxa that appear in small counts in a few samples, which is known as filtering. We present the package PERFect which performs a permutation filtering approach designed to address two problems in microbiome data processing: (1) define and quantify loss due to filtering by implementing thresholds; and (2) introduce and evaluate a permutation test for filtering loss to provide a measure of excessive filtering. The package employs an unbalanced binary search algorithm that greatly reduces computational time for these permutations. The effectiveness of the proposed approach on downstream microbiome data analysis is illustrated on two microbiome quality control datasets. Our filtering method reduces: (1) the magnitude of differences in alpha diversity for samples containing the same bacteria processed at different labs and (2) the dissimilarity between samples (beta diversity) that contain the same microbiome potentially alleviating technical variability.
Recommended Citation
Cao, Quy Xuan, "Methods for Analyzing High Dimensional Data with Applications to the Wearable and Microbiome Data Analysis" (2019). Graduate Student Theses, Dissertations, & Professional Papers. 11507.
https://scholarworks.umt.edu/etd/11507
© Copyright 2019 Quy Xuan Cao