Year of Award
2019
Document Type
Professional Paper
Degree Type
Master of Science (MS)
Other Degree Name/Area of Focus
Data Scienece
Department or School/College
Mathematical Sciences
Committee Chair
Brian Steele
Commitee Members
Brian Steele, Emily Stone, Javier Perez Alvaro
Keywords
Outlier, Computational complexity, High dimesional dataset
Publisher
University of Montana
Subject Categories
Applied Statistics | Probability | Theory and Algorithms
Abstract
In statistics and data science, outliers are data points that differ greatly from other observations in a data set. They are important attributes of the data because they can dramatically influence patterns and relationships manifested by non-outliers. It is therefore very important to detect and adequately deal with outliers. Recently, a novel algorithm, the ROMA algorithm, has been proposed [11]. In this paper, we propose a modification of the ROMA algorithm that reduces its computational complexity from $O(n^2 m)$ to $O((n/(2^m-o(1)))^2 m)$ where $n$ is the number of data points and $m$ is the dimension of the space. And as a consequence, if $\log(n) <2^m$, then the improved complexity is $O((n/\log(n))^2 m)$.
Recommended Citation
Khormali, Omid, "High Dimensional Outlier Detection" (2019). Graduate Student Theses, Dissertations, & Professional Papers. 11377.
https://scholarworks.umt.edu/etd/11377
© Copyright 2019 Omid Khormali