Year of Award
2019
Document Type
Professional Paper
Degree Type
Master of Science (MS)
Other Degree Name/Area of Focus
Data Scienece
Department or School/College
Mathematical Sciences
Committee Chair
Brian Steele
Commitee Members
Brian Steele, Emily Stone, Javier Perez Alvaro
Keywords
Outlier, Computational complexity, High dimesional dataset
Subject Categories
Applied Statistics | Probability | Theory and Algorithms
Abstract
In statistics and data science, outliers are data points that differ greatly from other observations in a data set. They are important attributes of the data because they can dramatically influence patterns and relationships manifested by non-outliers. It is therefore very important to detect and adequately deal with outliers. Recently, a novel algorithm, the ROMA algorithm, has been proposed [11]. In this paper, we propose a modification of the ROMA algorithm that reduces its computational complexity from $O(n^2 m)$ to $O((n/(2^m-o(1)))^2 m)$ where $n$ is the number of data points and $m$ is the dimension of the space. And as a consequence, if $\log(n) <2^m$, then the improved complexity is $O((n/\log(n))^2 m)$.
Recommended Citation
Khormali, Omid, "High Dimensional Outlier Detection" (2019). Graduate Student Theses, Dissertations, & Professional Papers. 11377.
https://scholarworks.umt.edu/etd/11377
© Copyright 2019 Omid Khormali