Year of Award

2019

Document Type

Professional Paper

Degree Type

Master of Science (MS)

Other Degree Name/Area of Focus

Data Scienece

Department or School/College

Mathematical Sciences

Committee Chair

Brian Steele

Commitee Members

Brian Steele, Emily Stone, Javier Perez Alvaro

Keywords

Outlier, Computational complexity, High dimesional dataset

Publisher

University of Montana

Subject Categories

Applied Statistics | Probability | Theory and Algorithms

Abstract

In statistics and data science, outliers are data points that differ greatly from other observations in a data set. They are important attributes of the data because they can dramatically influence patterns and relationships manifested by non-outliers. It is therefore very important to detect and adequately deal with outliers. Recently, a novel algorithm, the ROMA algorithm, has been proposed [11]. In this paper, we propose a modification of the ROMA algorithm that reduces its computational complexity from $O(n^2 m)$ to $O((n/(2^m-o(1)))^2 m)$ where $n$ is the number of data points and $m$ is the dimension of the space. And as a consequence, if $\log(n) <2^m$, then the improved complexity is $O((n/\log(n))^2 m)$.

Share

COinS
 

© Copyright 2019 Omid Khormali