Year of Award

2019

Document Type

Professional Paper

Degree Type

Master of Science (MS)

Other Degree Name/Area of Focus

Data Scienece

Department or School/College

Mathematical Sciences

Committee Chair

Brian Steele

Commitee Members

Brian Steele, Emily Stone, Javier Perez Alvaro

Keywords

Outlier, Computational complexity, High dimesional dataset

Subject Categories

Applied Statistics | Probability | Theory and Algorithms

Abstract

In statistics and data science, outliers are data points that differ greatly from other observations in a data set. They are important attributes of the data because they can dramatically influence patterns and relationships manifested by non-outliers. It is therefore very important to detect and adequately deal with outliers. Recently, a novel algorithm, the ROMA algorithm, has been proposed [11]. In this paper, we propose a modification of the ROMA algorithm that reduces its computational complexity from $O(n^2 m)$ to $O((n/(2^m-o(1)))^2 m)$ where $n$ is the number of data points and $m$ is the dimension of the space. And as a consequence, if $\log(n) <2^m$, then the improved complexity is $O((n/\log(n))^2 m)$.

Share

COinS
 

© Copyright 2019 Omid Khormali