#### Presentation Title

High Dimensional Outlier Detection

#### Presentation Type

Oral Presentation

#### Abstract/Artist Statement

In statistics and data science, the outliers are the data points that differ greatly from other values in a data set. They are important when looking at the large data set because they can sometimes effect on perceiving the whole data. It is therefore very important to detect and adequately deal with outliers. Recently, in [V. Menon and S. Kalyani, Structured and Unstructured Outlier Identification for Robust PCA: A Non iterative, Parameter free Algorithm, arXiv:1809.04445v1], a novel algorithm for detecting outliers is presented which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. In this research, we improved this algorithm by reducing its complexity from O(n2m) to O((n/log(n))2m) where n is the number of data points and m is the dimension of the space.

#### Mentor Name

Brian Steele

High Dimensional Outlier Detection

UC 333

In statistics and data science, the outliers are the data points that differ greatly from other values in a data set. They are important when looking at the large data set because they can sometimes effect on perceiving the whole data. It is therefore very important to detect and adequately deal with outliers. Recently, in [V. Menon and S. Kalyani, Structured and Unstructured Outlier Identification for Robust PCA: A Non iterative, Parameter free Algorithm, arXiv:1809.04445v1], a novel algorithm for detecting outliers is presented which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. In this research, we improved this algorithm by reducing its complexity from O(n2m) to O((n/log(n))2m) where n is the number of data points and m is the dimension of the space.