High Dimensional Outlier Detection
Presentation Type
Oral Presentation
Abstract/Artist Statement
In statistics and data science, the outliers are the data points that differ greatly from other values in a data set. They are important when looking at the large data set because they can sometimes effect on perceiving the whole data. It is therefore very important to detect and adequately deal with outliers. Recently, in [V. Menon and S. Kalyani, Structured and Unstructured Outlier Identification for Robust PCA: A Non iterative, Parameter free Algorithm, arXiv:1809.04445v1], a novel algorithm for detecting outliers is presented which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. In this research, we improved this algorithm by reducing its complexity from O(n2m) to O((n/log(n))2m) where n is the number of data points and m is the dimension of the space.
Mentor Name
Brian Steele
High Dimensional Outlier Detection
UC 333
In statistics and data science, the outliers are the data points that differ greatly from other values in a data set. They are important when looking at the large data set because they can sometimes effect on perceiving the whole data. It is therefore very important to detect and adequately deal with outliers. Recently, in [V. Menon and S. Kalyani, Structured and Unstructured Outlier Identification for Robust PCA: A Non iterative, Parameter free Algorithm, arXiv:1809.04445v1], a novel algorithm for detecting outliers is presented which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. In this research, we improved this algorithm by reducing its complexity from O(n2m) to O((n/log(n))2m) where n is the number of data points and m is the dimension of the space.