Open Access Repository

Outlier detection algorithms over fuzzy data with weighted least squares

Nikolova, N ORCID: 0000-0001-6160-6282, Rodriguez, RM, Symes, M ORCID: 0000-0003-2241-4995, Toneva, D, Kolev, K and Tenekedjiev, K ORCID: 0000-0003-3549-0671 2021 , 'Outlier detection algorithms over fuzzy data with weighted least squares' , International Journal of Fuzzy Systems, vol. 23, no. 5 , 1234–1256 , doi: 10.1007/s40815-020-01049-8.

Full text not available from this repository.

Abstract

In the classical leave-one-out procedure for outlier detection in regression analysis, we exclude an observation and then construct a model on the remaining data. If the difference between predicted and observed value is high we declare this value an outlier. As a rule, those procedures utilize single comparison testing. The problem becomes much harder when the observations can be associated with a given degree of membership to an underlying population, and the outlier detection should be generalized to operate over fuzzy data. We present a new approach for outlier detection that operates over fuzzy data using two inter-related algorithms. Due to the way outliers enter the observation sample, they may be of various order of magnitude. To account for this, we divided the outlier detection procedure into cycles. Furthermore, each cycle consists of two phases. In Phase 1, we apply a leave-one-out procedure for each non-outlier in the dataset. In Phase 2, all previously declared outliers are subjected to Benjamini–Hochberg step-up multiple testing procedure controlling the false-discovery rate, and the non-confirmed outliers can return to the dataset. Finally, we construct a regression model over the resulting set of non-outliers. In that way, we ensure that a reliable and high-quality regression model is obtained in Phase 1 because the leave-one-out procedure comparatively easily purges the dubious observations due to the single comparison testing. In the same time, the confirmation of the outlier status in relation to the newly obtained high-quality regression model is much harder due to the multiple testing procedure applied hence only the true outliers remain outside the data sample. The two phases in each cycle are a good trade-off between the desire to construct a high-quality model (i.e., over informative data points) and the desire to use as much data points as possible (thus leaving as much observations as possible in the data sample). The number of cycles is user defined, but the procedures can finalize the analysis in case a cycle with no new outliers is detected. We offer one illustrative example and two other practical case studies (from real-life thrombosis studies) that demonstrate the application and strengths of our algorithms. In the concluding section, we discuss several limitations of our approach and also offer directions for future research.

Item Type: Article
Authors/Creators:Nikolova, N and Rodriguez, RM and Symes, M and Toneva, D and Kolev, K and Tenekedjiev, K
Keywords: regression analysis, leave-one-out method, degree of membership, multiple testing, Benjamini–Hochberg step-up multiple testing, false-discovery rate
Journal or Publication Title: International Journal of Fuzzy Systems
Publisher: Springer
ISSN: 1562-2479
DOI / ID Number: 10.1007/s40815-020-01049-8
Copyright Information:

Copyright Taiwan Fuzzy Systems Association 2021

Related URLs:
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP