Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Investigating Outlier Detection Techniques Based on Kernel Rough Clustering

Author(s): Wang Meng*, Cao Wenhang and Dui Hongyan

Volume 17, Issue 1, 2024

Published on: 12 September, 2023

Article ID: e120923220978 Pages: 11

DOI: 10.2174/2666255816666230912153541

Price: $65

Abstract

Background: Data quality is crucial to the success of big data analytics. However, the presence of outliers affects data quality and data analysis. Employing effective outlier detection techniques to eliminate dirty data can improve data quality and garner more accurate analytical insights. Data uncertainty presents a significant challenge for outlier detection methods and warrants further refinement in the era of big data.

Objective: The unsupervised outlier detection based on the integration of clustering and outlier scoring scheme is the current research hotspot. However, hard clustering fails when dealing with abnormal patterns with uncertain and unexpected behavior. Rough boundaries help identify more accurate cluster structures. Therefore, this article uses uncertainty soft clustering based on rough set theory to extend the clustering technology and designs appropriate scoring schemes to capture abnormal instances. This solves the problem of outlier detection in uncertain and nonlinear complex data.

Methods: This paper proposes the flow of an outlier detection algorithm based on Kernel Rough Clustering and then compares the detection accuracy with five existing popular methods using synthetic and real-world datasets. The results show that the proposed method has higher detection accuracy.

Results: The detection precision and recall of the proposed method were improved. For the detection accuracy, it is superior to popular methods, indicating that the proposed method has a good detection effect in identifying outlier.

Conclusion: Compared with popular methods, the proposed method has a slight advantage in detection accuracy and is one of the effective algorithms that can be selected for outlier detection.

Graphical Abstract

[1]
B. Ouyang, Y. Song, Y. Li, S. Gaurav, and B. Mathieu, "EBOD: An ensemble-based outlier detection algorithm for noisy datasets", Knowl. Bas. Sys., vol. 231, p. 107400, 2021.
[2]
H.J. Escalante, "A comparison of outlier detection algorithms for machine learning", Proc. Int. Conf. Commun. Comp., Nevade, USA, 2005, pp. 228-237.
[3]
J. Mourão-Miranda, D.R. Hardoon, T. Hahn, A.F. Marquand, S.C.R. Williams, J. Shawe-Taylor, and M. Brammer, "Patient classification as an outlier detection problem: An application of the one-class support vector machine", Neuroimage, vol. 58, no. 3, pp. 793-804, 2011.
[http://dx.doi.org/10.1016/j.neuroimage.2011.06.042] [PMID: 21723950]
[4]
K. Zhang, and M. Luo, "Outlier-robust extreme learning machine for regression problems", Neurocomputing, vol. 151, pp. 1519-1527, 2015.
[http://dx.doi.org/10.1016/j.neucom.2014.09.022]
[5]
L. Gao, M. Cai, and Q. Li, "A relative granular ratio-based outlier detection method in heterogeneous data", Inf. Sci., vol. 622, pp. 710-731, 2023.
[http://dx.doi.org/10.1016/j.ins.2022.11.154]
[6]
W. Yu, and W. Na, "Research on credit card fraud detection model based on distance sum", In International Joint Conference on Artificial Intelligence, Hainan, China, 2009, pp. 353-356.
[http://dx.doi.org/10.1109/JCAI.2009.146]
[7]
S. Axelsson, Intrusion detection systems: A taxonomy survey. Technical Report, 2000, pp. 99-15.
[8]
Z. Gao, C. Cecati, and S.X. Ding, "A survey of fault diagnosis and fault-tolerant techniques-part I:fault diagnosis with model-based and signal-based approaches", IEEE Trans. Ind. Electron., vol. 62, no. 6, pp. 3757-3767, 2015.
[http://dx.doi.org/10.1109/TIE.2015.2417501]
[9]
K.D. Borne, and A. Vedachalam, Surprise detection in multivariate astronomical data., Springer: Berlin, 2012, pp. 275-289.
[10]
G. Nychis, V. Sekar, and D.G. Andersen, "An empirical evaluation of entropy-based traffic anomaly detection", In Eighth ACM SIGCOMM conference on Internet measurement, Connecticut, USA, 2008, pp. 151-156.
[http://dx.doi.org/10.1145/1452520.1452539]
[11]
R. Li, H. Chen, S. Liu, X. Li, Y. Li, and B. Wang, "Incomplete mixed data-driven outlier detection based on local–global neighborhood information", Inf. Sci., vol. 633, pp. 204-225, 2023.
[http://dx.doi.org/10.1016/j.ins.2023.03.037]
[12]
Z. Pawlak, "Rough sets", Int. J. Comp. Inform. Sci., vol. 11, no. 5, pp. 341-356, 1982.
[http://dx.doi.org/10.1007/BF01001956]
[13]
F. Jiang, Y. Sui, and C. Cao, "A rough set approach to outlier detection", Int. J. Gen. Syst., vol. 37, no. 5, pp. 519-536, 2008.
[http://dx.doi.org/10.1080/03081070701251182]
[14]
F. Shaari, A.A. Bakar, and A.R. Hamdan, "Outlier detection based on rough sets theory", Intell. Data Anal., vol. 13, no. 2, pp. 191-206, 2009.
[http://dx.doi.org/10.3233/IDA-2009-0363]
[15]
Z. Xue, and S. Liu, "Rough-based semi-supervised outlier detection", In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, Tianjin, China, 2009, pp. 520-523.
[http://dx.doi.org/10.1109/FSKD.2009.227]
[16]
Q. Hu, Z. Yuan, K. Qin, and J. Zhang, "A novel outlier detection approach based on formal concept analysis", Knowl. Bas. Sys., vol. 268, pp. 110-486, 2023.
[17]
W. Ke, J. Wei, N. Xiong, and Q. Hou, "GSS: A group similarity system based on unsupervised outlier detection for big data computing", Inf. Sci., vol. 620, pp. 1-15, 2023.
[http://dx.doi.org/10.1016/j.ins.2022.11.078]
[18]
W. Hongzhi, B. Mohamed Jaward, and H. Mohamed, "Progress in outlier detection techniques: A survey", IEEE Access, vol. 7, pp. 107964-108000, 2019.
[19]
M. Ester, H-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise", Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Kyoto, Japan, 1996, pp. 226-231.
[20]
Z. He, X. Xu, and S. Deng, "Discovering cluster-based local outliers", Pattern Recognit. Lett., vol. 24, no. 9-10, pp. 1641-1650, 2003.
[http://dx.doi.org/10.1016/S0167-8655(03)00003-5]
[21]
L. Duan, L. Xu, Y. Liu, and J. Lee, "Cluster-based outlier detection", Ann. Oper. Res., vol. 168, no. 1, pp. 151-168, 2009.
[http://dx.doi.org/10.1007/s10479-008-0371-9]
[22]
J. Huang, Q. Zhu, L. Yang, D. Cheng, and Q. Wu, "A novel outlier cluster detection algorithm without top-n parameter", Knowl. Base. Syst., vol. 121, pp. 32-40, 2017.
[http://dx.doi.org/10.1016/j.knosys.2017.01.013]
[23]
A. Nowak-Brzezi’nskaa, and C. Hory’n, "Outliers in rules-the comparision of lof, cof and kmeans algorithms", 24th Int. Conf. Knowl. Base. Intell. Inform. Eng. Sys., vol. 176, pp. 1420-1429, 2020.
[24]
A. Mohiuddin, and M. Abdun Naser, "A novel approach for outlier detection and clustering improvement", IEEE 8 th Conference on In-dustrial Electronics and Applications (ICIEA), Melbourne, Australia, 2013, pp. 577-582.
[25]
A.K. Jain, and R.C. Dubes, "Algorithms for clustering data", Technometrics, vol. 32, no. 2, pp. 227-229, 1988.
[26]
S. Ramaswamy, R. Rastogi, and K. Shim, "Efficient algorithms for mining outliers from large data sets", In: ACM SIGMOD international conference on Management of data, Dallas Texas, USA, 2000, pp. 427-438.
[http://dx.doi.org/10.1145/342009.335437]
[27]
F. Angiulli, and C. Pizzuti, "Fast outlier detection in high dimensional spaces", In Sixth European Conference on Principles of Data Mining and Knowledge Discovery, Helsinki, Finland, 2002, pp. 15-26
[28]
K. Zhang, M. Hutter, and H. Jin, "A new local distance-based outlier detection approach for scattered real-world data", In 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Bangkok, Thailand, 2009, pp. 813-822.
[http://dx.doi.org/10.1007/978-3-642-01307-2_84]
[29]
B. Yu, M. Song, and L. Wang, "Local isolation coefficient-based outlier mining algorithm", In Second International Conference on IEEE Information Technology and Computer Science, Kiev, Ukraine, 2009, pp. 448-451.
[30]
M.M. Breunig, H.P. Kriegel, and R.T. Ng, "Lof: Identifying density-based local outliers", Proc. ACM. SIGMOD. Record., pp. 93-104, 2000.
[31]
T. Jian, Z. Chen, and A.W.C. Fu, "Enhancing effectiveness of outlier detections for low density patterns", In Pacific-Asia Conference on Knowledge Discovery and Data Mining, Taipei, China, 2002, pp. 535-548.
[32]
J. Wen, A.K. Tung, and J. Han, "Ranking outliers using symmetric neighborhood relationship", In Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 2006, pp. 577-593.
[33]
H.P. Kriegel, P. Krger, and E. Schubert, "Loop: Local outlier probabilities", In ACM Conference on Information and Knowledge Management, Hong Kong, China, 2009, pp. 1649-1652.
[http://dx.doi.org/10.1145/1645953.1646195]
[34]
A. Taylor, "Identifying organisms for production using unsupervised parameter learning for outlier detection", US Patent 11574153, February 7, 2023.
[35]
I. Iryna Vogler, and M. Iman, "Machine learning-based data analyses for outlier detection", US Patent 11537942, December 12 2022.
[36]
J.I.A. Yuting, and N. Jayaram, "Machine learning outlier detection using weighted histogram-based outlier scoring (W-HBOS)", US Patent 20220101069, March 31, 2022.
[37]
P. Lingras, and C. West, "Interval set clustering of web users with rough k-means", J. Intell. Inf. Syst., vol. 23, no. 1, pp. 5-16, 2004.
[http://dx.doi.org/10.1023/B:JIIS.0000029668.88665.1a]
[38]
W. Meng, D. Hongyan, Z. Shiyuan, D. Zhankui, and W. Zige, "The kernel rough k-means algorithm", Rec. Adv. Comp. Sci. Commun., vol. 13, no. 2, pp. 234-239, 2020.
[http://dx.doi.org/10.2174/2213275912666190716121431]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy