Submitted:
22 May 2023
Posted:
23 May 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
3. Problem Definitions
4. Proposed Algorithm
| Algorithm 1. Subspace Generation. |
| Input. (U, A): the information system, where the attribute set A is divided into C-conditional attributes and D-decision attributes, consisting of n objects, Output: Subspace of (U, A) Step1. Generate a dominance relation on U corresponding to C and X⊆U. Step2. Generate the nano topology and its basis Step3. for each x∈C, find and Step4. if () Step5. then drop x from C, Step6. else form criterion reduction Step7. end for Step8. generate CORE(C)=∩{criterion reductions} Step9. Generate subspace of the given information system |
| Algorithm 2: Dynamic k-means clustering algorithm |
| Input. E: Information system consisting n objects and attribute set CORE(A) ⊆ A, tmax: the maximum time-gap of consecutive time-stamp, tmin: the minimum length of life-span. Output. Set of clusters where each cluster is associated with a sequence of time intervals as its life-spans Step1. Given d1-dimensional dataset CORE(A) Step2. Select C[i]={x[i], tp[i]}; i=1,2,..k, where x[i] be the data instances or means of clusters, tp[i] points to list of time-intervals each maintained for every cluster contains time-stamps (start-time) of x[i] and start-time = last-time initially Step3. for each incoming data instance x with current time-stamp current-time Step3. { if d(x, Cj) ≤ d(x, Ci), i ≠ j; i =1, 2,…k Step4. {Add x to Cj Step5. Update mean(Cj) Step6. if (|current-time – last-time[j]|≤ tmax) Step7. {if(last-time[j] ≤ current-time) Step8. extend life-span(Cj) by setting last-time[j] = current-time Step9. else go to Step3 Step10. } Step11. else if(|last-time[j] - start-time|≥ tmin Step12. {Add [start-time[j], last-time[j]] to tp[j] Step13. set last-time[j] = start-time[j] = current-time Step14. } Step15. } Step16. } Step17. if (assign does not occur) go to Step19 Step18 else go to Step3 Step19. Output cluster set |
| Algorithm 3: algorithm for finding periodic(fully/partially) and fuzzy periodic clusters. |
|
Step1. For each cluster c with list of line-spans L. Step2. initially Lc=null // Lc is the list of superimposed intervals Step3. lt = L.get() // lt points to the 1st time interval (life-span) in L Step4. Lc = append(lt) Step5. m=1 // m = number of intervals superimposed Step6. while((lt=L.get())!=null) Step7. {flag = 0 Step8. while ((lct =L.get())!=null) Step9. if (compsuperimp(lt, lct) Step10. flag =1 Step11. if (flag == 0) Step12. Lc.append(lt) } Step13. } Step14. } Step15. compsupeimp(lt, lct) Step16. if(|intersect(lct, lt)!=null)| Step17. { superimp(lct, lt) Step18. m++ Step19. return 1 Step20. } Step21 return 0 Step22. match ratio = m/n // n = number periods in the whole dataset. Step23. if (match=1) Step24. the cluster c is fully periodic Step25. else partially periodic Step26. generate fuzzy intervals from superimposed intervals to get fuzzy periodic clusters. Step27. End |
5. Complexity Analysis
6. Experimental Analysis and Results.
7. Conclusions, Limitations and Lines for Future works
7.1. Conclusions.
7.2. Limitations and Future directions of work
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, L.D.; He, W.; Li, S. Internet of Things in Industries: A Survey. IEEE Trans. Ind. Inform. 2014, 10, 2233–2243. [Google Scholar] [CrossRef]
- Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Informatics 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
- Sethi, P.; Sarangi, S.R. Internet of Things: Architectures, Protocols, and Applications. J. Electr. Comput. Eng. 2017, 2017, 1–25. [Google Scholar] [CrossRef]
- Papaioannou, M.; Karageorgou, M.; Mantas, G.; Sucasas, V.; Essop, I.; Rodriguez, J.; Lymberopoulos, D. A Survey on Security Threats and Countermeasures in Internet of Medical Things (IoMT). Trans. Emerg. Telecommun. Technol. 2020, 33. [Google Scholar] [CrossRef]
- Mantas, G.; Komninos, N.; Rodriguz, J.; Logota, E.; and Marques, H. Security for 5G Communications. in Fundamentals of 5G Mobile Networks. Wiley, 2015, pp. 207–220. [CrossRef]
- Zarpelão, B.B.; Miani, R.S.; Kawakani, C.T.; de Alvarenga, S.C. A survey of intrusion detection in Internet of Things. J. Netw. Comput. Appl. 2017, 84, 25–37. [Google Scholar] [CrossRef]
- Makhdoom, I.; Abolhasan, M.; Lipman, J.; Liu, R.P.; Ni, W. Anatomy of Threats to the Internet of Things. IEEE Commun. Surv. Tutorials 2018, 21, 1636–1675. [Google Scholar] [CrossRef]
- Zachos, G.; Essop, I.; Mantas, G.; Porfyrakis, K.; Ribeiro, J.C.; Rodriguez, J. Generating IoT Edge Network Datasets based on the TON_IoT Telemetry Dataset. 2021 IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). LOCATION OF CONFERENCE, PortugalDATE OF CONFERENCE; pp. 1–6.
- Mazarbhuiya, F.A.; Shenify, M. A Mixed Clustering Approach for Real-Time Anomaly Detection. Appl. Sci. 2023, 13, 4151. [Google Scholar] [CrossRef]
- Mazarbhuiya, F.A.; AlZahrani, M.Y.; Mahanta, A.K. Detecting Anomaly Using Partitioning Clustering with Merging. ICIC Express Lett. 2020, 14, 951–960. [Google Scholar]
- Mazarbhuya, F.A.; AlZahrani, M.Y.; Georgieva, L. Anomaly Detection Using Agglomerative Hierarchical Clustering Algorithm; ICISA 2018. Lecture Notes on Electrical Engineering (LNEE); Springer: Hong Kong. 2019; Volume 514, pp. 475–484.
- Mazarbhuiya, F. A. Detecting Anomaly using Neighborhood Rough Set based Classification Approach. ICIC Express Lett. 2023, 17, 73–80. [Google Scholar]
- Al Mamun, S. M. A.; Valmaki, J. Anomaly Detection and Classification in Cellular Networks Using Automatic Labeling Technique for Applying Supervised Learning. Procedia Comput. Sci. 2018, 140, 186–195. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, H.; Zheng, X.; Tian, L. An Efficient Framework for Unsupervised Anomaly Detection over Edge-Assisted Internet of Things. ACM Trans. Sens. Networks 2023. [Google Scholar] [CrossRef]
- Mozaffari, M.; Doshi, K.; Yilmaz, Y. Self-Supervised Learning for Online Anomaly Detection in High-Dimensional Data Streams. Electronics 2023, 12, 1971. [Google Scholar] [CrossRef]
- Angiulli, F.; Fassetti, F.; Serrao, C. Anomaly detection with correlation laws. Data Knowl. Eng. 2023, 145. [Google Scholar] [CrossRef]
- Zhou, F.; Wang, G.; Zhang, K.; Liu, S.; Zhong, T. Semi-Supervised Anomaly Detection Via Neural Process. IEEE Trans. Knowl. Data Eng. 2023, 35, 10423–10435. [Google Scholar] [CrossRef]
- Retting, L.; Khayaati, M.; Cudre-Mauroux, P.; Piorkowski, M. Online anomaly detection over Big Data streams. In Proceedings of the 2015 IEEE International Conference on Big Data. Santa Clara, CA, USA, 29 October–1 November 2015. [Google Scholar]
- Hartigan, J.A. Hartigan Clustering Algorithms. John Wiley & Sons: 1975.
- Cheng, Y.-M.; Jia, H. A Unified Metric for Categorical and Numeric Attributes in Data Clustering. Hong Kong University Technical Report; Publisher: 2011. Available online: https://www.comp.hkbu.edu.hk/tech-report.
- Mazarbhuiya, F.A.; Abulaish, M. Clustering Periodic Patterns using Fuzzy Statistical Parameters. Int. J. Innov. Comput. Inf. Control. 2012, pp. 2113–2124.
- Gil-Garcia, R.; Badia-Contealles, J.M.; Pons-Porrata, A. Dynamic Hierarchical Compact Clustering Algorithm. In Progress in Pattern Recognition, Image Analysis and Applications. Sanfeliu, A., Cortés, M.L., Eds.; CIARP 2005, LNCS 3775; Springer: Berlin, Heidelberg; pp. 302–310.
- Hammouda, K.; Kamel, M. Efficient phrase-based document indexing for Web document clustering. IEEE Trans. Knowl. Data Eng. 2004, 16, 1279–1296. [Google Scholar] [CrossRef]
- Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
- Hodge, V. , Austin, J. A survey of outlier detection methodologies, Artif Intell Rev vol..22(2), pp. 85–126, 2004.
- Kaya, M.-F.; Schoop, M. Analytical Comparison of Clustering Techniques for the Recognition of Communication Patterns. Group Decis. Negot. 2021, 31, 555–589. [Google Scholar] [CrossRef]
- Aggarwaal C., C.; and Philip, S. Y. An effective and efficient algorithm for high-dimensional outlier detection, VLDB J. vol. 14(2), pp. 211–221, 2005.
- Ramchandran, A.; Sangaiaah, A. K. Chapter 11 - Unsupervised Anomaly Detection for High Dimensional Data—an Exploratory Analysis, Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications, Intelligent Data-Centric Systems, pp. 233-251, 2018.
- Retting, L.; Khayati, M.; Cudre-Maurooux, P.; Piorkowski, M. Online anomaly detection over Big Data streams. In Proceedings of the 2015 IEEE International Conference on Big Data, Santa Clara, CA, USA, 29 October–1 November 2015. [Google Scholar]
- Alguliyev, R.; Aliguliyev, R.; Sukhostat, L. Anomaly Detection in Big Data based on Clustering. Stat. Optim. Inf. Comput. 2017, 5, 325–340. [Google Scholar] [CrossRef]
- Hahsler, M.; Piekenbrock, M.; Doran, D. dbscan: Fast Density-Based Clustering with R. J. Stat. Softw. 2019, 91, 1–30. [Google Scholar] [CrossRef]
- Song, H.; Jiang, Z.; Men, A.; Yang, B. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data. Comput. Intell. Neurosci. 2017, 2017, 1–9. [Google Scholar] [CrossRef]
- Mazarbhuiya, F. A.
- Ahmed, S.; Lavin, A. ; Purdy, S; and Aghaa, Z. Unsupervised real-time anomaly detection for streaming data, Neurocomputing 262 (2017) pp. 134–147.
- Pawlak, Z. , Rough sets, International Journal of Computer and Information Sciences. vol. 11, pp. 341–356, 1982.
- Thivagar, M. L.; Richaard, C. On nano forms of weakly open sets, International Journal of Mathematics and Statistics Invention. vol. 1(1), pp. 31–37, 2013.
- Thivagar, M.L.; Priyalatha, S. Medical diagnosis in a indiscernibility matrix based on nano topology. Cogent Math. 2017, 4. [Google Scholar] [CrossRef]
- Kim, B.; Alawami, M.A.; Kim, E.; Oh, S.; Park, J.; Kim, H. A Comparative Study of Time Series Anomaly Detection Models for Industrial Control Systems. Sensors 2023, 23, 1310. [Google Scholar] [CrossRef]
- Alghawli, A.S. Complex methods detect anomalies in real time based on time series analysis. Alex. Eng. J. 2021, 61, 549–561. [Google Scholar] [CrossRef]
- Younas, M.Z. Anomaly Detection using Data Mining Techniques: A Review. Int. J. Res. Appl. Sci. Eng. Technol. 2020, 8, 568–574. [Google Scholar] [CrossRef]
- Thudumu, S.; Branch, P.; Jin, J.; Singh, J. (. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 2020, 7, 1–30. [Google Scholar] [CrossRef]
- Habeeb, R.A.A.; Nasaruddin, F.; Gani, A.; Hashem, I.A.T.; Ahmed, E.; Imran, M. Real-time big data processing for anomaly detection: A Survey. Int. J. Inf. Manag. 2018, 45, 289–307. [Google Scholar] [CrossRef]
- Wang, B.; Hua, Q.; Zhang, H.; Tan, X.; Nan, Y.; Chen, R.; Shu, X. Research on anomaly detection and real-time reliability evaluation with the log of cloud platform. Alex. Eng. J. 2022, 61, 7183–7193. [Google Scholar] [CrossRef]
- Halstead, B.; Koh, Y.S.; Riddle, P.; Pechenizkiy, M.; Bifet, A. Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams. ACM Trans. Knowl. Discov. Data 2023, 17, 1–36. [Google Scholar] [CrossRef]
- Zhao, Z.; Birke, R.; Han, R.; Robu, B.; Buchenak, S.; Ben Mokhtar, S.; Chen, L.Y. RAD: On-line Anomaly Detection for Highly Unreliable Data. arXiv arXiv:1911.04383. https://arxiv.org/abs/1911.04383, 2019.
- Chenaghlou, M.; Moshtghi, M.; Lekhie, C.; Salahi, M. Online Clustering for Evolving Data Streams with Online Anomaly Detection. In Advances in Knowledge Discovery and Data Mining. In Proceedings of the 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, 3–6 June 2018; pp. 508–521. [Google Scholar]
- Firoozjaei, M.D.; Mahmoudyar, N.; Baseri, Y.; Ghorbani, A.A. An evaluation framework for industrial control system cyber incidents. Int. J. Crit. Infrastruct. Prot. 2021, 36, 100487. [Google Scholar] [CrossRef]
- Chen, Q.; Zhou, M.; Cai, Z.; Su, S. Compliance Checking Based Detection of Insider Threat in Industrial Control System of Power Utilities. 2022 7th Asia Conference on Power and Electrical Engineering (ACPEE). LOCATION OF CONFERENCE, ChinaDATE OF CONFERENCE; pp. 1142–1147.
- Zhao, Z.; Mehrootra, K.G.; Mohan, C.K. Online Anomaly Detection Using Random Forest. In Recent Trends and Future Technology in Applied Intelligence; Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M., Eds.; IEA/AIE 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland.
- Izakian, H.; Pedryecz, W. Anomaly detection in time series data using fuzzy c-means clustering. In Proceedings of the 2013 Joint IFSA World congress and NAFIPS Annual meeting, Edmonton, AB, Canada, 24–28 June 2013. [Google Scholar]
- Decker, L.; Leite, D.; Giommi, L.; Bonacorsi, D. Real-Time Anomaly Detection in Data Centers for Log-based Predictive Maintenance using an Evolving Fuzzy-Rule-Based Approach. arXiv, 2020; arXiv:2004.13527v1. [Google Scholar]
- Masdari, M.; Khezri, H. Towards fuzzy anomaly detection-based security: a comprehensive review. Fuzzy Optim. Decis. Mak. 2020, 20, 1–49. [Google Scholar] [CrossRef]
- de Campos Souza, P.V.; Guimarães, A.J.; Rezenede, T. S.; Silva Araujo, V. .J.; Araujo, V. S. Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks. AI 2020, vol. 1, pp. 92–116. https://www.mdpi.com/2673-2688/1/1/5.
- Habeeb, R.A.A.; Nasauddin, F.; Gani, A.; Hashem, I.A.T.; Amanullah, A.M.E.; Imran, M. Clustering-based real-time anomaly detection—A breakthrough in big data technologies. Spec. Issue: ‘Context Aware Mobil. Internet Things Enabling Technol. Appl. Chall. ‘Intell. Resour. Manag. Cloud Comput. Netw. 2022, vol. 33, e3647.
- Mahanta, A.K.; Mazarbhuiya, F.A.; Baruah, H.K. Finding calendar-based periodic patterns. Pattern Recognit. Lett. 2008, 29, 1274–1284. [Google Scholar] [CrossRef]
- Mazarbhuiya, F.A.; Mahanta, A.K.; Baruah, H.K. Solution of the Fuzzy Equation A + X = B Using the Method of Superimposition. Appl. Math. 2011, 02, 1039–1045. [Google Scholar] [CrossRef]
- Zadeh, L. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 1978, 1, 3–28. [Google Scholar] [CrossRef]
- Loeve, M. Probability Theory. Springer Verlag: New York, NY, USA, 1977.
- Klir, J.; and Yuan, B. Fuzzy Sets and Logic Theory and Application. Prentice Hill Pvt. Ltd.: 2002, New Jersey USA.
- Qiana, Y.; Dang, C.; Liaanga, J.; and Tangc, D. Set-valued ordered information systems Information Sciences. Vol. 179, pp. 2809-2832, 2009.
- Stripling, E.; Baesens, B.; Chizi, B.; Broucke, S.V. Isolation-based conditional anomaly detection on mixed-attribute data to uncover workers' compensation fraud. Decis. Support Syst. 2018, 111, 13–26. [Google Scholar] [CrossRef]
- Ding, Z.; Fei, M. An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data using Sliding Window. IFAC Proc. Vol. 2013, 46, 12–17. [Google Scholar] [CrossRef]
- Abdullah, J.; and Chandran, N. Hierarchical Density-based Clustering of Malware Behaviour. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 2017. 9(2-10): p. 159-164.
- Kitsune Network Attack Dataset, Available online: https://github.com/ymirsky/Kitsune-py.
- UCI KDD Archive KDD Cup 1999 Data. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on Ocotber 2007).













| Algorithms | Accuracy | Execution time | Periodic clusters obtained | |||
|---|---|---|---|---|---|---|
| KDDCUP’99 (41 attributes) |
Kitsune (115 attributes) |
KDDCUP’99 (41 attributes) |
Kitsune (115 attributes) |
|||
| 1 | k-means | 95% | 86% | 28 | 95 | × |
| 2 | IF model | 84% | 74% | 19 | 64.5 | × |
| 3 | SC | 61.1% | 65.3% | 44 | 149.5 | × |
| 4 | HDBSCAN | 24.1% | 38.5% | 95 | 150 | × |
| 5 | ACA | 82% | 72% | 16 | 54.4 | × |
| 6 | LOF | 94.7% | 90.2% | 14 | 47.6 | × |
| 7 | SSWLOFCC | 95.6% | 93.9 | 12 | 40 | × |
| 8 | PCM | 86% | 76% | 26 | 88 | × |
| 9 | OnCAD | 97% | 84% | 30 | 102 | × |
| 10 | MICA | 98% | 98% | 28 | 68 | × |
| 11 | Proposed Approach (RFPSCA) |
98% | 98.3% | 58 | 88.5 | √ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).