Submitted:
01 March 2023
Posted:
02 March 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
3. Problem Definitions
- Definition 3.1 Distance in Categorical Attributes.
- Definition 3.2 Calculating the Weight of an Attribute.
- Definition 3.3 Distance in Numeric Attributes.
- Definition 3.4 Distance in Mixed Attributes.
- Definition 3.5 Similarity of the Cluster Pair.
- Definition 3.6 Fuzzy Set.
- Definition 3.7 Convex Normal Fuzzy Set.
- Definition 3.8 Fuzzy Number.
- Definition 3.9 Fuzzy Interval.
- Definition 3.10 Support and core of a fuzzy set
- Definition 3.11 Set Superimposition.
- Definition 3.12 Superimposition of superimposed intervals.
- Definition 3.13 Merge Function.
4. Proposed Algorithm
| Algorithm1 |
| Step 1: Given an online d-dimensional dataset with both categorical and numeric attributes. Step 2: Select the number k, to decide the number of clusters. Step 3: Take first k, data instances and assign them as K-cluster centroid along with their time-stamp as start time of lifetime of the clusters. Step 4: Assign each incoming data instance to the closest centroid using equal weights for the categorical attributes. Step 5: Update the two auxiliary matrices maintained for storing the frequency of each categorical value occurring in the cluster, and the mean vector of the numerical parts of all the data instances belonging to the cluster. Step 6: Extend or update the lifetime of the clusters using the time-stamp of the current data instance to be inserted to the cluster Step 7: Compute the weights of categorical attributes. Step 8: if (assignment does not occur) go to step 9. else re-assign each data instance to the new closest centroid of each cluster. go to step 5 Step 9: for each possible pair of clusters (Ci, Cj) with lifetimes as a superimposed intervals S[ti] and S[tj] respectively { if (core(S[ti])∩core(S[tj]) = empty) break; else if (sim(Ci, Cj)<=sigma) {merge (Ci, Cj); superimpose the lifetimes } continue } Step 10: Output clusters |
5. Time Complexity
6. Experimental Settings and Discussions
![]() |

![]() |

![]() |





7. Conclusions
Availability Data and Material
DECLARATION
Ethical statement
References
- Pamula, R., Deka, J. K. Nandi, S.; “An Outlier Detection Method based on Clustering”, Proceedings of 2011 Second International Conference on Emerging Applications of Information Technology, India (February 2011) 253-256.
- Zhang, Y., Liu, J. and Li, H; “An Outlier Detection Algorithm Based on Clustering Analysis”, The Proceedings of 2010 First International Conference on Pervasive Computing, Signal Processing and Applications, September 2010.
- Agrawal, S., and Agrawal, J.; “Survey on Anomaly Detection on Data Mining Techniques”, Procedia Computer Science, 60(2015), pp. 708-713.
- Pocha, A. and Park, J. M.; “An overview of anomaly detection techniques: Existing solutions and latest technologies”, Computer Networks, 51(12), 2007, pp. 3448-3470.
- Zaki, M. J., and Wong, L.; “Data Mining Techniques”, WSPC-2003, Lecture Notes Series. http://www.cs.rpi.edu/~zaki/PaperDir/PGKD04.pdf.
- Soni, D.; “Understanding the different types of machine learning”, Towards Data Science, 2019. https://towardsdatascience.com/understanding-the-different-types-of-machine-learning-models-9c47350bb68a.
- Hartigan, J. A.; “Hartigan, "Clustering Algorithms", John Wiley & Sons, 1975.
- Bailey, K.; "Numerical Taxonomy and Cluster Analysis". Typologies and Taxonomies, (1994). 34.
- Gibson, D., Kleinberg, J. and Raghavan, P.; "Clustering categorical data: An approach based on dynamical systems", In Proc. of the 24th Int’l Conf. on Very Large Databases, New York (1998) 311-323. [CrossRef]
- Cheng, Yiu-ming, and Jia, H; “A Unified Metric for Categorical and Numeric Attributes in Data Clustering”, Hong Kong University Technical Report (July 2011), http://www.comp.hkbu.edu.hk/tech-report.
- Mazarbhuiya, F. A., Abulaish, M.; “Clustering Periodic Patterns using Fuzzy Statistical Parameters”, International Journal of Innovative Computing Information and Control (IJICIC), Vol. 8, No. 3(b), 2012, pp. 2113-2124.
- Gil-Garcia, R., Badia-Contelles, J. M, and Pons-Porrata, A.; ” Dynamic Hierarchical Compact Clustering Algorithm”, CIARP 2005, LNCS 3775, pp. 302-310.
- Hammouda, K. M., and Kamel, M. S.; “Efficient phrase-based document indexing for web document clustering”. IEEE Transactions on Knowledge and Data Engineering, 16(10):1279–1296, 2004. [CrossRef]
- Munz, G., Li, S., and Carle, G.; “Traffic Anomaly Detection using K-Means Clustering”, Allen Institute for Artificial Intelligence, 2007.
- Riad, A. M., Elhenawy, Ibrahim, Hassan, Ahmed and Awadallah, N; “visualize network anomaly detection by using k-means clustering algorithm”, International Journal of Computer Networks & Communications (IJCNC) Vol.5, No.5, September 2013.
- Mazarbhuiya, F. A. AlZahrani, M. Y. and Georgieva L.; “Anomaly Detection Using Agglomerative Hierarchical Clustering Algorithm”, ICISA 2018, Lecture Notes on Electrical Engineering (LNEE) Volume 514, Springer, Hong Kong, pp 475-484.
- Linquan, X., Wang, W., Liping, C., and Guangxue, Y. “An Anomaly Detection Method Based on Fuzzy C-means Clustering Algorithm”, Proceedings of the Second International Symposium on Networking and Network Security, April 2010, China, pp. 089-092.
- Mazarbhuiya, F. A. AlZahrani, M. Y. and Mahanta, A. K.; “Detecting Anomaly Using Partitioning Clustering with Merging”; ICIC Express Letters Vol. 14(10), Japan, October 2020, pp. 951-960.
- Retting, L., Khayati, M., Cudre-Mauroux, P. and Piorkowski,” 2015 IEEE International Conference on Big Data”, CA, USA.
- Li, X. and Han, J.; “Mining approximate top-k subspace anomalies in multi-dimensional time-series data,” in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, September 23-27, 2007, 2007, pp. 447–458.
- Gupta, M., Gao, J., Aggrawal, C. C., and Jain, J; “Outlier detection for temporal data: A survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 1, pp. 1–1, 2014.
- Zhao, Z., Birke, R., Han, R., Robu, B., Bouchenak, S., Ben Mokhtar S., and Chen, L. Y.;” RAD: On-line Anomaly Detection for Highly Unreliable Data”, November 2019. https://arxiv.org/abs/1911.04383.
- MacQueen, J.; “Some methods for classification and analysis of multivariate observations,” in Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14. Oakland, CA, USA., 1967, pp. 281–297.
- Chenaghlou, M., Moshtaghi, M., Lekhie, C., and Salahi, M.; “Online Clustering for Evolving Data Streams with Online Anomaly Detection”, Advances in Knowledge Discovery and Data Mining, June 2018, pp. 508-521.
- Baruah, H. K.; “The Randomness-Fuzziness consistency principle”, International Journal of Energy, Information and Communications, Vol 1(1), Nov-2010, Japan.
- Mahanta, A. K., Mazarbhuiya, F. A., and Baruah, H. K.; “Finding Calendar-based Periodic Patterns”, Pattern Recognition Letters, Vol. 29(9), Elsevier publication, USA, 2008, pp. 1274-1284. [CrossRef]
- Alguliyev, R., Aliguliyev, R., and Sukhostat, L., “Anomaly Detection in Big Data based on Clustering” In Statistics, Optimization & Information Computing, Vol. 5, Dec 2017, pp. 325-340.
- Zhao, Z., Mehrotra, K. G., and Mohan, C. K.; “Online Anomaly Detection Using Random Forest”, In: Mouhoub M., Sadaoui S., Ait Mohamed O., Ali M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science, Springer, Cham.
- Song, H., Jiang, Z., Men, A., and Yang, B.; “A Hybrid Semi-Supervised Anomaly Detection Model for High Dimensional data”. Computational Intelligence and Neuroscience, Vol 2017, pp. 1-9. [CrossRef]
- Zhao, Z., Mehrotra, K. G., and Mohan, C. K.; “Online Anomaly Detection Using Random Forest”, In: Mouhoub M., Sadaoui S., Ait Mohamed O., Ali M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science, Springer, Cham.
- Masdari, M., and Khezri, H.; “Towards fuzzy anomaly detection-based security: a comprehensive review”, Fuzzy Optimization and Decision Making, Vol 20(2001), 2020, pp. 1-49. [CrossRef]
- Izakian, H., and Pedrycz, W.; “Anomaly detection in time series data using fuzzy c-means clustering”, In proc. of 2013 Joint IFSA World congress and NAFIPS Annual meeting, Canada.
- Decker, L. Leite, D., Giommi, L., and Bonakorsi, D.; “Real-time anomaly detection in data centers for log-based predictive maintenance using fuzzy-rule based approach, April 2020, https://arxiv.org/pdf/2004.13527.pdf.
- de Campos Souza, P. V., Guimarães, A. J., Rezende, T. S., Silva Araujo, V. J., and Araujo, V. S.; “Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks”, AI 2020, 92-116, https://www.mdpi.com/2673-2688/1/1/5. [CrossRef]
- Loeve, M. ; “Probability Theory,” Springer Verlag, New York, 1977.
- Mazarbhuiya, F. A., Mahanta, A. K., and Baruah, H. K. Baruah,; “The Solution of fuzzy equation A+X=B using the method of superimposition”, Applied Mathematics, 2011, 2, 1039-1045.
- Basak, J., and Krishnapuram, R.; “Interpretable hierarchical clustering by constructing and unsupervised decision tree”, IEEE Trans. Knowledge and Data Engineering, vol.17, no.1, pp.121-132, 2005. [CrossRef]
- Klir, J., and Yuan, B.; “Fuzzy Sets and Logic Theory and Application”, Prentice Hill Pvt. Ltd. (2002).
Author’s Profile
![]() |
FOKRUL ALOM MAZARBHUIYA received B.Sc. from Assam University, India and M.Sc. from Aligarh Muslim University, India. After that he obtained his Ph.D. in Computer Science from Gauhati University, India. He worked as an Assistant Professor in College of Computer Science and Information Systems, King Khalid University, Saudi Arabia during 2008 to 2011 and, as an Assistant Professor, Information Technology, College of Computer Science and IT, AlBaha University, Saudi Arabia during 2011 to 2018. Since 2019, he is serving as a faculty member in the School of Fundamental & Applied Sciences, Assam Don Bosco University, India. He has published more than 60 research articles in various International and National Journals. His research interest includes Data Mining, Information Security, Fuzzy Mathematics and Fuzzy logic. |
| MOHAMED SHENIFY received his B.Sc. in computer science from Indiana State University, Terre Haute, Indiana, USA in May 1990; his M.Sc. in computer science from Ball State University, Muncie, Indiana, USA in December 1991 and his PhD in computer science from Illinois Institute of Technology, Chicago, Illinois, USA in May 1998. He is currently working as an Associate Professor at the College of Computer Science and Information Technology (CSIT), Albaha University, Saudi Arabia. His research interests include natural language processing, information retrieval, health care systems, fuzzy logic, and computing education |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).



