Submitted:
13 August 2024
Posted:
14 August 2024
You are already at the latest version
Abstract
Keywords:
I. Introduction





7. Quantile Transformation











II. Literature Survey
III PROPOSED ALGORITHM
IV. Results
V. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Ethical considerations
References
- Maradana Durga Venkata Prasad, Dr. Srikanth T, “A Survey on Clustering Algorithms and their Constraints”, INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING, 11(6s).
- Y. Singh and A. Mohan, “A Survey on Unsupervised Clustering Algorithm based on K-Means Clustering,” International Journal of Computer Applications, vol. 156, no. 8, pp. 6–9, Dec. 2016, doi: https://doi.org/10.5120/ijca2016912481. [CrossRef]
- D. Cao and B. Yang, “An improved k-medoids clustering algorithm,” Apr. 2010, doi: https://doi.org/10.1109/iccae.2010.5452085. [CrossRef]
- M. Roux, “A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms,” Journal of Classification, vol. 35, no. 2, pp. 345–366, Jul. 2018, doi: https://doi.org/10.1007/s00357-018-9259-9. [CrossRef]
- D. Deng, “DBSCAN Clustering Algorithm Based on Density,” IEEE Xplore, Sep. 01, 2020.
- Patwary, D. Palsetia, A. Agrawal, W. Liao, F. Manne, and A. Choudhary, “Scalable parallel OPTICS data clustering using graph algorithmic techniques,” IEEE International Conference on High Performance Computing, Data, and Analytics, Nov. 2013, doi: https://doi.org/10.1145/2503210.2503255. [CrossRef]
- Y. Zhang et al., “Gaussian Mixture Model Clustering with Incomplete Data,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 1s, pp. 1–14, Jan. 2021, doi: https://doi.org/10.1145/3408318. [CrossRef]
- X. Wei and W. B. Croft, “LDA-based document models for ad-hoc retrieval,” Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, 2006, doi: https://doi.org/10.1145/1148170.1148204. [CrossRef]
- W. Wang, J. Yang, and R. R. Muntz, “STING: A Statistical Information Grid Approach to Spatial Data Mining,” Very Large Data Bases, pp. 186–195, Aug. 1997.
- M. Rysz, Foad Mahdavi Pajouh, and E. L. Pasiliao, “Finding clique clusters with the highest betweenness centrality,” vol. 271, no. 1, pp. 155–164, Nov. 2018, doi: https://doi.org/10.1016/j.ejor.2018.05.006. [CrossRef]
- H. KUANG and J. LUO, “Text clustering based on genetic fuzzy C-means algorithm,” Journal of Computer Applications, vol. 29, no. 2, pp. 558–560, Apr. 2009, doi: https://doi.org/10.3724/sp.j.1087.2009.00558. [CrossRef]
- Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an algorithm,” Neural Information Processing Systems, vol. 14, pp. 849–856, Jan. 2001.
- V. V. Mazur, K. A. Barmuta, S. S. Demin, E. A. Tikhomirov, and M. A. Bykovskiy, “Innovation Clusters: Advantages and Disadvantages,” DOAJ (DOAJ: Directory of Open Access Journals), Mar. 2016.
- Maradana Durga Venkata Prasad, Dr. Srikanth T, “Buddy System Based Alpha Numeric Weight Based Clustering Algorithm with User Threshold”, INTELLIGENT SYSTEMS AND APPLICATIONS IN , ENGINEERING, 12(8s).
- X. Zou, Y. Hu, Z. Tian and K. Shen, "Logistic Regression Model Optimization and Case Analysis," 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 2019, pp. 135-139, doi: 10.1109/ICCSNT47585.2019.8962457. [CrossRef]
- E. I. G. Nassara, E. Grall-Maës and M. Kharouf, "Linear Discriminant Analysis for Large-Scale Data: Application on Text and Image Data," 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 2016, pp. 961-964, doi: 10.1109/ICMLA.2016.0173. [CrossRef]
- Navada, A. N. Ansari, S. Patil and B. A. Sonkamble, "Overview of use of decision tree algorithms in machine learning," 2011 IEEE Control and System Graduate Research Colloquium, Shah Alam, Malaysia, 2011, pp. 37-42, doi: 10.1109/ICSGRC.2011.5991826. [CrossRef]
- J. K. Jaiswal and R. Samikannu, "Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression," 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2017, pp. 65-68, doi: 10.1109/WCCCT.2016.25. [CrossRef]
- N. Aziz, E. A. P. Akhir, I. A. Aziz, J. Jaafar, M. H. Hasan and A. N. C. Abas, "A Study on Gradient Boosting Algorithms for Development of AI Monitoring and Prediction Systems," 2020 International Conference on Computational Intelligence (ICCI), Bandar Seri Iskandar, Malaysia, 2020, pp. 11-16, doi: 10.1109/ICCI51257.2020.9247843. [CrossRef]
- M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf, "Support vector machines," in IEEE Intelligent Systems and their Applications, vol. 13, no. 4, pp. 18-28, July-Aug. 1998, doi: 10.1109/5254.708428. [CrossRef]
- E. Wilson and D. W. Tufts, "Multilayer perceptron design algorithm," Proceedings of IEEE Workshop on Neural Networks for Signal Processing, Ermioni, Greece, 1994, pp. 61-68, doi: 10.1109/NNSP.1994.366063. [CrossRef]
- R. Chauhan, K. K. Ghanshala and R. C. Joshi, "Convolutional Neural Network (CNN) for Image Detection and Recognition," 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 2018, pp. 278-282, doi: 10.1109/ICSCCC.2018.8703316. [CrossRef]
- S. Zhang, F. Yang, D. Zhou and X. Zeng, "Bayesian Methods for the Yield Optimization of Analog and SRAM Circuits," 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), Beijing, China, 2020, pp. 440-445, doi: 10.1109/ASP-DAC47756.2020.9045614. [CrossRef]
- K. Taunk, S. De, S. Verma and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 2019, pp. 1255-1260, doi: 10.1109/ICCS45141.2019.9065747. [CrossRef]
- T. R. N and R. Gupta, "Feature Selection Techniques and its Importance in Machine Learning: A Survey," 2020 IEEE International Students' Conference on Electrical,Electronics and Computer Science (SCEECS), Bhopal, India, 2020, pp. 1-6, doi: 10.1109/SCEECS48394.2020.189. [CrossRef]
- J. Heaton, "An empirical analysis of feature engineering for predictive modeling," SoutheastCon 2016, Norfolk, VA, USA, 2016, pp. 1-6, doi: 10.1109/SECON.2016.7506650. [CrossRef]
- D. U. Ozsahin, M. Taiwo Mustapha, A. S. Mubarak, Z. Said Ameen and B. Uzun, "Impact of feature scaling on machine learning models for the diagnosis of diabetes," 2022 International Conference on Artificial Intelligence in Everything (AIE), Lefkosa, Cyprus, 2022, pp. 87-94, doi: 10.1109/AIE57029.2022.00024. [CrossRef]
- K. S. Prathyusha and B. E. Reddy, "Normalization Methods for Multiple Sources of Data," 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2021, pp. 1013-1019, doi: 10.1109/ICICCS51141.2021.9432142. [CrossRef]
- S. Naveen, A. Omkar, J. Goyal and R. Gaikwad, "Analysis of Principal Component Analysis Algorithm for Various Datasets," 2022 International Conference on Futuristic Technologies (INCOFT), Belgaum, India, 2022, pp. 1-7, doi: 10.1109/INCOFT55651.2022.10094448. [CrossRef]
- P. Hajibabaee, F. Pourkamali-Anaraki and M. A. Hariri-Ardebili, "An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering," 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 2021, pp. 1674-1680, doi: 10.1109/ICMLA52953.2021.00267. [CrossRef]
- E. I. G. Nassara, E. Grall-Maës and M. Kharouf, "Linear Discriminant Analysis for Large-Scale Data: Application on Text and Image Data," 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 2016, pp. 961-964, doi: 10.1109/ICMLA.2016.0173. [CrossRef]
- M. Mahin, M. J. Islam, A. Khatun and B. C. Debnath, "A Comparative Study of Distance Metric Learning to Find Sub-categories of Minority Class from Imbalance Data," 2018 International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, Bangladesh, 2018, pp. 1-6, doi: 10.1109/CIET.2018.8660777. [CrossRef]
- G. Guo, L. Chen, Y. Ye and Q. Jiang, "Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences," in IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 12, pp. 2936-2948, Dec. 2017, doi: 10.1109/TNNLS.2016.2608354. [CrossRef]
- B. Çatalbaş, B. Çatalbaş and Ö. Morgül, "A new initialization method for artificial neural networks: Laplacian," 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, 2018, pp. 1-4, doi: 10.1109/SIU.2018.8404491. [CrossRef]
- D. Mienye and Y. Sun, "A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects," in IEEE Access, vol. 10, pp. 99129-99149, 2022, doi: 10.1109/ACCESS.2022.3207287. [CrossRef]
- D. Yazdani, S. Golyari and M. R. Meybodi, "A new hybrid approach for data clustering," 2010 5th International Symposium on Telecommunications, Tehran, Iran, 2010, pp. 914-919, doi: 10.1109/ISTEL.2010.5734153. [CrossRef]
- Y. Guan, Y. Han and S. Liu, "Deep Learning Approaches for Image Classification Techniques," 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 2022, pp. 1132-1136, doi: 10.1109/EEBDA53927.2022.9744739. [CrossRef]
- Tian, L. Zhu, S. Zhang and L. Liu, "Improvement and parallelism of k-means clustering algorithm," in Tsinghua Science and Technology, vol. 10, no. 3, pp. 277-281, June 2005, doi: 10.1016/S1007-0214(05)70069-9. [CrossRef]
- S. Yaram, "Machine learning algorithms for document clustering and fraud detection," 2016 International Conference on Data Science and Engineering (ICDSE), Cochin, India, 2016, pp. 1-6, doi: 10.1109/ICDSE.2016.7823950. [CrossRef]
- M. Cherrington, F. Thabtah, J. Lu and Q. Xu, "Feature Selection: Filter Methods Performance Challenges," 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 2019, pp. 1-4, doi: 10.1109/ICCISci.2019.8716478. [CrossRef]
- N. El Aboudi and L. Benhlima, "Review on wrapper feature selection approaches," 2016 International Conference on Engineering & MIS (ICEMIS), Agadir, Morocco, 2016, pp. 1-5, doi: 10.1109/ICEMIS.2016.7745366. [CrossRef]
- P. Gawade and S. Joshi, "Feature Selection for Embedded Media in the Context of Personification," 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2020, pp. 568-572, doi: 10.1109/ICIRCA48905.2020.9183293. [CrossRef]
- Kaur, K. Guleria and N. Kumar Trivedi, "Feature Selection in Machine Learning: Methods and Comparison," 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2021, pp. 789-795, doi: 10.1109/ICACITE51222.2021.9404623. [CrossRef]
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | TP | FN |
| Actual Negative | FP | TN |
| Feature Selection Methods | Sub Feature selection Methods | Sub Sub Feature selection Methods |
|---|---|---|
| Filter Method[40] | Statistical Tests | |
| Chi-Square Test | ||
| ANOVA (Analysis of Variance) | ||
| Mutual Information | ||
| Pearson Correlation Coefficient | ||
| Spearman’s Rank Correlation | ||
| Variance Threshold | ||
| Correlation Analysis | ||
| Wrapper Method[41] | Recursive Feature Elimination (RFE): | |
| RFE with Cross-Validation | ||
| Forward Selection | ||
| Backward Elimination | ||
| Bidirectional Elimination | ||
| Embedded Method[42] | Regularization Techniques | L1 Regularization (Lasso |
| L2 Regularization (Ridge): | ||
| Elastic Net | ||
| Tree-Based Methods | Decision Trees | |
| Random Forests | ||
| Gradient Boosting Machines (GBM) | ||
| Embedded Feature Selection Methods | LASSO Regression | |
| Embedded Feature Selection in Neural Networks | ||
| Feature Weighting in Linear Model | Feature Weighting in SVMs | |
| Linear Discriminant Analysis (LDA | ||
| Autoencoder-Based Feature Extraction | ||
| Feature Importance Analysis | Permutation Importance | |
| SHAP (SHapley Additive explanations |
| Methods of Feature Engineering | Sub Selection Method |
| Feature Engineering [43] | Domain Knowledge |
| Interaction Features | |
| Binning | |
| Dimensionality Reduction |
| Feature1 | Feature2 | Feature3 | ---- | Featuren | Global mean nearest value |
|---|---|---|---|---|---|
| Feature1_object1 | Feature2_object1 | Feature3_object1 | Featuren_object1 | Nearest_value1 | |
| Feature1_object2 | Feature2_object2 | Feature3_object2 | Featuren_object2 | Nearest_value2 | |
| Feature1_object3 | Feature2_object3 | Feature3_object3 | Featuren_object3 | Nearest_value3 | |
| Feature1_objectn | Feature2_objectn | Feature3_objectn | Featuren_objectn | Nearest_valuen |
| Clustering Algorithm | Accuracy |
|---|---|
| Kmeans Without Seed | Output Varies |
| Kmeans With Best Seed | 0.96 |
| Global Mean Based Nearest Feature Object Selection, Creation Method With Kmeans With Best Seed | 0.9733333333333334 |
Author Details:
![]() |
Dr. SrikanthThota received his Ph.D in Computer Science Engineering for his research work in Collaborative Filtering based Recommender Systems from J.N.T.U, Kakinada. He received M.Tech. Degree in Computer Science and Technology from Andhra University. He is presently working as an Associate Professor in the department of Computer Science and Engineering, School of Technology, GITAM University, Visakhapatnam, Andhra Pradesh, India. His areas of interest include Machine learning, Artificial intelligence, Data Mining, Recommender Systems, Soft computing. |
![]() |
Mr. MaradanaDurgaVenkata Prasad received his B. TECH (Computer Science and Information Technology) in 2008 from JNTU, Hyderabad and M.Tech. (Software Engineering) in 2010 from Jawaharlal Nehru Technological University, Kakinada, He is a Research Scholar with Regd No: 1260316406 in the department of Computer Science and Engineering, Gandhi Institute of Technology and Management (GITAM) Visakhapatnam, Andhra Pradesh, INDIA. His Research interests include Clustering in Data Mining, Big Data Analytics, and Artificial Intelligence. He is currently working as an Assistant Professor in Department of Computer Science Engineering, CMR Institute of Technology, Ranga Reddy, India. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

