Submitted:
29 April 2025
Posted:
30 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction


2. Literature Review
3. Dataset Description
3.1. Understanding the Data Type of All the Attributes:

3.2. Finding Missing Values Of The Attributes:

| Attributes | Missing Values |
| Date | 0 |
| Date Type | 0 |
| Age | 2 |
| Sex | 9 |
| Race | 57 |
| Ethnicity | 9416 |
| Residence City | 596 |
| Residence County | 1260 |
| Residence State | 1988 |
| Injury City | 178 |
| Injury County | 3334 |
| Injury State | 3029 |
| Injury Place | 358 |
| Description of Injury | 807 |
| Death City | 2784 |
| Death County | 3891 |
| Death State | 5108 |
| Location | 1349 |
| Location if Other | 10787 |
| Cause of Death | 0 |
| Manner of Death | 9 |
| Other Significant Conditions | 10782 |
| Heroin | 8403 |
| Heroin death certificate (DC) | 11241 |
| Cocaine | 7403 |
| Fentanyl | 3932 |
| Fentanyl Analogue | 11007 |
| Oxycodone | 10965 |
| Oxymorphone | 11819 |
| Ethanol | 8780 |
| Hydrocodone | 11812 |
| Benzodiazepine | 9264 |
| Methadone | 10903 |
| Meth/Amphetamine | 11854 |
| Amphet | 11550 |
| Tramad | 11679 |
| Hydromorphone | 11904 |
| Morphine (Not Heroin) | 11922 |
| Xylazine | 10903 |
| Gabapentin | 11512 |
| Opiate NOS | 11854 |
| Heroin/Morph/Codeine | 9779 |
| Other Opioid | 11759 |
| Any Opioid | 3034 |
| Other | 11195 |
| ResidenceCityGeo | 167 |
| InjuryCityGeo | 257 |
| DeathCityGeo | 1 |
4. Methodology
4.1. Data mining Technique
4.2. Outlier Detection

4.3. Association Rule Mining
4.4. Clustering


5. Implementation Of Data Preprocessing
5.1. Data Cleaning:

Handling missing values:

5.2. Handling Duplicates

5.3. Removing Whitespaces
5.3. Detecting Outliers
5.4. Data Mapping
5.5. Data Integration
5.6. Redundancy Management using Correlation Heatmap
5.7. Data Transformation
5.8. Standardize “Age” Column
5.9. Data Visualization and Interpretation

5.10. Identifying the Youngest and Oldest Victims of Overdose Cases



6. Ethical Considerations in Data Mining
7. Conclusion
References
- Hedegaard, H., Miniño, A., & Warner, M. (2020). Drug overdose deaths in the United States, 1999–2018: Key findings from the National Vital Statistics System, Mortality (Data Brief No. 356). Centers for Disease Control and Prevention. https://www.cdc.gov/nchs/data/databriefs/db356-h.pdf.
- Coast to Forest. (2023, February 21). Cocaine fact sheet. https://c2f.oregonstate.edu/understand/fact-sheets/cocaine.
- National Institute on Drug Abuse. (2021, June 1). Fentanyl drug facts. National Institutes of Health. https://nida.nih.gov/publications/drugfacts/fentanyl.
- LinkedIn. (2024). What is the importance of data preprocessing in machine learning? https://www.linkedin.com/advice/3/what-importance-data-preprocessing-machine-learning-6ompe.
- Tamboli, N. (2021, October). Tackling missing value in dataset. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/.
- Monnat, S. M. (2023). Demographic and geographic variation in fatal drug overdoses in the United States, 1999–2020. Population and Development Review. https://pmc.ncbi.nlm.nih.gov/articles/PMC10292656/.
- Saeed, S., Abdullah, A., Jhanjhi, N. Z., Naqvi, M., & Nayyar, A. (2022). New techniques for efficiently k-NN algorithm for brain tumor detection. Multimedia Tools and Applications, 81(13), 18595–18616.
- Dogra, V., Singh, A., Verma, S., Kavita, Jhanjhi, N. Z., & Talib, M. N. (2021). Analyzing DistilBERT for sentiment classification of banking financial news. In S. L. Peng, S. Y. Hsieh, S. Gopalakrishnan, & B. Duraisamy (Eds.), Intelligent computing and innovation on data science (Vol. 248, pp. 665–675). Springer. [CrossRef]
- Gopi, R., Sathiyamoorthi, V., Selvakumar, S., et al. (2022). Enhanced method of ANN based model for detection of DDoS attacks on multimedia Internet of Things. Multimedia Tools and Applications, 81(36), 26739–26757. [CrossRef]
- Chesti, I. A., Humayun, M., Sama, N. U., & Jhanjhi, N. Z. (2020, October). Evolution, mitigation, and prevention of ransomware. In 2020 2nd International Conference on Computer and Information Sciences (ICCIS) (pp. 1–6). IEEE.
- Alkinani, M. H., Almazroi, A. A., Jhanjhi, N. Z., & Khan, N. A. (2021). 5G and IoT based reporting and accident detection (RAD) system to deliver first aid box using unmanned aerial vehicle. Sensors, 21(20), 6905.
- Babbar, H., Rani, S., Masud, M., Verma, S., Anand, D., & Jhanjhi, N. (2021). Load balancing algorithm for migrating switches in software-defined vehicular networks. Computational Materials and Continua, 67(1), 1301–1316.
- O’Donnell, J. K., Gladden, R. M., & Seth, P. (2017). Trends in deaths involving heroin and synthetic opioids excluding methadone, and law enforcement drug product reports, by census region — United States, 2006–2015. Morbidity and Mortality Weekly Report, 66(34), 897–903. [CrossRef]
- Hedegaard, H., Miniño, A. M., & Warner, M. (2020). Drug overdose deaths in the United States, 1999–2019 (NCHS Data Brief No. 394). National Center for Health Statistics. https://www.cdc.gov/nchs/data/databriefs/db394-h.pdf.
- National Institute on Drug Abuse. (2021). Overdose death rates. National Institutes of Health. https://www.drugabuse.gov/drug-topics/trends-statistics/overdose-death-rates.
- Centers for Disease Control and Prevention. (2021). Understanding the epidemic. https://www.cdc.gov/drugoverdose/epidemic/index.html.
- Srinivasan, K., Garg, L., Chen, B. Y., Alaboudi, A. A., Jhanjhi, N. Z., Chang, C. T., ... & Deepa, N. (2021). Expert System for Stable Power Generation Prediction in Microbial Fuel Cell. Intelligent Automation & Soft Computing, 30(1).
- Javed, D., Jhanjhi, N. Z., & Khan, N. A. (2023, July). Explainable Twitter bot detection model for limited features. In IET Conference Proceedings CP837 (Vol. 2023, No. 11, pp. 476-481). Stevenage, UK: The Institution of Engineering and Technology.
- Humayun, M., Jhanjhi, N. Z., Alsayat, A., & Ponnusamy, V. (2021). Internet of things and ransomware: Evolution, mitigation and prevention. Egyptian Informatics Journal, 22(1), 105-117.
- Humayun, M., Sujatha, R., Almuayqil, S. N., & Jhanjhi, N. Z. (2022, June). A transfer learning approach with a convolutional neural network for the classification of lung carcinoma. In Healthcare (Vol. 10, No. 6, p. 1058). MDPI.
- Kumar, A., Kumar, M., Verma, S., Kavita, Jhanjhi, N. Z., & Ghoniem, R. M. (2022). Vbswp-CeaH: vigorous buyer-seller watermarking protocol without trusted certificate authority for copyright Protection in cloud environment through additive homomorphism. Symmetry, 14(11), 2441.
- Sindiramutty, S. R., Jhanjhi, N. Z., Tan, C. E., Lau, S. P., Muniandy, L., Gharib, A. H., ... & Murugesan, R. K. (2024). Industry 4.0: Future Trends and Research Directions. Convergence of Industry 4.0 and Supply Chain Sustainability, 342-405.
- Javed, D., Jhanjhi, N. Z., Khan, N. A., Ray, S. K., Al Mazroa, A., Ashfaq, F., & Das, S. R. (2024). Towards the future of bot detection: A comprehensive taxonomical review and challenges on Twitter/X. Computer Networks, 254, 110808.
- Das, S. R., Jhanjhi, N. Z., Asirvatham, D., Ashfaq, F., & Abdulhussain, Z. N. (2023, February). Proposing a model to enhance the IoMT-based EHR storage system security. In International Conference on Mathematical Modeling and Computational Science (pp. 503-512). Singapore: Springer Nature Singapore.
- Faisal, A., Jhanjhi, N. Z., Ashraf, H., Ray, S. K., & Ashfaq, F. (2025). A Comprehensive Review of Machine Learning Models: Principles, Applications, and Optimal Model Selection. Authorea Preprints.
- JingXuan, C., Tayyab, M., Muzammal, S. M., Jhanjhi, N. Z., Ray, S. K., & Ashfaq, F. (2024, November). Integrating AI with Robotic Process Automation (RPA): Advancing Intelligent Automation Systems. In 2024 IEEE 29th Asia Pacific Conference on Communications (APCC) (pp. 259-265). IEEE.
- Alshudukhi, K. S., Ashfaq, F., Jhanjhi, N., & Humayun, M. (2024). Blockchain-Enabled Federated Learning for Longitudinal Emergency Care. IEEE Access.
- Akila, D., Raja, S. R., Revathi, M., Ashfaq, F., & Khan, A. A. (2024, July). Text Clustering on CCSI System using Canopy and K-Means Algorithm. In 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC) (pp. 1-6). IEEE.
- Akila, D., Raja, S. R., Revathi, M., Ashfaq, F., & Khan, A. A. (2024, July). Text Clustering on CCSI System using Canopy and K-Means Algorithm. In 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC) (pp. 1-6). IEEE.
- Javed, D., Jhanjhi, N. Z., Ashfaq, F., Khan, N. A., Das, S. R., & Singh, S. (2024, July). Student Performance Analysis to Identify the Students at Risk of Failure. In 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC) (pp. 1-6). IEEE.
- Bora, P. S., Sharma, S., Batra, I., Malik, A., & Ashfaq, F. (2024, July). Identification and Classification of Rare Medicinal Plants. In 2024 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC) (pp. 1-6). IEEE.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
