Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Novel Statistical Approach to Investigate Wisconsin Breast Cancer Data by Generalized Linear Mixed Model Approach in Cancer Epidemiology in Medicine

Version 1 : Received: 9 October 2023 / Approved: 10 October 2023 / Online: 11 October 2023 (09:24:40 CEST)

How to cite: İyit, N.; Akdam, N. A Novel Statistical Approach to Investigate Wisconsin Breast Cancer Data by Generalized Linear Mixed Model Approach in Cancer Epidemiology in Medicine. Preprints 2023, 2023100677. https://doi.org/10.20944/preprints202310.0677.v1 İyit, N.; Akdam, N. A Novel Statistical Approach to Investigate Wisconsin Breast Cancer Data by Generalized Linear Mixed Model Approach in Cancer Epidemiology in Medicine. Preprints 2023, 2023100677. https://doi.org/10.20944/preprints202310.0677.v1

Abstract

The main aim of this study is to predict whether the type of breast cancer is “benign” or “malignant” by classical generalized linear model (GLM) approach and an extended family of GLM called generalized linear mixed model (GLMM) approach for binomially distributed response variable with binary link functions. In this study, an advanced statistical modeling approach based on the GLMM to the traditional statistical modeling approach based on the GLM for binomially distributed response variable with various binary link functions is proposed to investigate the relationships between the “malignant or benign diagnosis of the BC in patients” and “nine attributes” of 699 BC diagnosed patients. This study also focuses on the statistical significance of the accurate classification of the BC diagnosed patients in cancer studies in medicine in “benign” or “malignant” type based on the WBC dataset. In this study, the superiority of the GLMM approach over the GLM approach for the binary response variable especially belonging to the WBC dataset is emphasized in the field of cancer diagnosis in medicine. Also the importance and the power of the IC and performance metrics as the goodness-of-fit test statistics are strongly emphasized for accurate statistical inferences from the “best” fitted model. In this study, from the main findings, the best fitted model among the GLM and GLMM approaches for the binary response variable is determined as the GLMM under “logit” link function with “id” random effect with the most statistically significant odds of the occurance of the BC being “malignant” as 7.9104, 5.6888, 5.6643, 4.9842, 4.1212, 2.0679, 1.8755, and 1.3970 times more than being “benign” for every one-unit increase in the quantities of “clump thickness”, “bland chromatin”, “mitoses”, “bare nuclei”, “cell shape”, “marginal adhesion”, “epithelial cell size”, and “cell size”, respectively.

Keywords

Generalized linear mixed model; Wisconsin breast cancer data; logit; probit; cloglog; cauchit link functions; random effect

Subject

Public Health and Healthcare, Public, Environmental and Occupational Health

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.