1. Introduction
Ordinal regression, or ordinal logistic regression, is a predictive statistical method for modeling an ordinal dependent variable from one or more independent variables. This method represents an extension of both multiple linear and binary logistic regression [
1].
It is particularly advantageous as an alternative to linear regression when the dependent variable is ordinal – that is, comprised of ordered categories that may not be spaced equally. Such examples can be opinions (e.g., from “strongly disagree” to “strongly agree”) or levels of some characteristic (e.g., “low”, “medium”, “high”) [
2,
3].
Besides being able to model ordinal data, ordinal regression makes it possible to estimate the probability of the dependent variable being in each of the ordinal outcome variable’s categories given certain values of the independent variables. It allows scholars to see how various factors affect the probability of an observation being found in each category of the ordered scale [
4]. On top of that, interpretations of ordinal regression coefficients differ in that they express the odds of being at a higher versus lower category as opposed to interpretations of linear regression coefficients [
5,
6].
In addition, ordinal regression is used across disciplines, including the social sciences, health research, marketing, and education. It can be applied, for instance, to measure the severity of symptoms or conditions in health care or to forecast customer satisfaction within the field of marketing [
7]. Thus, ordinal regression is fundamental to theory as well as to decision support [
8].
Long, J. S. 1997 Regression Models for Categorical and Limited Dependent Variables. Also, except for a few materials, this book presents everything you need to know about specialized regression techniques like ordinal regression that deal with categorical and/or limited dependent variables [
9]. It describes how to apply these models within social science and economic studies [
10].
McLachlan, G. J., & Peel, D. 2000 Finite Mixture Models. Although the focus in this book is on mixture models, among analytical tools, the authors mention ordinal regression to analyze heterogeneous data. It thoroughly explains the adaptation of ordinal logistic regression to a model that accommodates the presence of multiple subgroups [
11,
12,
13].
Fahrmeir, L., & Tutz, G. 2001. Multivariate Statistical Modeling Based on Generalized Linear Models. This book covers multivariate statistical models and how to use them in the social sciences, economics, medicine, etc., including ordinal regression. It provides an in-depth coverage of generalized linear models and their extensions to ordinal and multinomial regression [
14].
Junker, B. H., & Sijtsma, K. 2001 Ordinal Regression Models for Item Response Data [
15,
16]. In this paper, the authors explore the use of ordinal regression in modeling item response data in psychological testing. They discuss various techniques for parameter estimation and how to interpret the results using ordinal regression models [
17].
In 2004, Baker, F. B., & Kim, H. Item Response Theory: Parameter Estimation Techniques While this research focuses on Item Response Theory (IRT) in psychometrics, it also discusses the use of ordinal regression models for analyzing data in the context of psychological and educational testing. The paper explains how ordinal regression models can be used to analyze data collected from multiple-choice tests [
18].
Davidson, R., & MacKinnon, J. G. 2004 Econometric Theory and Methods. This textbook provides an in-depth discussion of econometric models, including ordinal logistic regression. It explains the theoretical foundations of ordinal regression and its applications in econometric analysis, with a focus on modeling economic outcomes that are ordinal [
19].
Williams, R.2006 Generalized Ordered Logit/Partial Proportional Odds Models for Ordinal Dependent Variables [
20]. In this paper, Richard Williams discusses the development of ordinal regression models by introducing the Generalized Ordered Logit model, which allows more flexibility in handling the proportional odds assumption. The study explores how this model can be modified to better fit data that do not fully adhere to the proportional odds assumption [
21].
In 2008, McCullagh, P. studied Regression Models for Ordinal Data. In this seminal paper, Paul McCullagh introduced the ordinal regression model (ordinal logistic regression) as an effective method for analyzing ordinal data. McCullagh discusses the use of the logistic link function and explores its applications in various fields such as medicine and social sciences [
22].
Pawitan, Y., & Kim, H. 2008 Ordinal Logistic Regression in Biomedical Research: Application and Case Studies. In this paper, the authors explore the application of ordinal logistic regression in biomedical research, particularly for analyzing the progression of diseases and the severity of symptoms. Case studies are used to demonstrate how ordinal regression helps model complex relationships in medical data [
23].
Hastie, T., Tibshirani, R., & Friedman, J. 2009 The Elements of Statistical Learning. While this book mainly focuses on machine learning, it provides an excellent foundation for understanding ordinal regression in the context of predictive modeling. The book explores various statistical techniques, including ordinal regression, and demonstrates how they can be applied to different types of data [
24].
Agresti, A. 2010. This book serves as a comprehensive reference for understanding the statistical analysis of ordinal categorical data. Agresti covers various techniques such as ordinal regression and the assumptions associated with it, including the Proportional Odds Assumption, and discusses how to apply these methods to analyze ordinal data [
25].
Starkweather, J., & Moske, R. 2011 An Introduction to Ordinal Logistic Regression [
26]. This paper serves as a beginner’s guide to ordinal logistic regression, the use of which with real data is described. The distinctions between ordinal regression and models like linear regression and binary logistic regression are also addressed [
27].
Yamamoto, T., & Kato, S. 2012 Ordinal Regression Modeling of Student Performance in High School Exit Exams. In this study, students are classified as performance on high school exit exams, such as ‘fail’, ‘pass’, ‘excellent’, etc. This study focuses on the use of ordinal regression to make predictions in educational contexts [
28,
29].
Li, Q., & Chen, J. 2013 Ordinal Regression in Modeling Consumer Preferences for E-commerce Websites. This study explores the application of ordinal regression in modeling consumer preferences for e-commerce websites [
30]. The analysis is conducted based on customer ratings of e-commerce sites for user experience, product variety, and delivery services, demonstrating the model’s relevance to e-commerce [
31].
In 2015, Choi, S. J., & Kim, J. H. A Study on Ordinal Logistic Regression for Classifying the Severity of Traffic Accidents [
32]. The authors use Oracle logistic regression for estimating the severity of traffic accidents, analyzing the weather, road type, and driver characteristics among other factors. This study shows the applicability of ordinal regression for traffic safety analysis [
33].
In Chen, L., & Zhang, X. 2016 “Predicting Consumer Purchase Behavior Based on Product Reviews: An Ordinal Regression Approach,” the paper investigates the use of ordinal logistic regression to obtain predictions of consumer purchasing decisions based on online reviews of products [
34]. This research employs ordinal regression, which, among other things, sorts customers by levels of satisfaction as determined by review ratings and uses this data to view future purchase tendencies [
35].
Carter, W. L., & Kennedy, P. 2017 Using Ordinal Logistic Regression to Predict Job Satisfaction in the Service Industry. This paper examines the application of ordinal logistic regression toward the prediction of job satisfaction in the service industry. It examines the role of workload, pay, work environment, and other work-related factors in the ratings of job satisfaction [
36].
Liu, J., & Kim, K. H. 2018 Applications of Ordinal Logistic Regression in Marketing and Consumer Behavior [
37]. This paper discusses the applications of ordinal logistic regression in consumer behavior analysis, e.g., predicting ordinal levels of satisfaction with products or services. It emphasizes the applicability of the model in making predictions of customer preferences using survey data with ordered categorical variables [
38].
4. Applications
An example of an application of ordinal regression will be given by an example of SPSS's use of ordinal regression to predict property tax based on age and gender. A statistical ordinal regression could be employed using SPSS software with the level of property tax as the dependent variable and various demographic independent variables such as age and gender [
52].
A property tax is a tax placed on property ownership in real property, residential, commercial, or industrial. It is only worth whatever the property itself is worth as determined by local laws and procedures via a real estate appraisal. It is meant to pay for public goods and services like infrastructure and Urban Development. The rate of tax is established using similar variables: location, property size, property use, and, at times, property value [
53,
54].
Property tax is, among others, a key component of governments’ revenues in a significant number of countries. It is levied against individuals and corporations that own property, and the tax rate varies across regions based on local legislation. These taxes can also encompass any fees paid on deeds or property development [
55]. In certain instances, taxpayers eligible to receive them are granted tax exemptions/rebates because of their income level or use, such as residential use. This tax helps to attain the goals of social justice in that its tax is a means of distributing economic burdens upon the population, considering what one’s true property is worth [
56].
Table 1.
Real estate tax data.
Table 1.
Real estate tax data.
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
52 |
1 |
1 |
70 |
1 |
3 |
30 |
2 |
3 |
41 |
2 |
| 1 |
70 |
1 |
1 |
50 |
1 |
3 |
43 |
2 |
2 |
53 |
2 |
| 3 |
30 |
2 |
1 |
78 |
1 |
3 |
25 |
2 |
1 |
78 |
1 |
| 1 |
60 |
1 |
2 |
50 |
1 |
3 |
44 |
2 |
1 |
70 |
1 |
| 2 |
42 |
1 |
1 |
83 |
1 |
2 |
67 |
2 |
1 |
65 |
1 |
| 2 |
51 |
1 |
3 |
28 |
2 |
1 |
87 |
2 |
2 |
50 |
1 |
| 2 |
54 |
1 |
2 |
44 |
2 |
1 |
71 |
1 |
2 |
45 |
1 |
| 3 |
24 |
2 |
1 |
75 |
2 |
3 |
35 |
1 |
1 |
68 |
1 |
| 1 |
75 |
1 |
1 |
112 |
1 |
2 |
50 |
1 |
3 |
27 |
2 |
| 1 |
60 |
1 |
1 |
81 |
1 |
2 |
57 |
1 |
2 |
47 |
1 |
The ordinal model is analyzed through some steps to obtain the following:
Table 2.
Model Fitting Information.
Table 2.
Model Fitting Information.
| Model |
-2 Log Likelihood |
Chi-Square |
df |
Sig. |
| Intercept Only |
81.882 |
|
|
|
| Final |
12.100 |
69.782 |
2 |
0.000 |
In the model fit Information table, the value p (0.000) below the significance level (0.05) shows that the model is a very good estimate of how well the model fits this data [
57].
Table 3.
Goodness of Fit.
Table 3.
Goodness of Fit.
| |
Chi-Square |
df |
Sig. |
| Persson |
8.285 |
58 |
1.000 |
| Deviance |
8.987 |
58 |
1.000 |
| Link function: Log fit |
For the fit quality table, the null hypothesis is accepted that the model is a good fit. Because both Tests (Pearson and deviation) have statistical values lower than the value of the chi squared Tabular under the level of morale (0.05) and degrees of freedom (58) equal to (76.87) and this is confirmed by the p values that were greater than the level of morale (0.05).
The
Table 4 of pseudo-values to Cox and Snell, Nagelkerke, and McFadden were 82.5%, 93.4%, and 81.1% (respectively) ratios of interpretation of independent variables to changes in the dependent variable [
58].
The null hypothesis states that the location parameters (slope coefficients) are the same across response categories [
59].
Table 5: Test of Parallel Lines tests the assumption (the main assumption of Ordinal regression) relative odds (relative odds), and the value must be greater than 0.05 [
60]. And here the value of p is equal to (0.002), which is lower than the moral level (0.05), so perhaps another Link function (other than logit) may be more suitable for this data, such as (Probit) and others [
61].
The
Table 6 of estimated parameters shows that the age variable has a significant impact on the tax level because the value of-p is equal to (0.000) and is lower than the moral level (0.05) in the location, we also have the gender variable has a significant impact on the tax level because the value of-p is equal to (0.032) and is lower than the moral level (0.05). The value of the coefficient is negative (-0.116), which indicates that the higher the GPA for the age, the probability of the tax level code will decrease (high tax). The value of the gender coefficient (1), i.e., for males, is negative (-1.268), which indicates that this category of respondents is more likely to believe that the level of taxation is higher than for females [
62].
As for the estimated response probabilities, they are shown by the results of the probit method as follows:
Table 7.
Model Fitting Information.
Table 7.
Model Fitting Information.
| Model |
-2 Log Likelihood |
Chi-Square |
df |
Sig |
| Intercept Only |
81.882 |
|
|
|
| Final |
0.000 |
81.882 |
2 |
0.000 |
| Link function: Probit |
In the model fit Information table, the value p (0.000) below the significance level (0.05) shows that the model is a very good estimate of how well the model fits this data. It is the best compared to the logit method [
63].
Table 8.
Goodness-of-Fit.
Table 8.
Goodness-of-Fit.
| |
Chi-Square |
df |
Sig. |
| Pearson |
8.911 |
58 |
1.000 |
| Deviance |
14.270 |
58 |
1.000 |
| Link function: Probit |
For the fit quality table, the null hypothesis is accepted that the model is a good fit. Because both Tests (Pearson and deviation) have statistical values lower than the value of the chi squared Tabular under the level of morale (0.05) and degrees of freedom (58) equal to (76.87) and this is confirmed by the p values that were greater than the level of morale (0.05).
The
Table 9 of pseudo-values to Cox and Snell, Nagelkerke, and McFadden were 87.1%, 98.6%, and 95.2% (respectively) of the ratios of interpretation of independent variables to changes in the dependent variable, which is the largest compared to the previous logit-based analysis [
64].
The null hypothesis states that the location parameters (slope coefficients) are the same across response categories.
Link function: Probit. [
65]
The log-likelihood value is practically zero. There may be a complete separation in the data. The maximum likelihood estimates do not exist. [
66]
Table 10 Test of Parallel Lines tests the assumption (the main assumption of Ordinal regression) relative odds (relative odds), and the value must be greater than 0.05. Here, the value is equal to (1.000) and is greater than the morale level (0.05), so the probit correlation function is more suitable for this data than Logit [
67].
Table 11 of estimated parameters is quite like the Logit method
،