Spatial Analysis of Road Traffic Accident Hotspots in Jega, Nigeria: A Comparative Study of Kernel Density Estimation and Geographically Weighted Regression

Muddassiru Abubakar; Umar Usman

doi:10.20944/preprints202603.2086.v1

Submitted:

24 March 2026

Posted:

26 March 2026

You are already at the latest version

Abstract

Road traffic accidents remain a critical public safety challenge in rapidly urbanizing regions of sub-Saharan Africa, where heterogeneous road infrastructure and high population density exacerbate risk. This study applies Kernel Density Estimation (KDE) and Geographically Weighted Regression (GWR) to analyze spatial patterns of road traffic accidents across Jega Local Government Area, Kebbi State, Nigeria, using fifty georeferenced primary data points collected through Global Positioning System surveys and manual traffic counts. The KDE analysis identified optimal bandwidth of 175 meters with a Prediction Accuracy Index (PAI) of 3.50 at the 85th percentile threshold, indicating strong spatial clustering of accidents. Spatial autocorrelation analysis revealed significant clustering (Moran's I = 0.312, p < 0.05). The GWR model demonstrated strong explanatory power with global R² of 0.72 and AICc of 420.35. Local R² values exhibited substantial spatial variation (range: 0.20–0.95), highlighting the importance of localized analysis. Cross-validation results (RMSE = 3.45, MAE = 2.12, R² = 0.65) confirmed predictive robustness. The integrated geospatial framework identified distinct high-risk corridors, with Gada (8 accidents), Garkar Ando (5 accidents), and Gobirawa (5 accidents) emerging as critical hotspots requiring immediate intervention. This research provides a validated geostatistical framework for micro-scale road safety planning in Nigerian cities.

Keywords:

road traffic accidents

;

Kernel Density Estimation

;

Geographically Weighted Regression

;

spatial analysis

;

accident hotspots

;

Jega

;

geostatistics

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

Road traffic accidents (RTAs) represent a persistent and multifaceted global public health crisis, characterized by significant mortality, enduring economic burdens, and profound social disruption (World Health Organization, 2021, 2023). Globally, an estimated 1.19 million deaths occur annually due to road traffic accidents, with low- and middle-income countries bearing a disproportionate burden. Although these nations possess only about 60% of the world's vehicles, they account for over 90% of all traffic-related fatalities (WHO, 2023). The socioeconomic consequences include loss of productivity, increased healthcare costs, and strain on public resources (Peden et al., 2004).

In the Nigerian context, this crisis is exacerbated by heterogeneous road networks, rapid urbanization, informal transport systems, and inadequate traffic management systems (Eke et al., 2021). The Federal Road Safety Corps has documented alarming statistics, with 1,300 road accidents claiming 51,251 injured persons in Nigeria over a three-year period (Abubakar & Umar, 2022). The road traffic environments in Nigeria are characterized by a combination of largely inexperienced drivers, poorly maintained vehicles, inadequate road infrastructure, and weak traffic law enforcement (Odeleye, 2003).

Understanding the spatial distribution of accident risk is critical for developing targeted interventions that improve road safety. Spatial statistical models provide a means to quantify and map accident risk, enabling planners to identify hotspots and prioritize interventions (Anderson, 2009). Traditional regression models capture global trends but fail to account for local spatial autocorrelation, leading to biased estimates (Anselin, 1988; Lord & Mannering, 2010). Conversely, geostatistical approaches such as Kriging effectively model spatial dependence but ignore deterministic trends driven by infrastructural factors (Goovaerts, 1997; Cressie, 1993).

The methodological evolution of spatial traffic safety analysis has been marked by a critical departure from global, aspatial models toward techniques that explicitly acknowledge and model spatial dependency and heterogeneity. Foundational global regression approaches, such as Ordinary Least Squares (OLS), are fundamentally constrained by the assumption of spatial stationarity an ontological flaw that yields biased parameter estimates and masked local effects when applied to inherently spatial phenomena like accident clustering (Anselin, 1988; Lord & Mannering, 2010). This limitation catalyzed the development of local statistical frameworks, most prominently Geographically Weighted Regression (GWR), which conceptualizes geographic space as a continuous field of varying parameter estimates (Brunsdon et al., 1996; Fotheringham et al., 2002). GWR operationalizes Tobler's First Law of Geography through distance-decay weighting kernels (Tobler, 1970).

In Nigeria, spatial accident modelling remains underutilized, with most studies relying on descriptive GIS mapping (Oni, 2011; Olawole, 2012). However, recent applications of spatial statistics have focused on environmental hazards (Usman & Abubakar, 2020) and public health risks (Onyeka et al., 2018), suggesting potential for similar methods in traffic safety. Notably, Abubakar and Umar (2022) applied Universal Kriging to analyze road traffic accidents in Jega LGA, Kebbi State, identifying spatial autocorrelation patterns and highlighting southern parts of the study area as higher-risk zones. Their findings demonstrated the feasibility of applying variogram-based modelling for localized accident prediction.

Building upon this foundation, Abubakar and Salmanu (2025) employed a Regression Kriging (RK) framework in Jega, quantitatively isolating directional risk gradients and demonstrating that approximately 76% of accident variance is spatially structured within a 330-meter range. Their work exemplifies the transition from descriptive hazard mapping to predictive, hyper-local risk modeling. Similarly, Abubakar et al. (2025) applied Geographically Weighted Regression to analyze accident patterns in Jega, capturing spatially varying relationships between accident occurrence and geographic location, with a global R² of 0.72.

The analytical superiority of GWR lies in its capacity to uncover latent, place-specific risk mechanisms that global models erroneously aggregate or omit. By allowing coefficients to vary locally, GWR can reveal how relationships change across space (Xu & Huang, 2015). This granularity directly enhances model fit, with comparative studies consistently reporting higher explanatory power for GWR over OLS in traffic safety contexts.

This study aims to: (1) apply Kernel Density Estimation to identify accident hotspots in Jega, Nigeria, using the Prediction Accuracy Index methodology; (2) implement Geographically Weighted Regression to model spatially varying relationships between accident occurrence and geographic location; and (3) generate spatially explicit risk surfaces to identify critical intervention zones for evidence-based road safety policy.

2. Materials and Methods

2.1. Study Area

The area under investigation comprises fifty (50) sample points located in Jega Local Government Area, Kebbi State, northern Nigeria. The study area falls between latitude 11°55'0" to 12°18'0" N and longitude 4°17'0" to 4°32'0" E. The area is characterized as one of the centers of commerce in the State, with significant agricultural and trading activities (Abubakar & Umar, 2022).

Figure 1. (a) the geographic illustration of the study location (b) The geographic layout displays the distribution of 50 accident data points across Jega LGA.

2.2. Data Description

The study relies solely on primary data collected directly from the field during the year 2020. This includes the use of the Global Positioning System (GPS) to obtain the geographic coordinates of selected traffic corridors, as well as manual traffic counts conducted along these corridors. Field observations were also employed to gather contextual information on traffic flow and road usage patterns. The dataset contains as input variables for the analysis: geographic coordinates (latitude/longitude and projected UTM coordinates X, Y), accident counts at each location, and location names. The data recorded in fifty sample points includes accident counts ranging from 1 to 8 incidents per location, with a total of 122 accidents across all locations (Abubakar & Salmanu, 2025; Abubakar et al., 2025).

2.3. Spatial Autocorrelation Analysis: Moran's I

Moran's I was used to evaluate whether the pattern of accident data is distributed as clustered, dispersed, or random. This is one of the oldest techniques and widely used to determine spatial correlation (Haining, 2003; Moran, 1950). Spatial autocorrelation analysis was performed to assess clustering patterns. Moran's I can be computed using Equation 1:

I = \frac{N \sum_{i} \sum_{j} w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{W \sum_{i} (x_{i} - \bar{x})^{2}}

(1)

where

N

is the number of cases,

x_{i}

is the variable value at a particular location,

x_{j}

is the variable value at another location,

\bar{x}

is the mean of the variable, and

W

is a weight applied to the comparison between location

i

and location

j

. The

w_{i j}

is a distance-based weight matrix, which is the inverse distance between locations

i

and

j

(1 / d_{i j})

.

2.4. Kernel Density Estimation (KDE)

KDE is one of the most popularly used methods to analyze the properties of a point event distribution (Silverman, 1986; Bailey & Gatrell, 1995). It has been used widely in the analysis of traffic accident 'hotspots' and detection (Anderson, 2009; Xie & Yan, 2008). The density at a particular location can be computed using Equation 2:

λ (s) = \sum_{i = 1}^{n} \frac{1}{π r^{2}} k (\frac{d_{i s}}{r})

(2)

where

λ (s)

is the density at location

s

,

r

is the search radius (bandwidth) of the KDE, and

k

is the weight of a point

i

at distance

d_{i s}

to location

s

.

2.5. Prediction Accuracy Index (PAI)

The Prediction Accuracy Index developed by Chainey et al. (2008) was used to evaluate interpolation performance. PAI is estimated as the ratio between the percentage of accident rate and the percentage of hotspot area (see Equation 3). All PAI values were estimated concerning area. The higher the PAI, the better the method's performance.

P A I = \frac{\frac{n}{N} \times 100}{\frac{m}{M} \times 100}

(3)

where

n

is the number of accidents in hotspots,

N

is the total number of accidents,

m

is the area involved in accident hotspots, and

M

is the total area of the study region.

2.6. Geographically Weighted Regression (GWR)

Following the methodology of Abubakar et al. (2025), Geographically Weighted Regression was implemented to capture spatially varying relationships. The GWR model extends the traditional regression framework by allowing local rather than global parameters to be estimated (Brunsdon et al., 1996; Fotheringham et al., 2002). The model is specified as:

Y_{i} = β_{0} (u_{i}, v_{i}) + \sum_{k = 1}^{p} β_{k} (u_{i}, v_{i}) X_{i k} + ϵ_{i}

(4)

where

(u_{i}, v_{i})

denotes the coordinates of location

i

,

β_{k} (u_{i}, v_{i})

is the local regression coefficient for predictor

k

at location

i

, and

ϵ_{i}

is the random error term.

Bandwidth selection was optimized by minimizing the Akaike Information Criterion corrected (AICc) (Hurvich et al., 1998):

A I C c = 2 n l n (\hat{σ}) + n l n (2 π) + n (\frac{n + t r (S)}{n - t r (S) - 2})

(5)

2.7. Model Validation

Cross-validation procedures were employed to assess predictive accuracy (Isaaks & Srivastava, 1989). The prediction error is the difference between the observed and predicted values at a cross-validation point:

e (s_{i}) = Z (s_{i}) - \hat{Z} (s_{i})

(6)

The following cross-validation measures were calculated (Abubakar & Salmanu, 2025):

Mean Error (ME):

M E = \frac{1}{n} \sum_{i = 1}^{n} e (s_{i})

(7)

Root Mean Squared Error (RMSE):

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} e^{2} (s_{i})}

(8)

Mean Absolute Error (MAE):

M A E = \frac{1}{n} \sum_{i = 1}^{n} ∣ e (s_{i}) ∣

(9)

R-Squared:

R^{2} = 1 - \frac{S S_{r e s}}{S S_{t o t}}

(10)

3. Results

3.1. Descriptive Statistics

The accident data from fifty locations in Jega Local Government Area revealed a total of 122 accidents, with an average of 2.44 accidents per location. The maximum accident count at a single location was 8 (recorded at Gada), while the minimum was 1. The distribution showed that locations with higher accident frequencies were concentrated in commercial areas and major intersections, consistent with findings from previous studies in the region (Abubakar & Umar, 2022; Abubakar & Salmanu, 2025).

Table 1. Summary Statistics of Accident Data.

Statistic	Value
Total Accidents	122
Number of Locations	50
Mean Accidents per Location	2.44
Standard Deviation	1.71
Maximum Accidents	8 (Gada)
Minimum Accidents	1
Skewness	1.34
Kurtosis	2.18

Figure 2. Distribution of accident counts across the 50 sampled locations in Jega, showing frequency of locations by accident count categories.

3.2. Spatial Autocorrelation Results

Moran's I analysis revealed significant spatial clustering of accident incidents in the study area. The calculated Moran's I value of 0.312 (p = 0.0012, z-score = 4.23) indicates positive spatial autocorrelation, meaning that locations with high accident counts tend to be clustered together.

Figure 3. Moran's I scatterplot showing spatial autocorrelation of accident counts in Jega, with Accident Count plotted against Accident Count.

Table 2. Spatial Autocorrelation Results.

Statistic	Value
Moran's I	0.312
Expected I	-0.020
P-value	0.0012
Z-score	4.23
Interpretation	Significant clustering

3.3. KDE Bandwidth Optimization

Bandwidth optimization using maximum likelihood criteria identified an optimal bandwidth of 175 meters for the KDE analysis. This bandwidth represents the spatial scale at which accident patterns are most coherently represented, balancing between oversmoothing and capturing random noise. The log-likelihood profile showed peak performance at 175m, with values decreasing gradually at larger bandwidths.

Figure 4. Bandwidth optimization for Kernel Density Estimation showing (a) log-likelihood profile across candidate bandwidths with optimal bandwidth identified at 175 meters, and (b) distribution of accident counts used for KDE weighting.

Table 3. Bandwidth Optimization Results.

Bandwidth (m)	Log-Likelihood
50	-4.82
75	-4.51
100	-4.28
125	-4.12
150	-4.03
175	-3.98
200	-4.02
225	-4.09
250	-4.18
275	-4.29
300	-4.41

Optimal Bandwidth: 175 meters.

3.4. Hotspot Identification and PAI Analysis

Five percentile thresholds (75th, 80th, 85th, 90th, and 95th) were evaluated to identify optimal hotspot delineation. The Prediction Accuracy Index (PAI) was calculated for each threshold to determine the best-performing model.

Figure 5. Kernel Density Estimation results showing (a) continuous accident density surface across Jega, and (b) identified hotspots at the optimal 85th percentile threshold (PAI = 3.50).

Table 4. PAI Analysis for Different Percentile Thresholds.

Percentile	Threshold Density	Hotspot Area (%)	Accidents in Hotspots	Accidents in Hotspots (%)	PAI
75th	2.34e-6	25.0%	89	73.0%	2.92
80th	3.12e-6	20.0%	78	63.9%	3.20
85th	4.28e-6	15.0%	64	52.5%	3.50
90th	6.53e-6	10.0%	47	38.5%	3.85
95th	9.87e-6	5.0%	28	23.0%	4.60

The 85th percentile threshold (PAI = 3.50) was selected as the optimal balance between hotspot specificity and practical intervention area. At this threshold, hotspots cover 15% of the study area but contain 52.5% of all accidents, demonstrating strong predictive capability.

3.5. GWR Model Performance

Following the methodology of Abubakar et al. (2025), Geographically Weighted Regression was implemented to model spatially varying relationships. The GWR model demonstrated strong explanatory power with a global R² of 0.72 and adjusted R² of 0.68.

Table 5. GWR Model Summary.

Statistic	Value
Global R²	0.72
Adjusted R²	0.68
AICc	420.35
Bandwidth	850.50 m

The AICc value of 420.35 confirms the model's superiority over traditional regression approaches by better capturing spatial heterogeneity. The optimal bandwidth of 850.50 meters reveals the meaningful spatial scale at which accident influences operate. In comparison, Abubakar and Salmanu (2025) reported a shorter spatial range (330.12 m) using Regression Kriging, highlighting that GWR captures broader-scale varying relationships.

3.6. Regression Coefficients

The regression coefficients quantify the spatial trends in accident risk across Jega. The intercept (6.823, p < 0.001) represents the baseline accident count at the coordinate origin. The significant negative coefficient for easting (X: -0.00012, p < 0.001) indicates decreasing risk moving eastward. Conversely, the positive northing coefficient (Y: 0.00008, p < 0.001) confirms increasing accident frequency toward northern urban centers, consistent with known high-risk zones like BLB Junction (Abubakar & Salmanu, 2025).

Table 6. Regression Coefficients for Spatial Accident Prediction.

Variable	Coefficient	Std. Error	t-value	p-value	95% CI
Intercept	6.823	±0.451	15.12	<0.001***	(5.94, 7.71)
X (Easting)	-0.00012	±0.00003	-4.00	<0.001***	(-0.00018, -0.00006)
Y (Northing)	0.00008	±0.00002	4.00	<0.001***	(0.00004, 0.00012)

***Significant at α = 0.001*.

3.7. Spatial Variability in Model Performance

The distribution of local R² values reveals significant spatial variation in the model's explanatory power across the study area. While the median local R² of 0.72 matches the global R², the wide range from 0.20 to 0.95 highlights important geographic differences (Abubakar et al., 2025).

Figure 6. Spatial distribution of local R² values from Geographically Weighted Regression, showing variation in model explanatory power across the study area (range: 0.20 to 0.95).

Figure 7. Spatially varying coefficient surfaces for (a) easting (X) and (b) northing (Y) coordinates from Geographically Weighted Regression.

Table 7. Local R² Distribution.

Percentile	Value
Minimum	0.20
25%	0.55
Median	0.72
75%	0.85
Maximum	0.95

The lower quartile value of 0.55 suggests that in 25% of locations, the model explains just over half of the variation in accident counts, potentially indicating areas where additional explanatory variables may be needed. The upper quartile value of 0.85 and maximum of 0.95 demonstrate that in many locations, particularly those with higher accident frequencies, the model performs exceptionally well.

3.8. Cross-Validation Results

Leave-one-out cross-validation (LOOCV) was performed to assess the GWR model's predictive performance on unseen data. The results demonstrate robust predictive capability.

Figure 8. Comparison of predicted and observed accident frequencies from GWR model, with 1:1 reference line indicating perfect prediction.

Figure 9. Residual analysis showing (a) residuals vs. fitted values, (b) histogram of residuals with normal curve overlay, and (c) spatial distribution of residuals.

Table 8. Cross-Validation Results.

Metric	Value
RMSE	3.45
MAE	2.12
R² (LOOCV)	0.65

The RMSE of 3.45 and MAE of 2.12 indicate relatively small prediction errors for accident counts. The validation R² value of 0.65 suggests the model maintains good explanatory power when generalizing to new, unseen locations. These metrics align closely with those reported by Abubakar and Salmanu (2025) in their Regression Kriging analysis of Jega (RMSE = 3.214, R² = 0.682).

3.9. Location-Specific Validation

Model predictions were validated against observed accident counts at six key locations in Jega, following the approach of Abubakar and Salmanu (2025). The results demonstrate consistent accuracy with errors ranging from 1.4% to 6.0%.

Figure 10. Bar chart comparing observed and predicted accident counts at six key locations in Jega, with error percentages displayed above each pair.

Table 9. High-Accuracy Accident Prediction Results for Selected Locations.

Location	Observed	Predicted	Residual	Error %	Coordinates (Lat, Lon)
Gada	8	7.89	+0.11	1.4%	12.218251, 4.371984
Kaura	4	3.76	+0.24	6.0%	12.218705, 4.379883
Dakora	4	3.82	+0.18	4.5%	12.222822, 4.376158
Kofar kasuwa ta gabas	4	4.18	-0.18	4.5%	12.223552, 4.375976
BLB Junction	4	3.91	+0.09	2.3%	12.228315, 4.378345
Eco Bank	3	2.83	+0.17	5.7%	12.227773, 4.371796

The best predictions occurred at Gada (1.4% error) and BLB Junction (2.3% error), demonstrating the model's strength in urban centers with consistent traffic patterns.

3.10. Identified Hotspot Locations

Integration of KDE and GWR results identified distinct hotspot locations across Jega. The top ten locations ranked by accident density are presented in Table 10.

Figure 11. Ranking of top ten accident hotspot locations in Jega by density score, showing accident counts and normalized density values.

Table 10. Top Ten Hotspot Locations Ranked by Density.

Rank	Location	Accident Count	Density Score
1	Gada	8	0.98
2	Garkar Ando	5	0.92
3	Gobirawa	5	0.89
4	De'Blue	5	0.85
5	Dakora	4	0.81
6	Kofar kasuwa ta gabas	4	0.78
7	Kaura	4	0.76
8	BLB Junction	4	0.74
9	Round about	4	0.71
10	Zaito	3	0.68

Gada emerges as the primary hotspot with 8 accidents and the highest density score (0.98), confirming its status as the most hazardous location in the study area. This finding aligns with Abubakar and Umar (2022) and Abubakar and Salmanu (2025).

Figure 12. Integrated accident risk map for Jega combining KDE hotspot contours (red lines at 85th percentile), GWR local R² background (color gradient), and identified top ten hotspot locations (labeled points).

4. Discussion

4.1. Spatial Clustering Patterns

The significant spatial autocorrelation detected (Moran's I = 0.312, p < 0.05) confirms that road traffic accidents in Jega are not randomly distributed but exhibit clustering tendencies. This finding aligns with the theoretical expectation that accidents cluster due to shared risk factors such as road geometry, traffic volume, and land use patterns (Anselin, 1988; Lord & Mannering, 2010). The clustering pattern is consistent with Abubakar and Umar (2022), who identified spatial autocorrelation in Jega accident data.

4.2. KDE Performance

The KDE analysis with optimal bandwidth of 175 meters and PAI of 3.50 at the 85th percentile demonstrates strong predictive capability for accident hotspot identification. The finding that higher percentile thresholds yield higher PAI values is consistent with the mathematical definition of PAI, where smaller hotspot areas with concentrated accidents produce higher indices. However, as Chainey et al. (2008) note, excessively high thresholds may identify areas too small for practical intervention. The selection of the 85th percentile balances statistical performance with practical utility.

4.3. GWR Model Performance and Spatial Non-Stationarity

The GWR model's strong performance (global R² = 0.72, AICc = 420.35) confirms the presence of spatial non-stationarity in accident-generating processes, consistent with theoretical expectations (Brunsdon et al., 1996; Fotheringham et al., 2002). The substantial variation in local R² values (0.20 to 0.95) underscores the importance of localized analysis, as global models would mask these important spatial differences (Abubakar et al., 2025).

The regression coefficients revealing decreasing risk eastward (-0.00012, p < 0.001) and increasing risk northward (+0.00008, p < 0.001) provide quantitative evidence for directional trends in accident occurrence. These findings align with Abubakar and Salmanu (2025), who reported similar directional patterns using Regression Kriging.

The GWR bandwidth of 850.50 meters indicates the spatial scale at which local relationships operate, suggesting that accident risk factors vary meaningfully over distances of approximately 850 meters. This is larger than the 330.12 meter range reported by Abubakar and Salmanu (2025) for Regression Kriging, reflecting the different purposes of the two methods.

4.4. Comparative Performance of Geospatial Methods

The complementary strengths of KDE and GWR are evident in this analysis. KDE provides a non-parametric, data-driven approach to hotspot identification without requiring predictor variables, making it particularly valuable for initial exploratory analysis and visualization (Anderson, 2009; Xie & Yan, 2008). GWR, conversely, enables modeling of spatially varying relationships, providing insights into the factors driving accident patterns and enabling prediction at unsampled locations (Brunsdon et al., 1996; Fotheringham et al., 2002).

The cross-validation results (RMSE = 3.45, MAE = 2.12, validation R² = 0.65) demonstrate the GWR model's robust predictive capability, comparable to the Regression Kriging results reported by Abubakar and Salmanu (2025) (RMSE = 3.214, R² = 0.682).

4.5. Identified High-Risk Zones

The integration of KDE and GWR results consistently identifies Gada as the primary hotspot (8 accidents, density score 0.98), followed by Garkar Ando (5 accidents, 0.92), Gobirawa (5 accidents, 0.89), and De'Blue (5 accidents, 0.85). These locations correspond to major intersections and commercial areas, consistent with findings from previous studies (Abubakar & Umar, 2022; Abubakar & Salmanu, 2025; Abubakar et al., 2025).

The identification of Gada as the highest-risk location aligns with field observations of heavy traffic volume, multiple intersection approaches, and proximity to market areas. The strong model performance at this location (1.4% prediction error) confirms the reliability of the geospatial framework for identifying priority intervention zones.

4.6. Methodological Considerations and Limitations

Several methodological considerations warrant discussion. First, the relatively small sample size (n=50) may limit the generalizability of findings, though it is adequate for geostatistical analysis (Webster & Oliver, 2001). Second, the absence of additional predictor variables in the KDE analysis means that identified hotspots cannot be directly attributed to specific causal factors. Third, the GWR model, while capturing spatial non-stationarity, may be subject to local multicollinearity issues (Wheeler & Tiefelsdorf, 2005).

5. Conclusions

This study demonstrates the effectiveness of integrating Kernel Density Estimation and Geographically Weighted Regression for spatial analysis of road traffic accidents in Jega, Nigeria. The key findings are:

Significant spatial clustering of accidents exists (Moran's I = 0.312, p < 0.05), confirming that accidents are not randomly distributed and justifying the application of spatial analytical methods.
KDE with optimal bandwidth of 175 meters effectively identifies accident hotspots, with PAI of 3.50 at the 85th percentile threshold, indicating that 15% of the study area contains 52.5% of all accidents.
GWR demonstrates strong explanatory power (global R² = 0.72, AICc = 420.35) with substantial spatial variation in local R² (0.20–0.95), confirming the presence of spatial non-stationarity.
Directional trends in accident risk are quantified: risk decreases eastward (-0.00012, p < 0.001) and increases northward (+0.00008, p < 0.001).
Cross-validation confirms predictive robustness (RMSE = 3.45, MAE = 2.12, validation R² = 0.65), with location-specific errors below 6% at key sites.
Primary hotspots identified include Gada (8 accidents), Garkar Ando (5), Gobirawa (5), and De'Blue (5), providing clear targets for intervention.

These findings contribute to the growing corpus of spatial econometric applications in transportation science and provide evidence-based, scalable frameworks for data-driven road safety policy in Nigerian cities.

6. Recommendations

Based on the findings of this study, the following recommendations are made:

6.1. Targeted Infrastructure Interventions

Transportation authorities should implement immediate infrastructure upgrades in identified high-risk zones, prioritizing Gada, Garkar Ando, Gobirawa, and De'Blue. These targeted measures should include intersection redesign, traffic calming features, improved lighting and signage, and pedestrian crossing facilities near commercial areas.

6.2. Enhanced Enforcement Strategies

Dynamic enforcement strategies should be deployed, including automated speed cameras during peak risk periods, increased police presence at identified hotspots, and targeted enforcement of traffic regulations.

6.3. Institutionalizing Data-Driven Safety Management

Governments should establish formal processes for ongoing spatial monitoring of accident patterns, including annual updates of KDE and GWR models, integration of accident predictions with real-time traffic monitoring systems, and development of integrated data systems combining accident data with traffic volume and road geometry information.

6.4. Methodological Extensions

Future research should consider incorporating additional predictor variables, temporal analysis of accident patterns, comparative evaluation of alternative spatial methods including Regression Kriging (Abubakar & Salmanu, 2025), Universal Kriging (Abubakar & Umar, 2022), and Co-Kriging (Usman & Abubakar, 2020), and application of the integrated framework to other Nigerian cities.

6.5. Policy Integration

Urban planning departments should integrate spatial risk models into long-term development decisions, using zoning regulations to discourage high-risk land uses in problematic areas and conducting public awareness campaigns targeting road users in high-risk areas.

Author Contributions

Conceptualization, Muddassiru Abubakar; Methodology, Muddassiru Abubakar; Formal analysis, Muddassiru Abubakar; Data curation, Muddassiru Abubakar; Writing – original draft, Muddassiru Abubakar; Writing – review & editing, Umar Usman and Muddassiru Abubakar; Visualization, Umar Usman; Supervision, Umar Usman. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

The authors acknowledge the support of the Department of Mathematics, Federal University Birnin Kebbi, for facilitating data collection. We also thank the Federal Road Safety Corps, Jega Division, for providing traffic situation reports that complemented field observations.

Conflicts of Interest

The authors declare no conflict of interest.

References

Abubakar, M., & Salmanu, A. (2025). Mapping high-risk traffic zones in Jega, Nigeria: An integrated geospatial framework for road safety planning. Journal of Basics and Applied Sciences Research, 3(6), 252-261. [CrossRef]
Abubakar, M., Salmanu, A., Umar, M., & Usman, A. G. (2025). Analyzing spatial heterogeneity in road traffic accidents using geographically weighted regression: A case study of Jega, Nigeria. International Journal of Applied Sciences and Mathematical Techniques, 11(9), 126-138.
Abubakar, M., & Umar, M. (2022). Spatial analysis on road traffic accidents in Kebbi State using Universal Kriging. Savanna Journal of Basic and Applied Sciences, 4(1), 65-70.
Anderson, T. K. (2009). Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis & Prevention, 41(3), 359-364. [CrossRef]
Anselin, L. (1988). Spatial econometrics: Methods and models. Springer. [CrossRef]
Bailey, T. C., & Gatrell, A. C. (1995). Interactive spatial data analysis. Longman.
Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28(4), 281-298. [CrossRef]
Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal, 21, 4-28. [CrossRef]
Cressie, N. (1993). Statistics for spatial data. Wiley. [CrossRef]
Eke, C. O., Omole, D. N., & Ayo, O. (2021). Trends and spatial patterns of road traffic accidents in Nigeria. Journal of Transport Geography, 92, 103017. [CrossRef]
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regression: The analysis of spatially varying relationships. Wiley. [CrossRef]
Goovaerts, P. (1997). Geostatistics for natural resources evaluation. Oxford University Press. [CrossRef]
Haining, R. (2003). Spatial data analysis. Cambridge University Press.
Hurvich, C. M., Simonoff, J. S., & Tsai, C. L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B, 60(2), 271-293. [CrossRef]
Isaaks, E. H., & Srivastava, R. M. (1989). An introduction to applied geostatistics. Oxford University Press.
Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice, 44(5), 291-305. [CrossRef]
Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17-23. [CrossRef]
Odeleye, A. O. (2003). Road traffic accidents in Nigeria: A public health problem. Nigerian Medical Practitioner, 43(3), 45-49.
Olawole, M. O. (2012). Transport poverty in metropolitan Lagos. Transport Policy, 24, 152-159. [CrossRef]
Oni, S. I. (2011). Spatial analysis of road traffic accidents in Lagos State, Nigeria. Journal of Geography and Regional Planning, 4(7), 436-444.
Onyeka, I. N., Mbachu, C., & Udigwe, I. (2018). Spatial distribution of malaria incidence in Nigeria: A geostatistical approach. Malaria Journal, 17, 432.
Peden, M., Scurfield, R., & Mohan, D. (2004). World report on road traffic injury prevention. World Health Organization.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman & Hall. [CrossRef]
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2), 234-240. [CrossRef]
Usman, U., & Abubakar, M. (2020). Spatial modelling of lead (Pb) concentration for the soil in Sokoto Rima Basin using Co-Kriging. International Journal of Statistical Distributions and Applications, 6(2), 36-41.
Webster, R., & Oliver, M. A. (2001). Geostatistics for environmental scientists. Wiley. [CrossRef]
Wheeler, D. C., & Tiefelsdorf, M. (2005). Multicollinearity and correlation among local regression coefficients in geographically weighted regression. Journal of Geographical Systems, 7(2), 161-187. [CrossRef]
World Health Organization. (2021). Global status report on road safety 2021. WHO.
World Health Organization. (2023). Global status report on road safety 2023. WHO.
Xie, Z., & Yan, J. (2008). Kernel density estimation of traffic accidents in a network space. Computers, Environment and Urban Systems, 32, 396-406. [CrossRef]
Xu, P., & Huang, H. (2015). Modeling crash spatial heterogeneity: Random parameter versus geographically weighting. Accident Analysis & Prevention, 75, 16-25. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Spatial Analysis of Road Traffic Accident Hotspots in Jega, Nigeria: A Comparative Study of Kernel Density Estimation and Geographically Weighted Regression

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Description

2.3. Spatial Autocorrelation Analysis: Moran's I

2.4. Kernel Density Estimation (KDE)

2.5. Prediction Accuracy Index (PAI)

2.6. Geographically Weighted Regression (GWR)

2.7. Model Validation

3. Results

3.1. Descriptive Statistics

3.2. Spatial Autocorrelation Results

3.3. KDE Bandwidth Optimization

3.4. Hotspot Identification and PAI Analysis

3.5. GWR Model Performance

3.6. Regression Coefficients

3.7. Spatial Variability in Model Performance

3.8. Cross-Validation Results

3.9. Location-Specific Validation

3.10. Identified Hotspot Locations

4. Discussion

4.1. Spatial Clustering Patterns

4.2. KDE Performance

4.3. GWR Model Performance and Spatial Non-Stationarity

4.4. Comparative Performance of Geospatial Methods

4.5. Identified High-Risk Zones

4.6. Methodological Considerations and Limitations

5. Conclusions

6. Recommendations

6.1. Targeted Infrastructure Interventions

6.2. Enhanced Enforcement Strategies

6.3. Institutionalizing Data-Driven Safety Management

6.4. Methodological Extensions

6.5. Policy Integration

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe