This study evaluated the practicality and usefulness of Benford’s Law and outlier detection tests (Grubbs’, Hampel’s, and T-tests) for identifying irregularities in healthcare billing data. Benford’s Law demonstrated significant utility in flagging deviations from expected digit distributions, signalling potential anomalies and casting doubt on data credibility and authenticity. The outlier detection tests complemented this analysis by pinpointing specific providers with anomalous billing patterns.
The accessibility of these methods makes them practical tools for healthcare systems with limited resources, enabling rapid identification of irregularities in large datasets without complex analytical infrastructure. Benford’s Law served as a first-line screening tool, while the outlier detection tests additionally provided a clear focus for further investigation by isolating extreme values that may indicate fraudulent or irregular practices.
This systematic approach highlights the complementary strengths of these methods, enhancing transparency in healthcare systems and enabling early detection of anomalies. The simplicity and practicality of these tools make them especially valuable for resource-constrained settings, where advanced analytical expertise may not be readily available. By combining Benford’s Law with outlier detection tests, this study offers a cost-effective strategy for enhancing transparency in healthcare systems, informing targeted approaches to mitigate financial losses.
4.1. Strategic Use of Historical Data for Evaluating Fraud Detection Methods in Healthcare Billing
The study strategically utilized historical billing data from 2014 to evaluate the effectiveness of Benford’s Law and Grubbs’, Hampel’s, and T-tests, in identifying anomalies within healthcare billing practices. The decision to use data from 2014 was deliberate, as it provided a controlled environment for testing these statistical methods without posing reputational risks to current healthcare providers. By focusing on historical real-world data, the study ensured that its primary objective was methodological evaluation rather than the identification of present-day fraudsters.
Evaluating historical data may be particularly relevant in Slovenia, where healthcare financing relies on a combination of fee-for-service, capitation, and diagnosis-related group (DRG) reimbursement models, all of which are influenced by historical expenditure patterns [
15,
16,
17]. Additionally, historical datasets often feature suitable sample sizes and fewer biases, such as nonresponse or incomplete records, which can compromise the validity of analyses [
18,
19,
20,
21]. These characteristics made the 2014 dataset an ideal candidate for evaluating the applicability and reliability of Benford’s Law and outlier detection tests in healthcare billing systems [
15,
16].
4.2. Assessing Data Credibility and Authenticity with Benford’s Law: Practicality and Usefulness in Healthcare Fraud Detection
Benford’s Law demonstrated its utility as a preliminary screening tool for checking the credibility and authenticity of the data, suggesting potential manipulation or overbilling. It detected anomalies in healthcare billing data in most of the variables in the study, such as “number of services charged,” “total number of points charged,” “number of points per examination,” and “average examination values (€)”. Conversely, compliance with Benford’s Law in categories like “number of first examinations” and “total number of examinations” indicates that these data are likely authentic, reinforcing the method’s reliability for distinguishing between genuine and anomalous patterns. Deviations from expected digit distributions in these variables signalled potential irregularities, consistent with prior studies applying Benford’s Law to fraud detection. Its application has already proven effective in identifying billing anomalies and manipulated data elsewhere [
1,
2,
3,
5,
6,
7,
8].
However, the limitations of Benford’s Law become apparent when applied to smaller datasets or those with constrained ranges, such as the “number of first examinations” and “total number of first and follow-up examinations.” In such cases, compliance with the Benford’s law may not necessarily confirm data authenticity, as smaller sample sizes reduce the reliability of Benford’s conformity testing [
22]. This aligns with the findings elsewhere, which emphasize that Benford’s Law is most effective when applied to datasets with more than 100 data points spanning multiple orders of magnitude [
6,
22]. While its simplicity and low implementation cost make it an attractive tool for fraud detection, it may perhaps best be used as a first-step analysis method to flag anomalies for further investigation using more robust techniques [
5,
6,
22,
23]. These findings underscore the value of combining Benford’s Law with complementary methods, such as outlier detection tests, to enhance fraud detection accuracy and reliability in healthcare datasets.
Easy accessibility of Benford’s Law allows healthcare systems lacking advanced analytical infrastructure to identify potential anomalies quickly [
9,
22,
23]. However, as noted in this study, Benford’s Law is best used combined with complementary methods such as outlier detection tests as in this study (Grubbs’, Hampel’s and T-test). Combined use enhances fraud detection accuracy and reliability by addressing its limitations and providing a more robust framework for anomaly detection [
5,
6,
7,
8]. Such integration aligns with recent advances in fraud detection methodologies, where hybrid models incorporating machine learning and simple statistical techniques have shown improved sensitivity and precision [
5,
6,
7].
4.3. Pinpointing Anomalies with Outlier Detection Tests: Practicality and Usefulness of Grubbs’, Hampel’s, and T-Tests in Healthcare Billing Analysis
The findings of this study confirm the practicality and usefulness of Grubbs’, Hampel’s, and T-tests as robust tools for identifying irregularities in healthcare billing data. These methods demonstrated complementary strengths Benford’s Law in detecting outliers across diverse billing categories, highlighting their adaptability to different data distributions and institutional contexts. Grubbs’, Hampel’s, and T-tests proved straightforward to implement, offering a practical and cost-effective means of preliminary screening.
Grubbs’ test proved particularly effective for normally distributed data categories, such as evaluated in this study, where it identified prominent outliers with high statistical significance. For example, a specialized dermatology center was consistently flagged for unusually high “points per examination” and “average examination values,” aligning with previous applications of Grubbs’ test in healthcare fraud detection [
8,
10]. The method’s reliance on mean and standard deviation makes it ideal for initial screening in small to moderate-sized datasets, though its sensitivity to normality assumptions limits its utility in skewed distributions. Recent advancements, such as data transformation techniques to enhance Grubbs’ outlier detection power in sequential data, further validate its adaptability to complex billing patterns [
10].
Hampel’s test addressed non-normal distributions, such as “number of points per examination,” where median-based thresholds (
k = 4.5) robustly identified outliers without distortion from extreme values. This method flagged the same specialized dermatology center as Grubbs’ test, corroborating results across methodologies. Hampel’s reliance on MAD ensures resilience against skewed datasets, a feature critical in healthcare billing analyses where fee structures or patient demographics may inherently distort means [
8,
24]. For instance, its application in detecting geopolitical shocks in agri-food sector revenue anomalies demonstrates its utility in distinguishing contextual irregularities from inherent variability—a principle transferable to healthcare billing [
25].
T-test provided a critical validation layer by comparing trimmed and full dataset means, confirming outliers such as the large university health center’s unusually high “number of services charged.” This method’s ability to quantify an outlier’s impact on central tendency ensures targeted scrutiny of high-risk providers. Its simplicity aligns with findings from proficiency testing studies, where T-tests efficiently identified outliers in interlaboratory comparisons of clinical measurements [
8,
12,
13,
14].
A key advantage of these tests lies in their computational efficiency and interpretability. They were implemented using Excel 2010, underscoring their accessibility for healthcare systems lacking advanced analytical infrastructure [
9]. This aligns with microcontroller-based sensor studies, where Grubbs’ test improved measurement accuracy in resource-constrained environments [
25]. Similarly, hybrid frameworks combining outlier detection with machine learning, as seen in photovoltaic fault detection [
26], suggest future potential for integrating these tests into automated healthcare fraud detection pipelines without sacrificing transparency.
However, limitations must be acknowledged. Grubbs’ test requires iterative application for multiple outliers, while Hampel’s test may overlook subtle anomalies in small samples. The T-test’s dependence on normality assumptions limits its standalone use. Despite these constraints, their combined application in this study, consistent with methodologies using Benford’s Law in COVID-19 test fraud detection [
23,
27], enhanced detection accuracy by triangulating results across methods. For example, the specialized dermatology center’s outlier status across all three tests reduced false-positive risks, a critical consideration in fraud investigations.
4.4. Future Research Directions
Future research should continue validating the methods tested in this study, using contemporary datasets to assess their adaptability to evolving billing practices [
1,
5]. Automated systems integrating Benford’s Law with outlier detection tests could enhance real-time monitoring of healthcare real-data payments, enabling faster anomaly detection and reducing reliance on manual audits [
4,
6].
Hybrid approaches combining simple statistical methods with advanced technologies like machine learning and artificial intelligence hold promise for improving sensitivity and precision in fraud detection [
4,
5,
6,
7]. For instance, ensemble learning techniques or neural network-based models could complement Benford’s Law by identifying complex patterns of manipulation in healthcare data [
28]. Integrating these tools with clinical guidelines could also align billing practices with evidence-based care standards, enhancing both financial integrity and patient outcomes.
The findings in this study align with broader research on fraud detection and data integrity, emphasizing the importance of accessible tools for resource-constrained healthcare systems. Expanding the application of these methods to other healthcare systems and reimbursement models, such as capitation or DRG schemes, would provide valuable insights into their generalizability across diverse contexts [
15,
16,
17]. By leveraging historical data alongside innovative computational techniques, future studies can refine fraud detection frameworks and contribute to optimizing resource allocation in healthcare systems.