1. Introduction
Acute appendicitis (AA) remains the leading cause of emergency abdominal surgery worldwide [
1]. Although its overall mortality rate is low, diagnostic delays significantly increase the risk of complications and morbidity [
2,
3].
Acute appendicitis (AA) diagnosis is primarily based on physical examination, a focused clinical history, and basic laboratory tests, including complete blood count and acute-phase reactants. The current recommended approach emphasizes risk stratification to guide clinical decision-making, employing multivariable scoring systems such as the AIR score, the RIPASA score, the PAS score, and the BIDIAP score—the latter two specifically developed for pediatric populations—to identify patients who require additional imaging and/or hospital admission, and to reduce the incidence of negative surgical explorations [
4,
5,
6]. Although these scoring systems have proven highly effective for the initial triage of patients with suspected AA, imaging techniques—primarily ultrasound (US) and computed tomography (CT)—remain essential for confirming or ruling out the diagnosis and for differentiating between complicated (CAA) and uncomplicated (NCAA) appendicitis [
7,
8].
Computed tomography (CT) is a widely recognized diagnostic tool for acute appendicitis (AA), particularly in cases with a high clinical suspicion and inconclusive ultrasound (US) findings, having shown superior diagnostic performance compared to US in recent meta-analyses [
7]. However, using CT involves considerable expenditure of human and economic resources. In addition, despite ongoing advances in low-dose imaging protocols, CT remains a significant source of ionizing radiation, which limits its unrestricted use, particularly in vulnerable populations such as children and pregnant women [
1,
4,
7]. Despite the available evidence, recent studies continue demonstrating an overuse of CT imaging in pediatric populations [
9].
Other imaging modalities with higher specificity, such as magnetic resonance imaging (MRI), have also demonstrated excellent diagnostic performance in acute appendicitis (AA) [
10,
11]. However, their clinical implementation remains challenging and costly in current practice. For instance, pediatric patients often require sedation to undergo MRI examinations, adding complexity to its routine use.
Ultrasound (US) has demonstrated excellent diagnostic performance in the evaluation of appendicitis, both when performed by specialized radiologists [
7] and when conducted by clinicians using point-of-care ultrasound (POCUS) [
12,
13]. Nevertheless, US remains a highly operator-dependent modality, and considerable rates of non-visualization of the cecal appendix are reported in recent literature [
14]. Non-visualization may be attributed to several factors, including patient obesity, the anatomical location of the appendix, poor acoustic windows due to interposed bowel loops, and the operator's experience level. The adoption of standardized protocols, such as the graded compression technique described by Puylaert in 1986 [
15], the three-step positioning algorithm [
16], and structured coaching strategies [
17], has significantly improved appendiceal visualization rates.
Quillin et al. first reported using Doppler US as an additional diagnostic tool in evaluating acute appendicitis (AA) in 1992 [
18]. Based on the pathophysiological premise that inflammation of the cecal appendix leads to increased blood flow that can be detected and quantified using Doppler techniques, numerous studies have evaluated the potential diagnostic performance of Doppler ultrasound—including color Doppler (CD), power Doppler (PD), contrast-enhanced power Doppler (CEPD), and more recently, spectral Doppler (SD)—in acute appendicitis (AA), as well as its ability to discriminate between complicated (CAA) and uncomplicated (NCAA) forms [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38]. This systematic review aims to synthesize the existing evidence on this topic.
2. Methods
2.1. Literature search and selection
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses in Diagnostic Test Accuracy Studies (PRISMA-DTA) guidance [
39]. Supplementary File 1 shows the PRISMA-DTA Checklist. We specifically designed and implemented a review protocol registered in the International Prospective Register of Systematic Reviews (PROSPERO ID CRD42025641841).
Eligible studies were identified by searching the primary existing medical bibliography databases (PubMed, Web of Science, Scopus, and OVID MEDLINE). Supplementary File 2 shows the detailed search strategy for each bibliographic database. The search was last executed on 22.04.2025.
JAM and MRJ selected articles using the COVIDENCE ® tool. The search results were imported into the platform, and both authors screened the articles separately. Disagreements were resolved by consensus. Supplementary File 3 shows the inclusion and exclusion criteria.
2.2. Quality assessment
The QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool was used to evaluate each selected article´s methodological quality and risk of bias [
40]. Each article evaluated patient selection, index test, reference standard, flow, and timing. Applicability concerns regarding patient selection, index tests, and reference standards were also assessed.
2.3. Data extraction and synthesis
The target condition was defined as acute appendicitis (AA) confirmed either by histopathological examination or intraoperative findings. The index test was Doppler ultrasound (all modes). The reference standard was the histopathological examination of the resected cecal appendix. Two independent reviewers (JAM, MRJ) extracted the relevant data from the selected articles following a standardized procedure. Extracted data included author, country where the study was conducted, year of publication, study design, study population (sample size, age range, and sex distribution), AA group and control group (CG) definitions, reference standard used in AA group, mean or median and standard deviation or range or interquartile range for peak systolic velocity (PSV) and resistive index (RI) determinations, statistical p-value for the between-group comparison, PSV and RI cut-off value (if established), and its associated sensitivity and specificity. There were no disagreements between the reviewers after collating the extracted data. The metrics used in each study were reviewed, and it was determined that a standardization of units was not required. Means (ranges) were converted to means (standard deviations) following a standardized procedure [
41,
42] in two cases [
33,
38]. True positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were obtained either directly from the included studies or estimated, when not explicitly reported, based on available sensitivity, specificity, and the number of patients with and without the target condition, using standardized statistical formulae [
43]. Reported sensitivities, specificities, sample sizes per group, and predictive values were used to cross-validate the calculations.
The diagnostic odds ratio (DOR) was calculated for each study as (TP × TN) / (FP × FN). A continuity correction of 0.5 was applied to all zero-valued contingency table cells to avoid division by zero. Subsequently, DOR was log-transformed to stabilize variances and allow for linear approximation. The standard error (SE) of the log-transformed DOR was calculated using standard formulae based on the corrected contingency table counts. The 95% confidence intervals (CIs) for the log(DOR) were obtained by applying the normal approximation method (log(DOR) ± 1.96 × SE) and were then exponentiated to derive the 95% CIs on the original DOR scale.
2.4. Meta-analysis
Two random-effects meta-analyses (MA) for Doppler ultrasound were performed using the restricted maximum likelihood (REML) method: (1) a meta-analysis of resistance index (RI) values comparing acute appendicitis (AA) and control group (CG) patients, and (2) a meta-analysis of PSV values (measured in cm/s) comparing AA and CG patients. All studies with available data were included. Due to the limited number of studies reporting PSV and RI data, no additional sensitivity analyses were conducted. Results were expressed as mean differences with corresponding 95% confidence intervals (CIs) and were depicted using forest plots. Between-study heterogeneity was assessed using the I² statistic. Two leave-one-out sensitivity analyses were conducted (one for each REML meta-analysis).
2.5. Diagnostic Test Accuracy Meta-analysis
Three main diagnostic test accuracy (DTA) meta-analytical models were conducted: (1) overall diagnostic performance of Doppler ultrasound (AA vs. CG), (2) diagnostic performance of color Doppler ultrasound (AA vs. CG), and (3) diagnostic performance of spectral Doppler ultrasound (AA vs. CG). Pooled sensitivity, specificity and area under the curve (AUC) estimates were reported for each model. Results were presented as forest plots of sensitivity and specificity and hierarchical summary receiver operating characteristic (HSROC) curves. Meta-regression analyses were performed to assess the impact of study design (prospective vs retrospective) and population characteristics (pediatric vs mixed/adult) on diagnostic performance. To perform meta-regresion analyses, sensitivity and specificity were logit-transformed, and univariate models were fitted separately for each outcome, incorporating standard errors derived from contingency table data. The Knapp-Hartung method was applied to adjust standard errors, and the proportion of variance explained by each covariate was estimated using the adjusted R² statistic. The
metadta,
midas, and
metandi modules in STATA were used to conduct the DTA meta-analyses [
44,
45,
46]. The
mada module in R was used to conduct the meta-regression DTA analyses [
47].
2.6. Publication Bias and Small-Study Effects Assessment.
Concerning the REML meta-analytical models, Egger's and Begg´s tests and funnel plots (not shown) were used to assess the risk of publication bias. When evidence of publication bias was identified, the trim-and-fill method was applied to estimate its potential impact on the results [
48]. For the DTA meta-analytical models, Deeks' asymmetry test was performed when more than 10 studies were included in the analysis to evaluate the presence of publication bias [
49]. A weighted linear regression of log(DOR) against the inverse square root of the sample size was performed. The p-value of the slope coefficient was used to determine the presence of asymmetry, with a p-value <0.10 considered suggestive of publication bias, in line with established guidelines for Deeks' test in diagnostic test accuracy meta-analyses.
Statistical analyses were conducted using Review Manager (RevMan) version 5.4 (The Cochrane Collaboration, 2020), Stata version 19.0 (StataCorp LLC, College Station, TX, USA) with the metandi, midas and metadta modules, and R version 4.3.2 (R Foundation for Statistical Computing, Vienna, Austria) with the mada module (version 0.5.12).
3. Results
The search returned 405 articles (Scopus n=92; Pubmed n=91; Web of Science n=192; Ovid MEDLINE n=30). 117 duplicates were removed. Among the remaining 288 articles, we excluded 267 (inclusion and exclusion criteria, n=267; reports not retrieved, n=0). This review finally included 21 studies with data from 2,774 participants (946 males, 1,061 females, 767 without gender specification), including 1,112 patients with a confirmed diagnosis of AA and 1,145 controls (CG) [
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38]. Discrepancies were identified between the number of patients included per group (AA and CG), the number of patients per gender (male/female), and the total number of patients reported in the review, attributable to the lack of explicit reporting in some studies. The flowchart of the search and selection process is shown in
Figure 1.
The risk of bias concerning the selection of patients was considered low in three of the studies [
18,
19,
20], unclear in seventeen of them [
21,
22,
23,
25,
26,
27,
28,
29,
31,
32,
33,
34,
35,
36,
37,
38], and high in two of them [
24,
30]. The risk of bias concerning the index test was considered low in fifteen studies [
18,
21,
22,
23,
24,
25,
26,
27,
28,
30,
32,
35,
36,
37,
38] and unclear in seven [
19,
20,
22,
29,
31,
33,
34]. The risk of bias concerning the reference standard was considered low in four studies [
20,
31,
32,
34], unclear in seventeen [
19,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
33,
35,
36,
37,
38], and high in one of them [
18]. The risk of bias concerning flow and timing was considered low in fifteen studies [
18,
19,
21,
22,
23,
24,
25,
26,
27,
28,
32,
35,
36,
37,
38], unclear in six studies [
20,
22,
29,
31,
33,
34], and high in one of them [
30]. Regarding patient selection applicability concerns, the risk was considered low in three of the studies [
18,
19,
20], unclear in seventeen of them [
21,
22,
23,
25,
26,
27,
28,
29,
31,
32,
33,
34,
35,
36,
37,
38], and high in two of them [
24,
30]. Regarding the index test applicability concern, the risk was considered low in eighteen studies [
18,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
32,
35,
36,
37,
38] and unclear in four studies [
19,
31,
33,
34]. Concerning reference standard applicability concerns, the risk was considered low in twenty studies [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38] and high in one [
18]. In the case of Lim et al. [
22], prospective and retrospective cohorts were analyzed separately. For this reason, the reference may be considered as both low risk and unclear risk in certain categories (such as index test or flow and timing). The QUADAS-2 results are depicted in
Figure 2.
4. Doppler Ultrasound in Acute Appendicitis
Sociodemographic Characteristics
Table 1 summarizes the data extracted from the twenty-one studies that evaluated Doppler ultrasound. All studies were conducted between 1992 and 2025 [
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38]. Seven were from the United States [
18,
19,
20,
24,
29,
31,
33], three were from India [
30,
37,
38], three were from Turkey [
26,
32,
34], one was from Canada [
21], one was from South Korea [
22], one was from Italy [
23], one was from France [
25], one was from Brazil [
27], one was from Israel [
28], , one was from Iran [
35], one was from Egypt [
36]. Thirteen studies were prospective [
18,
19,
20,
21,
23,
24,
25,
26,
27,
32,
36,
37,
38], and five were retrospective [
28,
29,
31,
33,
34]. One study reported two cohorts, one prospective and one retrospective [
22]. One study did not explicitly report its design, and after reviewing its design, we classified it as retrospective [
30]. One study was reported as cross-sectional, and after reviewing its design, we classified it as retrospective [
35]. Four studies involved exclusively pediatric populations [
18,
19,
20,
27].
Fifteen studies included patients with clinical suspicion of AA as their study population [
18,
19,
20,
23,
25,
26,
27,
28,
29,
30,
33,
35,
36,
37,
38]. One study included a selective group of patients presenting with atypical manifestations of AA [
24]. In two cases, populations with histopathologically confirmed AA and various types of control groups were included separately [
21,
22]. Three studies included only patients who underwent surgical intervention for suspected AA [
31,
32,
34].
Twenty studies consistently defined 'case' as the histopathological confirmation of AA in the surgical specimen [
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38]. AA was based on surgical findings in one study, and a histopathological study was not explicitly reported [
18]. Twelve studies stratified the AA group into NCAA and CAA [
18,
19,
20,
21,
23,
24,
25,
26,
27,
28,
31,
32]
This was not the case for the definition of 'control', which constituted either patients seen at the Emergency Department in which the diagnosis of AA was finally excluded (also known as non-surgical abdominal pain or NSAP) [
18,
22,
23,
24,
25,
26,
27,
28,
29,
30,
33,
34], patients with AA suspicion which finally had other surgical pathology [
18,
24,
28], negative appendectomies (NA) [
19,
23,
24,
26,
27,
28,
29,
30,
32] or specifically lymphoid hyperplasia as a form of NA [
34], healthy control with ultrasound performed for other reasons (i.e, urological pathology) [
21], patients with irritable bowel syndrome suspicion who underwent a barium enema [
22].
In twelve studies, the authors restricted their analyses to the subgroup of patients in whom the cecal appendix was identified using grayscale ultrasound (US) [
21,
22,
23,
25,
27,
28,
29,
30,
33,
35,
36,
38]. Additionally, in some of these cases, the inclusion criteria were even more restrictive. For instance, in the study by Daga et al., only the 85 patients with appendiceal identification on US and sonographic criteria for acute appendicitis were included [
30], while in the study by Anuj et al., only patients with an appendix visible on grayscale US and spectral Doppler waveforms on appendiceal US were considered [
38]. In three of these cases, only US examinations with borderline features were included [
22,
29,
35].
Table 1 shows the main characteristics of the studies included in this review, including the Doppler modalities assessed and the technical parameters of the sonographic examinations.
5. Overall Doppler Ultrasound Diagnostic Performance in Acute Appendicitis
6. Spectral Doppler
6.1. Spectral Doppler Measurement Units
Nine studies reported the use of SD for diagnosing acute appendicitis (AA) [
21,
22,
29,
32,
33,
35,
36,
37,
38]; among them, seven provided numerical values and/or specific diagnostic performance data [
21,
32,
33,
35,
36,
37,
38]. The authors who reported SD numerical values and/or diagnostic performance data assessed it using three continuous quantitative parameters: PSV, RI, and pulsatility index (PI). PSV was consistently reported across all studies in centimeters per second (cm/s), whereas RI is a dimensionless parameter. In two instances, where studies reported means and ranges instead of standard deviations, the missing standard deviations were estimated using the method described by Wan et al. [
41] to enable meta-analytic pooling. Although the Wan et al. method was originally developed to estimate means and standard deviations from medians and ranges (or interquartile ranges), the studies in question reported means (not medians). To minimize potential inaccuracies, results were compared with estimates obtained using the method proposed by Hozo et al. [
42]. Nevertheless, this approach constitutes a methodological limitation, as combining means and ranges alone does not reliably allow accurate estimation of standard deviation and may not accurately reflect the underlying data distribution.
6.2. Diagnostic Performance of Peak Systolic Velocity and Resistive Index (AA Vs. CG)
Seven studies reported quantitative values of the RI [
21,
26,
32,
33,
36,
37,
38]. Five of them [
33,
35,
36,
37,
38] reported a cut-off value for RI, ranging from 0.495 [
35] to 0.65 [
33,
38]. Sensitivities and specificities for RI ranged from 63.9% [
33] to 90.5% [
35] and from 58.3% [
37] to 96.5% [
33], respectively.
Four studies reported quantitative values in cm/s for PSV [
33,
36,
37,
38]. They all reported a PSV cut-off, ranging from 8.6 cm/s [
36] to 11.8 cm/s [
37]. Sensitivities and specificities for PSV ranged from 85.3% [
38] to 98.3% [
36] and from 54.2% [
37] to 94.7% [
33,
35], respectively. One study reported PI values as means [
32]. In five studies, true positive, false positive, true negative, and false negative values could be calculated for both PSV and RI [
33,
35,
36,
37,
38].
Four studies provided a p-value for the comparison of PSV and RI values between the AA and CG, three of which were statistically significant (p<0.001) [
33,
36,
38]. In the study by Saini et al. [
37], the p-value for the comparison of PSV between groups was statistically significant (p<0.009). In contrast, the p-value for the comparison of RI only reached marginal significance (p=0.056). The reported sensitivity and specificity for each study are shown in Table 1.
6.3. Random-Effects Meta-Analysis for Spectral Doppler (AA Vs. CG)
The random-effect meta-analysis of PSV (AA vs. CG) included four articles (139 AA and 139 controls) and resulted in a significant mean difference [95% CI] of 7.43 [5.37,9.48] cm/s (p<0.01). The I
2 value was 59.6%. The forest plot of this meta-analysis is shown in
Figure 4. A leave-one-out analysis was performed, carrying out iterations on the present model, excluding one of the studies included in each iteration (forest plot not shown). The leave-one-out analysis showed that the article that conditioned the model most negatively was El-Aleem et al. [
36]. Its exclusion from the model resulted in a mean difference [95% CI] of 8.34 [6.42,10.27] cm/s (p<0.001). An Egger test obtained a p-value of 0.78, and the Begg test obtained a value of 0.73. Therefore, a trim-and-fill analysis was not performed.
Concerning the random-effects meta-analysis of RI, Patriquin et al. [
21] reported CG RI values as a range without a measure of central tendency; thus, the study could not be included in the meta-analytical models. Incesu et al. [
26] did not provide a dispersion measure for RI, and Uzunosmanoğlu et al. (2017) [
32] likewise reported RI values without dispersion data; consequently, both studies were also excluded from the meta-analyses. The random-effect meta-analysis of RI (AA vs. CG) included four articles (139 AA and 139 controls) and resulted in a significant mean difference [95% CI] of 0.14 [0.10,0.19] (p<0.01). The I
2 value was 52%. The forest plot of this meta-analysis is shown in
Figure 4. A leave-one-out analysis was performed, carrying out iterations on the present model, excluding one of the studies included in each iteration (forest plot not shown). The leave-one-out analysis showed that the article that conditioned the model most negatively was Saini et al. [
37]. Its exclusion from the model resulted in a mean difference [95% CI] of 0.17 [0.13,0.20](p<0.001). An Egger test obtained a p-value of 0.31, and the Begg test obtained a value of 0.31. Therefore, a trim-and-fill analysis was not performed.
6.4. Diagnostic Test Accuracy Meta-Analysis for Spectral Doppler (AA Vs. CG)
The DTA meta-analysis for SD (AA vs. CG) included 10 observations and yielded a pooled sensitivity and specificity [95% CI] of 88% [80,93] and 87% [77,93].
Figure 5 and
Figure 6 show the forest plot and the HSROC curve resulting from this meta-analysis.
Separate DTA models were performed for PSV and RI (figures not shown). The model for PSV (AA vs. CG) included five observations and yielded a pooled sensitivity and specificity [95% CI] of 94% [89,97] and 87% [71,95], respectively. The model for RI (AA vs. CG) included five observations and yielded a pooled sensitivity and specificity [95% CI] of 81% [68,89] and 88% [73,95].
7. Color Doppler
Fifteen authors evaluated CD as a diagnostic tool in acute appendicitis (AA) [
18,
19,
20,
21,
22,
23,
24,
25,
27,
28,
29,
30,
31,
32,
34]. Of these authors, two exclusively assessed the ability of CD to discriminate between NCAA and CAA [
20,
31], while the rest evaluated the ability of CD to diagnose AA in comparison to the control group [
18,
19,
21,
22,
23,
24,
25,
27,
28,
29,
30,
32,
34]. A considerable heterogeneity was identified in the reported definitions of positivity (pathological findings) for CD imaging in acute appendicitis (AA). While some authors considered any detection of CD flow in the cecal appendix as positive, others only considered positivity when hyperemia or increased appendiceal flow was observed. Some authors, such as Patriquin et al.[
21], used a multicategory scale based on the number of CD signals detected in the appendiceal wall (0 = none, 1–2 = few, 3–4 = moderate, >4 = abundant). This scale was later replicated by other authors, such as Gaitini et al. [
28]. Some studies reported different diagnostic performance estimates depending on the cut-off point selected for the proposed scale; for example, Xu et al.[
29] reported varying results depending on whether elevated flow or "type 2 flow" was considered diagnostic of AA, or whether the absence of flow was deemed sufficient to exclude AA. Other authors, such as Daga et al.[
30], also reported different diagnostic outcomes depending on whether any detected appendiceal Doppler flow was considered diagnostic, or only cases showing hyperemia.
8. Power Doppler
Three authors reported evaluating PD to diagnose acute appendicitis (AA) [
23,
26,
34]. Pinto et al. reported a higher diagnostic performance of PD over CD [
23]. Incesu et al. PD with CEPD, demonstrating the latter's superiority over standalone PD [
26]. Aydin et al. reported diagnostic performance data that combined results from both CD and PD modalities without distinction [
34].
Only three studies' contingency table data (TP, FP, TN, FN) were available for the power Doppler modality. Therefore, a DTA meta-analytical model could not be performed, as at least four studies are required to fit such models reliably.
9. Doppler Ultrasound (Complicated Appendicitis vs. Non-Complicated Appendicitis)
Four studies provided Doppler data and/or comparisons for CAA and NCAA groups [
20,
21,
31,
32].
Diagnostic Performance of Doppler Ultrasound (NCAA vs. CAA)
Four studies reported the sensitivity and specificity of CD for discriminating NCAA and CAA: Quillin et al. [
20] (77.8% and 60%), Patriquin et al. [
21] (100% for both), Uzunosmanoğlu et al. [
32] (93% and 85%), and Xu et al. [
31] (25% and 72.4%). Two studies also provided SD measurements for the CAA and NCAA groups, using RI values [
21] or PI values [
32].
10. Discussion
The present systematic review and meta-analysis evaluated the role of all Doppler US modalities in diagnosing AA, consistently demonstrating excellent diagnostic yield.
Concerning the biological plausibility and the pathophysiological rationale for using Doppler ultrasound to diagnose acute appendicitis (AA), inflammation of the cecal appendix is associated with a localized increase in vascular perfusion secondary to the release of inflammatory mediators. These changes are potentially detectable through Doppler imaging techniques. However, it is essential to note that this phenomenon is not specific to AA and may occur in any infectious or inflammatory process. Consequently, conditions such as colitis or ileitis may also present with increased Doppler signal on ultrasound evaluation. However, it should be considered that based on this same pathophysiological premise, the occurrence of appendiceal tissue ischemia in the context of gangrenous acute appendicitis (GAA) or complicated acute appendicitis (CAA) may be associated with a reduction or absence of Doppler flow within the appendix. This phenomenon has been previously reported by authors such as Quillin et al. [
20], who observed that appendiceal hyperemia was more frequent in non-perforated AA compared to perforated cases, and Patriquin et al. [
21], who described the absence of Doppler signal at the appendiceal tip in a high proportion of CAA cases.
Regarding the different Doppler modalities, CD, PD, and SD have been primarily evaluated. CD was the first modality used for diagnosing AA and remains the most extensively characterized in the medical literature, demonstrating excellent diagnostic performance. PD has also shown excellent, and in some cases superior, performance; however, the limited number of published studies and the inability to conduct meta-analytical models to assess its diagnostic accuracy quantitatively prevent definitive conclusions from being drawn. CEPD, although promising, was only evaluated in one article. On the other hand, recent literature has focused on using SD, mainly through analyzing PSV and RI. In this regard, SD offers a significant advantage over CD and PD, namely the ability to obtain objective quantitative measurements, which could potentially reduce interobserver variability inherent to ultrasound examinations, particularly when using CD or PD modes. Regarding the discriminative capacity of Doppler ultrasound to distinguish non-complicated acute appendicitis (NCAA) from complicated acute appendicitis (CAA), the available evidence is limited and currently markedly inferior to that reported for the diagnosis of acute appendicitis (AA) versus a control group (CG). This is a significant limitation, given that the potential presence of selection bias must be assumed in all cases.
Another relevant aspect is the lack of experience with the normal Doppler imaging appearance of the cecal appendix. This represents a significant limitation, as distinguishing between normal and pathological findings is critical for accurately characterizing the diagnostic performance of Doppler ultrasound in acute appendicitis (AA). It should also be considered that although the equipment used in the earlier studies was technologically more primitive and therefore less sensitive, it was reasonable to interpret a positive Doppler signal as pathological at that time. However, this concept likely requires re-evaluation given the greater sensitivity of current US machines.
Regarding study design, most studies were prospective with consecutive patient recruitment. However, a significant number of studies exhibited a potential risk of selection bias, as many included only patients in whom the appendix was visualized on grayscale US, and, in several cases, specifically those with borderline sonographic findings for acute appendicitis (AA) diagnosis (e.g., non-compressible appendices or those measuring 6–8 mm). On the one hand, this represents an advantage, as the overall diagnostic performance of the tool is assessed in a population where diagnostic uncertainty is frequent, such as in cases of borderline visualized appendices. On the other hand, it must be noted that the diagnostic performance data provided in these studies may not reflect the general population of patients undergoing primary US for suspected AA, as cases without appendiceal visualization were systematically excluded in some studies. Considering the significant rate of non-visualization of the appendix reported in recent series, we believe that (1) the overall diagnostic performance of Doppler ultrasound in AA is likely overestimated in these studies, but on the other hand (2) this tool demonstrates potential diagnostic utility specifically in cases where the appendix is positively visualized, including those with borderline sonographic criteria.
Many retrospective studies relied on the retrospective review of static images or videotapes of examinations originally performed by other radiologists. We believe this represents a significant limitation and should be considered when evaluating the diagnostic performance reported in these studies. Additionally, the retrospective nature of these studies introduces essential limitations, such as (1) the lack of an accurate epidemiological representation of the prevalence of acute appendicitis (AA) and its distribution by age and sex (for example, several studies report a disproportionately higher number of female patients, despite AA being a condition with a slight male predominance) [
24]. Concerning the geographic distribution of the studies included, it is sufficiently broad not to limit the extrapolation of the results of this work.
Diagnostic odds ratios (DORs) across individual studies (AA vs.CG) showed considerable variability, ranging from 2.26 to 1848 (Supplementary File 4). Most studies reported high DORs, suggesting that Doppler ultrasound performs very well in distinguishing acute appendicitis (AA) from control groups (CG). Several studies reached extremely high DOR values, pointing to near-perfect diagnostic performance. However, the broad confidence intervals around these estimates indicate some imprecision, probably related to smaller sample sizes or low event rates. The asymmetry observed in the confidence intervals, with much wider upper bounds, is expected due to the log transformation applied during analysis and reflects the natural variability common to diagnostic accuracy studies [
50].It should also be noted that the clinical interpretation and translation of the DOR is less intuitive than that of sensitivity and specificity, and therefore it is not commonly reported or used in clinical practice.
The present study has essential strengths, such as the robust methodology based on the PRISMA-DTA guidelines and the DTA meta-analytical models used. However, it has significant limitations: 1) the potential selection (spectrum) bias in most articles, 2) the limitations inherent to the inferential statistical procedures used, 3) the small sample size and the retrospective nature of some of the included studies, 4) the high heterogeneity observed in some of the DTA meta-analytic models conducted, 5) the high heterogeneity in the control group definition.
Given its noninvasive nature and excellent diagnostic accuracy, Doppler ultrasound holds promise as an essential diagnostic tool for acute appendicitis. However, it needs further validation through large, well-designed multicenter studies.
Supplementary Materials
The following supporting information can be downloaded at the website of this paper posted on
Preprints.org.
CRediT Authorship Contribution Statement
JAM: Conceptualization and study design; literature search and selection; data curation and extraction; formal analysis; investigation; methodology; project administration; resources; validation; visualization; writing – original draft; writing – review and editing. MRJ: Literature search and selection; data curation and extraction; project administration; resources; validation; visualization; writing, review, and editing.
Conflicts of Interest
The authors declare that they have no conflict of interest.
Financial Statement/Funding
This review did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors, and none of the authors has external funding to declare.
Ethical Approval
This study did not involve the participation of human or animal subjects, and therefore, IRB approval was not sought.
Statement of Availability of the Data Used during the Systematic Review
All data used for the meta-analytical models are available in the accompanying supplementary dataset file
Registration
PROSPERO (CRD42025641841).
References
- Lotfollahzadeh S, Lopez RA, Deppen JG. Appendicitis. 2024 Feb 12. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan–. [PubMed]
- Tang G, Zhang L, Xia L, Zhang J, Chen R, Zhou R. Preoperative in-hospital delay increases postoperative morbidity and mortality in patients with acute appendicitis: a meta-analysis. Int J Surg. 2025 Jan 1;111(1):1275-1284. [CrossRef] [PubMed]
- Bolmers MDM, de Jonge J, Bom WJ, van Rossem CC, van Geloven AAW, Bemelman WA; Snapshot Appendicitis Collaborative Study group. In-hospital Delay of Appendectomy in Acute, Complicated Appendicitis. J Gastrointest Surg. 2022 May;26(5):1063-1069. Epub 2022 Jan 20. [CrossRef] [PubMed]
- Di Saverio S, Podda M, De Simone B, Ceresoli M, Augustin G, Gori A, Boermeester M, Sartelli M, Coccolini F, Tarasconi A, De' Angelis N, Weber DG, Tolonen M, Birindelli A, Biffl W, Moore EE, Kelly M, Soreide K, Kashuk J, Ten Broek R, Gomes CA, Sugrue M, Davies RJ, Damaskos D, Leppäniemi A, Kirkpatrick A, Peitzman AB, Fraga GP, Maier RV, Coimbra R, Chiarugi M, Sganga G, Pisanu A, De' Angelis GL, Tan E, Van Goor H, Pata F, Di Carlo I, Chiara O, Litvin A, Campanile FC, Sakakushev B, Tomadze G, Demetrashvili Z, Latifi R, Abu-Zidan F, Romeo O, Segovia-Lohse H, Baiocchi G, Costa D, Rizoli S, Balogh ZJ, Bendinelli C, Scalea T, Ivatury R, Velmahos G, Andersson R, Kluger Y, Ansaloni L, Catena F. Diagnosis and treatment of acute appendicitis: 2020 update of the WSES Jerusalem guidelines. World J Emerg Surg. 2020 Apr 15;15(1):27. [CrossRef] [PubMed]
- Andersson RE, Stark J. Diagnostic value of the appendicitis inflammatory response (AIR) score. A systematic review and meta-analysis. World J Emerg Surg. 2025 Feb 8;20(1):12. [CrossRef] [PubMed]
- Arredondo Montero J, Bardají Pascual C, Antona G, Ros Briones R, López-Andrés N, Martín-Calvo N. The BIDIAP index: a clinical, analytical and ultrasonographic score for the diagnosis of acute appendicitis in children. Pediatr Surg Int. 2023 Apr 10;39(1):175. [CrossRef] [PubMed]
- Arruzza E, Milanese S, Li LSK, Dizon J. Diagnostic accuracy of computed tomography and ultrasound for the diagnosis of acute appendicitis: A systematic review and meta-analysis. Radiography (Lond). 2022 Nov;28(4):1127-1141. Epub 2022 Sep 18. [CrossRef] [PubMed]
- Bom WJ, Bolmers MD, Gans SL, van Rossem CC, van Geloven AAW, Bossuyt PMM, Stoker J, Boermeester MA. Discriminating complicated from uncomplicated appendicitis by ultrasound imaging, computed tomography or magnetic resonance imaging: systematic review and meta-analysis of diagnostic accuracy. BJS Open. 2021 Mar 5;5(2):zraa030. [CrossRef] [PubMed]
- Chidiac C, Issa O, Garcia AV, Rhee DS, Slidell MB. Failure to Significantly Reduce Radiation Exposure in Children with Suspected Appendicitis in the United States. J Pediatr Surg. 2024 Aug 22:161701. Epub ahead of print. [CrossRef] [PubMed]
- D'Souza N, Hicks G, Beable R, Higginson A, Rud B. Magnetic resonance imaging (MRI) for diagnosis of acute appendicitis. Cochrane Database Syst Rev. 2021 Dec 14;12(12):CD012028. [CrossRef] [PubMed]
- Kim D, Woodham BL, Chen K, Kuganathan V, Edye MB. Rapid MRI Abdomen for Assessment of Clinically Suspected Acute Appendicitis in the General Adult Population: a Systematic Review. J Gastrointest Surg. 2023 Jul;27(7):1473-1485. Epub 2023 Apr 20. PMCID: PMC10366263. [CrossRef] [PubMed]
- Matthew Fields J, Davis J, Alsup C, Bates A, Au A, Adhikari S, Farrell I. Accuracy of Point-of-care Ultrasonography for Diagnosing Acute Appendicitis: A Systematic Review and Meta-analysis. Acad Emerg Med. 2017 Sep;24(9):1124-1136. Epub 2017 Aug 21. [CrossRef] [PubMed]
- Cho SU, Oh SK. Accuracy of ultrasound for the diagnosis of acute appendicitis in the emergency department: A systematic review. Medicine (Baltimore). 2023 Mar 31;102(13):e33397. [CrossRef] [PubMed]
- Harel S, Mallon M, Langston J, Blutstein R, Kassutto Z, Gaughan J. Factors Contributing to Nonvisualization of the Appendix on Ultrasound in Children With Suspected Appendicitis. Pediatr Emerg Care. 2022 Feb 1;38(2):e678-e682. [CrossRef] [PubMed]
- Puylaert JB. Acute appendicitis: US evaluation using graded compression. Radiology. 1986 Feb;158(2):355-60. [CrossRef] [PubMed]
- Chang ST, Jeffrey RB, Olcott EW (2014) Three-step sequential positioning algorithm during sonographic evaluation for appendicitis increases appendiceal visualization rate and reduces CT use. AJR Am J Roentgenol 203(5):1006–1012.
- Pfeifer CM, Carrejo B, Lewis S, Hutchinson K, Gokli A, Kwon J. Structured coaching as a means to improve sonographic visualization of the appendix: a quality improvement initiative. Emerg Radiol. 2023 Apr;30(2):161-166. Epub 2023 Jan 4. [CrossRef] [PubMed]
- Quillin SP, Siegel MJ. Appendicitis in children: color Doppler sonography. Radiology. 1992 Sep;184(3):745-7. [CrossRef] [PubMed]
- Quillin SP, Siegel MJ. Appendicitis: efficacy of color Doppler sonography. Radiology. 1994 May;191(2):557-60. [CrossRef] [PubMed]
- Quillin SP, Siegel MJ. Diagnosis of appendiceal abscess in children with acute appendicitis: value of color Doppler sonography. AJR Am J Roentgenol. 1995 May;164(5):1251-4. [CrossRef] [PubMed]
- Patriquin HB, Garcier JM, Lafortune M, Yazbeck S, Russo P, Jequier S, Ouimet A, Filiatrault D. Appendicitis in children and young adults: Doppler sonographic-pathologic correlation. AJR Am J Roentgenol. 1996 Mar;166(3):629-33. [CrossRef] [PubMed]
- Lim HK, Lee WJ, Kim TH, Namgung S, Lee SJ, Lim JH. Appendicitis: usefulness of color Doppler US. Radiology. 1996 Oct;201(1):221-5. [CrossRef] [PubMed]
- Pinto, F. , Lencioni, R., Falleni, A. et al. Assessment of hyperemia in acute appendicitis: Comparison between power Doppler and color Doppler sonography. Emergency Radiology 5, 92–96 (1998). [CrossRef]
- Gutierrez CJ, Mariano MC, Faddis DM, Sullivan RR, Wong RS, Lourie DJ, Stain SC. Doppler ultrasound accurately screens patients with appendicitis. Am Surg. 1999 Nov;65(11):1015-7. [PubMed]
- Kessler N, Cyteval C, Gallix B, Lesnik A, Blayac PM, Pujol J, Bruel JM, Taourel P. Appendicitis: evaluation of sensitivity, specificity, and predictive values of US, Doppler US, and laboratory findings. Radiology. 2004 Feb;230(2):472-8. Epub 2003 Dec 19. [CrossRef] [PubMed]
- Incesu L, Yazicioglu AK, Selcuk MB, Ozen N. Contrast-enhanced power Doppler US in the diagnosis of acute appendicitis. Eur J Radiol. 2004 May;50(2):201-9. [CrossRef] [PubMed]
- Baldisserotto M, Peletti AB. Is colour Doppler sonography a good method to differentiate normal and abnormal appendices in children? Clin Radiol. 2007 Apr;62(4):365-9. Epub 2007 Jan 30. [CrossRef] [PubMed]
- Gaitini D, Beck-Razi N, Mor-Yosef D, Fischer D, Ben Itzhak O, Krausz MM, Engel A. Diagnosing acute appendicitis in adults: accuracy of color Doppler sonography and MDCT compared with surgery and clinical follow-up. AJR Am J Roentgenol. 2008 May;190(5):1300-6. [CrossRef] [PubMed]
- Xu Y, Jeffrey RB, Shin LK, DiMaio MA, Olcott EW. Color Doppler Imaging of the Appendix: Criteria to Improve Specificity for Appendicitis in the Borderline-Size Appendix. J Ultrasound Med. 2016 Oct;35(10):2129-38. Epub 2016 Aug 25. [CrossRef] [PubMed]
- Daga, Soniya; Kachewar, Sushil; Lakhkar, Dilip L; Jethlia, Kalyani; Itai, Abhijeet. Sonographic evaluation of acute appendicitis and its complications. West African Journal of Radiology 24(2):p 152-156, Jul–Dec 2017. [CrossRef]
- Xu Y, Jeffrey RB, Chang ST, DiMaio MA, Olcott EW. Sonographic Differentiation of Complicated From Uncomplicated Appendicitis: Implications for Antibiotics-First Therapy. J Ultrasound Med. 2017 Feb;36(2):269-277. Epub 2016 Dec 31. [CrossRef] [PubMed]
- Uzunosmanoğlu H, Çevik Y, Çorbacıoğlu ŞK, Akıncı E, Buluş H, Ağladıoğlu K. Diagnostic value of appendicular Doppler ultrasonography in acute appendicitis. Ulus Travma Acil Cerrahi Derg. 2017 May;23(3):188-192. [CrossRef] [PubMed]
- Shin LK, Jeffrey RB, Berry GJ, Olcott EW. Spectral Doppler Waveforms for Diagnosis of Appendicitis: Potential Utility of Point Peak Systolic Velocity and Resistive Index Values. Radiology. 2017 Dec;285(3):990-998. Epub 2017 Jun 5. [CrossRef] [PubMed]
- Aydin S, Tek C, Ergun E, Kazci O, Kosar PN. Acute Appendicitis or Lymphoid Hyperplasia: How to Distinguish More Safely? Can Assoc Radiol J. 2019 Nov;70(4):354-360. Epub 2019 Sep 6. [CrossRef] [PubMed]
- Bakhshandeh T, Maleknejad A, Sargolzaie N, Mashhadi A, Zadehmir M. The utility of spectral Doppler evaluation of acute appendicitis. Emerg Radiol. 2022 Apr;29(2):371-375. Epub 2022 Jan 11. [CrossRef] [PubMed]
- El-Aleem RA, Abd Allah AA, Shehata MR, Seifeldein GS, Hassanein SM. Diagnostic performance of spectral Doppler in acute appendicitis with an equivocal Alvarado score. Emerg Radiol. 2024 Apr;31(2):141-149. Epub 2024 Jan 24. [CrossRef] [PubMed]
- Saini S, Mittal MK, Kanaujia R et al (2024) Exploring the role of spectral Doppler in acute appendicitis. Egypt J Radiol Nucl Med 55:218.
- Anuj, G., S., R.R., Ashok, Y. et al. Diagnostic Utility of Spectral Doppler Ultrasound in Acute Appendicitis: a Prospective Study. Indian J Surg (2025). [CrossRef]
- McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, and the PRISMA-DTA Group. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA. 2018;319(4):388–396. [CrossRef]
- Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM; QUADAS-2 Group. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011 Oct 18;155(8):529-36. [CrossRef] [PubMed]
- Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014 Dec 19;14:135. [CrossRef] [PubMed]
- Hozo, D., Djulbegovic, B., & Hozo, I. (2005). Estimating the mean and variance from the median, range, and the size of a sample. BMC Medical Research Methodology, 5(1), 13. [CrossRef]
- Šimundić AM. Measures of Diagnostic Accuracy: Basic Definitions. EJIFCC. 2009 Jan 20;19(4):203-11. PMCID: PMC4975285. [PubMed]
- Nyaga, V.N., Arbyn, M. Metadta: a Stata command for meta-analysis and meta-regression of diagnostic test accuracy data – a tutorial. Arch Public Health 80, 95 (2022). [CrossRef]
- Roger, M. Harbord & Penny Whiting, 2009. "metandi: Meta-analysis of diagnostic accuracy using hierarchical logistic regression," Stata Journal, StataCorp LP, vol. 9(2), pages 211-229, June.
- Dwamena, BA. MIDAS: Stata module for meta-analytical integration of diagnostic test accuracy studies. Statistical Software Components S456880, Boston College Department of Economics, revised 13 Dec 2009.
- Doebler P, Holling H, Rojas-Garcia A, Hillebrand T (2023). mada: Meta-Analysis of Diagnostic Accuracy. R package version 0.5.12. Available from: https://CRAN.R-project.org/package=mada.
- Shi L, Lin L. The trim-and-fill method for publication bias: practical guidelines and recommendations based on a large database of meta-analyses. Medicine (Baltimore). 2019 Jun;98(23):e15987. [CrossRef] [PubMed]
- Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005 Sep;58(9):882-93. [CrossRef] [PubMed]
- Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003 Nov;56(11):1129-35. [CrossRef] [PubMed]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).