Preprint
Article

This version is not peer-reviewed.

DECIDE-Lab: A Value-of-Information and POMDP Framework for Diagnostic Laboratory Test Selection

Submitted:

10 June 2026

Posted:

12 June 2026

You are already at the latest version

Abstract
Diagnostic laboratory tests are often judged by analytic validity, clinical validity, sensitivity, specificity, predictive values, and likelihood ratios. These quantities describe test per­formance, but they do not determine whether a result improves care. A laboratory test has clinical value when it changes a decision in a way that improves expected outcomes after accounting for uncertainty, downstream management, time, cost, patient burden, and treatment harm. This ar­ticle presents DECIDE-Lab (Decision-centered Evaluation of Clinical Information and Dynamic Evidence for Laboratory Testing), a decision-theoretic framework for diagnostic laboratory test selection. DECIDE-Lab combines value-of-information analysis with partially observable Markov decision processes to evaluate single tests, reflex testing, serial monitoring, and stopping rules. The framework links sensitivity and specificity to posterior belief updating, action thresholds, expected utility, utility elicitation, clinical utility frontiers, diagnostic dominance, implementation pathways, and subgroup-specific value. An illustrative acute coronary syndrome application shows that the value of serial troponin testing concentrates in intermediate-risk patients for whom an additional result can change disposition or treatment. A worked sepsis pathway demonstrates the POMDP mechanics step by step, including an initial belief state, Test 1, posterior updating, conditional value for Test 2, and the resulting stop-or-continue decision. Five disease-state applications–acute coronary syndrome, sepsis, prostate cancer, autoimmune disease, and tuberculosis–demonstrate how the framework can support diagnostic stewardship, adaptive ordering, and value-based labo­ratory evaluation. The central implication is that diagnostic value should be measured by action­changing utility rather than accuracy alone, while explicitly examining how conclusions change when utilities, implementation constraints, and equity goals vary.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Diagnostic testing is a core element of medical care because it converts uncertainty into clinical action. A laboratory result may support treatment, discharge, admission, referral, isolation, surveillance, or no further workup. The same result may also create harm if it triggers unnecessary treatment, avoidable procedures, diagnostic delay, anxiety, or cascades of low-value follow-up testing. For this reason, diagnostic evaluation should not end with the question of whether a test is accurate. The more important clinical question is whether the test changes a decision that matters to the patient.
The dominant language of diagnostic evaluation emphasizes analytic validity, clinical validity, sensitivity, specificity, likelihood ratios, and predictive values. These measures remain indispensable. They tell clinicians how a test behaves under specified conditions and how a result changes disease probability [2,15,16,17]. Yet they do not answer the ordering question directly. A highly sensitive and specific test may add little value if the patient already lies below a discharge threshold or above a treatment threshold. A faster or less expensive test with lower standalone accuracy may add substantial value if it moves the patient across the threshold at which management changes. Diagnostic value is therefore contextual, not intrinsic.
This distinction has practical consequences for laboratory medicine. Many laboratory stewardship programs target duplicate orders, low-yield panels, or test utilization outliers. These efforts can reduce waste, but they often lack a general theory of when a result is worth obtaining. A decision-centered approach begins with the downstream action. It asks: What decision is pending? What is the clinician’s current belief about disease? How would each possible result update that belief? Would the updated belief cross a management threshold? Do the expected benefits of that action change exceed the cost, burden, and delay of the test?
Medical decision analysis provides the tools to answer these questions. Treatment-threshold models show when acting, testing, or withholding treatment is rational under uncertainty [1,3,4,10]. Value-of-information (VOI) theory quantifies the expected improvement in decision quality gained by collecting information before acting [6,7,8,9]. Decision curve analysis and cost-effectiveness methods further emphasize that prediction and diagnosis should be evaluated by their consequences for decisions and outcomes [5,18]. These traditions are highly relevant to laboratory testing, but they are rarely presented as a unified framework for test selection, reflex testing, and serial monitoring.
Laboratory diagnosis also unfolds over time. Providers often order an initial test, observe an imperfect result, decide whether to treat or wait, then order a confirmatory or repeat test. Acute coronary syndrome evaluation, sepsis management, autoimmune workups, prostate cancer evaluation, and tuberculosis diagnosis all follow this sequential pattern. Partially observable Markov decision processes (POMDPs) are well suited to these problems because the true disease state is hidden, observations are imperfect, actions affect future states, and the value of testing depends on accumulated evidence [11,12,13,14].
This article develops DECIDE-Lab (Decision-centered Evaluation of Clinical Information and Dynamic Evidence for Laboratory Testing), a framework that combines VOI and POMDP methods for diagnostic laboratory test selection. The framework is designed for clinicians, laboratory leaders, diagnostic developers, health-system decision makers, informaticians, and decision scientists. It does not replace clinical judgment. It makes the structure of diagnostic judgment explicit.
The article makes five contributions. First, it connects sensitivity and specificity to clinical utility through posterior belief updating and action thresholds. Second, it distinguishes test accuracy, single-test value, conditional test value, and sequential test value. Third, it introduces a clinical utility frontier that identifies dominated tests and clarifies why the most accurate test is not always the most valuable test. Fourth, it extends diagnostic value to preference-sensitive and equity-aware evaluation by allowing priors, utilities, burdens, follow-up access, and test performance to vary across patients and subgroups. Fifth, it addresses three practical barriers to adoption: eliciting utilities when harms and benefits are contested, embedding the model into electronic health record clinical decision support, and using subgroup-specific thresholds to avoid equity-neutral rules that reproduce unequal outcomes. Sixth, it translates the theory into five clinical applications that show how the framework can guide test selection in common diagnostic pathways.

2. Materials and Methods: The DECIDE-Lab Framework

DECIDE-Lab treats laboratory ordering as a decision problem rather than a test-performance problem. The framework has five steps (Figure 1). The provider first defines the pending clinical action. The provider then estimates the current belief state, maps available tests to observation probabilities, compares expected utility with and without testing, and either orders a test, acts, waits, or stops testing.

2.1. Disease States, Observations, and Actions

Let D { 0 , 1 } denote the latent disease state, where D = 1 means disease is present and D = 0 means disease is absent. Let p = P ( D = 1 ) denote the provider’s pretest probability. A binary laboratory test T produces observation o { + , } . Sensitivity and specificity are
S e = P ( T = + D = 1 ) , S p = P ( T = D = 0 ) .
Bayes’ rule gives the posterior probability after a positive result:
P ( D = 1 T = + ) = S e p S e p + ( 1 S p ) ( 1 p ) ,
and after a negative result:
P ( D = 1 T = ) = ( 1 S e ) p ( 1 S e ) p + S p ( 1 p ) .
Let a A denote a feasible clinical action. In the simplest case, A = { T r e a t , N o T r e a t } . In practice, A may include discharge, observe, admit, isolate, refer, order an imaging test, repeat a laboratory test, start treatment, stop treatment, or monitor. Let U ( a , d ) denote the utility of action a when the true disease state is d. Utility may represent net health benefit, net monetary benefit, quality-adjusted survival, avoided harm, a patient-centered value score, or another decision-relevant measure [3,18].

2.2. Expected Value of Diagnostic Information

Without additional testing, the provider selects the action with the highest expected utility:
E U 0 ( p ) = max a A { p U ( a , 1 ) + ( 1 p ) U ( a , 0 ) } .
With testing, the provider observes the result before acting:
E U T ( p ) = c T + o { + , } P ( o p , T ) max a A E [ U ( a , D ) o ] ,
where c T includes monetary cost, turnaround time, phlebotomy burden, follow-up burden, opportunity cost, and operational cost. The expected value of diagnostic information is
E V D I ( T ; p ) = E U T ( p ) E U 0 ( p ) .
This expression shows why accuracy alone is insufficient. Test performance affects posterior probabilities, but utility depends on whether those posterior probabilities change the chosen action.

2.3. Treatment Thresholds and Action-Changing Information

For a treat/no-treat decision, define
U T P = U ( T r e a t , 1 ) , U F P = U ( T r e a t , 0 ) , U F N = U ( N o T r e a t , 1 ) , U T N = U ( N o T r e a t , 0 ) .
Treatment is preferred when
p U T P + ( 1 p ) U F P p U F N + ( 1 p ) U T N .
Solving gives the treatment threshold
p * = U T N U F P ( U T P U F N ) + ( U T N U F P ) ,
provided the denominator is positive. This threshold expresses the clinical tradeoff between false-positive and false-negative decisions [1,4].
Proposition 1.(Action-changing condition).A diagnostic test has positive gross value only if at least one possible result changes the action selected under the prior belief.
Proof. 
Suppose that the same action a * maximizes expected utility before testing and after every possible test result. Then the expected utility after observing the test, before subtracting test cost, equals the expected utility of taking a * under the prior by the law of total expectation. Thus the gross value of testing is zero. If testing has positive cost or burden, net value is negative. A positive gross value therefore requires at least one result that changes the optimal action. □
The clinical interpretation is direct. A test does not create value simply because it moves a probability. It creates value when that movement changes what the provider should do.

2.4. Sensitivity, Specificity, and the Region of Value

Sensitivity and specificity influence value through their effect on posterior beliefs. However, their marginal value is threshold-dependent. A gain in sensitivity has greatest value when false negatives are harmful and a negative or positive result can change treatment. A gain in specificity has greatest value when false positives are harmful and a result can prevent unnecessary treatment, referral, isolation, or additional testing.
Proposition 2.(Threshold-dependent marginal value).Improving sensitivity or specificity increases expected utility only through result patterns that can change the optimal action or reduce the expected harm of the action selected.
Proof. 
Expected utility with testing is a weighted average of posterior optimal utilities across possible observations. Changing sensitivity or specificity changes both observation probabilities and posterior probabilities. If no posterior belief crosses a decision boundary and the same action remains optimal for every observation, the maximized action term remains unchanged except for probability-weighting of the same decision. The expected gain is therefore zero before cost. Positive marginal value arises when the performance change increases the probability of a result that supports a better action or reduces the probability of a result that supports a harmful action. □
This proposition explains why the same test can be high value in one patient and low value in another. The key determinant is the patient’s location relative to clinical decision thresholds.

3. Handling Utility Elicitation in Clinical Practice

DECIDE-Lab depends on utilities because laboratory tests create value only through the actions that follow them. This requirement is a strength of the framework, but it also creates a practical challenge. Clinicians, patients, laboratories, payers, and health systems may disagree about the relative harm of missed disease, unnecessary treatment, delayed diagnosis, false-positive cascades, procedural complications, cost, and anxiety. A model that hides these judgments appears more objective than it is. DECIDE-Lab instead makes the judgments visible.
Utility values can be handled in several practical ways. First, investigators can use a threshold analysis rather than a single best estimate. The analyst varies the harm of a false negative, the harm of a false positive, the cost of testing, and the burden of follow-up across plausible ranges and identifies where the preferred action changes [1,3,10]. Second, stakeholders can elicit utilities from multiple perspectives. A patient-centered analysis may weight anxiety, procedural burden, and time differently from a hospital stewardship analysis. Third, the model can use net monetary benefit or net health benefit when cost-effectiveness assumptions are available [18]. Fourth, the analyst can report robust decision regions: areas in which the preferred test remains unchanged across a broad range of utilities.
A sepsis biomarker example illustrates the point. In early suspected sepsis, the false-negative consequence may be severe because delayed antibiotics can worsen outcomes. When the utility loss assigned to a missed sepsis case increases, the value of a high-sensitivity test rises relative to a more balanced test, even if the high-sensitivity test produces more false positives. Figure 2 shows this pattern using stylized parameters. The figure is not intended to set a clinical policy. It demonstrates how DECIDE-Lab can reveal whether a recommendation depends on a fragile utility assumption or remains stable across plausible values.
This approach changes how the framework should be interpreted. DECIDE-Lab does not require one universally correct utility vector. It requires transparent reporting of the utility ranges under which a testing strategy is preferred. That transparency is especially important when diagnostic recommendations affect different stakeholders differently.

4. Sequential Testing and POMDP Model

Many laboratory pathways involve more than one test. Let b t = P ( D = 1 H t ) denote the belief state after history H t , where the history includes symptoms, risk factors, previous laboratory results, imaging results, treatment response, and elapsed time. If the provider chooses testing action a t and observes o t + 1 , the belief update is
b t + 1 = P ( o t + 1 D = 1 , a t ) b t P ( o t + 1 b t , a t ) .
A diagnostic pathway can be represented as a POMDP
M = ( S , A , O , T , Z , R , γ ) ,
where S is the latent disease-state space, A is the action space, O is the observation space, T ( s s , a ) is the disease transition model, Z ( o s , a ) is the observation model, R ( s , a ) is the reward function, and γ is the discount factor. For a laboratory test, sensitivity and specificity parameterize Z ( o s , a ) .
The optimal value function over belief states satisfies
V * ( b ) = max a A R ( b , a ) + γ o P ( o b , a ) V * ( τ ( b , a , o ) ) ,
where τ ( b , a , o ) is the Bayesian belief update after action a and observation o.
Proposition 3.(Optimal stopping).Additional testing is optimal only when its expected marginal value exceeds its total cost, including cost, burden, delay, and downstream consequences.
Proof. 
At a belief state b t , the provider can stop and choose the best non-testing action or continue testing. Bellman optimality selects the action with the highest expected value. Continuing is preferred only if the expected future value after the next observation, minus the total cost of obtaining that observation, exceeds the value of stopping. Otherwise, stopping dominates. □
This result gives a formal basis for diagnostic stewardship. Stopping is not a failure to gather information. Stopping is optimal when further information is unlikely to change action enough to justify its cost.

4.1. Worked Sequential POMDP Walkthrough: Suspected Sepsis

A concrete example helps clarify how the POMDP component operates in practice. Consider an adult patient in the emergency department with suspected infection but without definitive evidence of sepsis at presentation. The pending decision is whether to start broad-spectrum antibiotics immediately, continue short-interval observation, or obtain additional biomarker information. The example is stylized and is intended to illustrate the calculation rather than define a clinical guideline.
Let the latent state be S { N o S e p s i s , S e p s i s } . Let the initial belief be b 0 = P ( S e p s i s ) = 0.25 after history, vital signs, physical examination, and routine laboratory information. The available actions are T r e a t , O b s e r v e , T e s t 1 , and T e s t 2 . T e s t 1 is a rapid screening biomarker with sensitivity S e 1 = 0.85 and specificity S p 1 = 0.70 . T e s t 2 is a more specific follow-up biomarker or confirmatory assessment with sensitivity S e 2 = 0.75 and specificity S p 2 = 0.90 . The model uses a treatment threshold of 0.50 : above this value, immediate treatment has higher expected utility than observation; below this value, observation is preferred unless additional information has positive conditional value.
The first decision is to order T e s t 1 because the initial belief lies in the action-changing region. A positive result updates the belief to
P ( S e p s i s T e s t 1 + ) = 0.85 ( 0.25 ) 0.85 ( 0.25 ) + ( 1 0.70 ) ( 0.75 ) = 0.486 .
This posterior remains close to the treatment threshold. The patient is neither clearly low risk nor clearly above the threshold for immediate action. Therefore, the conditional value of T e s t 2 is evaluated.
If T e s t 2 is ordered after a positive T e s t 1 , the probability of a positive T e s t 2 is
P ( T e s t 2 + b = 0.486 ) = 0.75 ( 0.486 ) + ( 1 0.90 ) ( 1 0.486 ) = 0.416 .
A positive T e s t 2 increases the posterior to
P ( S e p s i s T e s t 1 + , T e s t 2 + ) = 0.75 ( 0.486 ) 0.75 ( 0.486 ) + 0.10 ( 0.514 ) = 0.876 ,
which favors treatment. A negative T e s t 2 lowers the posterior to
P ( S e p s i s T e s t 1 + , T e s t 2 ) = 0.25 ( 0.486 ) 0.25 ( 0.486 ) + 0.90 ( 0.514 ) = 0.208 ,
which favors observation or an alternative diagnostic pathway. Thus, T e s t 2 has conditional value after a positive T e s t 1 because either result changes the preferred action.
A negative T e s t 1 leads to a different conclusion. The posterior becomes
P ( S e p s i s T e s t 1 ) = ( 1 0.85 ) ( 0.25 ) ( 1 0.85 ) ( 0.25 ) + 0.70 ( 0.75 ) = 0.067 .
At this belief state, the patient is well below the treatment threshold. Unless the clinician assigns an extremely high utility loss to a missed case or observes new clinical deterioration, the conditional value of T e s t 2 is low. The POMDP therefore recommends stopping the biomarker sequence and choosing observation, reassessment, or evaluation for alternative diagnoses.
Table 1 summarizes the belief trajectory and stop-or-continue logic. Figure 3 provides the same logic as a pathway diagram. The important lesson is that the POMDP does not recommend serial testing automatically. It recommends the next test only when the current belief state makes the next observation likely to change the clinical action.
This walkthrough demystifies the POMDP formulation. The hidden disease state is not observed directly. Tests generate observations with known or estimated sensitivity and specificity. The belief state evolves after each result. The action set changes because the provider can treat, observe, or continue testing. The reward function enters through the treatment threshold and the cost of additional testing. The optimal policy is therefore a mapping from belief states to actions: test when information is likely to change management, treat when disease probability is sufficiently high, and stop testing when additional information has low marginal value.

5. Clinical Utility Frontier and Diagnostic Dominance

A laboratory test can be dominated even when it has favorable test-performance characteristics. Define the expected utility of test T i at belief state b as E U ( T i ; b ) and its total cost as C ( T i ; b ) . Test T i is weakly dominated by T j if
E U ( T j ; b ) E U ( T i ; b ) , C ( T j ; b ) C ( T i ; b ) ,
with at least one strict inequality. The clinical utility frontier is the set of non-dominated testing strategies across relevant belief states.
Proposition 4.(Diagnostic dominance).A dominated test should not be selected by a rational provider unless it has benefits not captured in the base utility function, such as availability, equity, patient preference, feasibility, or operational resilience.
Proof. 
If one test produces at least as much expected utility at no greater cost, and one inequality is strict, then the dominated test cannot maximize expected net value under the specified utility function. Selection of the dominated test can only be justified by adding omitted benefits or constraints to the model. □
The frontier concept is useful for diagnostic development and laboratory stewardship. It identifies redundant tests, tests that add value only after particular prior results, and rapid tests that dominate slower alternatives when delay carries clinical risk.

6. Preference-Sensitive and Equity-Aware Diagnostic Value

Diagnostic value can vary across patients and populations. Patients may differ in risk tolerance, ability to return for follow-up, out-of-pocket cost, transportation burden, fear of procedures, treatment preferences, and willingness to accept false-positive or false-negative risk. Subgroups may also differ in disease prevalence, disease spectrum, access to confirmatory care, and test performance [15,17]. A uniform testing threshold can therefore appear fair while producing unequal value.
Let g index a patient subgroup or preference profile. Subgroup-specific diagnostic value is
E V D I g ( T ; b ) = E U g ( T ; b ) E U g ( N o T e s t ; b ) .
This expression allows priors, utilities, observation probabilities, follow-up access, costs, and burdens to vary by subgroup. A subgroup-specific threshold is not automatically equitable. It becomes equity-relevant when it is used to examine whether a uniform policy systematically underdiagnoses one group, overdiagnoses another group, or imposes downstream burdens unequally.
Prostate cancer screening and diagnostic follow-up illustrate the issue. A uniform PSA threshold for MRI referral or biopsy may treat equal PSA values as equal clinical situations. Yet the decision value of follow-up can differ when baseline risk, probability of clinically significant cancer, MRI availability, biopsy access, complication risk, and loss to follow-up vary across groups [23,24]. If one subgroup has a higher prior probability of clinically significant disease and worse access to timely MRI or biopsy, a uniform threshold may delay definitive evaluation for patients who face both higher disease risk and greater follow-up barriers. Conversely, if another subgroup has lower prior risk and reliable access to follow-up, the same threshold may expose more patients to false-positive cascades without comparable benefit.
DECIDE-Lab can evaluate this problem explicitly. The analyst defines subgroup-specific priors p g , test characteristics S e g and S p g when evidence supports variation, follow-up probability F g , and utility functions U g ( a , d ) that include cancer detection, biopsy harm, overdiagnosis, anxiety, and the burden of additional visits. The framework then compares a uniform rule with subgroup-specific strategies that maximize expected utility subject to an equity constraint. One possible constraint is that the probability of missed clinically significant cancer should not exceed a prespecified level in any subgroup. Another is that expected net benefit should not fall below a minimum acceptable level for disadvantaged groups.
Figure 4 shows a stylized example. The uniform threshold treats the groups identically. The equity-adjusted threshold differs because the model accounts for subgroup-specific prior risk and follow-up constraints. This does not imply that demographic group membership should be used uncritically in clinical algorithms. It shows how a decision model can expose the consequences of ignoring heterogeneity. The ethical question is not whether thresholds differ, but whether the resulting policy improves fair outcomes without reinforcing bias or denying beneficial care.
The equity contribution of DECIDE-Lab is therefore diagnostic rather than prescriptive. It identifies where equal rules produce unequal expected consequences. It also separates empirical questions from normative questions. Empirical questions concern prevalence, performance, follow-up, and outcomes. Normative questions concern acceptable tradeoffs among utility maximization, equal opportunity for diagnosis, avoidance of harm, and allocation of scarce resources. Publication of both sets of assumptions would make diagnostic stewardship more transparent.

7. Results: Illustrative Quantitative Application

The illustrative application considers suspected acute coronary syndrome and serial troponin testing. The example is stylized and intended to demonstrate the decision logic rather than estimate a definitive clinical policy. Acute coronary syndrome is useful because the clinical decision is time-sensitive, the disease state is partially observed, and repeat testing is common [19,20,21].
The provider begins with a pretest probability of myocardial infarction. The available actions are discharge, observe with repeat troponin, admit, or treat. The harms include missed myocardial infarction, unnecessary admission, procedure burden, testing cost, and delay. The model compares no further testing, single troponin-guided action, and sequential testing. Figure 5 shows how diagnostic value varies with the prior probability of disease.
The main result is qualitative but important. Testing has low value when the pretest probability is very low because discharge or no acute treatment is already preferred. Testing also has low value when the pretest probability is very high because treatment or admission is already preferred. Testing has the greatest value in the intermediate range, where either a positive or negative result can change disposition or treatment. This is the action-changing region predicted by Proposition 1.
Figure 6 illustrates how value changes jointly with prevalence and test performance. Improvements in sensitivity and specificity matter most when they alter action around a clinically relevant threshold. This pattern supports the claim that test performance should be interpreted through decision value rather than accuracy alone.

8. Clinical Applications: How the Framework Can Be Used

The five applications below show how DECIDE-Lab can guide real diagnostic pathways. Each application follows the same structure: identify the pending clinical decision, define the belief state, map laboratory observations to posterior beliefs, compare actions by expected utility, and stop testing when the next result is unlikely to change management.

8.1. Acute Coronary Syndrome and Serial Troponin Testing

In suspected acute coronary syndrome, the immediate decision is not simply whether myocardial infarction is present. The provider must decide whether to discharge, observe, repeat troponin, admit, consult cardiology, start therapy, or pursue additional imaging. DECIDE-Lab begins by estimating the current belief state from symptoms, electrocardiographic findings, risk factors, time from symptom onset, and initial troponin.
A single troponin result has value when it moves the posterior probability across a discharge, observation, admission, or treatment threshold. A repeat troponin has value only if the patient remains in an intermediate belief state after the first result. Thus, the framework can compare 0/1-hour, 0/2-hour, and 0/3-hour strategies by asking which patients remain in the action-changing region after each observation [19,20,21].
The framework can also clarify why faster tests may be valuable even when their sensitivity and specificity are similar to slower alternatives. If a rapid result safely shortens observation time, reduces crowding, and avoids unnecessary admission, speed enters the utility function. In that setting, the rapid test may lie on the clinical utility frontier even if another strategy has slightly better analytic performance.

8.2. Sepsis, Procalcitonin, Lactate, and Antibiotic Decisions

Sepsis illustrates dynamic diagnostic value because the latent state can worsen quickly and the harms of delay are asymmetric. The provider must decide whether to start antibiotics, obtain cultures, order lactate or procalcitonin, give fluids, admit to intensive care, narrow therapy, or stop therapy. The belief state includes vital signs, suspected source, immune status, organ dysfunction, prior antibiotics, local epidemiology, and prior laboratory results.
DECIDE-Lab separates two uses of biomarkers. Early in evaluation, lactate or procalcitonin may help decide whether to escalate care or start antibiotics. Later, repeated procalcitonin may help decide whether to discontinue antibiotics and reduce avoidable exposure [22]. These decisions have different thresholds. The same test can therefore have low value for initiating antibiotics in a high-risk patient but high value for stopping antibiotics after clinical stabilization.
A POMDP representation is particularly useful because sepsis is a transition process. States may include no bacterial infection, localized infection, early sepsis, septic shock, and recovery. Actions affect future outcomes, and observations arrive over time. The model can evaluate when new biomarker information improves decisions enough to justify delay, cost, and downstream consequences.

8.3. Prostate Cancer, PSA, MRI, and Biopsy Selection

Prostate cancer evaluation involves repeated decisions under uncertainty. The provider must decide whether to repeat PSA, order reflex biomarkers, obtain MRI, refer to urology, perform biopsy, continue surveillance, or treat. The relevant disease state is often not any prostate cancer, but clinically significant prostate cancer. The belief state may include age, family history, race, PSA level, PSA velocity, digital rectal examination, prior biopsy, and patient preferences [23,24].
The framework identifies where each test is most useful. PSA has screening value but limited specificity. MRI may have high decision value in intermediate-risk patients because it can reduce unnecessary biopsy while preserving detection of clinically significant disease. Biopsy may dominate when the posterior probability already exceeds the biopsy threshold. Conversely, additional imaging may have little value in very low-risk patients because surveillance remains preferred regardless of result.
The POMDP extension supports longitudinal surveillance. Repeated PSA values, MRI findings, biopsy results, and clinical progression update the belief state. Utilities should include cancer detection, complications of biopsy, overdiagnosis, anxiety, downstream treatment harms, and patient preferences. This application shows why targeted diagnostic sequencing can be more valuable than uniform testing.

8.4. Autoimmune Disease, ANA, ENA, anti-dsDNA, and Complement Testing

Autoimmune disease evaluation often creates cascades of laboratory testing. A positive antinuclear antibody test can trigger extractable nuclear antigen panels, anti-dsDNA testing, complement levels, inflammatory markers, urinalysis, and referral. DECIDE-Lab asks whether each step changes referral, monitoring, or treatment decisions enough to justify its cost and downstream effects.
The prior belief should reflect clinical features, organ involvement, medication exposure, family history, baseline laboratory abnormalities, and symptom evolution. ANA testing may have value in patients with compatible symptoms and intermediate pretest probability. It may have low value in patients with nonspecific symptoms and very low pretest probability because false-positive follow-up may dominate benefit. Specific antibody panels may have high conditional value after a positive ANA in a clinically compatible patient, but low value after an isolated low-titer ANA in an otherwise low-risk patient [25].
A sequential model helps prevent indiscriminate reflex testing. The next test should be selected only if it can change the decision to refer, monitor renal involvement, initiate therapy, or stop diagnostic evaluation. The framework therefore supports more transparent autoimmune test stewardship.

8.5. Tuberculosis: TST, IGRA, Nucleic Acid Testing, and Culture

Tuberculosis diagnosis differs by clinical setting, prevalence, exposure history, immune status, and public health consequences. The provider may need to decide whether to treat latent infection, isolate a patient, start empiric therapy, order nucleic acid amplification testing, send culture, or begin contact investigation. The belief state includes symptoms, exposure risk, country of origin, immune suppression, radiographic findings, and local prevalence [26,27].
DECIDE-Lab clarifies why test value changes across settings. In a low-prevalence outpatient setting, specificity and avoidance of unnecessary treatment may dominate. In a high-risk inpatient setting with possible active tuberculosis, sensitivity, speed, and transmission prevention may dominate. A rapid molecular test may create high value by shortening time to isolation or treatment even when culture remains necessary for confirmation and susceptibility testing.
A POMDP can represent latent infection, active drug-susceptible disease, active drug-resistant disease, treated disease, and no infection. Rewards include patient outcomes, transmission prevention, drug toxicity, isolation burden, false-positive treatment, and public health cost. This application shows that diagnostic value can include both individual and population-level consequences.

8.6. Cross-case Workflow

Across all five disease states, DECIDE-Lab provides a practical workflow. First, name the decision that might change. Second, estimate the current belief state. Third, define the relevant action thresholds. Fourth, specify the observation model using sensitivity, specificity, timing, and feasibility. Fifth, compare expected utility with and without the test. Sixth, if uncertainty remains, evaluate the conditional value of the next test. Seventh, stop testing when additional information is unlikely to change action enough to justify its cost, burden, or delay.

9. Discussion

DECIDE-Lab reframes diagnostic laboratory testing as decision-centered information acquisition. This reframing is the main contribution of the article. It explains why laboratory tests should not be evaluated only by sensitivity, specificity, predictive values, or likelihood ratios. Those quantities are essential inputs, but the clinical endpoint is improved action.
The framework also helps interpret apparent conflicts between accuracy and value. A test with excellent discrimination may have low value if the provider’s action would not change. A test with modest discrimination may have high value if it moves an intermediate-risk patient across a decision threshold. A rapid test may dominate a slower test when delay carries clinical harm. A confirmatory test may have high value only after a particular first result. These distinctions are difficult to express with accuracy metrics alone.
For laboratories, DECIDE-Lab provides a formal basis for reflex testing and stewardship rules. A reflex test should be ordered when conditional value remains positive after the preceding result. It should be suppressed when the previous result already places the patient below or above an action threshold. For clinicians, the framework provides a language for deciding when to stop testing. For health systems and payers, it distinguishes tests that improve expected utility from tests that add cost without changing management.
The clinical utility frontier adds a useful comparison tool. A test may be dominated if another strategy delivers equal or greater expected utility at equal or lower cost. This frontier can be used to compare individual tests, panels, reflex algorithms, and serial monitoring strategies. It can also support diagnostic development by identifying settings where a new assay would need to improve speed, sensitivity, specificity, cost, or downstream action to create incremental value.
The preference-sensitive and equity-aware extension broadens the framework. Diagnostic value depends not only on biology and accuracy but also on patient preferences, follow-up access, cost-sharing, feasibility, and subgroup-specific performance. A single testing policy may not create equal value for all patients. Formal subgroup-specific VOI can help reveal when policies improve equity and when they reproduce inequity. The prostate cancer example illustrates why this point matters: a uniform PSA-related threshold can look consistent while producing different downstream access, harm, and missed-disease consequences across groups.

9.1. Path to Implementation in Clinical Decision Support

The practical path to implementation begins with narrow use cases rather than a comprehensive diagnostic optimizer. A health system could first deploy DECIDE-Lab for a single pathway with frequent testing, clear actions, and measurable outcomes, such as serial troponin testing, procalcitonin-guided antibiotic discontinuation, or autoimmune reflex testing. The model would sit behind an electronic health record clinical decision support rule. At the time of ordering, the system would retrieve structured predictors, recent laboratory results, timing information, and the pending order. It would then estimate the current belief state, calculate whether the proposed test is likely to change management, and return a concise recommendation.
The recommendation should be designed for clinical workflow. A useful display would not show the full POMDP. It would show the current risk category, the likely action-changing range, the expected value of the next test, and the reason for the recommendation. For example, the system might state that repeat testing is low value because the prior result already places the patient below the discharge threshold, or that repeat testing remains useful because the patient is still in the intermediate-risk region. This explanation is important for clinician trust and for avoiding alert fatigue.
Computational burden depends on the size of the state and action spaces. Small binary or few-state models can be evaluated in real time with precomputed threshold tables. Many laboratory stewardship applications do not require online solution of a large POMDP. A practical implementation can pre-solve the model offline across a grid of priors, test results, time intervals, and utility values. The EHR tool then performs a table lookup or a small Bayesian update at the bedside. Larger models can use approximate dynamic programming, finite-horizon truncation, sparse action spaces, or policy rules learned from simulation [12,13,14].
Implementation also requires governance; laboratory leaders and clinicians should review local prevalence estimates, test-performance inputs, and recommended thresholds. After deployment, the system should be monitored for ordering behavior, missed diagnoses, false-positive cascades, length of stay, time to treatment, follow-up completion, subgroup performance, and override rates. DECIDE-Lab is therefore not only a mathematical model. It is a structure for accountable diagnostic decision support.
This study has limitations. The quantitative application is illustrative and uses stylized assumptions rather than patient-level calibration. Utility estimates can be difficult to elicit, and stakeholders may disagree about how to value outcomes, burdens, and costs; for that reason, DECIDE-Lab should report threshold and sensitivity analyses rather than rely only on a single preferred utility vector. Test sensitivity and specificity may vary by disease spectrum, specimen timing, platform, preanalytic conditions, and patient subgroup. POMDPs can become computationally demanding when state and action spaces grow. These limitations identify the next empirical steps rather than undermine the framework.
Future work should calibrate DECIDE-Lab using electronic health records, laboratory information systems, claims data, prospective diagnostic studies, and pragmatic trials. Priority settings include serial troponin protocols, sepsis biomarker stewardship, autoimmune reflex testing, tuberculosis pathways, and prostate cancer risk stratification. Future work should also evaluate whether VOI-guided clinical decision support improves outcomes, reduces low-value testing, and avoids alert fatigue.

10. Conclusions

Laboratory tests create value when they improve decisions. DECIDE-Lab provides a rigorous and clinically interpretable framework for evaluating that value. By combining VOI with POMDPs, the framework links test characteristics to posterior beliefs, action thresholds, sequential testing, clinical utility frontiers, and equity-aware value. The approach supports a shift from asking whether a test is accurate to asking whether the test changes care in a way that benefits patients.

Author Contributions

Conceptualization, C.A.; methodology, C.A.; formal analysis, C.A.; writing–original draft preparation, C.A.; writing–review and editing, C.A. The author has read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This manuscript presents a methodological framework and illustrative simulation and does not analyze human subjects data.

Data Availability Statement

All data used in the illustrative analysis are generated from the parameter values reported in the supplemental code. No patient-level data were used.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Pauker, S.G.; Kassirer, J.P. The threshold approach to clinical decision making. N. Engl. J. Med. 1980, 302, 1109–1117. [Google Scholar] [PubMed]
  2. Jaeschke, R.; Guyatt, G.H.; Sackett, D.L. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA 1994, 271, 703–707. [Google Scholar] [CrossRef] [PubMed]
  3. Hunink, M.G.M.; Weinstein, M.C.; Wittenberg, E.; Drummond, M.F.; Pliskin, J.S.; Wong, J.B.; Glasziou, P.P. Decision Making in Health and Medicine: Integrating Evidence and Values; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  4. Sox, H.C.; Higgins, M.C.; Owens, D.K. Medical Decision Making, 2nd ed.; Wiley-Blackwell: Hoboken, NJ, USA, 2013. [Google Scholar]
  5. Vickers, A.J.; Elkin, E.B. Decision curve analysis: A novel method for evaluating prediction models. Med. Decis. Mak. 2006, 26, 565–574. [Google Scholar] [CrossRef] [PubMed]
  6. Howard, R.A. Information value theory. IEEE Trans. Syst. Sci. Cybern. 1966, 2, 22–26. [Google Scholar] [CrossRef]
  7. Raiffa, H.; Schlaifer, R. Applied Statistical Decision Theory; Harvard University Press: Cambridge, MA, USA, 1961. [Google Scholar]
  8. Berger, J.O. Statistical Decision Theory and Bayesian Analysis, 2nd ed.; Springer: New York, NY, USA, 1985. [Google Scholar]
  9. Parmigiani, G.; Inoue, L. Decision Theory: Principles and Approaches; Wiley: Chichester, UK, 2009. [Google Scholar]
  10. Weinstein, M.C.; Fineberg, H.V. Clinical Decision Analysis; W.B. Saunders: Philadelphia, PA, USA, 1980. [Google Scholar]
  11. Smallwood, R.D.; Sondik, E.J. The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 1973, 21, 1071–1088. [Google Scholar] [CrossRef]
  12. Kaelbling, L.P.; Littman, M.L.; Cassandra, A.R. Planning and acting in partially observable stochastic domains. Artif. Intell. 1998, 101, 99–134. [Google Scholar] [CrossRef]
  13. Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; Wiley: New York, NY, USA, 1994. [Google Scholar]
  14. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  15. Bossuyt, P.M.; Reitsma, J.B.; Bruns, D.E.; Gatsonis, C.A.; Glasziou, P.P.; Irwig, L.; Lijmer, J.G.; Moher, D.; Rennie, D.; de Vet, H.C.W.; et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015, 351, h5527. [Google Scholar] [CrossRef] [PubMed]
  16. Grimes, D.A.; Schulz, K.F. Refining clinical diagnosis with likelihood ratios. Lancet 2005, 365, 1500–1505. [Google Scholar] [CrossRef] [PubMed]
  17. Ransohoff, D.F.; Feinstein, A.R. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N. Engl. J. Med. 1978, 299, 926–930. [Google Scholar] [PubMed]
  18. Gold, M.R.; Siegel, J.E.; Russell, L.B.; Weinstein, M.C. (Eds.) Cost-Effectiveness in Health and Medicine; Oxford University Press: New York, NY, USA, 1996. [Google Scholar]
  19. Thygesen, K.; Alpert, J.S.; Jaffe, A.S.; Chaitman, B.R.; Bax, J.J.; Morrow, D.A.; White, H.D. Fourth universal definition of myocardial infarction. Circulation 2018, 138, e618–e651. [Google Scholar] [CrossRef] [PubMed]
  20. Than, M.; Cullen, L.; Reid, C.M.; Lim, S.H.; Aldous, S.; Ardagh, M.W.; Peacock, W.F.; Parsonage, W.A.; Ho, H.F.; Greenslade, J.; et al. A 2-h diagnostic protocol to assess patients with chest pain symptoms in the Asia-Pacific region. JAMA Intern. Med. 2011, 171, 1831–1838. [Google Scholar]
  21. Chew, D.P.; Lambrakis, K.; Blyth, A.; Seshadri, A.; Edmonds, M.J.R.; Briffa, T.; Cullen, L.; Quinn, S.; Karnon, J.; Chuang, A.; et al. A randomized trial of a 1-hour troponin T protocol in suspected acute coronary syndromes. Circulation 2019, 140, 1543–1556. [Google Scholar] [CrossRef] [PubMed]
  22. Schuetz, P.; Wirz, Y.; Sager, R.; Christ-Crain, M.; Stolz, D.; Tamm, M.; Bouadma, L.; Luyt, C.E.; Wolff, M.; Chastre, J.; et al. Procalcitonin to initiate or discontinue antibiotics in acute respiratory tract infections. Cochrane Database Syst. Rev. 2017, CD007498. [Google Scholar] [CrossRef] [PubMed]
  23. Moyer, V.A. Screening for prostate cancer: U.S. Preventive Services Task Force recommendation statement. Ann. Intern. Med. 2012, 157, 120–134. [Google Scholar] [CrossRef] [PubMed]
  24. Ahmed, H.U.; El-Shater Bosaily, A.; Brown, L.C.; Gabe, R.; Kaplan, R.; Parmar, M.K.; Collaco-Moraes, Y.; Ward, K.; Hindley, R.G.; Freeman, A.; et al. Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer: The PROMIS study. Lancet 2017, 389, 815–822. [Google Scholar] [CrossRef] [PubMed]
  25. Shiboski, C.H.; Shiboski, S.C.; Seror, R.; Criswell, L.A.; Labetoulle, M.; Lietman, T.M.; Rasmussen, A.; Scofield, H.; Vitali, C.; Bowman, S.J.; et al. 2016 ACR/EULAR classification criteria for primary Sjögren’s syndrome. Ann. Rheum. Dis. 2017, 76, 9–16. [Google Scholar] [CrossRef] [PubMed]
  26. Lewinsohn, D.M.; Leonard, M.K.; LoBue, P.A.; Cohn, D.L.; Daley, C.L.; Desmond, E.; Keane, J.; Lewinsohn, D.A.; Loeffler, A.M.; Mazurek, G.H.; et al. Official ATS/IDSA/CDC clinical practice guidelines: Diagnosis of tuberculosis in adults and children. Clin. Infect. Dis. 2017, 64, e1–e33. [Google Scholar] [CrossRef] [PubMed]
  27. World Health Organization. WHO Consolidated Guidelines on Tuberculosis. Module 3: Diagnosis; World Health Organization: Geneva, Switzerland, 2023. [Google Scholar]
Figure 1. DECIDE-Lab workflow. The framework begins with the clinical action that might change, not with the laboratory test itself. Test performance enters through the observation model, but clinical value depends on posterior belief updating, decision thresholds, and expected utility.
Figure 1. DECIDE-Lab workflow. The framework begins with the clinical action that might change, not with the laboratory test itself. Test performance enters through the observation model, but clinical value depends on posterior belief updating, decision thresholds, and expected utility.
Preprints 218011 g001
Figure 2. Illustrative utility sensitivity analysis for sepsis biomarker testing. As the utility loss assigned to a missed sepsis case increases, the value of a high-sensitivity testing strategy rises. The purpose is to show how DECIDE-Lab can test robustness when utilities are uncertain or contested. Values are stylized and generated from the supplemental code.
Figure 2. Illustrative utility sensitivity analysis for sepsis biomarker testing. As the utility loss assigned to a missed sepsis case increases, the value of a high-sensitivity testing strategy rises. The purpose is to show how DECIDE-Lab can test robustness when utilities are uncertain or contested. Values are stylized and generated from the supplemental code.
Preprints 218011 g002
Figure 3. Worked POMDP walkthrough for suspected sepsis. The pathway begins with an initial belief state of 0.25. A positive first test leaves the patient near the treatment threshold, so the conditional value of a second test is high. A negative first test lowers the belief state enough that the model stops the biomarker sequence. Values are stylized and generated from the supplemental code.
Figure 3. Worked POMDP walkthrough for suspected sepsis. The pathway begins with an initial belief state of 0.25. A positive first test leaves the patient near the treatment threshold, so the conditional value of a second test is high. A negative first test lowers the belief state enough that the model stops the biomarker sequence. Values are stylized and generated from the supplemental code.
Preprints 218011 g003
Figure 4. Illustrative equity-aware prostate cancer example. A uniform PSA/MRI threshold may not produce equal decision value when subgroup prior risk and access to follow-up differ. DECIDE-Lab can compare uniform and subgroup-specific strategies under explicit fairness constraints. Values are stylized and intended to demonstrate the modeling logic.
Figure 4. Illustrative equity-aware prostate cancer example. A uniform PSA/MRI threshold may not produce equal decision value when subgroup prior risk and access to follow-up differ. DECIDE-Lab can compare uniform and subgroup-specific strategies under explicit fairness constraints. Values are stylized and intended to demonstrate the modeling logic.
Preprints 218011 g004
Figure 5. Illustrative relationship between pretest probability and expected value of diagnostic information. Test value concentrates in the intermediate-risk region, where results are most likely to change action. Values are stylized and generated from the supplemental code.
Figure 5. Illustrative relationship between pretest probability and expected value of diagnostic information. Test value concentrates in the intermediate-risk region, where results are most likely to change action. Values are stylized and generated from the supplemental code.
Preprints 218011 g005
Figure 6. Illustrative value surface for diagnostic testing as a function of prior probability and test performance. The surface highlights that high accuracy does not guarantee high value when posterior beliefs do not cross action thresholds. Values are stylized and generated from the supplemental code.
Figure 6. Illustrative value surface for diagnostic testing as a function of prior probability and test performance. The surface highlights that high accuracy does not guarantee high value when posterior beliefs do not cross action thresholds. Values are stylized and generated from the supplemental code.
Preprints 218011 g006
Table 1. Step-by-step POMDP walkthrough for a stylized suspected sepsis pathway.
Table 1. Step-by-step POMDP walkthrough for a stylized suspected sepsis pathway.
Step Belief state Action Result and posterior Decision implication
Initial assessment b 0 = 0.25 Order T e s t 1 Not yet observed Prior is intermediate; information can change action.
After T e s t 1 + b 1 = 0.486 Evaluate T e s t 2 Near treatment threshold Continue testing because T e s t 2 can separate treat from observe.
After T e s t 1 + , T e s t 2 + b 2 = 0.876 Stop testing Posterior high Treat or escalate; more biomarker testing has low marginal value.
After T e s t 1 + , T e s t 2 b 2 = 0.208 Stop testing Posterior lower Observe, reassess, or pursue alternative diagnoses.
After T e s t 1 b 1 = 0.067 Stop biomarker sequence Posterior low Observation or alternative workup preferred unless clinical status changes.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated