Submitted:
07 August 2025
Posted:
08 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- the number of experts that can be involved and the number of values that can be elicited at one time.
- the use of group discussions and the handling of group dynamics
- the calibration and/or the weighting of the responses from the experts
- the handling of cognitive biases
- the methods used to aggregate and summarize individual responses from the group.
2. The Sheffield Elicitation Framework
2.1. Structured Framework
2.2. A Group-Based Approach
2.3. Training Experts in Quantifying Their Uncertainty
2.4. Eliciting Probability Distributions
2.5. Facilitator Role
2.6. Steps in the SHELF Method:
2.6.1. Preparation and Briefing
2.6.2. Training
2.6.3. Eliciting Individual Judgements
2.6.4. Group Discussion
2.6.5. Consensus Building
2.6.6. Fitting a Distribution
2.6.7. Documentation and Transparency
3. IDEA Protocol
3.1. A Group-Based Approach
3.2. Key Steps in the IDEA Method:
3.2.1. Investigate
-
Question:
- 'What will be the average batch nonconformance rate (X) to produce items ‘A’ by manufacturer ‘B’ given that the process is being transferred to a new factory in location C in 2026?'
-
4-step elicitation:
- Realistically, what do you think the lowest plausible value for the nonconformance rate, X, will be?
- Realistically, what do you think the highest plausible value for nonconformance rate, X, will be?
- Realistically, what is your best guess for nonconformance rate, X?
- How confident are you that your interval, from lowest to highest, could capture the true value of the nonconformance rate, X? Please enter a number between 50% and 100%
3.2.2. Discuss
3.2.3. Estimate
3.2.4. Aggregate
3.3. Documentation and Transparency
4. The Classical Method
4.1. Principles of the Classical Method:
4.1.1. Calibration
4.1.2. Information Score
4.1.3. Scoring Rules
4.2. Key Steps in the Classical Method:
4.2.1. Preparation
4.2.2. Elicitation
4.2.3. Aggregation
4.3. Documentation and Transparency
5. Elicitation Methods – A Summary of the Differences
| SHELF Protocol | IDEA Protocol | Classical Method | |
| Elicitation Process | Facilitated group discussion aiming for consensus | Independent estimates before and after discussion | Individual estimates with no group discussion |
| Group size | Small (1 to 6) | Medium (6 to 12) | Large (> 6) |
| Aggregation Method | Behavioural (consensus-building) | Unweighted mathematical (e.g., mean, median) | Performance-weighted based on calibration |
| Role of Facilitator | Active, driving consensus | Minimal, process-focused | Minimal, mathematically focused |
| Focus | Consensus-building | Independent judgements | Individual expert's performance |
| Discussion and Group Dynamics | Managed to reach consensus | Group discussion to share reasoning, but not to force consensus | No group discussion, focuses on individual estimates |
| Bias Mitigation | Facilitator-driven, to reduce anchoring and overconfidence | Process driven to reduce anchoring and overconfidence | Bias handled mathematically via calibration and performance weighting |
| Type of Judgements | Shared probability distributions | Individual probability estimates | Individual probability estimates |
| Application Context | Regulatory decision-making, policy assessment | Risk analysis, forecasting | Safety, engineering, environmental risk |
6. Discussion
6.1. Weighting and Aggregation
6.2. Bayesian Aggregation
Appendix A. An Example of an Elicitation Using the SHELF Methodology
-
Elicitation Training
- Before the elicitation, the facilitator asks the experts to review the available training material. This is to ensure that each expert has a consistent understanding of subjective probability and frequency and has had some practise in eliciting parameter values using a number of different methods, e.g. Monte Carlo and bisection methods. On the day of the elicitation the facilitator may again offer an elicitation training session that includes an example elicitation.
- In our example the analyst reviews the freely available Probabilistic judgements e-learning course (New & O'Hagan, 2018).
-
Review Relevant Evidence:
- Also before the elicitation and to reduce availability bias, the facilitator ensures that all the pertinent evidence about the product, including perhaps data from other similar manufacturing processes or similar products have been collated. To do this the facilitator should approach each expert asking for them to search for and to submit any relevant information that they have access to. All of the gathered information is then shared amongst all the experts before the elicitation.
- As the product in our example is new, with only limited data from pre-production processing this prompt encourages the analyst to investigate a related process but for a different product the manufacturer already produces.
-
Define Credible Bounds:
- On the day of the elicitation and having agreed a suitable method to use, the experts start by specifying their upper (U) and lower (L) credible bounds. Although the data was shared amongst all the experts they will initially be asked to work individually and not share their estimates at this stage with others in the group, this is to reduce possible authority bias. The facilitator should also explain that the bounds are not the absolute theoretical limits but form a credible interval such that there might still be some small probability of it not containing the true value. The facilitator may challenge the experts’ estimates by asking how surprised they would be if in the future the true value is considered to be outside of these limits. The experts should, if necessary, adjust their bounds until this scenario appears surprising and unlikely.
- The credible interval is examined at this stage to ensure all possible values are considered and to reduce the effect of anchoring bias on the subsequent elicitation of the quartiles which can be caused when starting an elicitation with estimating the median.
- In our example, from reviewing the related data, the engineer’s information concerning the increased complexity of this process, and after some challenges from the facilitator, the analyst estimates that in batches of 1000 manufactured items the lower bound (L) could still be as low as 1 but with 0 being considered unlikely. And that the upper bound (U) could be as high as 100 but a value greater than this would to be implausible.
-
Specify the Median:
- The experts estimate their median values, where the true number of nonconforming items in a lot is equally likely to be above or below this value. Although this equal probability judgement appears to be straightforward it is still difficult for experts to do. The facilitator should challenge their estimates by asking each expert to consider which side of their median they would bet contained the true value. The expert should adjust their estimate until they have no preference for either side in such a bet.
- In our example the analyst estimates that the median value (M) is 13 nonconforming items in a lot of 1000 items.
-
Determine the Quartiles:
- The experts specify their upper quartiles (Q3) such that the intervals [M, Q3] and [Q3, U] are equally likely, and the lower quartiles (Q1) such that the intervals [L, Q1] and [Q1, M] are also equally likely. This process should ensure, given effective challenges from the facilitator, that all four intervals [L, Q1], [Q1, M], [M, Q3], and [Q3, U] are considered equally probable, each with a 25% probability.
- In our example the analyst estimates Q1 and Q3 to be 5 and 27 respectively
-
Fit the Distribution:
- The facilitator fits a distribution to the elicited values (L, Q1, M, Q3, U), ensuring small probabilities outside the credible bounds and equal probabilities across the four intervals. The facilitator then presents the fitted distribution to the expert, offering implied values (e.g., the 5th or 95th percentile) for validation. If the expert disagrees, earlier steps are revisited to refine either the expert's judgements or the fitted distribution until an acceptable result is achieved.
- In our example, the elicitation is being carried out to generate a prior distribution for use in deriving a suitable sampling plan for the quality assurance of the manufacturing process. For the convenience of the analyst and given their previous experience the elicited values were the number of nonconforming items in a manufactured lot of 1000 items. However, as this particular proposal is to be based on sampling by attributes the prior information needs to be in the form of proportions rather than counts. Each is therefore divided by the lot size used in the elicitation to give the values (0.001, 0.005, 0.013, 0.027, 0.100) before fitting a distribution using the SHELF software (Oakley & O'Hagan, The Sheffield Elicitation Framework, 2019). The best fit distribution is given by SHELF as a scaled Beta (a = 0.80, b = 3.65, A = 0, B =1).
-
Aggregation
- It is only at this stage that individual estimates should be shared amongst all the experts in the group. A discussion is now encouraged for all to understand any differences and highlight the material referenced in deriving those different estimates. Under the SHELF protocol the individual fitted distributions are not combined mathematically but instead behavioural aggregation is used i.e. having discussed the differences in their estimates the facilitator will ask the group as a whole to agree to new consensus values for the U, Q1, M, Q3 and U values challenging them to include or exclude previous estimates or information depending on outcomes of the discussion phase. The facilitator then presents this fitted distribution to the group, again offering implied values (e.g., the 5th or 95th percentile) for validation. If the group disagrees, earlier steps are revisited to refine either the consensus judgements or the fitted distribution until an acceptable result is achieved.
- In our example as only one expert was involved in the elicitation this step is not required.
-
Sample size calculation
- The sample size for the first sample taken from a continuous sequence of lots can be calculated using the elicited prior information concerning the lot conformity, and the producer’s and consumer’s acceptable risk values.
- Setting the producer’s risk (PR) ≤ 5%, the conforming proportion (pc) = 0.02, a = 0.8 and b =3.65, from the elicitation, the maximum sample size that meets the producer’s risk requirement for an accept zero plan (Ac = 0) is np = 65 for PR = 0.05
- With the consumer’s risk (CR) ≤ 10%, the conforming proportion (pc) = 0.02, a = 0.8 and b =3.65, from the elicitation, the minimum sample size that meets the consumer’s risk requirement for an accept zero plan (Ac =0) is nc = 23 for CR = 0.096.
- The consumer’s plan, Ac =0 and n = 23, also meets the producer’s risk requirement (as PR = 0.022 ≤ 0.05) and the producer’s plan, Ac =0 and n = 65, would also meet the consumer’s risk requirement (CR = 0.017 ≤ 0.1). If they didn’t then calculate np and nc for the producer’s and consumer’s plans, for Ac = 1, 2, 3, …, until np ≥ nc and therefore any n such that np ≤ n ≤ nc will meet both risk requirements. Therefore, take n to be the value at the mid-point between np and nc i.e. n = 44 gives PR and CR = 0.038 and 0.037 respectively.
- Subsequent sample sizes can be calculated by updating the elicited prior with n, the sample size, and y, the number of nonconforming items in the sample, and repeating the above calculations.
References
- Colson, A. R., & Cooke, R. M. (2018). Expert elicitation: Using the classical model to validate experts’ judgments. Review of Environmental Economics and Policy, 12(1), 113-132. [CrossRef]
- Hartley, D., & French, S. (2021). A bayesian method for calibration and aggregation of expert judgement. International Journal of Approximate Reasoning, 130, 192-225. [CrossRef]
- Hemming, V., Burgman, M. A., Hanea, A. M., McBride, M. F., & Wintle, B. C. (2018). A practical guide to structured expert elicitation using the IDEA protocol. Methods in Ecology and Evolution, 9(1), 169–180. [CrossRef]
- New, L., & O'Hagan, T. (2018). Probabilistic judgements e-learning. Retrieved January 21, 2025, from Sheffield Elicitation Framework e-learning course in probabilistic judgements: https://shelf.sites.sheffield.ac.uk/e-learning-course.
- Oakley, J. E. (2010). Eliciting univariate probability distributions. In K. Böcker (Ed.), Rethinking risk measurement and reporting (Vol. 1). London: Risk Books.
- Oakley, J. E., & O'Hagan, A. (2019). The Sheffield Elicitation Framework. Retrieved from https://shelf.sites.sheffield.ac.uk/software.
- O'Hagan, A. (2019). Expert knowledge elicitation: Subjective but scientific. The American Statistician, 73, 69-81. [CrossRef]
- Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124–1131. [CrossRef]
- Williams, C. J., Wilson, K. J., & Wilson, N. (2021). A Comparison of Prior Elicitation Aggregation Using the Classical Method and SHELF. Journal of the Royal Statistical Society(3), 920-940. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).