Submitted:
28 April 2026
Posted:
30 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction

2. PMFs and Measurement
2.1. Probability Mass Functions
-
ProbabilitiesThe probabilities can be either derived from actual frequencies (occupancy per bin) or estimated in terms of prior knowledge or even judgment when scores are set. Figure 2 shows several Poisson PMFs (Section 4.2) as an example. When setting response scores as probabilities P, choices have to be made about how to define the scale-end scores of 0 and 100. Inferentially stable measurement also requires that the score varies monotonically across the scale in a manner allowing summarization which exhausts the data of all available information in a minimally sufficient statistic, [16,17,18]. (These requirements, [19], are subsequently checked (Section 5.5)).
-
Ranges
- -
- intrinsically discrete ranges, (Equation (1)) (such as when counting dots (Section 3.3)).
- -
- discrete ranges chosen for convenience (such as in sensory panel responses) where the observed variable is on a continuous scale, but the large uncertainties make it practical to round off the score, say to the nearest integer.
- -
- ranges can be as short as two, such as for the binomial distribution from binary Bernoulli trials (Section 4.1) as well as long, polytomous ranges spanning several categories (Section 4.3.2).
-
ScalesIn different application areas, the scales associated with both X (abscissa axis) and P (ordinate axis) of a PMF can be:
- -
- either fully quantitative (Section 3.3),
- -
- or more qualitative (Section 5)
ranging, respectively, from ratio and interval scales to the ordinal and nominal scales.
2.2. Accuracy of Classification
-
analytical accuracy.In an ’analytical’ scenario 1, the ’accuracy’ estimation of the ’measurand’, the quantitative number of each set of discrete objects (with the number of dots increasing from 1 to 10 in the present counting case), can be expressed in terms of:
- -
- (i) trueness—estimated as the difference between the perceived and true dot count, .
- -
- (ii) precision—estimated in terms of dispersion, such as dots, (Figure 1).
-
clinical accuracyIn a ’clinical’ scenario 2, distances between different categories of classification on the abscissa of a PMF (such as when measuring the difference before or after an intervention, or when estimating dispersion measures) may not be fully known or even meaningful (such as on nominal scales (Section 4.1)). Appropriate methods for dealing with such scales include log-odds ratio transformations, including GLMM and the Rasch psychometric theory with which ’measurands’ such as counter ability and task difficulty can be identified, as dealt with later in the paper in our clinical counting example (Section 5).
3. Analytical and Clinical Performance Metrics
3.1. PMF and Measurement System Analysis (MSA)
3.2. Analytical and Clinical Performance Criteria
- 1.
- ’Analytical’ performance criteria for determining, e.g., how much (quality characteristic: concentration) of a particular analyte (MSA object) is present in a sampled object (by "variable"), such as, first, analytical method accuracy (trueness and precision, [21]) (Section 2.2) and, second, sensitivity, such as instrument limit of detection (MSA measurement instrument), as exemplified in the analytical interpretation of the elementary counting case of the present study, Section 3.3. According to [26], "…Analytical performance focuses on the gathering of evidence that the measurement instrument in question reliably, accurately and consistently measures and or detects an analyte". This is closely related to terminology in acceptance sampling standards [28], §3.1 where Inspection by variables is inspection by measuring the magnitude(s) of a characteristic(s) of an item.
- 2.
- ’Clinical’ performance , according to [26], "aims to demonstrate that the measurement instrument can achieve clinically relevant outputs through predictable and reliable use by the intended users". This is closely related to terminology in the definition §3.1.3 of [29], where Inspection by attribute is "inspection whereby either the item is classified simply as conforming or nonconforming with respect to a specified requirement or set of specified requirements, or the number of nonconformities in the item is counted". Commonly used clinical performance metrics of the measurement systems include: selectivity, Equation (22) and specificity, Equation (23), which are plotted against each other on receiver operating characteristic curves, [30], when sampling by ’attribute’. A psychometric treatment of these clinical performance metrics [31,32] yields quantitative estimates for quality characteristics such as task difficulty (MSA object) and agent ability (MSA measurement instrument) as attributes of the different elements of the measurement system illustrated in Figure 3, corresponding to the top-right entry in Table 1.
3.3. PMFs for Counting and Related Tasks. ’Analytical’ and ’Sampling by Variable’
- (i) in fact known exactly
- (ii) conceptually simple
4. Case Studies of PMFs: Quantitative Statistical Process Control
4.1. Binomial Distribution and Dichotomous Bernoulli Trials in SPC
4.1.1. Dichotomous Decision-Making with Uncertainty
4.1.2. Distances on Categorical Scales: Counted Fractions
4.1.3. Compensating for Counted-Fraction Scale Non-Linearity
- the intercept at
- the slopeat
4.2. Poisson Distributions in SPC
4.3. Rasch Psychometric Model. Principle of Specific Objectivity
- ’agent’ ability,
- ’task’ difficulty,
4.3.1. Agnostic Rasch Models
4.3.2. Polytomous Measurement Models
4.3.3. Poisson PMFs for Counting
- In contrast to Rasch’s [40] psychometric model, it is important to note that Poisson [73] placed important restrictions on the applicability of his model: particularly that the misclassification events were to be "very rare", as should apply to the PMFs shown in Figure 2. Although there appears in the literature to be no exact threshold where the Poisson approximation breaks down, a ’rule of thumb’ quotes: is moderate (typically for good accuracy) ([33], Figure 2-24). Our psychometric simulations (Section 5) of task difficulty and counter ability do not suffer from the same restrictions and allow the choice of response level, , over a complete range Figure 8: from the most mis-classifications (for a low-ability counter attempting a difficult counting task) to the least number of mis-classifications where the Poisson approximation should be valid in the latter case (i.e., for a high-ability counter performing an easy counting task). The Poisson rule-of-thumb range can be seen to be comparable with the mis-classification rates shown in Figure 2. However, to make a full comparison between the classic ’rule of thumb’ limit to the Poisson distribution and the present psychometric simulations requires a proper account of the applicability of the ergodic principle: that is, to what extent the classic group-statistic Poisson approach [33,73] corresponds to the individual statistics of Rasch [51] psychometric modelling. For instance, the Poisson rate has in some way to be interpreted in psychometric cases where , i.e., just one counting agent (MSA: Instrument) while, at the same time, the overall number of degrees of freedom needs to be sufficiently large to achieve adequate reliability and validity (Section 5.5).
- The Poisson PMF number of mis-classifications shown in Figure 2 has no information about which numbers are perceived correctly or not, but should in principle correspond — for each item (j) — to the occupancies of the analytical PMFs shown in Figure 1. Similarly, the Poisson PMF number of mis-classifications has no obvious relation to the corresponding ’clinical’ metrics, that is, the counting task difficulty and the counter ability. (In principle, one can estimate the expected count by inverting Equation (21) since — in the present case of an elementary construct — the relation between task difficulty and the number of dots is known, Equation (26)).
5. ’Clinical’ Psychometric Study: Counting Dots
5.1. Dichotomous and Polytomous Mis-Classification Probabilities. CTT and Rasch Measurement Theory
- False Acceptance (or positive, FPR) Probability
- False Rejection (or negative, FNR) Probability
- (i) compensates for counted-fraction scale non-linearity (Section 4.1.3)
- (ii) exchanges the analytical measurands (such as the number of dots, Section 3.3) for the clinical measurands: counting task difficulty and counter ability.
5.2. Counted-Fraction Scale Non-Linearity When Counting Dots. ICC
5.3. Communication of Measurement Information Throughout the Measurement Process. Amount of Entropy
5.4. Simulated PMFs for the Elementary Counting Case
5.4.1. Counting Task Difficulty Simulated. Object (Task) Entropy
5.4.2. Counting Classifier Ability Simulated
-
Excel:Returns a vector of random numbers having the Normal distribution of the measurand counter abilities across the cohort shown with the blue histograms in Figure 8.
5.5. Reliability and Validity
5.6. Validation of Simulated LOGISTIC regressions



6. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| MDPI | Multidisciplinary Digital Publishing Institute |
| DOAJ | Directory of open access journals |
| CTT | Classical Test Theory |
| EMPIR | European Metrology Programme for Innovation and Research |
| EPM | European Programme for Metrology |
| FAP | False acceptance probability |
| FRP | False rejection probability |
| GLMM | Generalised Linear Measurement Model |
| GUM | Guide to the expression of uncertainty of measurement |
| ICC | Item characteristic curve |
| IRT | Item Response Theory |
| JCGM | Joint committee for guides in metrology |
| K-L | Kullback-Leibler |
| MSA | Measurement System Analysis |
| PMF | Probability Mass Function |
| Probability Density Function | |
| RISE | Research institutes of Sweden |
| RMT | Rasch Measurement Theory |
| SE | Standard Error |
| VIM | International Metrology Vocabulary |
References
- Bureau International des Poids et Mesures. The International System of Units (SI), 9th ed.; BIPM: Sèvres, France; SI Brochure, 2019. [Google Scholar]
- Pendrill, L.R. Chapter 2. In Quality Assured Measurement – Unification across Social and Physical Sciences; Springer, 2019. [Google Scholar] [CrossRef]
- Models, Measurement, and Metrology Extending the SI: Trust and Quality Assured Knowledge Infrastructures. In De Gruyter Series in Measurement Sciences; Fisher, Jr., Pendrill, W.P.L., Eds.; De Gruyter Oldenbourg: Berlin/Boston, 2024. [Google Scholar] [CrossRef]
- Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner; Fisher Jr., W.P., Massengill, P.J., Eds.; Springer: Singapore, 2023. [Google Scholar] [CrossRef]
- Fisher Jr., W.P. Measure and manage: Intangible assets metric standards for sustainability. In Business Administration Education: Changes in Management and Leadership Strategies; Marques, J., Dhiman, S., Holt, S., Eds.; Palgrave Macmillan: New York, NY, USA, 2012; pp. 43–63. [Google Scholar] [CrossRef]
- Barney, M.; Barney, F. Transdisciplinary Measurement through AI: Hybrid Metrology and Psychometrics Powered by Large Language Models. In Models, Measurement, and Metrology Extending the SI: Trust and Quality Assured Knowledge Infrastructures;De Gruyter Series in Measurement Sciences; Fisher Jr., W.P., Pendrill, L.R., Eds.; De Gruyter Oldenbourg: Berlin/Boston, 2024; Volume chapter 3, pp. 103–132. [Google Scholar] [CrossRef]
- Dehaene, S.; Izard, V.; Spelke, E.; Pica, P. Log or Linear? Distinct Intuitions of the Number Scale in Western and Amazonian Indigene Cultures. Science 2008, 320, 1217–1220. [Google Scholar] [CrossRef]
- Pendrill, L.R.; Fisher, W.P., Jr. Counting and Quantification: Comparing Psychometric and Metrological Perspectives on Visual Perceptions of Number. Measurement 2015, 71, 46–55. [Google Scholar] [CrossRef]
- Fisher Jr., W.P. Bateson and Wright on number and quantity: How to not separate thinking from its relational context. Symmetry 2021, 13. [Google Scholar] [CrossRef]
- Mallinson, T. Extending the justice-oriented, anti-racist framework for validity testing to the application of measurement theory in re(developing) rehabilitation assessments. In Models, Measurement, and Metrology Extending the SI; Fisher Jr., W.P., Pendrill, L., Eds.; De Gruyter, 2024; pp. 401–428. [Google Scholar]
- Sul, D.; Dominguez, D.G. Culturally responsive evaluation with Latinx communities through culturally specific assessment: Building the Latinx immigration trauma assessment. New Dir. Eval. 2024, 103–112. [Google Scholar] [CrossRef]
- Sul, D. Situating culturally specific assessment development within the disjuncture-response dialectic. In Models, Measurement, and Metrology Extending the SI; Fisher Jr., W.P., Pendrill, L., Eds.; De Gruyter, 2024; pp. 475–500. [Google Scholar]
- Sul, D.; Blackmon, A.T. Enacting culturally specific assessment by constructing a STEM leadership assessment framework. In Journal of Educational Measurement; 2026. [Google Scholar]
- JCGM. Technical Report JCGM 106:2012; Evaluation of measurement data – The role of measurement uncertainty in conformity assessment. Joint Committee for Guides in Metrology: Sèvres, France, 2012.
- Encyclopedia of Science, Technology, and Ethics; Mitcham, C., Ed.; Macmillan Reference: New York, 2005. [Google Scholar]
- Andersen, E.B. Sufficient statistics and latent trait models. Psychometrika 1977, 42, 69–81. [Google Scholar] [CrossRef]
- Andrich, D. Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika 2010, 75, 292–308. [Google Scholar] [CrossRef]
- Fischer, G.H. On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika 1981, 46, 59–77. [Google Scholar] [CrossRef]
- Andrich, D. Distinctions between assumptions and requirements in measurement in the social sciences. In Mathematical and Theoretical Systems: Proceedings of the 24th International Congress of Psychology of the International Union of Psychological Science; Keats, J.A., Taft, R., Heath, R.A., Lovibond, S.H., Eds.; Elsevier Science Publishers, 1989; Vol. 4, pp. 7–16. [Google Scholar]
- Bhatt, U.; Antorán, J.; Zhang, Y.; Liao, Q.V.; Sattigeri, P.; Fogliato, R.; Melançon, G.G.; Krishnan, R.; Stanley, J.; Tickoo, O.; et al. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21), 2021; ACM; pp. 401–413. [Google Scholar] [CrossRef]
- ISO. Accuracy (trueness and precision) of measurement methods and results — Part 1: General principles and definitions. ISO 5725-1:2023(en); International Standard. 2023. [Google Scholar]
- Brown, R.J.C.; Güttler, B.; Neyezhmakov, P.; Stock, M.; Wielgosz, R.I.; Kück, S.; Vasilatou, K. Report of the CCU/CCQM Workshop on `The Metrology of Quantities Which Can Be Counted’. Metrology 2023, 3, 309–324. [Google Scholar] [CrossRef]
- Rossi, G.B. Measurement and Probability: A Probabilistic Theory of Measurement with Applications; Springer: Dordrecht, The Netherlands, 2014. [Google Scholar] [CrossRef]
- JCGM GUM. Joint Committee for Guides in Metrology – Part 6: Developing and Using Measurement Models; Number JCGM GUM-6:2020. JCGM: Sèvres, France, 2020.
- Hofmann, B. “My Biomarkers Are Fine, Thank You”: On the Biomarkerization of Modern Medicine. J. General. Intern. Med. 2025, 40, 453–457. [Google Scholar] [CrossRef]
- European Commission. Factsheet for Manufacturers of In Vitro Diagnostic Medical Devices. Accessed. 2020.
- European Union. Regulation (EU) 2017/746 on In Vitro Diagnostic Medical Devices. 2017. [Google Scholar]
- ISO. ISO 3951-1:2022 Sampling procedures for inspection by variables; International Standard. 2022.
- ISO. ISO 2859-1:2026 Sampling schemes indexed by acceptance quality limit (AQL) for lot-by-lot inspection; International Standard. 2026.
- Birdsall, T.G. The Theory of Signal Detectability: ROC Curves and Their Character; University of Michigan Library: Ann Arbor, MI, USA, 1973. [Google Scholar]
- Linacre, J.M. Bernoulli Trials, Fisher Information, Shannon Information and Rasch. Rasch Meas. Trans. 2006, 20, 1062–1063. [Google Scholar]
- Pendrill, L.R.; Melin, J.; Stavelin, A.; Nordin, G. Modernising Receiver Operating Characteristic (ROC) Curves. Algorithms 2023, 16, 253. [Google Scholar] [CrossRef]
- Montgomery, D.C. Introduction to Statistical Quality Control, 3 ed.; Wiley: New York, 1996. [Google Scholar]
- NIST. Uncertainty machine. Available online: https://uncertainty.nist.gov/.
- Bernoulli, J. Ars Conjectandi; Opus posthumum; Thurneysen Brothers: Basel, 1713; ISSN OCLC 7073795. [Google Scholar]
- Tukey, J.A. Data Analysis and Behavioural Science. In The Collected Works of John A. Tukey, Volume III; Jones, L.V., Ed.; Chapman and Hall, 1984. [Google Scholar]
- Pearson, K. Mathematical contributions to the theory of evolution: on a form of spurious correlation which may arise when indices are used in the measurements of organs. Proc. Roy. Soc. 1897, 60, 489–98. [Google Scholar] [CrossRef]
- Filzmoser, P.; Hron, K.; Reimann, C. Principal Component Analysis for Compositional Data with Outliers. Environmetrics 2009, 20, 621–632. [Google Scholar] [CrossRef]
- Wright, B.D. Thinking with Raw Scores. Rasch Meas. Trans. 1993, 7, 299–300. [Google Scholar]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danmarks Paedagogiske Institut, 1960. [Google Scholar]
- Andrich, D. Rating Scales and Rasch Measurement. Expert Rev. Pharmacoeconomics Outcomes Res. 2011, 11, 571–585. [Google Scholar] [CrossRef]
- Aitchison, J. The Statistical Analysis of Compositional Data. J. R. Stat. Soc. 1982, 44, 139–177. [Google Scholar] [CrossRef]
- Bland, J.M.; Altman, D.G. The Odds Ratio. Br. Med. J. 2000, 320, 1468. [Google Scholar] [CrossRef]
- McCullagh, P. Regression Models for Ordinal Data. J. R. Stat. Soc. 1980, 42, 109–142. [Google Scholar] [CrossRef]
- Wright, B.D.; Stone, M.H. Best Test Design: Rasch Measurement; MESA Press: Chicago, 1979. [Google Scholar]
- Student, S.R.; Briggs, D.C.; Davis, L. Growth Across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model. Educ. Meas. Issues Pract. 2025, 44, 84–95. [Google Scholar] [CrossRef]
- Montgomery, D.C.; Runger, G.C. Applied Statistics and Probability for Engineers, 5 ed.; John Wiley & Sons: Hoboken, NJ, 2011. [Google Scholar]
- Meredith, W.M. The Poisson Distribution and Poisson Process in Psychometric Theory. In ETS Research Bulletin Series; Educational Testing Service: Princeton, NJ, 1968; p. RB-68-42. [Google Scholar]
- Fisher Jr., W.P. Invariance and traceability for measures of human, social, and natural capital: Theory and application. Measurement 2009, 42, 1278–1287. [Google Scholar] [CrossRef]
- Pendrill, L.R. Man as a Measurement Instrument. NCSLI Meas. 2014, 9, 24–35. [Google Scholar] [CrossRef]
- Rasch, G. On General Laws and the Meaning of Measurement in Psychology. In Proceedings of the Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability; Berkeley, CA, Neyman, J., Ed.; 1961; Volume 4, pp. 321–333. [Google Scholar]
- Bashkansky, E.; Turetsky, V. Ability Evaluation by Binary Tests: Problems, Challenges and Recent Advances. J. Phys. Conf. Ser. 2016, 772, 012012. [Google Scholar] [CrossRef]
- Liu, R.; Liu, H.; Shi, D.; Jiang, Z. Poisson Diagnostic Classification Models: A Framework and an Exploratory Example. Educ. Psychol. Meas. 2022, 82, 506–516. [Google Scholar] [CrossRef] [PubMed]
- Rasch, G. Retirement Lecture of 9 March 1972: Objectivity in Social Sciences: A Method Problem. In Rasch Measurement Transactions; Originally presented in 1972; Kreiner, Cecilie, Translator; 2010; Volume 24, pp. 1252–1272. [Google Scholar]
- Stone, M.; Stenner, J. From Ordinality to Quantity. Rasch Meas. Trans. 2014, 27, 1435–1437. [Google Scholar]
- Joint Committee for Guides in Metrology (JCGM). International Vocabulary of Metrology — Basic and General Concepts and Associated Terms (VIM), 3rd edition, 2008. JCGM 200:2008 with minor corrections. 2012. [Google Scholar]
- Pendrill, L.R.; et al. Reducing Search Times and Entropy in Hospital Emergency Departments with Real-Time Location Systems. IISE Trans. Healthc. Syst. Eng. 2021. [Google Scholar] [CrossRef]
- Pendrill, L.R. Category-based interlaboratory comparisons: Psychometric Rasch analyses defining reference values and statistical weighting in a clinical example. Educ. Methods Psychom.;SAMC 2024 Spec. Issue 2026, 4, 26. [Google Scholar] [CrossRef]
- Massof, R.W.; Fisher Jr., W.P. Psychophysics and the Measurement of Sensory Magnitudes. Measurement. Manuscript in review. 2026.
- Rice, S.; Pendrill, L.R.; Petersson, N.; Nordlinder, J.; Farbrot, A. Rationale and Design of a Novel Method to Assess the Usability of Body-Worn Absorbent Incontinence Care Products by Caregivers. J. Wound Ostomy Cont. Nurs. Open access. 2018, 45, 456–464. [Google Scholar] [CrossRef]
- Adams, R.J.; Wilson, M.; Wu, M. Multilevel item response models: An approach to errors in variables regression. J. Educ. Behav. Stat. 1997, 22, 47–76. [Google Scholar] [CrossRef]
- Beretvas, S.N.; Kamata, A. Part II. Multi-level Measurement Rasch Models. In Rasch Measurement: Advanced and Specialized Applications; Smith, V., Everett, J., Smith, R.M., Eds.; JAM Press, 2007; pp. 291–470. [Google Scholar]
- Briggs, D.C.; Wilson, M. An introduction to multidimensional measurement using Rasch models. J. Appl. Meas. 2003, 4, 87–100. [Google Scholar] [PubMed]
- Linacre, J.M. A User’s Guide to FACETS Rasch-Model Computer Program, Version 4.4.5. Winsteps.com Accessed. 2026. (accessed on 2026-04-09). [Google Scholar]
- Linacre, J.M.; Engelhard, G.; Tatum, D.S.; Myford, C.M. Measurement with judges: Many-faceted conjoint measurement. Int. J. Educ. Res.;Spec. Issue Appl. Probabilistic Conjoint Meas. 1994, 21, 569–577. [Google Scholar] [CrossRef]
- von Davier, M.; Carstensen, C.H. Multivariate and Mixture Distribution Rasch Models: Extensions and Applications; Springer, 2007. [Google Scholar] [CrossRef]
- Masters, G.N. A Rasch model for partial credit scoring. Psychometrika 1982, 47, 149–174. [Google Scholar] [CrossRef]
- Masters, G.N.; Wright, B.D. The Partial Credit Model. In Handbook of Modern Item Response Theory; van der Linden, W.J., Hambleton, R.K., Eds.; Springer: New York, 1996; pp. 101–121. [Google Scholar]
- Andrich, D. A Rating Formulation for Ordered Response Categories. Psychometrika 1978, 43, 561–573. [Google Scholar] [CrossRef]
- Rasch, G. On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Dan. Yearb. Philos. 1977, 14, 58–94. [Google Scholar] [CrossRef]
- Andrich, D. Models for measurement: Precision and the non-dichotomization of graded responses. Psychometrika 1995, 60, 7–26. [Google Scholar] [CrossRef]
- Pendrill, L.R. Quantities and units: order amongst complexity. In Models, Measurement, and Metrology: Extending the SI – Trust and Quality Assured Knowledge Infrastructures; Fisher Jr., W.P., Pendrill, L.R., Eds.; De Gruyter, 2024; p. chapter 2. [Google Scholar] [CrossRef]
- Poisson, S.D. Recherches sur la probabilité des jugements en matière criminelle et en matière civile; Original work in French; foundational in probability theory; Bachelier: Paris, 1837. [Google Scholar]
- Akkerhuis, T. Measurement system analysis for binary tests. PhD thesis, University of Groningen, 2016. [Google Scholar]
- Akkerhuis, T.; de Mast, J.; Erdmann, T. The statistical evaluation of binary test without gold standard: Robustness of latent variable approaches. Measurement 2017, 95, 473–479. [Google Scholar] [CrossRef]
- Linacre, J.M. Evaluating a Screening Test. Rasch Meas. Trans. 1994, 7, 317–318. [Google Scholar]
- Cipriani, D.; Fox, C.; Khuder, S.; Boudreau, N. Comparing Rasch analyses probability estimates to sensitivity, specificity and likelihood ratios when examining the utility of medical diagnostic tests. J. Appl. Meas. 2005, 6, 180–201. [Google Scholar]
- Fisher, W.P., Jr.; Burton, E. Embedding measurement within existing computerized data systems: Scaling clinical laboratory and medical records heart failure data to predict ICU admission. J. Appl. Meas. 2010, 11, 271–287. [Google Scholar]
- Linacre, J.M. Expected Score ICC, IRF (Rasch-half-point thresholds). Accessed. n.d. (accessed on 2026-03-27).
- Baker, F.B.; Kim, S.H. Item Response Theory: Parameter Estimation Techniques, 2 ed.; CRC Press: Boca Raton, 2004. [Google Scholar]
- Linacre, J.M. How to Simulate Rasch Data. Rasch Meas. Trans. 2007, 21, 1125. [Google Scholar]
- Weaver, W.; Shannon, C.E. The Mathematical Theory of Communication; University of Illinois Press: Champaign, 1963. [Google Scholar]
- Benish, W.A. A Review of the Application of Information Theory to Clinical Diagnostic Testing. Entropy 2020, 22, 97. [Google Scholar] [CrossRef]
- Pele, O.; Werman, M. The Quadratic-Chi Histogram Distance Family. In Proceedings of the European Conference on Computer Vision (ECCV), 2010; pp. 749–762. [Google Scholar]
- Melin, J.; et al. NeuroMET Memory Metric: Traceability and Comparability through Crosswalks. Sci. Rep. 2023, 13, 5179. [Google Scholar] [CrossRef]
- Melin, J.; Cano, S.J.; Flöel, A.; Göschel, L.; Pendrill, L.R. The Role of Entropy in Construct Specification Equations (CSE) to Improve the Validity of Memory Tests. Entropy 2022, 24, 934. [Google Scholar] [CrossRef] [PubMed]
- Stenner, A.J.; Smith, M. Testing Construct Theories. Percept. Mot. Ski. 1982, 55, 415–426. [Google Scholar] [CrossRef]
- Stenner, A.J.; Smith, M.I.; Burdick, D.S. Toward a Theory of Construct Definition. J. Educ. Meas. 1983, 20, 305–316. [Google Scholar] [CrossRef]
- Fisher, W.P., Jr.; Stenner, A.J. Theory-based metrological traceability in education: A reading measurement network. Measurement 2016, 92, 489–496. [Google Scholar] [CrossRef]
- Fisher Jr., W.P.; Stenner, A.J. Theory-based metrological traceability in education: A reading measurement network. In Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner; Reprint of Fisher and Stenner (2016); Fisher Jr., W.P., Massengill, P.J., Eds.; Springer, 2023; pp. 275–293. [Google Scholar] [CrossRef]
- Klir, G.J.; Folger, T.A. Fuzzy Sets, Uncertainty, and Information; Prentice Hall: New Jersey, 1988. [Google Scholar]
- Melin, J.; Pendrill, L.R.; et al. Construct Specification Equations: `Recipes’ for Certified Reference Materials in Cognitive Measurement. Meas. Sens. 2021, 18, 100290. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Brillouin, L. Science and Information Theory, 2 ed.; Academic Press, 1962. [Google Scholar] [CrossRef]
- Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. In Statistics for Social and Behavioral Sciences; De Boeck, P., Wilson, M., Eds.; Springer-Verlag, 2004. [Google Scholar]
- Measuring Psychological Constructs: Advances in Model-Based Approaches; Embretson, S.E., Ed.; American Psychological Association, 2010. [Google Scholar]
- Ru, D.; et al. RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation, 2024. arXiv 2024, arXiv:csversion 2. [Google Scholar]
- Pendrill, L.R. Using Measurement Uncertainty in Decision-Making & Conformity Assessment. Metrologia 2014, 51, S206. [Google Scholar] [CrossRef]
- Thompson, F.N.; et al. Trustworthy Artificial Intelligence. arXiv 2024, arXiv:cs. [Google Scholar]
- Letizia, F.N.; et al. Copula Density Neural Estimation. IEEE Transactions on Neural Networks and Learning Systems, 2025. [Google Scholar]
- Decruyenaere, A.; et al. Debiasing Synthetic Data Generated by Deep Generative Models, 2024. arXiv 2024, arXiv:statversion 1. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS); Fort Lauderdale, FL, USA, Gordon, G., Dunson, D., Dudík, M., Eds.; Proceedings of Machine Learning Research , 2011; Vol. 15, pp. 315–323. [Google Scholar]
- Sklar, M. Fonctions de répartition à N dimensions et leurs marges. Ann. De. l’ISUP 1959, 8, 229–231. [Google Scholar] [PubMed Central]








| Discrete | Continuous | |
|---|---|---|
| Qualitative | Instrument response: per category/class |
Clinical performance (point 2): Instrument ability, , u() Task difficulty, , u() |
| Quantitative | Analytical counting (Figure 1) How many dots in object? Counting errors Limit of detection |
Analytical measure (point 1): How much of a quantity in object? Measurement errors and uncertainties Trueness & precision |
| x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 33.72 | 10.77 | 2.29 | 0.37 | 0.05 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 3.09 | 7.89 | 13.43 | 17.15 | 17.53 | 14.92 | 10.89 | 6.95 | 3.95 | 2.02 | |
| 1.83 | 5.27 | 10.10 | 14.51 | 16.68 | 15.97 | 13.12 | 9.42 | 6.02 | 3.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).