Preprint
Article

This version is not peer-reviewed.

Stratification Criteria for Machine Learning Pattern Discovery in Particle Physics - Preparing for the AlphaFold Moment

Submitted:

19 February 2026

Posted:

19 February 2026

You are already at the latest version

Abstract
Machine learning capabilities are expanding into scientific domains at an accelerating pace. When applied to high energy physics pattern discovery, they will generate candidates faster than traditional evaluation can absorb. ML finds patterns in past data. It is inherently post hoc. Whether those patterns reflect structure or coincidence is unknowable at discovery time. This limitation applies equally to human and computational pattern finding. What differs is scale. ML candidate generation is effectively unbounded, while human evaluation capacity remains fixed. When generation rate exceeds evaluation bandwidth, binary accept or reject degenerates to random sampling. Information theoretically, the only response that preserves ranking under a finite evaluation budget is stratification. By focusing on stratification rather than binary filtering, rule adjustments can be made retroactively, thresholds tuned as results accumulate, and evaluation bandwidth focused on top ranked candidates. This paper attempts to codify those criteria, proposing seven computationally evaluable standards for stratifying ML generated patterns. The goal is not to deliver verdicts but to prioritize which candidates merit preregistration and longitudinal tracking. The framework preserves the essential paradigm: pattern plus theory equals potentially real physics. Patterns alone, however striking, remain candidates until theoretical understanding arrives. Making these criteria explicit enables prefiltering at scale while creating a collaborative resource rather than a competitive one. ML capabilities extend what physicists can search while preserving how physicists evaluate. We offer this provisional framework for community calibration, with the goal of developing validation infrastructure before the capability fully arrives.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated