1. Introduction
The P vs NP question is usually framed as a global boundary in computational feasibility. This paper approaches it as a constructive systems question: can one build a conditional, auditable decision pipeline that remains comparable under admissible transformations and closes within bounded resources?
Reductions and representation changes are central to NP-completeness, but they also create a transport problem: invariants that appear decisive in one representation may fail to remain stable, checkable, or comparable after transformation.
In [
1], we argued that barriers to P = NP can be reinterpreted as multitime barriers: different clocks govern transformation, verification, comparability, and repair, and “reasons” may fail to travel efficiently.
The present paper extends that diagnosis into an executable compilation program: Multitime Transport Compilation converts admissible transformations into receipt-gated comparability, enabling conditional candidate deciders under explicitly declared regimes rather than informal claims.
This work is oriented toward discovering evidence consistent with a constructive route to P = NP, but it does not claim P = NP. The purpose is to make “progress” legible as a sequence of auditable, falsifiable steps inside an explicit operational envelope: a declared set of admissible transformations and verification budgets, with outcomes recorded as receipts and promotion blocked unless mechanical gates are satisfied [
1].
This stance is compatible with the standard NP-completeness framing (where reductions define what must be preserved) [
4,
5], and with the broader complexity-theory norm that claims must be conditional on clearly stated assumptions and models [
7,
8,
9]. It also acknowledges that major barriers and non-monotonicities can arise when one attempts to turn informal “structure” into algorithms, motivating the paper’s insistence on gates, rollback, and bounded growth rather than narrative confidence [
10,
11,
23,
24].
Empirical statements are therefore restricted to what is receipt-backed by the runs reported here, and all promotion decisions are gated mechanically: overlap and scale constraints serve as comparability controls, while key_hit, abstain, and tie closure serve as closure controls [
1]. The explicit inclusion of abstain follows the reject-option and selective classification tradition, where refusal is a legitimate outcome but must be measured and bounded rather than hidden inside error bars or anecdotal exceptions [
21,
22].
Finally, the verification posture is informed by the SAT community’s emphasis on making solver behavior testable and checkable, including proof-oriented workflows and standards for inspectable artifacts — even though full external proof-log integration remains a limitation in the receipts shown here [
12,
13,
14,
15,
16,
17].
3. Method: Receipt-Gated Miner Protocol
3.1. Receipts and Artifacts
Each run produces receipt-like artifacts that declare configuration and report outcomes under bounded budgets:
RCR (RunConfigReceipt): configuration, budgets, and regime declaration.
DSR (Discovery Summary Receipt): discovery coverage, abstain rate, overlap, and no_overlap indices.
LR (Library Receipt): lib_keys, key_hit rate, candidate statistics, canonicalization time.
TBR (Tie-Break Receipt): tie_rate, tiebreak_success, unresolved count and samples (when enabled).
UR (Uplift Receipt): run-level verdict and a printed rationale string.
This paper uses only what appears in these logs, and treats receipts as the boundary of what is claimable for each run. [
1]
3.2. Candidate-Decider Compilation Pipeline
A “candidate polynomial decider” in this paper is conditional and receipt-gated: canonicalize → key → library lookup → bounded verification, with abstain allowed but mechanically gated. The point is not to assert a decider, but to make the path to one mechanically checkable: when gates fail, the paper records which failure mode blocks promotion under declared regimes and budgets. [
1,
4,
5,
6,
7,
8,
9]
3.3. Operational Definitions (Overlap, Key_Hit, Abstain, Ties, Growth)
To reduce ambiguity, the paper uses the following operational meanings as reflected in receipts:
overlap: measured agreement of the transport signature between the compared sets under the run’s declared canonicalizer+key semantics. overlap_min summarizes worst-case overlap; no_overlap lists concrete indices where overlap fails. (This is an instrumentation-dependent proxy, not a universal semantic invariant.) [
1]
key_hit: fraction of test instances whose derived key is found in the library index produced under the same declared semantics; key_hit_min summarizes worst-case key-lookup closure.
abstain: fraction of instances where the pipeline refuses to promote a decision under declared budgets and constraints (e.g., missing key, insufficient verification closure, or unresolved tie conditions); abstain_max summarizes worst-case refusal.
ties: tie_rate measures how often multiple candidates compete; tiebreak_success and unresolved report whether the declared tiebreak policy closes contestability. Persistent unresolved ties count as non-closure. [
21,
22]
library growth: lib_keys tracks library size; the scale guard constrains growth across sizes to resist memorization-by-fragmentation within the envelope. [
1]
3.4. Mechanical Gates and Paper Taxonomy (Decoupled from Printed Rationales)
The miner prints mechanical gate verdicts and a printed rationale label. For auditability, the paper separates:
Mechanical gate failure: determined by thresholds on overlap, key_hit, abstain, ties, lib_keys, and the scale guard.
Failure taxonomy label (paper): assigned from receipts and must be consistent with metrics. In particular, if overlap_min = 100% but gate failure arises from key_hit/abstain/ties, the correct paper label is closure deficit under stable measured overlap (e.g., KEY_MISS and/or TIES_BROKEN), not KEY_DRIFT. [
1]
Taxonomy labels used (paper):
3.5. Lemmas and Theorem-Style Operational Claims (Conditional, Receipt-Scoped)
The statements in this subsection are not complexity-theoretic theorems. They are protocol-level claims about what the instrumentation and gates allow the paper to infer, conditional on the declared semantics and the correctness of the receipts. [
1,
4,
5,
6,
7,
8,
9]
Lemma 3.5.1 (Mechanical promotion only).
A run is “promoted” by this paper if and only if it satisfies all declared gates. No qualitative interpretation can override a failing gate. [
1]
Lemma 3.5.2 (Transport–closure separation).
A run may have overlap_min = 100% under the declared canonicalizer+key semantics and still fail promotion because closure does not meet gates — e.g., key_hit remains below threshold, abstain exceeds the allowed bound, and/or ties remain unresolved.
Therefore, overlap stability is evidence of invariance under the current measurement proxy, not evidence of a decider (and not a P = NP claim). In the paper’s multitime framing, this corresponds to the transport clock remaining stable while closure clocks (lookup recall, refusal rate, contestability resolution) remain open. [
1]
Lemma 3.5.3 (Audit-consistent classification rule).
If the receipts report overlap_min = 100% (worst-case across the relevant stages) and no_overlap is empty, then the transport proxy is stable under the declared canonicalizer+key semantics. In that situation, any gate failure must be classified as closure-limited (e.g., insufficient key_hit, excessive abstain, failed tie closure, bounded-growth violations), even if a printed runtime label says KEY_DRIFT. The point is not to “rename” failures, but to keep the paper’s interpretations consistent with its own instrumentation: when the transport clock (overlap) is receipt-stable, the bottleneck is located on closure clocks, and knob selection should target closure variables rather than drift mitigation. This separation is a core auditability rule: it prevents post-hoc narrative drift and makes the next admissible experiment mechanically determined by what failed. [
1]
Lemma 3.5.4 (Tie closure blocks promotion).
Under the declared tie gate (tie_rate≤1% OR (tiebreak_success≥99% and unresolved=0)), persistent contestability is treated as non-closure: if tie_rate remains high while tiebreak_success stays near zero and unresolved ties persist, the run cannot be promoted — even when overlap is stable and other closure metrics improve. The reason is operational and audit-driven: unresolved ties mean the pipeline encountered competing candidates that it could not mechanically adjudicate within its declared policy and budgets, so any “decision” would be either arbitrary or hidden work. In this protocol, tie handling is therefore part of the decider story (not a cosmetic post-processing step): until ties are closed by a validated, receipt-visible policy, the system has not demonstrated governable closure at scale. This aligns with reject-option/selective-decision principles: abstention/deferral can be admissible, but only when it is explicitly accounted for, mechanically bounded, and progressively reduced rather than used to mask difficulty. [
1,
21,
22]
Lemma 3.5.5 (Bounded growth is necessary under library reuse).
If the library is the vehicle for reuse (canonicalize → key → lookup → verify), then bounding lib_keys and enforcing a scale guard are necessary to resist memorization-by-fragmentation within the tested envelope. These guards do not prove generalization, but without them improved key_hit can be achieved by uncontrolled key fragmentation. [
1]
Proposition 3.5.6 (Receipt-backed negative results isolate bottlenecks).
If two runs differ by a single declared knob while preserving canonicalizer+key semantics and regimes, and worst-case closure metrics do not improve, then the receipts support the narrow conclusion that this knob is not the primary bottleneck within the tested envelope. This is not a global statement about SAT or NP-completeness; it is a scoped bottleneck isolation statement. [
1,
4,
5,
6,
7,
8,
9]
Proposition 3.5.7 (Constructive route means governed iteration).
A “constructive route” in this paper means: define comparability by canonicalizer+key semantics; compile transport into a bounded library; verify under bounded policy; gate promotion mechanically; iterate admissible knobs while preserving receipts and rollback discipline. The route is constructive as a testable compilation program, not as a completed proof of P = NP. [
1]
Theorem 3.5.8 (Reporting discipline soundness, conditional).
Assuming receipts are faithfully produced by the declared miner implementation, the protocol ensures:
(a) comparability across runs is mechanical (same semantics, same gates, same reporting form),
(b) empirical statements are bounded to the declared operational envelope, and
(c) failure modes are separable into transport stability (overlap proxy), closure deficits
(key_hit/abstain), contestability deficits (ties), and bounded-growth constraints (lib_keys and scale guard), without requiring narrative interpretation. [
1]
3.6. Why These Are Not Solver Benchmarks
A fair critique is that these results resemble solver logs. The intended contribution is different: these are transport and closure receipts under a governance kernel. The program’s key objects are comparability stability (overlap), closure sufficiency (key_hit, abstain), contestability (ties), and bounded growth (lib_keys and scale guard). This instrumentation constrains what can be claimed and makes progress auditable, rather than optimizing performance narratives. [
1,
7,
8,
9,
12,
13,
14,
15,
16,
17]
3.7. What “Progress” Means in This Paper (Mechanical Promotion Only)
This work is oriented toward discovering evidence consistent with a constructive route to P = NP, but it does not claim P = NP. Empirical statements are restricted to what is receipt-backed by the runs reported here, and all promotion decisions are gated mechanically. [
1,
4,
5,
6,
7,
8,
9]
Concretely, “progress” is defined as movement along a disciplined ladder of auditable conditions. First, runs must be comparable: the run configuration (stages, regimes, canonicalizer parameters, budgets, and gates) is bound by receipts so that outcomes can be interpreted without post-hoc narrative changes. [
1] Second, the protocol must make failure modes classifiable: when overlap_min = 100% under the current proxy, failures are treated as closure-limited (key_hit/abstain/ties) rather than drift-limited, because this distinction determines which knobs are admissible to turn next. [
1] Third, progress requires that closure be demonstrated mechanically: the candidate decider is not “improved” by rhetoric, only by meeting explicit thresholds on key recall, abstention, tie closure, and bounded library growth, including scale guards that resist memorization-by-fragmentation. [
1]
This definition makes abstention and reject behavior first-class: the miner is allowed to refuse when it cannot close safely under its declared policy, and such refusals are recorded rather than hidden. This aligns with selective decision systems where abstention can be rational but must be accounted for and mechanically bounded if the goal is eventual promotion to a total decider. [
21,
22]
3.8. Remark (W = I ^ C as an Audit Philosophy, Non-Operational)
As a non-operational framing, the protocol aligns with the archetype W = I ^ C, where W is wisdom, I is intelligence, and C is consciousness (intelligence elevated by awareness, contestability, and self-binding).
In this reading, C is not “constraint” itself; rather, C is the capacity to notice when the system is not justified to act, to keep reasons comparable, and to refuse promotion when closure is not mechanically demonstrated. The miner’s receipts, budgets, gates, and rollback discipline are therefore best understood as a computable proxy for C: a consciousness-like operator that forces the system to remain accountable to its own declared limits rather than to narrative confidence. [
3]
This “computable C” is implementable as a runtime layer that sits above the inference engine. Concretely: (i) every run declares a regime and produces receipts (RCR/DSR/LR/TBR/UR) that bind what the system is allowed to claim; (ii) promotion is permitted only when explicit gates are satisfied (closure thresholds, tie closure, bounded growth, scale guard), otherwise the system must abstain; (iii) rollback is admissible when a knob degrades overlap stability or violates scale behavior, preserving audit continuity across multitime clocks (transport stability versus closure capacity). In this sense, consciousness is operationalized as “self-binding under receipts”: the system continuously checks whether it is inside an admissible corridor and refuses to act when it cannot justify closure within its declared budgets. [
1]
This remark is motivational: it does not alter metrics or gates. It clarifies why the paper treats abstention, tie closure, and bounded growth as first-class requirements: they are the computational face of “awareness” in a system that would otherwise optimize for output without accountability. [
21,
22]
4. Multitime Transport Compilation Protocol
Multitime Transport Compilation is an operational protocol for turning “solver behavior” into auditable, comparable evidence about whether a reusable transport signature (canonicalization → key → lookup → verify) can be made to close under bounded resources.
The protocol treats distinct “clocks” as first-class: (i) a transport-stability clock (measured overlap under fixed semantics), (ii) closure clocks (key_hit, abstain, tie closure), and (iii) a scale/bounded-growth clock (library growth and cross-size guards). Empirical statements in this paper are restricted to what is receipt-backed by declared runs [
1], with standard complexity context in [
4,
5,
6,
7,
8,
9].
4.1. Miner Stages S1–S5
A run is a fixed staged pipeline. Each stage is a declared test point defined by (i) a size parameter n (e.g., 128/256/512/1024 in the reported runs), (ii) a split into paired sets (denoted in logs as N_A / N_B under a fixed split policy), and (iii) a regime — a declared composition of admissible transformations applied inside the miner.
Operationally, each stage executes a stack of the form:
N0_robust ∘ WL (canonicalization + Weisfeiler–Leman-style refinement), optionally followed by
WL_tiebreak (a tie-resolution module, when enabled), and then
library lookup + bounded verification under the stage’s budgets.
Stages S1–S5 are not “tuning passes”; they are predefined measurement checkpoints meant to expose different failure modes under controlled envelopes. The exact stage→regime schedule is declared by the run configuration and must not be altered post hoc if a run is to remain comparable [
1].
The WL component is motivated as a practical canonical-form heuristic (in the orbit of graph canonicalization practice), not as a proof of canonical minimality [
18,
19,
20].
4.2. Canonicalization and Key Semantics
Canonicalization maps each input instance into a representation intended to be stable under declared admissible transformations. In the reported runs, this pipeline includes (as declared by regime) components such as:
PLE (pure literal elimination),
BCE (blocked clause elimination),
dup / lit-dup (controlled duplication normalization),
taut-heavy (tautology-heavy simplification),
sub (bounded subsumption-style reductions),
together with WL rounds to compute a refinement signature [
18,
19,
20]. These are deliberately budgeted: the canonicalizer parameters (e.g., subsumption budgets, iteration caps, node caps) define what the protocol considers an admissible amount of work per instance, which is essential to keep the experiment interpretable as “bounded compilation” rather than unbounded solving [
1].
Key semantics. The miner then forms a key from the canonicalized representation (conceptually: a hash of the canonical form / signature under the declared semantics). This enables:
key_hit: fraction of test instances whose keys appear in the library built from the paired set under the same semantics (recall of the key index).
overlap: fraction of instances whose keys are consistent across the compared sets under the same semantics (a measured invariance of the current key definition, not a universal semantic notion).
abstain: fraction of instances the pipeline refuses to promote because closure cannot be achieved within declared policies (e.g., key absent, verification failure, budget exhaustion, or unresolved tie).
These definitions are intentionally operational: the “candidate decider” in this paper is precisely the composition
canonicalize →
key →
lookup →
verify →
promote/refuse, not an unconstrained SAT solver [
1,
12,
17].
4.3. Library Dynamics, Scale Guard, Bounded Growth
The library is the compilation substrate: it stores bounded information indexed by keys, so that future instances sharing a key can reuse previously compiled structure rather than re-derive it. This is the mechanism by which the protocol probes a constructive route (reusable transport) instead of a purely per-instance search [
1].
Because a key-indexed library can degenerate into memorization, the protocol treats bounded growth as a hard constraint:
lib_keys tracks the number of distinct keys held by the library (an index-size proxy).
cap_per_key bounds how many candidates may be stored or retained per key.
max_candidates bounds how many candidates may be considered (a recall/truncation knob).
To resist “memorization by fragmentation,” the protocol also imposes a scale guard: library growth at larger sizes must remain close to library growth at smaller sizes (as declared in the gate). This forces the route, if it exists, to look like reusable indexing rather than an expanding lookup table [
1]. This framing is consistent with why disciplined reuse and bounded growth matter in any attempt to turn NP search into polynomial-time reuse under fixed semantics [
4,
5,
6,
7,
8,
9].
4.4. Verification Policy and Artifacts
Verification is treated as a bounded runtime step (the run declares a verify budget). Verification checks are internal to the miner’s policy: they determine whether a candidate can be promoted as “closed” under the declared regime and budgets.
Importantly, the receipts shown in this paper are not yet accompanied by externally checkable proof artifacts in standard SAT proof-log formats (e.g., DRAT/LRAT/FRAT-style ecosystems). As a result, auditability is currently strongest at the protocol level (gates, budgets, receipts, and reproducibility of the pipeline), rather than at the level of third-party replayable proof objects. Integrating standardized, independently checkable proof logs would strengthen reproducibility and align the methodology more closely with modern proof-checking practice in SAT workflows [
15], with broader proof-complexity context in [
23,
24].
4.5. Mechanical Gates (Promotion Rules)
Promotion is mechanical: the paper defines explicit thresholds that must be met to treat a run as supporting a stronger claim. Gates are chosen to separate three distinct outcomes:
1. Transport stability (measured by overlap_min under fixed semantics),
2. Closure (high key_hit, low abstain, tie closure),
3. Bounded reuse at scale (bounded lib_keys and scale guard).
Tie handling is a first-class gate because unresolved ties represent contestability failures: multiple competing candidates exist and cannot be mechanically resolved within policy. If tie_rate remains high, tiebreak_success remains near zero, and unresolved cases persist, promotion is blocked — even when overlap is stable — because the protocol cannot honestly claim closure at scale under its own rules [
1]. This is aligned with selective decision systems: abstention can be rational, but only if bounded and policy-consistent [
21,
22].
Finally, the paper’s failure taxonomy is defined to be audit-consistent: when overlap receipts show overlap_min = 100%, failures are classified as closure-limited (e.g., KEY_MISS / abstain / ties) rather than “drift,” even if a printed rationale string says otherwise. This is required for interpretability: the transport clock may be stable while closure clocks are not [
1].
4.6. Reporting Discipline (Run Ledger)
Runs are reported under a strict ledger discipline:
Each run is identified by a RunConfigReceipt (RCR) binding stages, regimes, budgets, canonicalizer parameters, and gate thresholds.
Each stage emits receipts (including stage summaries) so that “what happened” is anchored to declared settings rather than post-hoc narrative.
Each run is appended as its own subsection in Results (“Run gXXXX: supported by receipts”), and no aggregate claim is made unless multiple independent receipts support it under comparable configurations [
1].
This reporting discipline is the core methodological contribution: it turns the search for constructive routes into a governed process where progress and failure are visible, comparable, and mechanically classifiable — without pretending that any single notebook-scale experiment settles the P vs NP question [
1,
4,
5,
6,
7,
8,
9].
5. Results
This section reports results under a strict receipt-backed discipline. Each run is treated as a single, auditable episode whose claims are limited to what is explicitly present in its receipts (RunConfigReceipt, Stage Summaries, Gates, and UR Verdicts).
The purpose is not to “tell a success story,” but to make progress and failure mechanically comparable across runs, with knob changes interpreted as controlled interventions rather than post-hoc narrative.
Accordingly, each Run gXXXX subsection is evidence about a specific operational envelope: the staged miner design (S1–S5), declared regimes (R1–R3), canonicalizer+key semantics, and stated budgets. These runs are not presented as general SAT solver benchmarks; they are transport-and-closure receipts that localize bottlenecks under the paper’s candidate-decider pipeline (canonicalize → key → lookup → verify) and its mechanical promotion gates.
A crucial audit rule is enforced throughout: when overlap receipts show overlap_min=100.0%, the paper treats transport stability (under the current metric) as provisionally satisfied, and classifies the dominant failure as closure-limited (e.g., key_hit, abstain, tie closure, bounded growth), even if the miner prints a generic rationale such as KEY_DRIFT.
Conversely, when overlap receipts fall below 100.0% or show explicit no_overlap indices, the run is classified as measured drift under the current instrumentation. This separation implements the multitime ledger idea: different “clocks” can fail independently, so transport can remain stable while closure clocks do not.
In sum, “supported by receipts” means something concrete: every numeric claim in the run subsections (overlap_min, key_hit_min, abstain_max, lib_keys_max, scale-guard status, and tie metrics where reported) is directly taken from that run’s receipts, and the subsection’s failure taxonomy is required to be consistent with those measurements.
Any aggregate statement is deferred unless it is explicitly backed by multiple receipts reported in this paper.
5.0. Run Ledger (Receipt-Backed Reporting Discipline)
Each accepted run is recorded as its own receipt-backed subsection because the protocol treats empirical claims as audit objects, not narrative summaries: a claim is admissible only if it is traceable to a specific RunConfigReceipt and the associated stage receipts (DSR/LR/TBR/UR).
This is the same governance logic introduced in the multitime kernel framing of [
1]: comparability is produced by declared regimes, mechanical gates, and no-reopen discipline, so the “unit of evidence” is the closed run, not an averaged storyline.
Accordingly, the paper makes no aggregate claim (e.g., “we improved closure” or “WL helps/hurts”) unless multiple independent run receipts explicitly support it under comparable conditions. This prevents over-interpreting variance that is known to arise in SAT-style pipelines due to heuristic interactions, instance distribution shift, and budget sensitivity — phenomena that have long been emphasized in SAT solver engineering and evaluation practice [
12,
13,
17].
It also mirrors the broader complexity-theory norm that general conclusions require careful control of assumptions and distributions, rather than extrapolation from a small set of experiments [
7,
8,
9].
5.1. Run g0008: Stable Measured Overlap with Closure Deficit and Unresolved Ties
RCR=ed782b6b16643ba386113e428e0d8759a4a14c483546a0b7e10185abca03d57e.
Config: S=12 skeletons, V=10 variants (fixed split 5/5), budgets discovery=280000 and verify=400000, WL rounds=12, cap_per_key=10, max_candidates=18. Canonicalizer params: sub_max_k=6, sub_budget_pairs=15000, bce_budget=8000, ple_max_iters=60, big_node_cap=50000. Regimes: R1=ple∘bce, R2=dup∘taut-heavy∘lit-dup, R3=sub∘dup.
Worst-case stage summaries:
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15 (S1)
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True
scale guard: PASS
Mechanical gates fail under thresholds key_hit≥99, abstain≤1, and ties closure. Printed gate rationale is “KEY_DRIFT,” but since overlap_min=100.0%, the paper classifies the dominant issue as closure deficit under stable measured overlap, plus tie blockage.
Failure taxonomy (paper): KEY_MISS (primary), TIES_BROKEN (blocking).
What this enables: This run isolates a stable-overlap envelope while demonstrating insufficient closure. It supports focusing subsequent tests on closure knobs (coverage/recall/canonicalizer budgets) and tie resolution rather than drift mitigation.
5.2. Run g0009: WL-Depth Increase Correlates with Measured Drift and Scale-Guard Failure
RCR=dfae2ddc04dbc4d0a29341b021f561c7c9940c6e19a303c1f091e015ed0cbea7. Same as g0008 except WL rounds=16.
Worst-case stage summaries:
overlap_min=83.3% (S1–S3) and overlap_min=75.0% (S5), with no_overlap indices present
key_hit_min=68.3% (S5)
abstain_max=31.7% (S5)
lib_keys_max=23
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True
scale guard: FAIL (lib_keys(1024)=23 while lib_keys(512)=15, violating the +2 constraint)
Here overlap receipts support measured drift under the current metric, and scale behavior degrades.
Failure taxonomy (paper): KEY_DRIFT (primary) plus DISCOVERY_COVERAGE_LIMIT and/or RECALL_TRUNC symptoms (secondary), and TIES_BROKEN (blocking).
What this enables: This run provides a receipt-backed counterexample to monotone refinement: deeper WL (as configured) can reduce overlap and violate scale guards. It justifies rollback as an admissible governance move.
5.3. Run g0010: Rollback to WL=12 Restores Stable Overlap and Scale Guard; Closure Deficit Remains
RCR=f9ab65fd10c3b71f91820d2973d63d9ac6db5ec87c2b3cd7e35d80eb80374875. Same as g0008 (WL rounds=12).
Worst-case stage summaries:
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True
scale guard: PASS
Mechanical gates fail for closure and ties. Printed gate rationale is “KEY_DRIFT,” but overlap receipts indicate stable measured overlap; the paper classifies the issue as closure deficit under stable measured overlap, plus tie blockage.
Failure taxonomy (paper): KEY_MISS (primary), TIES_BROKEN (blocking).
What this enables: Together with g0009, this supports treating WL=16 (in this configuration) as destabilizing measured overlap and scale behavior, while WL=12 preserves stable overlap. Remaining obstacles are closure and tie resolution.
5.4. Run g0011: Recall Ablation via Cap-Per-Key Increase Shows No Closure Uplift Under Stable Overlap
RCR=7e4a0539558f0c0475e502ec5d3358e4fad1557a63213be2c928a3861cfc59a0 (ConfigHash12=7e4a0539558f). Config: S=12 skeletons, V=10 variants (fixed split 5/5), budgets discovery=280000 and verify=400000, WL rounds=12, cap_per_key=20, max_candidates=18. Canonicalizer params: sub_max_k=6, sub_budget_pairs=15000, bce_budget=8000, ple_max_iters=60, big_node_cap=50000. Regimes: R1=ple∘bce, R2=dup∘taut-heavy∘lit-dup, R3=sub∘dup.
Worst-case stage summaries:
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15 (S1)
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True
scale guard: PASS
Mechanical gates fail under thresholds key_hit≥99, abstain≤1, and ties closure. Printed gate rationale is “KEY_DRIFT,” but since overlap_min=100.0%, the paper classifies the dominant issue as closure deficit under stable measured overlap, plus tie blockage. Increasing cap_per_key increases candidate load (avg_cand and p90_cand rise, with p90_cand saturating max_candidates=18), yet worst-case closure metrics (key_hit_min, abstain_max) do not improve.
Failure taxonomy (paper): DISCOVERY_COVERAGE_LIMIT (primary), TIES_BROKEN (blocking).
What this enables: This run provides a negative ablation: increasing per-key recall capacity does not move worst-case closure under the current configuration. It motivates shifting the next test knob toward discovery coverage (discovery_steps) or canonicalizer strength, while treating tie closure as a separate promotion blocker.
5.5. Run g0012: Doubled Discovery Budget, Stable Overlap, No Closure Uplift
RCR=ccc10332c0607211ae8903fa1681c8c20839d319d65a8380a4e2ae3c9b6e502e (ConfigHash12=ccc10332c060). Configuration highlights: S=12 skeletons, V=10 variants (fixed split 5/5). Budgets: discovery=560000, verify=400000. WL rounds=12. cap_per_key=20, max_candidates=18. Canonicalizer parameters: sub_max_k=6, sub_budget_pairs=15000, bce_budget=8000, ple_max_iters=60, big_node_cap=50000. Regimes: R1=ple∘bce, R2=dup∘taut-heavy∘lit-dup, R3=sub∘dup.
Worst-case stage summaries (receipts):
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15 (S1)
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True (S1–S5, where reported)
scale guard (1024 vs 512): PASS
Mechanical gates (thresholds: overlap=100%, key_hit≥99%, abstain≤1%, lib_keys≤24, tie closure, scale guard) fail at all stages. The miner prints rationale “KEY_DRIFT” (UR-lite receipt=a6c042a9b81e; UR-full receipt=933b6846d326), but because overlap_min=100.0% throughout, this run is classified in-paper as closure-limited under stable measured overlap: key_hit and abstain do not satisfy gates, and tie closure remains blocked (tie_rate=50% with tiebreak_success=0% and unresolved cases).
What this enables: This run tests whether increasing discovery budget alone improves closure under an otherwise stable-overlap envelope; it does not, within the reported worst-case metrics. It motivates the next experiments to target closure knobs (recall/candidate truncation and tie closure) rather than transport stability, since the transport proxy remains stable while closure gates remain unsatisfied.
5.6. Run g0013: Supported by Receipts
RCR = 507a1c3ebee44d30ab9d840df25e251820011ac5a6cd096d7deafe6b03e266fe. Same as g0012 except max_candidates=60 (WL=12, cap_per_key=20, discovery=560000, verify=400000).
Worst-case stage summaries:
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True• scale guard: PASS
Gate outcome: increasing max_candidates (a recall/truncation knob) does not improve worst-case closure metrics under the current regime/canonicalizer. Since overlap_min=100.0%, the paper classifies this as closure-limited rather than drift, even if a printed rationale reports KEY_DRIFT. This is an explicit receipt-backed negative result: the closure deficit observed in g0008/g0010/g0011/g0012 is not primarily explained by max_candidates truncation (within this envelope).
Failure taxonomy (paper): KEY_MISS (primary), TIES_BROKEN (blocking).
What this enables: rules out a plausible “easy” explanation (recall truncation) and tightens the paper’s bottleneck diagnosis: remaining gains likely require (i) canonicalizer+key semantics changes, and/or (ii) a validated tie-closure policy.
5.7. Run g0014: Supported by Receipts
RCR = f29e8b32588ed38fe7fef18c20951079e9cf5eaed93195f95a6b974607573ac4. Same as g0011 (WL=12, cap_per_key=20, discovery=280000, verify=400000) except canonicalizer parameter sub_budget_pairs=60000 (increased from 15000). All other canonicalizer params unchanged (sub_max_k=6, bce_budget=8000, ple_max_iters=60, big_node_cap=50000). Regimes: R1=ple∘bce, R2=dup∘taut-heavy∘lit-dup, R3=sub∘dup.
Worst-case stage summaries
overlap_min=100.0% (S1–S5)
key_hit_min=88.3% (S2–S4)
abstain_max=11.7% (S2–S4)
lib_keys_max=15
ties: tie_rate_max=50.0%, tiebreak_success_min=0.0%, unresolved_any=True
scale guard: PASS
Gate outcome (what changed, what did not)
Increasing sub_budget_pairs (a canonicalizer-strength knob intended to collapse more near-equivalences and improve key reuse) does not improve worst-case closure metrics under the current regime/key semantics. Overlap remains maximally stable, but closure remains limited: key_hit stays below the ≥99 threshold, abstain remains above the ≤1 threshold, and tie closure remains blocking.
This is a receipt-backed negative result: within this operational envelope, the closure deficit observed in g0008/g0010/g0011/g0012 is not primarily explained by subsumption-pair budget being too small (at least up to 60000 under the current sub_max_k and key design).
Failure taxonomy (paper)
KEY_MISS (primary), TIES_BROKEN (blocking).
What this enables (why it’s relevant)
Together with g0013 (which rules out max_candidates as the dominant “easy” recall explanation), g0014 rules out a second plausible “easy” explanation (canonicalizer budget). That tightens the paper’s bottleneck diagnosis: further progress likely requires a change in key semantics / canonical form definition and/or a validated tie-closure policy (not merely more recall or more canonicalizer budget inside the current semantics).
9. Conclusion
This paper advances the program of multitime barriers in [
1] from diagnosis to an operational compilation protocol: Multitime Transport Compilation. Rather than treating reductions and invariants as purely conceptual objects, we treat them as governed runtime transformations whose comparability must be made mechanical and auditable.
The miner implements a conditional candidate-decider pipeline — canonicalize → key → library lookup → bounded verification — under a Temporal State Machine discipline (admissibility corridors, receipts, no-reopen, abstain gating, and recovery-to-okay feasibility).
This operational stance is motivated by the standard NP-completeness and reduction framework initiated by Cook and Karp [
4,
5] and consolidated in core complexity references [
7,
8,
9], while explicitly acknowledging known barrier patterns such as natural proofs and proof-complexity constraints [
10,
11,
23,
24]. The objective is not to claim a theorem, but to publish a protocol whose claims are constrained by receipts and therefore remain contestable and reproducible.
Receipt-backed runs reported here establish two concrete, falsifiable separations. First, transport stability under the current metric can hold while closure still fails: with WL rounds set to 12, overlap is measured as stable (overlap_min=100%), yet key_hit remains far below the mechanical gate and abstain remains far above it, and unresolved ties persist. This shows that the “transport clock” (as proxied by overlap under the chosen canonicalizer+key semantics) can be OK while the “closure clocks” are not, turning the problem into a closure deficit rather than measured drift. Second, refinement is not monotone: increasing WL depth (WL=16 in the reported configuration) correlates with measured overlap loss and a scale-guard violation, providing a receipt-backed counterexample to the assumption that stronger canonicalization necessarily improves comparability. This supports rollback as a governance operator rather than an ad hoc retreat, consistent with the kernel logic developed in [
1].
The paper does not claim P = NP. Instead, it proposes a constructive and auditable research route: make comparability mechanical (overlap and scale guards), classify failure modes using receipt-backed metrics rather than narrative labels, and iterate one admissible knob at a time toward closure (higher key_hit, lower abstain, resolved ties) under declared regimes.
The broader intellectual posture is aligned with the selective classification and reject-option literature — abstain is allowed but must be measured and gated [
21,
22] — and with the SAT engineering tradition where solver behavior must be made inspectable and checkable via artifacts and proof-oriented tooling [
12,
13,
14,
15,
16,
17]. The multitime epistemic ledger framing P(τ)+NP(τ)=1 is used only as a governance identity for credit assignment under a declared window τ, extending the co-evolution ledger perspective in [
2] into a multitime setting while remaining consistent with the closure archetype emphasized by the Circle of Equivalence motif in [
3].
Finally, the paper’s canonicalization/key strategy is conceptually adjacent to graph canonicalization and refinement heuristics (including WL-style refinement), while not claiming equivalence to graph isomorphism results; the references provide context for why canonical forms and refinement can be powerful yet nontrivial [
18,
19,
20].