Early Detection of Re-Identification Risk in Multi-Turn Dialogues via Entity-Aware Evidence Accumulation

Yeongseop Lee; Seungun Park; Yunsik Son

doi:10.20944/preprints202603.0209.v1

Submitted:

02 March 2026

Posted:

03 March 2026

You are already at the latest version

Abstract

In multi-turn conversational AI, individually innocuous personally identifiable information (PII) fragments disclosed across successive turns can accumulate into a re-identification risk that no single utterance reveals on its own. Existing PII detectors operate on isolated utterances and therefore cannot track this cross-turn evidence build-up. We propose a stateful middleware guardrail whose core design principle is speaker-attributed entity isolation: every extracted PII fragment is classified by its originating conversational participant (first-person USER vs. incidentally mentioned third parties), and evidence is accumulated in entity-isolated subgraphs that structurally prevent cross-entity contamination. A three-tier extraction pipeline (Tier-0 deterministic regex; Tier-1 Presidio/spaCy NER with zero-shot NER independent verification; Tier-2 independent zero-shot NER; plus rule-based post-processing) refines noisy NER candidates, and an evidence-gated Commit Gate writes only corroborated cues to entity state, firing a re-identification onset signal t_pred at the earliest turn where combination-based onset rules grounded in the re-identification uniqueness literature are satisfied. On a 184-record template-synthetic evaluation corpus, the system achieves OW@5= 70.7% with MAE= 2.442 turns, reducing naïve accumulation MAE by 56% (BL2 MAE= 5.522). We confirm structural robustness on a 300-record mutation stress set and sanity-check RULE_B generalization on the ABCD external corpus (OW@0= 97.1%, MAE= 0.011). The pipeline requires no modification to the underlying conversational model and serves as a drop-in runtime guardrail for existing dialogue systems.

Keywords:

privacy

;

quasi-identifier

;

conversational AI

;

incremental disclosure

;

entity tracking

;

confidence gating

;

provenance

;

traceability

;

selective prediction

;

runtime guardrails

Subject:

Computer Science and Mathematics - Security Systems

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Early Detection of Re-Identification Risk in Multi-Turn Dialogues via Entity-Aware Evidence Accumulation

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe