The widespread use of textual data sanitization techniques,such as identifier removal and synthetic data generation, has raised ques-tions about their effectiveness in preserving individual privacy. This studyintroduced a comprehensive evaluation framework designed to measureprivacy leakage in sanitized datasets at a semantic level. The frameworkoperated in two stages: linking auxiliary information to sanitized recordsusing sparse retrieval and evaluating semantic similarity between orig-inal and matched records using a language model. Experiments wereconducted on two real-world datasets, MedQA and WildChat, to assessthe privacy-utility trade-off across various sanitization methods. Resultsshowed that traditional PII removal methods retained significant privateinformation, with over 90% of original claims still inferable. Syntheticdata generation demonstrated improved privacy performance, especiallywhen enhanced with differential privacy, though often at the cost ofdownstream task utility. The evaluation also revealed that text coher-ence and the nature of auxiliary knowledge significantly influenced re-identification risks. These findings emphasized the limitations of currentsurface-level sanitization practices and highlighted the need for robust,context-aware privacy mechanisms that balance utility and protection insensitive textual data releases.