Submitted:
02 May 2026
Posted:
05 May 2026
You are already at the latest version
Abstract
Copy-move forgery in biomedical research images threatens scientific integrity, yet automated pixel-level localisation remains challenging due to low-contrast textures and small duplicated regions. We propose a segmentation pipeline that pairs a frozen DINOv2-base vision transformer (86M parameters) with a lightweight 3.4M-parameter convolutional decoder. Training proceeds in two stages: decoder warmup at learning rate $10^{-5}$ with the backbone fully frozen, followed by joint fine-tuning of the last twelve transformer blocks at $5 \times 10^{-7}$, yielding a $20\times$ learning rate ratio that preserves pretrained features while adapting to biomedical imagery. At inference, flip-based test-time augmentation, gradient-enhanced adaptive thresholding ($\alpha = 0.45$), and grid-searched area/probability gating ($A_{\min} \in [200,400]$, $p_{\min} \in [0.20,0.30]$) convert probability maps into binary masks. Evaluated on the Recod.ai/LUC benchmark derived from over 2{,}000 retracted papers, the method achieves a validation F1 of 0.563 on a 1{,}027-image held-out split. Comparative studies with representative existing approaches, spanning rule-based, CNN-based, and hybrid multi-component strategies, show that the proposed pipeline provides a stronger balance of localisation accuracy, architectural simplicity, and reproducibility than existing methods for biomedical copy-move forgery detection.