Preprint
Article

This version is not peer-reviewed.

Graph-Based Phishing Domain Detection via Certificate–DNS Heterogeneous Networks

Submitted:

29 December 2025

Posted:

30 December 2025

You are already at the latest version

Abstract
Individual phishing URLs are often short-lived, but underlying infrastructure such as domains, IP addresses, and certificates exhibits recurring patterns. We propose a graph-based detection framework that models a heterogeneous network comprising domains, IP addresses, TLS certificates, and registrars. Node embeddings are learned using a relational graph convolutional network (R-GCN) trained on 3.1 million domains, of which 210,000 are labeled as phishing-related. Structural features such as shared-IP communities, certificate reuse, and registrar clusters are incorporated into the model. The graph-based detector is capable of flagging suspicious domains before they are widely used in attacks; in a retrospective study, it identifies 73% of phishing domains at least 24 hours prior to first appearance in blacklists. Compared with domain-lexical baselines, the method improves precision at 90% recall by 15.6 percentage points. These findings demonstrate that infrastructure-level graph modeling provides complementary signals to content-based phishing detection and can enhance proactive defense.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated