Submitted:
21 May 2025
Posted:
22 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction

2. Related Work
2.1. Overview of Targeted Cyber Attacks
3. Insights and Practical Implications of the (H–DIR)2 Framework
3.1. Simulation Pipeline: Formal (H-DIR)2Workflow

- O1
- RDF conversion fRDF : Ω → T0 serialises raw frames into W3C–RDF 1.1 triples, enabling formally verifiable reasoning [17].
- O2
- Spark SQL windowing W∆t ◦ Q : T0 → S(∆t) performs streaming selection with sub-second latency on multi-terabyte traces [29].
- O3
- Vectorisation φ : S(∆t) → xt applies one-hot or embedding schemes that match best practice in neural intrusion detectors [27].
- O4
- ARNN core A : xt →(at+1, Wt+1) updates the probabilistic weight matrix using the learning rule in [6].
- O5
- Semantic graph injection Ψneu→sym : Wt+1 → ∆Tt reifies risk scores as triples e.g. :host_A :hasRiskScore "0.87"^^xsd:float, thus supporting region-based, policy-driven enforcement [1].
- O6
- Dynamic update loop U : Tt ∪∆Tt → Tt+1 closes the observation–prediction cycle and realises the runtime coordination that current 5G/IoT surveys still find missing [19].
- –
- Feature schema. Vectoriser ϕ loads a protocol–specific dictionary DΠ (e.g. TCP flags vs. RPL control codes);
- –
- Loss re–weighting. Hyper-parameters (α, β) are tuned per Λ to prioritise node classification or edge prediction1;
- –
- Graph semantics. Risk-injection operator Ψneu→sym appends triples in a namespace :Π so that SPARQL rules remain protocol-consistent.
- 1.
- RDF Conversion Level2
with timestamps τi, an injective map
.- 2.
- Spark SQL / Streaming Selection Level
materialise the structured table
- 3.
- Vectorisation Level
- 4.
- ARNN Core Level
- 5.
- Semantic Graph (SPARQL) — Dual Level Coupling
- 6.
- Dynamic Update Loop
.3.2. Entropy-Based Detection and Adaptive Defense with (H-DIR)2
3.3. Dual Scalability of the (H-DIR)2 Architecture
- Vertical (Quantitative) Scalability. Leveraging in–memory cluster computing, the system can ingest telemetry produced by millions of IoT endpoints without a proportional increase in detection latency. Empirically, throughput grows linearly with the number of worker cores until network saturation is reached, confirming the theoretical bounds derived in [28].
- Horizontal (Qualitative) Scalability. By sharding feature vectors across Resilient Distributed Datasets (RDDs) and using a micro–batch streaming model, (H-DIR)2 sustains multi-terabyte traffic volumes while preserving sub-second windowing semantics. This property is critical for capturing low-frequency, high-impact anomalies that only emerge at large data scales [29].
3.4. Integration with Apache Spark and RDF Graphs
- –
- Nodes represent entities such as IP addresses or ports;
- –
- Edges encode typed interactions (packet type, temporal correlation).
3.5. ARNN: Adaptive Neural Modelling for Attack Propagation
3.6. Semantic–Neural Coupling and Dynamic Update
- –
- Semantic layer – an ontology of protocol rules and expert heuristics that prunes forbidden state transitions;
- –
- Neural layer – an Adaptive Recurrent Neural Network (ARNN) that learns temporal correlations directly from telemetry streams.
, with γ calibrated via ROC analysis.3.7. Dynamic Update of the Semantic Graph

4. Experimental Results and Evaluation
4.1. SYN-Flood Case Study
- –
- Shannon entropy H(X) on flag distribution X = {SYN, SYN–ACK, ACK}. An alarm is raised if ∆H = Ht − Hbaseline < −θH with θH = 0.50 bits [?].
- –
- Imbalance ratio r = #SYN/#SYN–ACK (continuous feature).
- –
- ARNN quality: accuracy, false-positive rate (FPR), area under ROC curve (AUC).
- –
- Detection latency τdet from first spoofed SYN to alarm.

| Indicator | Value | 95% CI |
| Accuracy | 94.1% | [93.7, 94.5] |
| FPR | 4.7% | [4.3, 5.1] |
| AUC | 0.978 | ±0.004 |
| τ˜det | 247 ms | [221, 273] ms |
| ∆H∗ (peak) | −1.15 bits | — |
| rattack | 27.4 ± 3.5 | — |
4.2. DAO–DIO Routing Manipulation Case Study
- (O1)
- RDF serialisation into the IoT–RPL–OWL ontology, yielding T0.
- (O2)
- Streaming windowing ∆t = 5 s and Spark SQL filtering.
- (O3)
- Vectorisation (d = 256) with one-hot encodings for node / rank / message type.
- (O4)
- ARNN core – attentive RNN, nh = 128, η = 10−3, loss weights (α, β) = (0.3, 0.7).
- (O5)
- Risk scoring Ri = σ(ai); nodes with Ri > 0.6 are flagged.
- (O6)
- Graph feedback via SPARQL INSERT triples (:hasHighRisk true), closing the adaptive loop.
- –
- Routing loops — number of closed rank cycles.
- –
- Maximum incoming risk maxi Σj wji in the learned graph.
- –
- Packet-delivery ratio (PDR).
- –
- Average loop duration in seconds.
- –
- ∆H entropy over DAO/DIO message mix; alarm if ∆H >θH = 1.2 bits [?].

4.3. NTP Amplification Case Study
- (O1)
- RDF serialisation into the IoT–UDP–OWL schema.
- (O2)
- Windowing ∆t = 1 s; Spark SQL computes per-IP entropy.
- (O3)
- Vectorisation (d = 128) on srcIP, dstIP, UDP ports, NTP_cmd.
- (O4)
- ARNN core — LSTM variant, 3 layers, 64 cells, η = 2 × 10−3.
- (O5)
- Risk scoring threshold Ri > 0.55.
- (O6)
- Graph feedback injects :underMitigation true.
- –
- Edge caching (C = 0.9) to absorb duplicate replies.
- –
- Anycast load distribution over S = 5 edge nodes.
- –
- Entropy filter — alarm if ∆H ≥ θH = 1.5 bits.
- –
- ARNN early predictor (validation ACC = 0.90) drives proactive throttling.
- –
- Peak load at the victim (Gb/s).
- –
- Mitigation latency τmit (s) after ∆H trigger.
- –
- Back-end traffic reduction (ratio).
- –
- ARNN early-stage prediction accuracy.



4.4. Comparative Summary Across Scenarios



4.5. Extended Comparison with State-of-the-Art Methods

4.6. Dynamic Integration Between Semantics and Prediction in (H–DIR)2
- –
- On the left, the semantic layer (RDF + SPARQL) identifies suspicious patterns in network traffic flows.
- –
- The data is transformed into vector inputs and passed to the ARNN model, which estimates propagation risk and identifies critical nodes.
- –
- On the right, the predictions (e.g., probability of attack) feed into the weight graph wij.
- –
- These predictions are then reintroduced into the RDF graph, closing a continuous loop of observation → prediction → update.
5. Conclusions
A Mathematical Proofs
A.1 Closed-Form Backlog Threshold


Reproducibility
Acknowledgements
References
- Arakelian, A., et al.: Region-based security and policy enforcement for internet-of-things architectures. In: Proc. IEEE International Conference on Internet of Things (iThings). pp. 1–8 (2018). [CrossRef]
- Buyya, R., Vahid Dastjerdi, A.: Internet of Things: Principles and Paradigms. Elsevier, 2nd edn. (2023). 2023.
- Conti, M., Dehghantanha, A., Franke, K., Watson, S.: Internet of things security and forensics: Challenges and opportunities. Future Generation Computer Systems 78, 544–546 (2018). [CrossRef]
- for Cybersecurity, C.I.: Cic-ddos2019 dataset (2019), https://www.unb.ca/cic/datasets/ddos-2019.html.
- Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intrusion detection: Approaches, datasets, and challenges. Journal of Information Security and Applications 50, 102419 (2020). [CrossRef]
- Gelenbe, E.: A diffusion model for packet travel time in a random neural network. IEEE Systems Journal 6(2), 308–316 (2012). [CrossRef]
- Gu, Y., Li, K., Guo, Z., Wang, Y.: A deep learning and entropy-based approach for network anomaly detection in iot environments. IEEE Access 7, 169296–169308 (2019). 1692; 7.
- Gupta, B., Badotra, S., Quamara, M., Choudhary, S.: Distributed denial of service attacks detection techniques in cloud computing and iot: Challenges and future directions. Computer Communications 178, 283–300 (2021). [CrossRef]
- Hamza, A., Gharakheili, H.H., Benson, T.A., Sivaraman, V.: Detecting volumetric attacks on iot devices via sdn- based monitoring of mud activity. In: Proceedings of the 2019 ACM Symposium on SDN Research (SOSR) (2019), https://www.andrew.cmu.edu/user/theophib/papers/SoSR19.pdf.
- Kaynar, B., Sivrikaya, F.: Distributed attack graph generation with deep learning for network security. In: 2019 IEEE Symposium on Computers and Communications (ISCC). pp. 1–6. IEEE (2019).
- Koroniotis, N., Moustafa, N., Sitnikova, E., Turnbull, B.: Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. Future Generation Computer Systems 100, 779–796 (2019). [CrossRef]
- Marcov, L., et al.: Dao–dio routing manipulation dataset (2023). [CrossRef]
- Mirsky, Y., et al.: Kitsune Network Attack Dataset – NTP Amplification Subset. 2023. Available online: https://www.kitsune-dataset.example.org/ntp (accessed on 5 May 2025).
- Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: An ensemble of autoencoders for online network intrusion detection. In: Proceedings of the Network and Distributed System Se- curity Symposium (NDSS). The Internet Society (2018), https://www.ndss-symposium.org/ndss2018/kitsune-ensemble-autoencoders-online-network-intrusion-detection/.
- Moustafa, N., Turnbull, B., Choo, K.K.R.: An ensemble intrusion detection framework for iot networks using deep learning and feature selection. IEEE Transactions on Industrial Informatics 18(6), 4022–4031 (2022).
- Paxson, V., Allman, M., Chu, J., Sargent, M.: Computing TCP’s Retransmission Timer. RFC 6298 (2011).
- Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of sparql. ACM Transactions on Database Systems 34(3), 16:1–16:45 (2009). [CrossRef]
- Raoof, A., Matrawy, A., Lung, C.H.: Routing attacks and mitigation in iot networks: Rpl-based approach. IEEE Internet of Things Journal 7(8), 7368–7381 (2020). [CrossRef]
- Sicari, P., Rizzardi, A., Coen-Porisini, A.: 5g in the internet of things era: An overview on security and privacy challenges. Computer Networks 179, 107345 (2020). [CrossRef]
- Sicari, P., Rizzardi, A., Coen-Porisini, A.: 5g in the internet of things era: An overview on security and privacy challenges. Computer Networks 179, 107345 (2020). [CrossRef]
- Sicari, P., Rizzardi, A., Grieco, L., Coen-Porisini, A.: Security, privacy and trust in internet of things: The road ahead. Computer Networks 76, 146–164 (2015). [CrossRef]
- Singh, P., Kumar, R., Gupta, S.: Entropy-based cyber threat detection in cloud-iot systems: A review and future directions. Journal of Network and Computer Applications 224, 103845 (2024).
- Suryotrisongko, H.W., Akbar, M.: Anomaly detection in internet of things using isolation forest algorithm. In: 2018 International Electronics Symposium (IES). pp. 449–454. IEEE (2018). [CrossRef]
- Tosi, D., Pazzi, R.: Design and experimentation of a distributed information retrieval-hybrid architecture in cloud iot data centers. In: IFIP International Internet of Things Conference. pp. 12–21. Springer (2024).
- Wang, P., Chen, Q., Peng, S.: Graph-based security analysis for industrial control systems. IEEE Transactions on Industrial Informatics 14(5), 1890–1900 (2018).
- Wilson, E.: Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22(158), 209–212 (1927). [CrossRef]
- Yin, C., Zhu, Y., Fei, J., He, X.: A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5, 21954–21961 (2017). [CrossRef]
- Zaharia, M., Chowdhury, M., Das, T.e.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: USENIX NSDI. pp. 15–28 (2012).
- Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Apache spark: A unified engine for big data processing. Communications of the ACM 59(11), 56–65 (2016). [CrossRef]
| 1 | For SYN Flood, α ≫ β emphasises rapid node compromise detection; for DAO–DIO, β dominates to reveal routing loops. |
| 2 | All symbols match the notation of Section 3. |
| 3 | |
| 4 | Experiments run on Python3.11.4, Spark3.5.0, PyTorch2.1; section: Reproducibility. Full environment files are included in the repository https://github.com/RobUninsubria/HDIR2-paper.git. |
| 5 |





| Attack | Protocol Layer | Key Entropy Signal | Mitigation Module |
| TCP SYN-Flood | Transport | ∆Hflags spike | Adaptive Rate Limiter (Section 4.1) |
| DAO–DIO (RPL) | IoT Network | ∆Hpath drift | Route Sanitiser (Section 4.2) |
| NTP Amplification | Application / UDP | ∆Hsize bimodality | Amplification Throttler (Section 4.3) |

| Method | Latency (ms) | AUC | Entropy-based Explainability | Real-time |
| Spark IDS | 950 | 0.91 | No | Yes |
| Kitsune-AE | 670 | 0.93 | No | Yes |
| Isolation Forest | 720 | 0.89 | Partial | No |
| (H-DIR)2 | 247 | 0.978 | Yes | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).