Submitted:
11 March 2024
Posted:
12 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Proposing a new architectural solution, called BC-DRLzSC, using the potential of BC, DRL, and the principle of a ZT architecture.
- Employing BC through the use of smart contracts for system identity registration, authentication, and resource access control, Where two proposed smart contracts manage these aspects within a Verifiable Byzantine Fault Tolerance (VBFT) based public BC.
- Developing an IDS that leverages DRL algorithm integrated with ZT architecture for a proactive attack detection. Our methodology employs a decentralized Q-learning algorithm to monitor and predict unusual behaviors exhibited by the SC devices.
- Evaluating the effectiveness and reliability of the proposed IDS through an extensive evaluation, focusing on key performance metrics such as accuracy, F1-score, precision, and detection rate. The evaluation was conducted comprehensive experiments using the NSL-KDD dataset to rigorously evaluate our system’s ability to detect various threats, which included widely recognized attack scenarios.
2. PRELIMINARIES
2.1. Zero Trust
- Identity: All users and devices must undergo rigorous authentication and authorization processes in a ZT environment. Trust is never assumed, regardless of the entity’s location.
- Data: Protecting sensitive data is paramount in the ZT framework. That involves employing encryption, data classification, and strict data access controls to prevent unauthorized access and mitigate data breaches.
- Devices and Workloads: ZT extends its security measures to encompass devices and workloads operating within the network. Continuous monitoring and validation of these endpoints are essential to ensure security and compliance with established security policies.
- Analytics and Visibility: ZT relies heavily on advanced analytics and visibility tools. Real-time network and user behavior monitoring allows for the early detection of vulnerabilities and potential security threats.
- Automation and Orchestration: Automation and orchestration are critical ZT components. Security processes are automated to respond rapidly to security incidents. Orchestrating security actions across the network enhances adaptability in the face of evolving threats.
- Network and Endpoint Security: ZT imposes stringent security controls on network traffic and individual endpoints, including network segmentation, micro-segmentation, and robust endpoint security measures to minimize attack surfaces and reduce the impact of security breaches.

2.2. Blockchain
- Data Acquisition Technologies: IoT devices, sensors, and other data sources are essential for collecting data from various points in the SC which is cryptographically secured on the BC. All data is treated as untrusted until verified in a ZT environment.
- Internet of Things (IoT) Infrastructure: IoT infrastructure is integral in BC-based SC implementations since it provides real-time track and trace data for the products throughout their movement in the SC.
- Data Management Platforms: Data management platforms can securely store and manage SC information in a distributed ledger, which aligns with the principle of ZT, since access to this data is controlled and monitored, and trust is not automatically granted to any party.
- Big Data Analytics: Big Data analytics can be applied to the data stored in the BC. Data analysis can be used to detect threats or security breaches, aligning with ZT’s continuous monitoring and verification principles.
- Traceability Plans: BC enables end-to-end traceability, allowing stakeholders to track a product’s journey from source to destination ensuring transparency, product authenticity, and quality.
- Monitoring Mechanisms: BC facilitates real-time SC activity monitoring, allowing for an immediate response to security threats. This constant monitoring aligns with the principle of ZT of not trusting any entity by default and verifying all actions.
- Key Performance Indicators (KPIs): KPIs include security-related metrics that gauge SC security measure effectiveness. These KPIs help ensure that security is continuously assessed and improved in a ZT environment.

2.3. Deep Reinforcement Learning
3. Related Work
4. Identity-Based Cyber-attacks in Supply Chain
- Identity Spoofing: the attacker attempts to impersonate a legitimate SC actor and generate fake transactions on its behalf to gain unauthorized access. Attackers may use stolen credentials or manipulate headers to appear as authorized entities.
- Counterfeit Identity: the attacker creates a fake identity to infiltrate the SC network and gain access to sensitive information or systems. Attackers might pose as authorized personnel to place orders, alter specifications, or manipulate logistics.
- Insider attackers: Insiders are usually current or former actors or business associates who have privileges to access sensitive information or privileged accounts in the SC. Insiders might abuse their credentials to steal sensitive information, manipulate orders, or cause disruptions.
- Brute Force Attack: the attacker systematically exploits all possible usernames and passwords combinations until the correct combination is found, allowing unauthorized access to an enterprise resource. Weak identity credentials can be easily exploited using automated tools.
- Account Hijacking: the attacker takes control of SC-related accounts, such as shipping or inventory management systems, to perform actions on their behalf, such as diverting shipments or altering inventory records, which can result in shipment delays, inventory inaccuracies, and financial losses.
- Phishing: the attacker often impersonates a legitimate identity, such as banks or service providers, by sending fraudulent emails or messages to trick network participants into revealing their sensitive information, including user credentials and credit card details.
- Man-in-the-Middle (MITM): the attacker intercepts the communication between two SC parties, often without their knowledge, allowing the attacker to eavesdrop on sensitive information, modify data, or inject malicious content into the communication.
- Malicious Insertion: the attacker targets hardware or software components within the SC and inserts malicious code or firmware. These compromised components can lead to security vulnerabilities, data breaches, or operational disruptions.
5. Proposed BC-DRLzSC Hybrid Security Framework
5.1. SC Layer

- Origin: the source of raw materials where the product is cultivated, harvested, processed, and packed.
- Supplier: also involved in processing, and packaging for distribution.
- Manufacturer: performs actions from simple packaging to complex manufacturing processes and sets product quality specifications.
- Distributor (or Wholesaler): acquires the products from various manufacturers in one place, called a distribution center, and assemble, or packages them.
- Retailer: sells the products, monitors and analyzes product conditions, and provides APIs for end-consumers.
- Consumers: usually have fewer rights than other actors, including viewing the product’s origin and history and verifying product authenticity.

5.2. IoT Layer
5.3. ZT Layer
- BC by ensuring node registration and authentication and transaction validation and verification.
- Secure Intelligent Access Control where ZT enforces the principle of least privilege, which means that users, devices, and applications are granted the minimum access required to perform their tasks. This action limits the potential damage caused by a security breach. Access controls are based on factors such as user identity, device security posture, location, and the sensitivity of the data or resource being accessed. Every access request is scrutinized, and a user or device is given access only if they meet the specific criteria set by the access policies.
- DRL detection module, which utilizes RL, a subset of ML that operates within the Markov Decision (MD) framework. This method equips the module with the ability to continuously learn device behavior and adapt to emerging threat patterns. RL utilizes MD to model the environment in situations where rewards or transition probabilities lack clarity,. The central objective of an RL agent is ascertaining an optimal mechanism that guides decision-making by mapping states to actions, enabling it to make informed choices based on its present state. The RL algorithm uses an iterative processes to enhance the agent’s decision-making proficiency over time. The agent refines its policy by selecting actions and receiving feedback in the form of rewards. This iterative interaction allows the agent to gradually discern actions that yield high rewards, eventually converging to an optimal policy that maximizes its expected cumulative reward over time.
5.4. BC Layer
6. Smart Contracts
6.1. Implementation
- Register_entity(): creating and initializing an ID card for each unique identity.
- Auth_entity(): matches the entity’s credentials against the registered login credentials.
- Info_entity(): queries a specific entity information.
- Revoke_entity(): removes the entity’s identity by invalidating its credentials.
- Create_resource(): creates a new resource (i.e., Data, Application, etc.)
- Access_resource(): requests access to a specific resource.
- Update_resource(): updates a resource’s details.
- Info_resource(): a view function that queries for a certain resource information.
- Exit_resource(): returns true if a resource exits safely.
- Total_resources(): queries of all system resources.
6.2. Evaluation
- from: 0x5B3...eddC4
- to: IdentityManagement.Register_entity(uint256,string,uint256,uint256) 0xd91...39138
- value: 0 wei
- data: 0xc39...00000logs: 0
- hash: 0xd6f...25ada
- status 0x1 Transaction mined and execution succeed
- transaction hash 0xd6f...25ada
- block hash 0x8d8...fe4e5
- block number 2
- from 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4
- to IdentityManagement.Register_entity(uint256,string,uint256,uint256) 0xd9145CCE52D386f254917e481eB44e9943F39138
- gas 226421 gas
- transaction cost 183589 gas
- execution cost 161613 gas
- input 0xc39...00000
- decoded input {
- "uint256 entity_id": "102",
- "string PK": "SfowXz",
- "uint256 O_id": "1",
- "uint256 entity_time": "1"



7. Deep Reinforcement Learning IDS
7.1. Implementation
7.2. Evaluation



8. Conclusion
References
- Ohm, M.; Plate, H.; Sykosch, A.; Meier, M. Backstabber’s knife collection: A review of open source software supply chain attacks. Detection of Intrusions and Malware, and Vulnerability Assessment: 17th Int. Conf., DIMVA 2020, Lisbon, Portugal, June 24–26, 2020, Proc. 17. Springer, 2020, pp. 23–43.
- Ismail, S.; Reza, H. Security Challenges of Blockchain-Based Supply Chain Systems. 2022 IEEE 13th Annual Ubiquitous Comput., Electron. & Mobile Commun. Conf. (UEMCON), 2022, pp. 1–6.
- Melnyk, S.A.; Schoenherr, T.; Speier-Pero, C.; Peters, C.; Chang, J.F.; Friday, D. New challenges in supply chain management: cybersecurity across the supply chain. Int. J. of Production Research 2022, 60, 162–183. [Google Scholar] [CrossRef]
- Li, D.; Zhang, E.; Lei, M.; Song, C. Zero trust in edge computing environment: A blockchain based practical scheme. Mathematical Biosciences and Engineering 2022, 19, 4196–4216. [Google Scholar] [CrossRef] [PubMed]
- Collier, Z.A.; Sarkis, J. The zero trust supply chain: Managing supply chain risk in the absence of trust. International Journal of Production Research 2021, 59, 3430–3445. [Google Scholar] [CrossRef]
- Ismail, S.; Reza, H.; Salameh, K.; Kashani Zadeh, H.; Vasefi, F. Toward an Intelligent Blockchain IoT-Enabled Fish Supply Chain: A Review and Conceptual Framework. Sensors 2023, 23. [Google Scholar] [CrossRef] [PubMed]
- Buck, C.; Olenberger, C.; Schweizer, A.; Völter, F.; Eymann, T. Never trust, always verify: A multivocal literature review on current knowledge and research gaps of zero-trust. Computers and Security 2021, 110, 102436. [Google Scholar] [CrossRef]
- Campbell, M. Beyond Zero Trust: Trust Is a Vulnerability. Computer 2020, 53, 110–113. [Google Scholar] [CrossRef]
- Sultana, M.; Hossain, A.; Laila, F.; Taher, K.A.; Islam, M.N. Towards developing a secure medical image sharing system based on zero trust principles and blockchain technology. BMC Medical Informatics and Decision Making 2020, 20, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Ismail, S.; Dawoud, D.; Reza, H. Towards A Lightweight Identity Management and Secure Authentication for IoT Using Blockchain. 2022 IEEE World AI IoT Congress (AIIoT), 2022, pp. 77–83.
- Moudoud, H.; Cherkaoui, S.; Khoukhi, L. An IoT blockchain architecture using oracles and smart contracts: the use-case of a food supply chain. 2019 IEEE 30th Annual Int. Symp. on Personal, Indoor and Mobile Radio Commun. (PIMRC). IEEE, 2019, pp. 1–6.
- Tsolakis, N.; Niedenzu, D.; Simonetto, M.; Dora, M.; Kumar, M. Supply network design to address United Nations Sustainable Development Goals: A case study of blockchain implementation in Thai fish industry. Journal of Business Research 2021, 131, 495–519. [Google Scholar] [CrossRef]
- Azzi, R.; Chamoun, R.K.; Sokhn, M. The power of a blockchain-based supply chain. Computers & Industrial Engineering 2019, 135, 582–592. [Google Scholar]
- Abeyratne, S.A.; Monfared, R.P. Blockchain ready manufacturing supply chain using distributed ledger. Int. J. of Research in Eng. and Technol. 2016, 05, 1–10. [Google Scholar]
- Powell, W.; Foth, M.; Cao, S.; Natanelov, V. Garbage in garbage out: The precarious link between IoT and blockchain in food supply chains. J. of Ind. Inf. Integr 2022, 25, 100261. [Google Scholar] [CrossRef]
- Dutta, P.; Choi, T.M.; Somani, S.; Butala, R. Blockchain technology in supply chain operations: Applications, challenges and research opportunities. Transportation Research Part E: Logistics and Transportation Review 2020, 142, 102067. [Google Scholar] [CrossRef] [PubMed]
- Tsoukas, V.; Gkogkidis, A.; Kampa, A.; Spathoulas, G.; Kakarountas, A. Enhancing Food Supply Chain Security through the Use of Blockchain and TinyML. Information 2022, 13. [Google Scholar] [CrossRef]
- Al-Farsi, S.; Rathore, M.M.; Bakiras, S. Security of Blockchain-Based Supply Chain Management Systems: Challenges and Opportunities. Applied Sciences 2021, 11. [Google Scholar] [CrossRef]
- Gai, K.; She, Y.; Zhu, L.; Choo, K.K.R.; Wan, Z. A blockchain-based access control scheme for zero trust cross-organizational data sharing. ACM Trans. on Internet Technol. (TOIT) 2022. [Google Scholar] [CrossRef]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J.; others. An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Nobi, M.N.; Krishnan, R.; Huang, Y.; Shakarami, M.; Sandhu, R. Toward deep learning based access control. Proc. of the Twelfth ACM Conf. on Data and Appl. Security and Privacy, 2022, pp. 143–154.
- Jin, Q.; Wang, L. Zero-Trust Based Distributed Collaborative Dynamic Access Control Scheme with Deep Multi-Agent Reinforcement Learning. EAI Endorsed Trans. on Security and Safety 2020, 8. [Google Scholar] [CrossRef]
- Kegenbekov, Z.; Jackson, I. Adaptive supply chain: Demand–supply synchronization using deep reinforcement learning. Algorithms 2021, 14, 240. [Google Scholar] [CrossRef]
- Hachaïchi, Y.; Chemingui, Y.; Affes, M. A policy gradient based reinforcement learning method for supply chain management. 2020 4th Int. Conf. on Advanced Systems and Emergent Technologies (IC_ASET). IEEE, 2020, pp. 135–140.
- Alves, J.C.; Mateus, G.R. Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncertain demands. Int. Conf. on Computational Logistics. Springer, 2020, pp. 584–599.
- Peng, Z.; Zhang, Y.; Feng, Y.; Zhang, T.; Wu, Z.; Su, H. Deep reinforcement learning approach for capacitated supply chain optimization under demand uncertainty. 2019 Chinese Automation Congress (CAC). IEEE, 2019, pp. 3512–3517.
- Powell, W.; Cao, S.; Foth, M.; He, S.; Turner-Morris, C.; Li, M. Revisiting Trust in Supply Chains: How Does Blockchain Redefine Trust? In Blockchain Driven Supply Chains and Enterprise Information Systems; Bouras, A., Khalil, I., Aouni, B., Eds.; Springer International Publishing: Cham, 2023; pp. 21–42. [Google Scholar]
- Gonczol, P.; Katsikouli, P.; Herskind, L.; Dragoni, N. Blockchain Implementations and Use Cases for Supply Chains-A Survey. IEEE Access 2020, 8, 11856–11871. [Google Scholar] [CrossRef]
- Malik, S.; Dedeoglu, V.; Kanhere, S.S.; Jurdak, R. Trustchain: Trust management in blockchain and iot supported supply chains. 2019 IEEE Int. Conf. on Blockchain (Blockchain). IEEE, 2019, pp. 184–193.
- Moudoud, H.; Cherkaoui, S. Empowering Security and Trust in 5G and Beyond: A Deep Reinforcement Learning Approach. IEEE Open J. of the Commun. Soc. 2023. [Google Scholar] [CrossRef]
| Year | Ref. | Directions | ||||
|---|---|---|---|---|---|---|
| DRL | BC | ZT | Application | Insights | ||
| 2021 | [23] | PPO | Inbound &Outbound Flow | PPO based DRL agent that can synchronize inbound and outbound flows in a SC and Support business continuity in a stochastic and non-stationary environment. | ||
| 2020 | [22] | MADDPG | ✓ | Traffic Allocation | MADDPG based optimize traffic allocation policy for adaptive and automatic collaborative management, considering network security, network environment, and user requirements. | |
| 2020 | [24] | PPO | Order Placement | Development Of a reinforcement learning agent for optimal order placement and inventory replenishment in SC management. | ||
| 2020 | [25] | PPO2 | Operating Cost | A DRL agent is employed to find an optimal policy for operating the entire SC and minimizing total operating costs . | ||
| 2019 | [26] | PPO2 | Inventory Management | A DRL method that aims to learn optimal policies that can adapt to changing demand conditions and make effective decisions regarding inventory management and capacity utilization in the SC. | ||
| 2022 | [17] | TinyML | ✓ | Security | Proposed a model to ensure the integrity Of collected data and self-sovereign identity approach to minimize single points of failure. Additionally, it incorporates TinyML's nascent technology to monitor devices to mitigate malicious behavior from actors in the SC. | |
| 2022 | [19] | ✓ | ✓ | Cross Organizational Data Sharing | An RBAC model using a multi-signature protocol and smart contract methods to facilitate lightweight data sharing among different organizations | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).