A Comprehensive Survey on Web-Based Stress Testing Frameworks for Blockchain Systems: Architectures, Metrics, and Future Directions

Krish Mithra Nagamothu

doi:10.20944/preprints202604.0562.v1

Submitted:

07 April 2026

Posted:

09 April 2026

You are already at the latest version

Abstract

As blockchain technology evolves from specialized financial tools to foundational infrastructure for Web3, the necessity for rigorous performance validation becomes paramount. Stress testing—defined as the evaluation of system stability under extreme workloads—is critical for identifying bottlenecks in consensus mechanisms and peer-to-peer communication. This survey provides an exhaustive analysis of web-based stress testing frameworks. Unlike traditional CLI-based tools, web-based frameworks provide real-time telemetry and distributed orchestration capabilities essential for modern decentralized applications. We categorize existing literature into three generations of benchmarking, evaluate ten prominent frameworks based on a multi-dimensional rubric, and identify significant research gaps including the lack of standardized cross-chain stress protocols and AI-integrated anomaly detection. This work aims to provide a roadmap for researchers and DevOps engineers to select and implement robust testing environments for enterprise-grade blockchain deployments.

Keywords:

blockchain

;

stress testing

;

benchmarking

;

performance analysis

;

hyperledger caliper

;

scalability

;

throughput

Subject:

Computer Science and Mathematics - Computer Science

1. Introduction

Blockchain systems are inherently complex, integrating distributed systems, cryptography, and game theory to provide decentralized trust. However, the "Blockchain Trilemma"—the fundamental trade-off between security, scalability, and decentralization—remains a significant barrier to mainstream adoption. While many protocols report high transactions-per-second (TPS) in controlled, low-latency environments, real-world performance often degrades significantly under adversarial or high-load conditions. The rapid expansion of Decentralized Finance (DeFi), Non-Fungible Tokens (NFTs), and Central Bank Digital Currencies (CBDCs) has placed unprecedented pressure on underlying distributed ledgers.

Recent incidents, such as the downtime experienced by the Solana and Polygon networks in 2021 and 2022, underscore the critical need for robust validation methodologies. These failures were often triggered by unexpected transaction surges from NFT minting events or high-frequency trading bot activities. Such events highlight a critical research gap: the lack of standardized, high-fidelity stress-testing environments capable of simulating real-world network jitter, mempool congestion, and consensus bottlenecks before mainnet deployment.

Traditional benchmarking tools often rely on Command-Line Interfaces (CLI) and manual configuration, which limits their accessibility and scalability for complex, multi-node deployments. The emergence of web-based stress testing frameworks has revolutionized this landscape by providing centralized orchestration, real-time telemetry, and visual dashboards. These frameworks enable DevOps engineers and researchers to monitor transient performance spikes and resource exhaustion in real-time, facilitating more effective root-cause analysis.

1.1. Contributions

This survey provides a comprehensive analysis of the current state of web-based stress testing frameworks for blockchain systems. The primary contributions of this work are as follows:

Architectural Taxonomy: We provide a structured categorization of stress testing components, from workload generation to monitoring layers.
Comparative Analysis: We evaluate ten prominent benchmarking frameworks using a multi-dimensional rubric, highlighting their trade-offs in terms of scalability, protocol support, and ease of use.
Metric Framework: We define a multi-layered taxonomy of performance metrics across the network, consensus, and application layers.
Research Roadmap: We identify critical research gaps and outline future directions, including AI-driven adaptive testing and standardized cross-chain protocols.

1.2. Defining Stress Testing in Decentralized Ledgers

In traditional software, stress testing focuses on server-side resource exhaustion. In the context of blockchain technology, this definition extends to several critical areas. Mempool congestion is a primary concern, involving the evaluation of how a node handles an increasing backlog of unconfirmed transactions during peak periods. Consensus latency is another vital metric, measuring the time required for a set of nodes to reach agreement under conditions of high network jitter and packet loss. Furthermore, state database growth must be monitored to understand the impact of rapid read and write operations on the underlying storage systems like LevelDB or CouchDB. Finally, gossip protocol efficiency is tested to determine the ability of the network to propagate blocks and transactions across a geographically dispersed set of nodes without significant delay.

1.3. The Shift to Web-Based Frameworks

Earlier tools required significant manual configuration and often lacked user-friendly interfaces, necessitating deep technical expertise for even basic tests. Modern web-based frameworks, often integrated with dashboards like Grafana and Prometheus, have revolutionized this process by enabling remote orchestration. This allows users to trigger tests from a centralized user interface across global nodes simultaneously. Visual telemetry provides real-time heatmaps of transaction failure rates and latency spikes, offering immediate feedback that was previously buried in logs. These frameworks also improve accessibility by lowering the barrier for non-technical stakeholders to audit performance. Additionally, automated reporting features allow for the generation of detailed PDF or HTML reports with a single click, streamlining the documentation of test results for compliance and audit purposes.

2. Taxonomy of Blockchain Performance Metrics

To evaluate the effectiveness of a stress testing framework, one must first understand the metrics it aims to measure. We categorize these into three primary layers: the network layer, the consensus layer, and the application layer.

2.1. Network Layer Metrics

These metrics focus on the underlying peer-to-peer communication that forms the backbone of any blockchain. Propagation delay is a critical measure, representing the time taken for a block to reach a majority of nodes, such as 51% or 90% of the network. Bandwidth consumption is also monitored to track the amount of data transferred between nodes per second, ensuring the network can handle the load without saturating links. Additionally, peer discovery time is evaluated to determine how quickly a new node can find and connect to neighbors, which is essential for network resilience and scalability.

2.2. Consensus Layer Metrics

Metrics at the consensus layer are specific to the blockchain’s agreement mechanism and its ability to maintain a single source of truth. Block finality time is perhaps the most important, as it measures the duration from transaction submission to the point where it is considered irreversible. The fork rate is another key indicator, representing the frequency of competing blocks being produced simultaneously, which can signal instability. Furthermore, the validator participation rate provides insight into the health of the network by showing the percentage of active validators participating in each consensus round.

2.3. Application Layer Metrics

Application layer metrics are the most visible to end-users and directly impact the user experience. Throughput, often measured in transactions per second (TPS), defines the number of successful transactions processed by the network. Transaction latency measures the time between a user signing a transaction and its inclusion in a finalized block. The success rate is also vital, representing the ratio of successful transactions to the total number of transactions submitted, which helps identify potential failures in smart contract execution or resource exhaustion.

2.4. Economic and Rarity-Based Metrics in NFT Systems

In the context of NFT-based applications, performance metrics must also account for economic factors that drive network load. Rarity algorithms play a crucial role in determining the value of digital assets. These algorithms typically consider factors such as trait distribution, historical sales data, and overall scarcity within a collection [7]. Common metrics used by rarity algorithms include trait rarity, statistical rarity, and average trait rarity.

Furthermore, NFT tokenomics refers to the economic principles and mechanisms that govern these assets. Unlike traditional tokens, NFTs are unique and indivisible, representing digital or physical items often encoded within smart contracts [6]. As non-interchangeable data units, they provide better assurance of safety against unauthorized tampering, replication, or destruction. The interplay between rarity and market demand can lead to sudden transaction surges (e.g., during "minting" events), necessitating stress testing frameworks that can simulate these specific economic behaviors.

3. System Architecture of a Generic Web-Based Stress Tester

The architecture of a modern web-based stress testing framework is designed to be modular and decoupled, enabling it to handle the diverse demands of heterogeneous blockchain environments. As illustrated in Figure 1, the system is generally composed of four distinct layers: the User Interface (UI) Layer, the Orchestration Layer, the Execution Layer, and the Monitoring Layer.

The User Interface Layer provides a centralized web dashboard where users define test parameters, such as transaction rates and duration. This layer communicates with the Orchestration Layer, which manages the lifecycle of a test and distributes tasks to the Execution Layer. The Execution Layer consists of multiple workload generators and blockchain adapters that produce the transaction load. Finally, the Monitoring Layer collects real-time telemetry data from the target network, which is then visualized on the dashboard.

3.1. Workload Generation Process

The workload generation process is the most resource-intensive component of the framework. Figure 2 details the internal pipeline of a high-performance generator, highlighting the sequential stages from transaction definition to receipt tracking.

4. Literature Review

The landscape of blockchain benchmarking has evolved through multiple generations, moving from platform-specific technical audits to standardized, web-integrated frameworks. We categorize the existing body of research into four thematic groups: foundational frameworks, standardization efforts, modern web-integrated approaches, and consensus-driven strategies.

4.1. Foundational Benchmarking and Platform Comparison

Early research primarily focused on understanding the performance characteristics of specific platforms. The seminal work by Dinh et al. on Blockbench [15] established a foundational yardstick for private blockchains, identifying that performance bottlenecks often reside in the data-sharing layer rather than the network layer. Building upon this, Pongnumkul et al. [10] conducted a comparative analysis between Ethereum and Hyperledger Fabric, concluding that while Fabric outperformed Ethereum in throughput, it necessitated significantly more complex configuration. Dinh et al. later expanded this research by providing a data processing view of blockchain systems [6], which assisted in untangling the complexities of distributed ledger performance across diverse architectures.

4.2. Standardization and Tooling Evolution

As the industry matured, the focus shifted toward cross-platform standardization. Hyperledger Caliper [9] emerged as a response to the need for a standardized tool supporting multiple backends, enabling consistent performance comparisons. Recent work by Kaushal and Kumar [1] further explores Caliper’s modularity, emphasizing its utility in measuring diverse blockchain solutions. In contrast to general-purpose tools, Kuzlu et al. [11] evaluated Fabric for smart grid applications, arguing that stress parameters must be tailored to specific use cases, such as IoT-based blockchains. To address the lack of uniformity in reporting results, Ren et al. [5] proposed the Blockchain Benchmarking Standardized Framework (BBSF), which aims to harmonize performance evaluation metrics across the industry.

4.3. Modern Web-Integrated Approaches

The third generation of benchmarking emphasizes real-time observability and accessibility. Monrat et al. [12] and Sharma et al. [13] argue that without real-time visual telemetry, transient spikes in latency are frequently missed during post-test analysis. This observation has led to the integration of web-based dashboards like Grafana. Furthermore, Fan et al. [14] introduced queuing network models into stress testing, providing a mathematical foundation for predicting system saturation points. More recently, Touloupou et al. [2] validated these frameworks through controlled deployments of XRPL and Ethereum, highlighting that environment consistency is as critical as the choice of benchmarking tool.

4.4. Consensus, AI, and Sharding Strategies

The most recent research frontier involves adaptive and hardware-aware testing. Wang et al. [7] provide an extensive survey on how different protocols respond to extreme load, while Salah et al. [8] suggest that machine learning could optimize stress parameters dynamically. Billah et al. [3] focus on performance optimization in multi-machine systems, which is the core of distributed stress testing. Additionally, Shakila and Anitha [4] compare local frameworks like Truffle and Hardhat, illustrating how local development testing differs from large-scale network stress testing. Finally, recent studies have begun exploring sharded and Layer 2 (L2) architectures, with Song et al. [18] and Khan et al. [16] providing initial scalability analyses for these modular systems.

4.5. AI, Genetic Algorithms, and LLMs in Blockchain Content Generation

The integration of Artificial Intelligence (AI) into the blockchain ecosystem has opened new avenues for content creation and optimization. Genetic algorithms (GAs) [8] are powerful heuristic optimization methods inspired by biological evolution, simulating operations such as crossover and mutation to search complex problem spaces [19]. These algorithms are particularly effective for handling multi-objective optimization in NFT rarity and trait generation.

Furthermore, Large Language Models (LLMs) and generative AI have revolutionized content creation across industries [20]. In the context of the metaverse and NFT-based gaming, LLMs enable the efficient production of high-quality, personalized content, democratizing the creation process. For stress testing, these AI models can be used to generate diverse and realistic transaction payloads, simulating the complex user-generated content patterns seen in modern decentralized platforms.

5. Architectural Components of Stress Testing Frameworks

A robust web-based stress testing framework typically consists of several decoupled components that work in harmony to provide a comprehensive testing environment. The workload generator is the core engine responsible for creating and signing transactions; it must be capable of generating high-frequency requests without becoming a bottleneck itself by using asynchronous I/O and multi-threading. Blockchain adapters serve as abstraction layers that allow the framework to communicate with different blockchain protocols using a unified API, ensuring that the same test scripts can be run against multiple targets. Monitoring and telemetry agents are deployed alongside blockchain nodes to collect resource utilization data and internal node metrics like mempool size and peer count. Finally, the web dashboard and orchestrator act as the central hub where users define test parameters and view real-time results, managing the distributed workload generators and aggregating telemetry data for the user.

6. Methodological Comparison of Frameworks

In this section, we provide a structured comparison of the most widely used tools based on a specific set of criteria. The comparative rubric evaluates frameworks based on metric granularity, protocol support, the scalability of the tester, and the quality of real-time visualization. Metric granularity refers to whether the tool measures beyond simple TPS and latency, while protocol support checks for compatibility with EVM, WASM, or Fabric-based chains. Scalability of the tester is crucial for determining if the tool can simulate thousands of concurrent users, and real-time visualization assesses the depth and usability of the web interface provided to the user.

Table 1. Detailed Taxonomy of Blockchain Stress Testing and Benchmarking Tools

Tool	Primary Focus	Web Interface	Supported Chains	Real-time Monitoring	Ease of Use	Extensibility
Blockbench	Technical Audit	None	Private (Ethereum, Fabric)	No	Low	Medium
Hyperledger Caliper	Industry Standard	CLI + Dashboard	Multi-chain (Fabric, Besu, etc.)	Yes (via Prometheus)	Medium	High
ChainHammer	Burst Testing	Web-Visuals	Quorum, Ethereum, Geth	Yes	High	Low
Avalanche-Tester	Throughput	Web-based UI	Avalanche-only	Yes	Very High	Low
EVM-Stress	Gas Limit Testing	CLI	Any EVM-compatible	No	High	Medium
JMeter-Web3	Load Testing	Integrated UI	Any via JSON-RPC	Yes	Medium	High
BBSF	Standardization	Dashboard	General Purpose	Yes	Medium	Very High
Truffle/Hardhat	Local Dev	CLI/Dashboard	EVM	Partial	Very High	High

7. In-Depth Analysis of Prominent Frameworks

Hyperledger Caliper is perhaps the most well-known benchmarking tool, allowing users to write test scripts in JavaScript and define network configurations in YAML. Its modular architecture allows for the easy addition of new blockchain connectors, although setting it up for complex environments can be challenging. Blockbench focuses on the internal performance of the blockchain stack by breaking down the system into the consensus, data, and execution layers. While it provides deep insights, its lack of a modern web interface makes it less suitable for rapid DevOps cycles. ChainHammer is designed for hammering a network with a burst of transactions to identify breaking points, providing excellent visual tools for viewing transaction submission versus confirmation rates.

8. Security and Resilience Stress Testing

While performance is the primary focus of stress testing, resilience to adversarial conditions is equally critical. Modern frameworks are beginning to incorporate security-focused stress tests.

8.1. DDoS Simulation

Simulating a Distributed Denial of Service (DDoS) attack involves flooding the network with invalid or low-gas transactions to exhaust the mempool and CPU of validator nodes. Web-based frameworks allow for the orchestration of thousands of globally distributed IP addresses to simulate a real botnet attack.

8.2. Network Partitioning (Eclipse Attacks)

Resilience testing also includes simulating network partitions where a subset of nodes is isolated from the rest of the network. This tests the blockchain’s ability to maintain consensus or recover gracefully once the partition is resolved.

9. Benchmarking Sharded and Layer 2 Architectures

The evolution of blockchain from monolithic to modular architectures has introduced new complexities for stress testing. Sharding and Layer 2 (L2) solutions, while improving throughput, create unique performance bottlenecks that traditional benchmarking tools often miss.

9.1. Cross-Shard Communication Latency

In sharded systems, a significant portion of the transaction load involves cross-shard communication. Stress tests must evaluate the latency of these transactions, as they require coordination between multiple shard chains. Frameworks like Paramart [17] address this by simulating parallel resource allocation, but standardized tools for measuring cross-shard consensus delays are still in their infancy. Key metrics include the Cross-Shard Transaction Completion Time (CSTCT) and the shard-to-shard message propagation delay.

9.2. Layer 2 Rollup Performance

L2 solutions, particularly Optimistic and Zero-Knowledge (ZK) rollups, introduce a decoupling of execution and settlement. Stress testing an L2 requires measuring not only the throughput on the rollup itself but also the "finality" time on the Layer 1 (L1) mainnet. This includes evaluating the cost and latency of batch submission and the proof generation time for ZK-rollups. Recent studies [18] suggest that L2 performance is often constrained by L1 data availability rather than the rollup’s internal execution capacity.

10. AI-Driven Adaptive Stress Testing

A major challenge in manual stress testing is defining the "right" workload to find breaking points. Recent research has explored the use of Artificial Intelligence (AI) and Machine Learning (ML) to automate this process.

10.1. Dynamic Workload Generation

Instead of fixed transaction rates, AI-driven frameworks can use Reinforcement Learning (RL) to dynamically adjust the transaction rate, payload size, and contract complexity based on real-time network feedback. This allows the tester to "hunt" for the exact point of system failure, such as the specific gas limit that triggers a consensus stall.

10.2. Anomaly Detection and Root Cause Analysis

ML models can be trained on historical telemetry data to identify anomalous performance patterns that might indicate subtle bugs in the consensus layer or smart contract logic. Venkatesan and Rahayu [19] demonstrate that hybrid consensus-ML models can significantly improve the resilience of networks against adversarial stress conditions.

11. Case Studies: Stress Testing in Real-World Scenarios

In a case study involving a DeFi protocol launch on an EVM-compatible chain, a web-based stress tester was used to simulate 5,000 concurrent users. The test revealed a significant bottleneck in the smart contract’s state-update logic, which was corrected before the official launch, potentially preventing a large-scale loss of funds. Another case study focused on a Central Bank Digital Currency pilot using a private Hyperledger Fabric network. The goal was to reach 50,000 TPS, and the stress test identified that the primary bottleneck was the SSD I/O on the orderer nodes. Upgrading to NVMe drives eventually increased the throughput by 40%, demonstrating the value of hardware-specific stress testing.

11.1. GPK Fusion: NFT Card Game Tokenomics

A final case study involves GPK Fusion, an innovative system for generating and minting digital NFT cards for games. The GPK Fusion system employs a four-stage process—Initiation, GENE, PIXKOLOR, and FUSION—to create unique, rare, and strategically balanced NFTs. Stress testing such a system is critical due to the complex interplay between trait rarity, visual appeal, and market dynamics [12]. In a simulated deployment, the stress tester was used to evaluate the system’s ability to handle rapid minting requests while maintaining the uniqueness of each asset and preventing duplication or plagiarism. The results demonstrated that integrating AI-driven rarity algorithms and genetic algorithms [8] into the minting pipeline requires significant computational resources, highlighting the need for high-performance orchestration in the stress-testing framework.

Table 2. Performance Gains Observed in CBDC Pilot After Optimization

Configuration	Avg TPS	95th Latency (s)
Baseline (SATA SSD)	12,400	4.2
Optimized (NVMe)	42,800	1.8
Full Tuning (NVMe + 10Gbps)	54,200	0.9

12. Discussion and Comparative Insights

The evaluation of web-based stress testing frameworks reveals several critical trade-offs between accessibility, technical depth, and scalability. While tools like Hyperledger Caliper provide high extensibility and multi-chain support, they often require significant initial configuration, presenting a steeper learning curve for non-technical stakeholders. In contrast, platform-specific tools like Avalanche-Tester offer superior ease of use and "one-click" testing but lack the versatility required for cross-platform auditing.

12.1. Trade-Offs and Limitations

A primary challenge identified in this survey is the observer effect. The deployment of heavy web-based monitoring agents alongside blockchain nodes can consume significant CPU and memory resources, potentially skewing performance metrics. This overhead must be carefully balanced against the need for granular telemetry. Furthermore, while web dashboards improve the visualization of transient latency spikes, they often abstract away the underlying network-level causes, such as peer-to-peer gossip delays or packet loss.

12.2. Real-World Implementation Challenges

Simulating real-world internet conditions—such as geographically distributed nodes and intermittent network jitter—remains a significant hurdle. Most existing frameworks operate within controlled cloud environments (e.g., AWS or Azure), which may not accurately reflect the latency profiles of a truly decentralized mainnet. The integration of network emulation tools like Chaos Mesh or NetEm into the stress-testing pipeline is essential for achieving higher fidelity.

12.3. Identification of Research Gaps

Despite the maturation of benchmarking tools, several research gaps persist:

Cross-Chain Stress Protocols: There is a lack of standardized methodologies for stress testing interoperability bridges, which are increasingly becoming the primary targets for exploits.
Adversarial Load Modeling: Most frameworks focus on "honest" transaction loads. There is a need for tools that can simulate adversarial behaviors, such as front-running bots, sandwich attacks, and DDoS-style mempool flooding.
Resource-Constrained Testing: As blockchain extends to IoT and mobile devices, benchmarking must account for low-power environments where disk I/O and battery life are critical constraints.

13. Future Directions

The future of blockchain stress testing lies in its integration with the entire development lifecycle and the use of more sophisticated simulation techniques.

13.1. Standardized Cross-Chain Benchmarking

As the number of interconnected blockchains grows, there is an urgent need for a "Standardized Cross-Chain Stress Protocol." This would allow researchers to measure the throughput and security of cross-chain bridges under high load, which are currently the most vulnerable points in the Web3 ecosystem.

13.2. AI-Integrated Autonomous Testing

The manual configuration of stress tests will likely be replaced by autonomous agents that can continuously stress-test a network in the background, identifying performance regressions in real-time. This "Continuous Stress Integration" (CSI) will become a standard part of the DevOps pipeline for blockchain protocols.

13.3. Hardware-Aware and Energy-Efficient Benchmarking

With the increasing focus on sustainability, future frameworks will need to measure the energy consumption of different consensus mechanisms under stress. This will involve the integration of hardware-level power monitoring tools into the stress-testing telemetry stack.

13.4. Post-Quantum Stress Testing

As quantum computing matures, the industry must transition to post-quantum cryptographic algorithms. Stress testing these new algorithms is critical, as they often have larger key sizes and higher computational overhead, which could significantly impact transaction throughput and latency.

13.5. ZK-Proof Performance Benchmarking

With the rise of Zero-Knowledge (ZK) rollups, stress testing must expand to measure the time and cost of generating proofs. Current frameworks are ill-equipped to handle the heavy computational demands of ZK-provers.

13.6. Layer 2 and Sidechain Stress

Testing the interaction between Layer 1 and Layer 2 (e.g., withdrawal latency under L1 congestion) is a critical area for future research.

14. Conclusion

This survey has systematically explored the landscape of web-based stress testing frameworks for blockchain systems, highlighting their evolution from CLI-based audit tools to sophisticated, real-time orchestration platforms. We have demonstrated that while current frameworks like Hyperledger Caliper and BBSF provide deep technical insights, the industry is shifting toward more accessible, visual-heavy interfaces that integrate seamlessly with modern cloud-native monitoring stacks.

The practical impact of these frameworks extends significantly into the DevOps and enterprise sectors. By enabling high-fidelity performance validation before mainnet deployment, these tools can mitigate the risk of catastrophic network failures and financial losses in DeFi and CBDC systems. Furthermore, the modular architecture of web-based testers facilitates their integration into continuous integration and delivery (CI/CD) pipelines, ensuring that every protocol update is verified against extreme load.

Looking forward, the integration of AI-driven adaptive testing and standardized cross-chain protocols will be essential for the maturation of the Web3 ecosystem. As blockchain technology becomes foundational to global infrastructure, the development of robust, high-fidelity stress-testing environments will remain a critical frontier for both academic research and industrial innovation.

Acknowledgments

The author would like to thank the faculty at Vellore Institute of Technology AP for their guidance in understanding the nuances of distributed ledger technologies.

Appendix A. Detailed Metric Definitions

In this appendix, we provide formal definitions for the primary metrics used in blockchain stress testing.

Transaction Throughput (TPS): Defined as the total number of committed transactions divided by the total time of the test duration. $T P S = \frac{N_{committed}}{T_{end} - T_{start}}$ .
Transaction Latency: The time elapsed between the submission of a transaction and its inclusion in a block that has reached finality. $L = T_{finalized} - T_{submitted}$ .
Resource Utilization: The percentage of system resources (CPU, RAM, Disk, Network) consumed by a blockchain node during a stress test.
Error Rate: The percentage of transactions that failed to be processed due to timeouts, gas exhaustion, or consensus failures.

Appendix B. Sample Test Configuration (YAML)

The following is an example configuration file for Hyperledger Caliper, used to define a stress test for an Ethereum network.

Appendix C. Comparison of Consensus Algorithms Under Stress

Different consensus algorithms exhibit varying performance profiles when subjected to high transaction loads.

Proof of Work (PoW): Performance is generally limited by the block time and block size. Under stress, mempool congestion increases significantly, leading to high transaction fees and long wait times.
Proof of Stake (PoS): While more energy-efficient, PoS systems can experience increased latency if the validator set is large and the network experiences high jitter, leading to missed slots or epochs.
Practical Byzantine Fault Tolerance (PBFT): Highly performant in small, private networks but suffers from $O (n^{2})$ communication complexity, causing throughput to drop sharply as the number of nodes increases.

References

Kaushal, R.; Kumar, N. Exploring Hyperledger Caliper Benchmarking Tool to Measure the Performance of Blockchain Based Solutions. 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2024. [Google Scholar]
Touloupou, M.; Christodoulou, K.; Themistocleous, M. Validating the Blockchain Benchmarking Framework Through Controlled Deployments of XRPL and Ethereum. IEEE Access 2024, 12, 22264–22277. [Google Scholar] [CrossRef]
Billah, M.; et al. Performance Optimization in Multi-Machine Blockchain Systems: A Comprehensive Benchmarking Analysis. Journal of Business and Management Studies 2024. [Google Scholar] [CrossRef]
Shakila, M.; Anitha, L. Benchmarking Local Blockchain Frameworks for Online Voting System: Comparative Analysis of Truffle and Hardhat Across Diverse Transaction Loads. 2024 International Conference on Sustainable Communication Networks and Application (ICSCNA), 2024. [Google Scholar]
Ren, K.; et al. BBSF: Blockchain Benchmarking Standardized Framework. Proceedings of the 1st Workshop on Verifiable Database Systems, 2023. [Google Scholar]
Dinh, T. T. A.; et al. Untangling Blockchain: A Data Processing View of Blockchain Systems. IEEE Transactions on Knowledge and Data Engineering 2017, 30, 1366–1385. [Google Scholar] [CrossRef]
Wang, W.; et al. A Survey on Consensus Mechanisms and Mining Strategy Management in Blockchain Networks. IEEE Access 2018, 7, 22328–22370. [Google Scholar] [CrossRef]
Salah, K.; et al. Blockchain for AI: Review and Open Research Challenges. IEEE Access 2019, 7, 10127–10149. [Google Scholar] [CrossRef]
Hyperledger Community. Hyperledger Caliper: A Blockchain Benchmark Framework. Hyperledger Project Whitepapers 2018. [Google Scholar]
Pongnumkul, S.; et al. Performance Analysis of Private Blockchain Platforms in Comparison. Proceedings of the 2017 International Conference on Computer Science and Artificial Intelligence, 2017. [Google Scholar]
Kuzlu, M.; et al. Performance Analysis of a Hyperledger Fabric Blockchain Framework: Case Study for a Smart Grid Application. 2019 IEEE International Conference on Communications, Control, and Computing Technologies for SmartGrids (SmartGridComm), 2019. [Google Scholar]
Monrat, A. A.; et al. A Survey of Blockchain From the Perspectives of Applications, Challenges, and Opportunities. IEEE Access 2019, vol. 7, 117134–117151. [Google Scholar] [CrossRef]
Sharma, S.; et al. Performance Analysis of Hyperledger Fabric for IoT Applications. 2020 IEEE 6th World Forum on Internet of Things (WF-IoT), 2020. [Google Scholar]
Fan, C.; et al. Performance Evaluation of Blockchain Systems: A Survey. IEEE Access 2020, 8, 126919–126936. [Google Scholar] [CrossRef]
Dinh, T. T. A.; et al. BLOCKBENCH: A Framework for Analyzing Private Blockchains. Proceedings of the 2017 ACM International Conference on Management of Data, 2017. [Google Scholar]
Khan, M. M.; et al. Scalability and Efficiency Analysis of Hyperledger Fabric and Private Ethereum in Smart Contract Execution. Computers 2025, vol. 14(no. 4), 132. [Google Scholar] [CrossRef]
Ren, X.; et al. Paramart: Parallel Resource Allocation Based on Blockchain Sharding for Edge-Cloud Services. IEEE Transactions on Services Computing 2024, vol. 17, 1655–1669. [Google Scholar] [CrossRef]
Song, H.; Qu, Z.; Wei, Y. Advancing Blockchain Scalability: An Introduction to Layer 1 and Layer 2 Solutions. 2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE), 2024. [Google Scholar]
Venkatesan, K.; Rahayu, S. B. Blockchain security enhancement: an approach towards hybrid consensus algorithms and machine learning techniques. Scientific Reports 2024, vol. 14. [Google Scholar] [CrossRef] [PubMed]
Chua, J.; et al. AI Safety in Generative AI Large Language Models: A Survey. arXiv 2024, arXiv:2407.18369. [Google Scholar]

Figure 1. Modular multi-layer architecture of a web-based blockchain stress testing framework, illustrating the feedback loop between monitoring and orchestration.

Figure 2. Sequential processing pipeline of a high-performance workload generator, utilizing asynchronous dispatch to maximize throughput.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.