1. Introduction
In large-scale entertainment venues such as stadiums or concert arenas, audiences increasingly wish to share their real-time experiences through live video streams captured on their smartphones. Conventional LTE networks, however, were not originally designed to support simultaneous uplink video transmission from hundreds of user devices within the same cell. When multiple users attempt to upload videos concurrently, uplink bandwidth contention and scheduling competition at the base station cause severe degradation in transmission quality, resulting in frame drops, high latency, and unstable video delivery.
Figure 1.
System Overview.
Figure 1.
System Overview.
To realize interactive and participatory viewing experiences in such high-density environments, it is essential to design a mechanism that dynamically regulates the number of simultaneous video transmitters while maintaining fairness and responsiveness. This paper proposes an adaptive user-generated video distribution system that addresses this challenge through a combination of clustered Selective Forwarding Unit (SFU) architecture and a priority-based transmission control mechanism.
The system enables spectators to send their user-generated videos over standard LTE networks. A central operator selects one of the incoming streams for real-time display on a large screen in the arena. To prevent network congestion, each device periodically reports its network statistics—such as transmission rate and stability metrics—which are used to compute a statistical weight. Devices are then prioritized according to these weights, and a PING–PONG round-trip time (RTT) probing procedure dynamically selects the devices with the lowest latency for transmission permission. This ensures that only a limited number of terminals are allowed to upload video at any given time, achieving stable operation under constrained wireless conditions.
The proposed system is built upon the WebRTC framework and its DataChannel mechanism [
1,
2], which provide lightweight and low-latency peer-to-peer communication capabilities. By combining adaptive device selection with clustered SFU-based media routing, the system maintains reliable user-generated video streaming even in congested LTE environments. This work contributes to the broader field of large-scale real-time multimedia systems by demonstrating a practical approach for scalable uplink control and fair media distribution across user devices.
2. System Characteristics and Network Environment
The key characteristics of the proposed user-generated video system can be summarized as follows:
2.1. Spiking Access Behavior in Dense Environments
The system exhibits a spiking access pattern—an instantaneous surge of simultaneous connections—particularly in enclosed environments such as arenas where a large number of user devices are geographically concentrated. As shown in
Figure 2, multiple Radio Units (RUs) deployed by different carriers (e.g., Docomo, au, SoftBank) must serve clusters of users within the same venue. When numerous users initiate video transmission at once, the limited uplink resources of LTE channels become highly congested, leading to severe contention and instability in video streaming.
2.2. Carrier-Level Independence and Load Imbalance
Each RU and its associated mobile core network are operated independently by different mobile network operators. Consequently, the distribution of users across carriers significantly influences the end-to-end video quality. As shown in
Figure 3, the Japanese mobile market is dominated by a few major operators, with user distribution varying by region and demographic. During traffic spikes, this uneven distribution leads to carrier-specific bottlenecks, where one operator’s LTE segment may become saturated while others remain underutilized. Thus, the proposed system must adapt to heterogeneous and independently managed network segments.
2.3. Device- and Browser-Level Variability
Since the system operates entirely on user devices through standard web browsers, variations in hardware performance, operating systems, and browser implementations directly affect transmission stability and visual output quality. These variations must be considered when designing adaptive mechanisms for real-time uplink control and quality assurance.
3. Baseline Network Latency Measurement
Before analyzing the spiking behavior during large-scale events, we first measured the steady-state round-trip time (RTT) between the operation site (Shibuya office) and multiple cloud regions. All measurements were conducted over WebRTC DataChannel (Reliability = ON) under non-congested conditions, using the same session establishment procedure as in the proposed system.
Eight representative network configurations were evaluated, including both wired (LAN) and wireless (Wi-Fi, LTE via MNO) environments.
Figure 4 illustrate the average ingress and egress latency per zone, while
Table 1 summarizes the observed RTTs.
Average RTT under stable (non-spiking) conditions
3.1. Observations
Within the Tokyo region (asia-northeast1), RTTs remained within the range of 4–7 ms for LAN/Wi-Fi and 40–43 ms for LTE networks, confirming stable local connectivity.
Between Tokyo and Mumbai (asia-south1), RTTs exceeded 130 ms on wired links and 170 ms on LTE, consistent with inter-regional propagation delay.
Across MNOs (Docomo, SoftBank, au), the intra-region latency differences were minimal (within ±3 ms), indicating that radio access and carrier core routing contribute comparably to RTT under stable conditions.
These baseline results demonstrate that, in the absence of uplink contention, the system’s WebRTC layer maintains predictable latency across heterogeneous network environments.These baseline results demonstrate that, in the absence of uplink contention, the system’s WebRTC layer maintains predictable latency across heterogeneous network environments.
4. System Design Overview
The proposed user-generated video system integrates several distinctive design features that enable adaptive, low-latency operation in large-scale LTE environments. As illustrated in
Figure 5,
the system employs a lightweight, unreliable DataChannel for real-time messaging between peers and services, avoiding conventional pub/sub or WebSocket architectures. This approach minimizes latency and resource overhead while ensuring rapid responsiveness even under constrained network conditions.
In addition, the system adopts a modular management layer with a
clean boot architecture (
Figure 6),
in which each component can independently initialize and operate without centralized dependencies. This simplifies orchestration, enhances resilience during scaling or recovery, and allows modules to be restarted or replicated seamlessly within a distributed environment.
Scalability is further realized through the
clustered SFU architecture (
Figure 7),
which allows horizontal expansion of forwarding units according to the number of connected terminals and available bandwidth.
This design provides flexibility to adapt to local wireless network constraints in large arenas while maintaining efficient routing of user-generated video streams.
Finally, to achieve stable and high-quality video selection, the system introduces a temporal weighting–based
channel recommendation mechanism (
Figure 8).
This mechanism estimates session reliability by analyzing ICE/DTLS channel status, latency variations, bitrate stability, and user engagement signals such as face detection values, thereby selecting the most responsive and reliable uplink sessions in real time.
Together, these architectural components form a cohesive, scalable, and adaptive system optimized for real-time user-generated video distribution over LTE networks.
5. Related Work
Recent studies have investigated the quality of WebRTC-based communication under emerging 5G networks. Among them, Nakazato et al. [
3] conducted an extensive field measurement of WebRTC over 5G: A Study of Remote Collaboration QoS in Mobile Environment, which provided comprehensive empirical data on the Quality of Service (QoS) characteristics of WebRTC over various radio environments including Sub6, mmWave, and repurposed 5G. Their work established a valuable foundation for understanding end-to-end latency, jitter, and handoff behavior in real-world 5G deployments, offering “baseline data” indispensable for subsequent system-level research.
In contrast, this Technical Note focuses on the upper operational layer built upon those network-level findings. Rather than evaluating radio performance itself, the proposed system addresses the practical realization of real-time operation, introducing clustered Selective Forwarding Unit (SFU) control and RTT-based uplink regulation mechanisms. These functions operate above the transport layer, managing adaptive video transmission among numerous terminals over LTE networks. Therefore, while Nakazato’s work serves as a quantitative foundation for understanding 5G QoS, this study extends the discussion to the control and orchestration layer that sustains end-to-end QoS across the “last one hop”, linking real-world user devices and clustered SFUs during live operation.
5.1. SDN-Based Scalable SFU Architectures
Recent advances in data-plane programmability have enabled hardware-assisted designs for large-scale video conferencing infrastructures. Michel et al. [
4] proposed
Scallop, an SDN-inspired Selective Forwarding Unit (SFU) that decouples the forwarding of media streams into a high-speed hardware data plane and a software control plane. By implementing packet replication, selective forwarding, and rate adaptation on an Intel Tofino-based programmable switch, Scallop achieved up to 210× higher scalability compared with software-based SFUs running on 32-core commodity servers, while reducing per-packet forwarding latency by a factor of 26. Although the Scallop architecture focuses primarily on offloading media processing to hardware for throughput and latency gains, its control model still assumes stable, high-bandwidth environments typical of data-center interconnects. In contrast, our system targets
edge-side operation over unpredictable wireless networks (e.g., LTE/5G uplinks) and emphasizes
adaptive cluster-level coordination and RTT-based transmission control to maintain end-to-end Quality of Service (QoS) in real-time user-generated video scenarios. Therefore, while Scallop and the proposed clustered SFU framework share the common goal of improving scalability, our approach operates at a different layer—closer to the
last one hop—where uplink contention and round-trip-time variations directly affect session stability. We thus regard Scallop’s work as complementary, representing a lower-level data-plane acceleration, whereas the present study contributes to the upper operational layer of SFU orchestration and adaptive uplink control.
5.2. Comparison with MEC Resource Allocation Surveys
A related perspective on network efficiency and resource management can be found in “Resource Allocation in Multi-access Edge Computing for 5G Networks: A Survey” by Sarah et al. [
5](Computer Networks, 2023). Their work systematically classifies and analyzes optimization methods for computational and radio resource allocation in MEC (Multi-access Edge Computing) systems. The survey focuses on how computing, storage, and network bandwidth are modeled as optimization variables under formal mathematical frameworks—such as mixed-integer programming, Lyapunov optimization, and reinforcement learning—to achieve system-wide efficiency in 5G URLLC, eMBB, and mMTC scenarios.
From a conceptual standpoint, both studies share the goal of maintaining service stability under limited network resources, but they differ fundamentally in scope and layer of abstraction:
5.2.1. Optimization Domain
The MEC resource-allocation survey addresses continuous optimization of shared computational and wireless resources among virtualized network entities (MEC hosts, VNFs, slices). Its methods assume predictable system parameters and rely on optimization solvers or learning-based schedulers for sustained efficiency. In contrast, our system operates at the application layer, above the network abstraction, where the available uplink bandwidth is not directly controllable. Instead of continuous allocation, our mechanism selects which devices should transmit at each moment, reflecting a discrete and spiking-access control problem rather than a convex optimization one.
5.2.2. Temporal Characteristics
The MEC optimization framework targets steady-state or slowly varying demand scenarios with known resource capacities. Conversely, the proposed Adaptive Clustered SFU System explicitly handles non-stationary, burst-type traffic—a condition typical of audience-driven live streaming over LTE. The system’s “who-can-transmit-now” decision logic is designed to stabilize short-term uplink volatility that conventional resource allocation cannot efficiently absorb.
5.2.3. Control Granularity
While MEC resource optimization focuses on allocating radio and compute resources among infrastructure nodes (e.g., gNB, MEH, NFV orchestrators), our architecture introduces a cross-layer selection mechanism at the application level. It leverages RTT-based measurement and statistical weighting to regulate video transmission across end-user terminals, independent of carrier-level scheduling.
5.2.4. System Objective
MEC frameworks aim for mathematical optimality in average resource utilization. Our SFU-based approach, on the other hand, seeks operational resilience under unpredictable contention—stabilizing service quality when optimal radio allocation is infeasible due to temporal spikes or opaque MNO control layers.
In summary, Sarah et al. address “how to allocate resources efficiently among networked nodes”, while the present study focuses on “which user should utilize limited uplink capacity at a given instant”. The two approaches are thus complementary: the former operates on infrastructure-level optimization of resource quantities, and the latter extends to application-level selection and behavioral adaptation in the face of stochastic, user-driven transmission surges.
5.3. Network-Centric vs Application-Centric QoE Optimization
Recent studies such as “Evaluation of Uplink Video Streaming QoE in 4G and 5G Cellular Networks Using Real-World Measurements” [
6] primarily address the network operator (MNO) perspective of Quality of Experience (QoE) optimization. Their approach focuses on measuring and predicting user-perceived QoE through large-scale real-world uplink experiments in 4G and 5G environments. The goal is to correlate QoE indicators (e.g., bitrate stability, frame loss, latency) with radio conditions and to derive models that enable network-side optimization. In other words, the study assumes that QoE can be improved by managing wireless resource allocation, scheduling, and congestion control within the operator’s infrastructure.
In contrast, our work (sfu-selfie.pdf) approaches QoE from an application-centric perspective, where the end-to-end behavior of the streaming system is directly controlled by the sending endpoint rather than the network. Instead of predicting QoE post hoc from network conditions, our method actively preserves QoE by dynamically adjusting application-layer transmission parameters (e.g., bitrate adaptation, encoder control, congestion-aware frame skipping) in real time. This approach targets environments where instantaneous bandwidth spikes and non-stationary uplink conditions are inherent to the system — conditions that cannot be fully compensated by wireless resource optimization alone.
While the MNO-centric approach is effective for long-term and large-scale optimization across users, the proposed system focuses on maintaining stable perceived quality for each individual session under unpredictable wireless fluctuations. Thus, the two studies are complementary: one enhances QoE through network-level prediction and resource management, whereas the other maintains QoE through application-level self-regulation resilient to transient and bursty uplink dynamics.
6. Conclusion
This paper presented a lightweight, adaptive user-generated video distribution system designed for large-scale arenas using standard LTE networks. To address the inherent challenge of limited wireless resources and contention among hundreds of user devices, a statistically weighted, priority-based transmission control mechanism was introduced, allowing only a subset of terminals to transmit video streams concurrently while maintaining network stability.
The system architecture leverages WebRTC’s unreliable DataChannel for low-latency messaging without additional cloud resources, combined with a clean boot and loosely coupled module design to simplify deployment and enable flexible clustered configurations. Through this modular approach, scalability up to 256× terminal capacity can be achieved via clustered SFUs, allowing the system to dynamically adapt to local network conditions at each venue.
Furthermore, this study introduces a machine learning–based channel recommendation framework that extracts time-windowed features from ICE/DTLS channel status, bitrate variation, and face detection data. Preliminary evaluations suggest that these features can be utilized to infer latency, responsiveness, stability, and user engagement, enabling more precise and seamless channel selection during live operation.
Overall, the proposed system demonstrates that combining clustered SFU architecture, lightweight real-time communication, and learning-based adaptive control enables scalable, responsive, and user-aware live video experiences in large public environments.
Author Contributions
Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing-Original Draft, Writing-Review and Editing, Visualization, Supervision, Project administration: D.Sugisawa.
Funding
This research received no external funding.
Conflicts of Interest
The author declares no conflicts of interest.
Use of Artificial Intelligence
The author used a large language model ChatGPT, OpenAI to assist in proofreading, grammar checking, and improving the clarity of expressions. The author reviewed and is responsible for the final content.
References
- R. Jesup, S. Loreto, and M. Tüxen, “WebRTC Data Channels,” RFC 8838, IETF, Jan. 2021.
- E. Rescorla, “WebRTC Data Channel Establishment Protocol,” RFC 8832, IETF, Jan. 2021.
- Jin Nakazato, Manabu Tsukada, Kousuke Nakagawa, Koki Ito, Romain Fontugne, … “WebRTC over 5G: A Study of Remote Collaboration QoS in Mobile Environment,” preprint, Research Square, 2023. [CrossRef]
- O. Michel, S. O. Michel, S. Sengupta, H. Kim, R. Netravali, and J. arXiv 2025, arXiv:2503.11649, 2025. [Google Scholar]
- S. Balasubramanian, M. Kist, M. Bennis, and A. Brunstrom, “Resource Allocation in Multi-access Edge Computing for 5G Networks: A Survey,” Computer Networks, vol. 230, pp. 109–128, 2023. [CrossRef]
- D. Chmieliauskas and Š. Paulikas, "Evaluation of Uplink Video Streaming QoE in 4G and 5G Cellular Networks Using Real-World Measurements," in IEEE Access, vol. 13, pp. 53996-54018, 2025, keywords: Quality of experience;Streaming media;Uplink;Measurement;Downlink;5G mobile communication;Bandwidth;Throughput;Real-time systems;Cellular networks;Cellular networks;QoE;real-time streaming;regression analysis;uplink throughput. [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).