Adaptive Clustered SFU Architecture for User-Generated Video Transmission over LTE Networks

Daisuke Sugisawa

doi:10.20944/preprints202510.1414.v1

Submitted:

16 October 2025

Posted:

17 October 2025

You are already at the latest version

Abstract

This study presents an adaptive user-generated video distribution system designed for large-scale arenas, where audiences can transmit live videos from smartphones over standard LTE networks. When hundreds of users attempt simultaneous video transmission, the limited wireless bandwidth causes severe uplink contention, resulting in unstable and delayed streams. To address this issue, the proposed system employs a clustered Selective Forwarding Unit (SFU) architecture combined with a statistically weighted, priority-based transmission control mechanism. Each participating device reports statistical metrics such as transmission performance and network stability, after which a priority-sorted selection process identifies candidate transmitters. Subsequently, a PING–PONG round-trip time (RTT) probing procedure selects the devices with the lowest latency for video transmission. This approach maintains stable and efficient operation of the user-generated video system under congested LTE network conditions, enabling real-time display of selected videos on large venue screens.

Keywords:

WebRTC

;

SFU

;

LTE

;

adaptive streaming

;

resource contention

;

QoS

Subject:

Computer Science and Mathematics - Computer Networks and Communications

1. Introduction

In large-scale entertainment venues such as stadiums or concert arenas, audiences increasingly wish to share their real-time experiences through live video streams captured on their smartphones. Conventional LTE networks, however, were not originally designed to support simultaneous uplink video transmission from hundreds of user devices within the same cell. When multiple users attempt to upload videos concurrently, uplink bandwidth contention and scheduling competition at the base station cause severe degradation in transmission quality, resulting in frame drops, high latency, and unstable video delivery.

Figure 1. System Overview.

To realize interactive and participatory viewing experiences in such high-density environments, it is essential to design a mechanism that dynamically regulates the number of simultaneous video transmitters while maintaining fairness and responsiveness. This paper proposes an adaptive user-generated video distribution system that addresses this challenge through a combination of clustered Selective Forwarding Unit (SFU) architecture and a priority-based transmission control mechanism.

The system enables spectators to send their user-generated videos over standard LTE networks. A central operator selects one of the incoming streams for real-time display on a large screen in the arena. To prevent network congestion, each device periodically reports its network statistics—such as transmission rate and stability metrics—which are used to compute a statistical weight. Devices are then prioritized according to these weights, and a PING–PONG round-trip time (RTT) probing procedure dynamically selects the devices with the lowest latency for transmission permission. This ensures that only a limited number of terminals are allowed to upload video at any given time, achieving stable operation under constrained wireless conditions.

The proposed system is built upon the WebRTC framework and its DataChannel mechanism [1,2], which provide lightweight and low-latency peer-to-peer communication capabilities. By combining adaptive device selection with clustered SFU-based media routing, the system maintains reliable user-generated video streaming even in congested LTE environments. This work contributes to the broader field of large-scale real-time multimedia systems by demonstrating a practical approach for scalable uplink control and fair media distribution across user devices.

2. System Characteristics and Network Environment

The key characteristics of the proposed user-generated video system can be summarized as follows:

2.1. Spiking Access Behavior in Dense Environments

The system exhibits a spiking access pattern—an instantaneous surge of simultaneous connections—particularly in enclosed environments such as arenas where a large number of user devices are geographically concentrated. As shown in Figure 2, multiple Radio Units (RUs) deployed by different carriers (e.g., Docomo, au, SoftBank) must serve clusters of users within the same venue. When numerous users initiate video transmission at once, the limited uplink resources of LTE channels become highly congested, leading to severe contention and instability in video streaming.

2.2. Carrier-Level Independence and Load Imbalance

Each RU and its associated mobile core network are operated independently by different mobile network operators. Consequently, the distribution of users across carriers significantly influences the end-to-end video quality. As shown in Figure 3, the Japanese mobile market is dominated by a few major operators, with user distribution varying by region and demographic. During traffic spikes, this uneven distribution leads to carrier-specific bottlenecks, where one operator’s LTE segment may become saturated while others remain underutilized. Thus, the proposed system must adapt to heterogeneous and independently managed network segments.

2.3. Device- and Browser-Level Variability

Since the system operates entirely on user devices through standard web browsers, variations in hardware performance, operating systems, and browser implementations directly affect transmission stability and visual output quality. These variations must be considered when designing adaptive mechanisms for real-time uplink control and quality assurance.

3. Baseline Network Latency Measurement

Before analyzing the spiking behavior during large-scale events, we first measured the steady-state round-trip time (RTT) between the operation site (Shibuya office) and multiple cloud regions. All measurements were conducted over WebRTC DataChannel (Reliability = ON) under non-congested conditions, using the same session establishment procedure as in the proposed system.

Eight representative network configurations were evaluated, including both wired (LAN) and wireless (Wi-Fi, LTE via MNO) environments. Figure 4 illustrate the average ingress and egress latency per zone, while Table 1 summarizes the observed RTTs.

Average RTT under stable (non-spiking) conditions

3.1. Observations

Within the Tokyo region (asia-northeast1), RTTs remained within the range of 4–7 ms for LAN/Wi-Fi and 40–43 ms for LTE networks, confirming stable local connectivity.
Between Tokyo and Mumbai (asia-south1), RTTs exceeded 130 ms on wired links and 170 ms on LTE, consistent with inter-regional propagation delay.
Across MNOs (Docomo, SoftBank, au), the intra-region latency differences were minimal (within ±3 ms), indicating that radio access and carrier core routing contribute comparably to RTT under stable conditions.

These baseline results demonstrate that, in the absence of uplink contention, the system’s WebRTC layer maintains predictable latency across heterogeneous network environments.These baseline results demonstrate that, in the absence of uplink contention, the system’s WebRTC layer maintains predictable latency across heterogeneous network environments.

4. System Design Overview

The proposed user-generated video system integrates several distinctive design features that enable adaptive, low-latency operation in large-scale LTE environments. As illustrated in Figure 5,

the system employs a lightweight, unreliable DataChannel for real-time messaging between peers and services, avoiding conventional pub/sub or WebSocket architectures. This approach minimizes latency and resource overhead while ensuring rapid responsiveness even under constrained network conditions.

In addition, the system adopts a modular management layer with a clean boot architecture (Figure 6),

in which each component can independently initialize and operate without centralized dependencies. This simplifies orchestration, enhances resilience during scaling or recovery, and allows modules to be restarted or replicated seamlessly within a distributed environment.

Scalability is further realized through the clustered SFU architecture (Figure 7),

which allows horizontal expansion of forwarding units according to the number of connected terminals and available bandwidth.

This design provides flexibility to adapt to local wireless network constraints in large arenas while maintaining efficient routing of user-generated video streams.

Finally, to achieve stable and high-quality video selection, the system introduces a temporal weighting–based channel recommendation mechanism (Figure 8).

This mechanism estimates session reliability by analyzing ICE/DTLS channel status, latency variations, bitrate stability, and user engagement signals such as face detection values, thereby selecting the most responsive and reliable uplink sessions in real time.

Together, these architectural components form a cohesive, scalable, and adaptive system optimized for real-time user-generated video distribution over LTE networks.

5. Related Work

Recent studies have investigated the quality of WebRTC-based communication under emerging 5G networks. Among them, Nakazato et al. [3] conducted an extensive field measurement of WebRTC over 5G: A Study of Remote Collaboration QoS in Mobile Environment, which provided comprehensive empirical data on the Quality of Service (QoS) characteristics of WebRTC over various radio environments including Sub6, mmWave, and repurposed 5G. Their work established a valuable foundation for understanding end-to-end latency, jitter, and handoff behavior in real-world 5G deployments, offering “baseline data” indispensable for subsequent system-level research.

In contrast, this Technical Note focuses on the upper operational layer built upon those network-level findings. Rather than evaluating radio performance itself, the proposed system addresses the practical realization of real-time operation, introducing clustered Selective Forwarding Unit (SFU) control and RTT-based uplink regulation mechanisms. These functions operate above the transport layer, managing adaptive video transmission among numerous terminals over LTE networks. Therefore, while Nakazato’s work serves as a quantitative foundation for understanding 5G QoS, this study extends the discussion to the control and orchestration layer that sustains end-to-end QoS across the “last one hop”, linking real-world user devices and clustered SFUs during live operation.

5.1. SDN-Based Scalable SFU Architectures

Recent advances in data-plane programmability have enabled hardware-assisted designs for large-scale video conferencing infrastructures. Michel et al. [4] proposed Scallop, an SDN-inspired Selective Forwarding Unit (SFU) that decouples the forwarding of media streams into a high-speed hardware data plane and a software control plane. By implementing packet replication, selective forwarding, and rate adaptation on an Intel Tofino-based programmable switch, Scallop achieved up to 210× higher scalability compared with software-based SFUs running on 32-core commodity servers, while reducing per-packet forwarding latency by a factor of 26. Although the Scallop architecture focuses primarily on offloading media processing to hardware for throughput and latency gains, its control model still assumes stable, high-bandwidth environments typical of data-center interconnects. In contrast, our system targets edge-side operation over unpredictable wireless networks (e.g., LTE/5G uplinks) and emphasizes adaptive cluster-level coordination and RTT-based transmission control to maintain end-to-end Quality of Service (QoS) in real-time user-generated video scenarios. Therefore, while Scallop and the proposed clustered SFU framework share the common goal of improving scalability, our approach operates at a different layer—closer to the last one hop—where uplink contention and round-trip-time variations directly affect session stability. We thus regard Scallop’s work as complementary, representing a lower-level data-plane acceleration, whereas the present study contributes to the upper operational layer of SFU orchestration and adaptive uplink control.

5.2. Comparison with MEC Resource Allocation Surveys

A related perspective on network efficiency and resource management can be found in “Resource Allocation in Multi-access Edge Computing for 5G Networks: A Survey” by Sarah et al. [5](Computer Networks, 2023). Their work systematically classifies and analyzes optimization methods for computational and radio resource allocation in MEC (Multi-access Edge Computing) systems. The survey focuses on how computing, storage, and network bandwidth are modeled as optimization variables under formal mathematical frameworks—such as mixed-integer programming, Lyapunov optimization, and reinforcement learning—to achieve system-wide efficiency in 5G URLLC, eMBB, and mMTC scenarios.

From a conceptual standpoint, both studies share the goal of maintaining service stability under limited network resources, but they differ fundamentally in scope and layer of abstraction:

5.2.1. Optimization Domain

The MEC resource-allocation survey addresses continuous optimization of shared computational and wireless resources among virtualized network entities (MEC hosts, VNFs, slices). Its methods assume predictable system parameters and rely on optimization solvers or learning-based schedulers for sustained efficiency. In contrast, our system operates at the application layer, above the network abstraction, where the available uplink bandwidth is not directly controllable. Instead of continuous allocation, our mechanism selects which devices should transmit at each moment, reflecting a discrete and spiking-access control problem rather than a convex optimization one.

5.2.2. Temporal Characteristics

The MEC optimization framework targets steady-state or slowly varying demand scenarios with known resource capacities. Conversely, the proposed Adaptive Clustered SFU System explicitly handles non-stationary, burst-type traffic—a condition typical of audience-driven live streaming over LTE. The system’s “who-can-transmit-now” decision logic is designed to stabilize short-term uplink volatility that conventional resource allocation cannot efficiently absorb.

5.2.3. Control Granularity

While MEC resource optimization focuses on allocating radio and compute resources among infrastructure nodes (e.g., gNB, MEH, NFV orchestrators), our architecture introduces a cross-layer selection mechanism at the application level. It leverages RTT-based measurement and statistical weighting to regulate video transmission across end-user terminals, independent of carrier-level scheduling.

5.2.4. System Objective

MEC frameworks aim for mathematical optimality in average resource utilization. Our SFU-based approach, on the other hand, seeks operational resilience under unpredictable contention—stabilizing service quality when optimal radio allocation is infeasible due to temporal spikes or opaque MNO control layers.

In summary, Sarah et al. address “how to allocate resources efficiently among networked nodes”, while the present study focuses on “which user should utilize limited uplink capacity at a given instant”. The two approaches are thus complementary: the former operates on infrastructure-level optimization of resource quantities, and the latter extends to application-level selection and behavioral adaptation in the face of stochastic, user-driven transmission surges.

5.3. Network-Centric vs Application-Centric QoE Optimization

Recent studies such as “Evaluation of Uplink Video Streaming QoE in 4G and 5G Cellular Networks Using Real-World Measurements” [6] primarily address the network operator (MNO) perspective of Quality of Experience (QoE) optimization. Their approach focuses on measuring and predicting user-perceived QoE through large-scale real-world uplink experiments in 4G and 5G environments. The goal is to correlate QoE indicators (e.g., bitrate stability, frame loss, latency) with radio conditions and to derive models that enable network-side optimization. In other words, the study assumes that QoE can be improved by managing wireless resource allocation, scheduling, and congestion control within the operator’s infrastructure.

In contrast, our work (sfu-selfie.pdf) approaches QoE from an application-centric perspective, where the end-to-end behavior of the streaming system is directly controlled by the sending endpoint rather than the network. Instead of predicting QoE post hoc from network conditions, our method actively preserves QoE by dynamically adjusting application-layer transmission parameters (e.g., bitrate adaptation, encoder control, congestion-aware frame skipping) in real time. This approach targets environments where instantaneous bandwidth spikes and non-stationary uplink conditions are inherent to the system — conditions that cannot be fully compensated by wireless resource optimization alone.

While the MNO-centric approach is effective for long-term and large-scale optimization across users, the proposed system focuses on maintaining stable perceived quality for each individual session under unpredictable wireless fluctuations. Thus, the two studies are complementary: one enhances QoE through network-level prediction and resource management, whereas the other maintains QoE through application-level self-regulation resilient to transient and bursty uplink dynamics.

6. Conclusion

This paper presented a lightweight, adaptive user-generated video distribution system designed for large-scale arenas using standard LTE networks. To address the inherent challenge of limited wireless resources and contention among hundreds of user devices, a statistically weighted, priority-based transmission control mechanism was introduced, allowing only a subset of terminals to transmit video streams concurrently while maintaining network stability.

Figure 9. Module Blocks.

The system architecture leverages WebRTC’s unreliable DataChannel for low-latency messaging without additional cloud resources, combined with a clean boot and loosely coupled module design to simplify deployment and enable flexible clustered configurations. Through this modular approach, scalability up to 256× terminal capacity can be achieved via clustered SFUs, allowing the system to dynamically adapt to local network conditions at each venue.

Furthermore, this study introduces a machine learning–based channel recommendation framework that extracts time-windowed features from ICE/DTLS channel status, bitrate variation, and face detection data. Preliminary evaluations suggest that these features can be utilized to infer latency, responsiveness, stability, and user engagement, enabling more precise and seamless channel selection during live operation.

Overall, the proposed system demonstrates that combining clustered SFU architecture, lightweight real-time communication, and learning-based adaptive control enables scalable, responsive, and user-aware live video experiences in large public environments.

Author Contributions

Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing-Original Draft, Writing-Review and Editing, Visualization, Supervision, Project administration: D.Sugisawa.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflicts of interest.

Use of Artificial Intelligence

The author used a large language model ChatGPT, OpenAI to assist in proofreading, grammar checking, and improving the clarity of expressions. The author reviewed and is responsible for the final content.

References

R. Jesup, S. Loreto, and M. Tüxen, “WebRTC Data Channels,” RFC 8838, IETF, Jan. 2021.
E. Rescorla, “WebRTC Data Channel Establishment Protocol,” RFC 8832, IETF, Jan. 2021.
Jin Nakazato, Manabu Tsukada, Kousuke Nakagawa, Koki Ito, Romain Fontugne, … “WebRTC over 5G: A Study of Remote Collaboration QoS in Mobile Environment,” preprint, Research Square, 2023. [CrossRef]
O. Michel, S. O. Michel, S. Sengupta, H. Kim, R. Netravali, and J. arXiv 2025, arXiv:2503.11649, 2025. [Google Scholar]
S. Balasubramanian, M. Kist, M. Bennis, and A. Brunstrom, “Resource Allocation in Multi-access Edge Computing for 5G Networks: A Survey,” Computer Networks, vol. 230, pp. 109–128, 2023. [CrossRef]
D. Chmieliauskas and Š. Paulikas, "Evaluation of Uplink Video Streaming QoE in 4G and 5G Cellular Networks Using Real-World Measurements," in IEEE Access, vol. 13, pp. 53996-54018, 2025, keywords: Quality of experience;Streaming media;Uplink;Measurement;Downlink;5G mobile communication;Bandwidth;Throughput;Real-time systems;Cellular networks;Cellular networks;QoE;real-time streaming;regression analysis;uplink throughput. [CrossRef]

Figure 2. Radio Unit Distribution and User Clustering in the Arena.

Figure 3. Mobile Network Operator Share in Japan (including MVNOs), October 2024.

Figure 4. Ingress/Egress Latency by Zone.

Figure 5. Real Time Messaging Overview.

Figure 6. Clean Boot.

Figure 7. Clustered.

Figure 8. Temporal Weigthing

Table 1. Average RTT under stable (non-spiking) conditions.

Connection Type	Network	Cloud Region	Avg. RTT [ms]
LAN	Local office	asia-northeast1	4
LAN	Local office	asia-south1	134
Wi-Fi	Local office	asia-northeast1	7
Wi-Fi	Local office	asia-south1	138
LTE	Docomo	asia-northeast1	43
LTE	SoftBank	asia-northeast1	40
LTE	SoftBank	asia-south1	172
LTE	au (KDDI)	asia-northeast1	40
LTE	au (KDDI)	asia-south1	170

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.