Preprint
Communication

This version is not peer-reviewed.

StrataServe: Hierarchical HBM–DRAM–SSD Parameter Serving for Distributed AI

Submitted:

15 January 2026

Posted:

15 January 2026

You are already at the latest version

Abstract
This paper presents a distributed AI training system that pools GPU high-bandwidth memory, host DRAM, and SSD into a coordinated parameter-serving hierarchy to support multiterabyte, sparsity-dominated deep models without sharing raw features across machines. The design shards and caches only the working parameters in GPU memory via multi-GPU hash tables, communicates intra-node over NVLink, and performs inter-node synchronization using RDMA-backed collective updates to preserve convergence under data parallelism. A four-stage pipeline overlaps network transfers, SSD I/O, CPU partitioning, and GPU compute while file-level compaction mitigates I/O amplification, yielding high throughput without inflating latency at scale. On industrial click-through-rate workloads with multi-terabyte embeddings, the system outperforms a large in-memory CPU cluster while maintaining production-grade accuracy, improving both training speed and price-performance for distributed AI. Overall, the architecture offers a pragmatic blueprint for scaling distributed learning through memory-hierarchy co-design and communication-aware parameter serving rather than brute-force cluster expansion.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated