Dynamic4D: Enhancing Self-Supervised Learning for Robust and Fine-Grained 4D Point Cloud Video Understanding

Mingxuan Du; Yutian Zeng

doi:10.20944/preprints202603.1381.v1

Submitted:

16 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

The proliferation of 4D point cloud videos highlights their potential, but the high cost of obtaining large-scale annotated data severely limits supervised methods. Consequently, self-supervised learning (SSL) is vital for learning generalizable representations from unlabeled 4D data. While existing SSL frameworks, such as Uni4D, have made progress, they often struggle with fine-grained motion understanding in extremely dynamic scenes, maintaining robustness under severe occlusion, and developing explicit predictive capabilities. To address these, we propose Dynamic4D, a novel and robust self-supervised framework tailored for dynamic 4D point cloud understanding. Dynamic4D introduces an Adaptive Causal Temporal Attention (ACTA) mechanism in the encoder for explicit causal temporal modeling and dynamic region-focused learning. Its decoder employs Motion Prediction Tokens (MPT) to directly infer motion vectors for masked regions. A novel adaptive motion-sensitive masking strategy further enhances robustness by intelligently prioritizing high-dynamic zones. Our multi-objective pre-training strategy integrates a new Dynamic Perception Loss alongside geometric reconstruction and latent-space alignment. Extensive experiments on diverse challenging benchmarks demonstrate that Dynamic4D consistently achieves state-of-the-art performance. It substantially outperforms prior methods, validating its superior capacity to learn highly robust, generalizable, and motion-aware representations for complex dynamic 4D point cloud scenes.

Keywords:

4D point clouds

;

self-supervised learning

;

dynamic

;

robustness

;

motion prediction

Subject:

Computer Science and Mathematics - Computer Vision and Graphics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Dynamic4D: Enhancing Self-Supervised Learning for Robust and Fine-Grained 4D Point Cloud Video Understanding

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe