Achieving seamless viewing of high-fidelity volumetric video, comparable to 2D video experiences, remains an open challenge. Existing volumetric video compression methods either lack the flexibility to adjust quality and bitrate within a single model for efficient streaming across diverse networks and devices, or struggle with real-time decoding and rendering on lightweight mobile platforms. To address these challenges, we introduce 4DGCPro, a novel hierarchical 4D Gaussian compression framework that facilitates real-time mobile decoding and high-quality rendering via progressive volumetric video streaming in a single bitstream. Specifically, we propose a perceptually-weighted and compression-friendly hierarchical 4D Gaussian representation with motion-aware adaptive grouping to reduce temporal redundancy, preserve coherence, and enable scalable multi-level detail streaming. Furthermore, we present an end-to-end entropy-optimized training scheme, which incorporates layer-wise rate-distortion (RD) supervision and attribute-specific entropy modeling for efficient bitstream generation. Extensive experiments show that 4DGCPro enables flexible quality and multiple bitrate within a single model, achieving real-time decoding and rendering on mobile devices while outperforming existing methods in RD performance across multiple datasets.
Left: Our method enables progressive streaming of hierarchical 4D Gaussians within a single bitstream, where incremental enhancement layers (e.g., +0.10MB) gradually improve visual quality (e.g., from 30.04dB to 31.18dB) with minimal bitrate overhead. Right: The streamed content is adaptively decoded and rendered in real-time on various devices (e.g., tablets 44FPS, desktops 58FPS) by dynamically selecting layers (L1-L6) based on available bandwidth and compute.
Our 4DGCPro framework. (a) Perceptually-weighted hierarchical 4D Gaussian representation models keyframes at multi-level detail for progressive reconstruction. (b) Hierarchical motion modeling decomposes dynamic scenes into rigid transformations and residual deformations based on the previous frame, while (c) motion-aware adaptive grouping dynamically adjusts to topological changes to enhance temporal consistency and reduce error accumulation. The entire pipeline is end-to-end optimized with layer-wise RD supervision and attribute-specific entropy modeling.
Analysis of group size and attributes distributions. (a) Large groups suffer from error accumulation while small groups exhibit data redundancy. (b) Keyframe Gaussian attributes display irregular spatial distributions, whereas (c) residual attributes follow Gaussian distributions.
Qualitative comparison on our 4DGCPro and HiFi4G datasets against ReRF, HPC, 3DGStream and VideoGS.
Gallery of our results. 4DGCPro delivers real-time high-fidelity rendering of scenes across challenging motions, such as “playing instruments”, “dancing” and “playing sports”.
Multi-bitrate results of our method under a single bitstream.