3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences. However, the vast number of Gaussians and their associated attributes poses significant challenges for storage and transmission. Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy. To address this gap, we propose 4DGC, a novel rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV. Specifically, 4DGC introduces a motion-aware dynamic Gaussian representation that utilizes a compact motion grid combined with sparse compensated Gaussians to exploit inter-frame similarities. This representation effectively handles large motions, preserving quality and reducing temporal redundancy. Furthermore, we present an end-to-end compression scheme that employs differentiable quantization and a tiny implicit entropy model to compress the motion grid and compensated Gaussians efficiently. The entire framework is jointly optimized using a rate-distortion trade-off. Extensive experiments demonstrate that 4DGC supports variable bitrates and consistently outperforms existing methods in RD performance across multiple datasets.
Left: 4DGC results, showcasing flexible quality levels across various bitrates. Middle: Comparison of visual quality and bitrate with state-of-the-art methods. Right: The RD performance of our approach surpasses that of prior work (e.g. 3DGStream, ReRF, TeTriRF).
Illustration of the 4DGC Framework. The reconstructed Gaussians from the previous frame are retrieved from the reference buffer and combined with the input images of the current frame to facilitate learning of the motion grid and the compensated Gaussians through a two-stage training process. In the first stage, the motion grid and its associated entropy model are optimized. In the second stage, the compensated Gaussians are refined along with their corresponding entropy model. Both stages are supervised by a rate-distortion trade-off, employing simulated quantization and an entropy model to jointly optimize representation and compression.
Illustration of our motion-aware dynamic Gaussian modeling that utilizes a multi-resolution motion grid with sparse compensated Gaussians to exploit inter-frame similarities.
We present a qualitative comparison with ReRF, TeTriRF, and 3DGStream on the coffee martini sequence from the N3DV dataset and the trimming sequence from the MeetRoom dataset, as shown in the figure. Our approach achieves comparable reconstruction quality to 3DGStream at a substantially lower bitrate, achieving a compression rate exceeding 16×. Compared to ReRF and TeTriRF, our 4DGC more effectively preserves finer details—such as the head, window, bottles, and books in coffee martini, and the face, hand, plant, and scissor in trimming—which are lost in the reconstructions of these two methods. This demonstrates that our 4DGC accurately captures dynamic scene elements and maintains high-quality detail in intricate objects while achieving a highly compact model size.