Reconstructing dynamic 3D scenes from sparse multi-view videos is highly ill-posed, often leading to geometric collapse, trajectory drift, and floating artifacts. Recent attempts introduce generative priors to hallucinate missing content, yet naive integration frequently causes structural drift and temporal inconsistency due to the mismatch between stochastic 2D generation and deterministic 3D geometry. In this paper, we propose GeoRect4D, a novel unified framework for sparse-view dynamic reconstruction that couples explicit 3D consistency with generative refinement via a closed-loop optimization process. Specifically, GeoRect4D introduces a degradation-aware feedback mechanism that incorporates a robust anchor-based dynamic 3DGS substrate with a single-step diffusion rectifier to hallucinate high-fidelity details. This rectifier utilizes a structural locking mechanism and spatiotemporal coordinated attention, effectively preserving physical plausibility while restoring missing content. Furthermore, we present a progressive optimization strategy that employs stochastic geometric purification to eliminate floaters and generative distillation to infuse texture details into the explicit representation. Extensive experiments demonstrate that GeoRect4D achieves state-of-the-art performance in reconstruction fidelity, perceptual quality, and spatiotemporal consistency across multiple datasets.
Overview of GeoRect4D. Left: Sparse inputs are adaptively decomposed into a static base and an anchor-controlled dynamic field to construct the base dynamic 3DGS model. Top Right: A degradation-aware, single-step diffusion prior synthesizes high-fidelity rectified views from coarse renderings. Bottom Right: A two-stage progressive optimization framework first applies geometric purification to stabilize the substrate, then performs generative distillation to seamlessly integrate hallucinated details with strict physical fidelity.
Qualitative Comparison 1: Visual comparison with baseline methods. GeoRect4D effectively suppresses erroneous floaters and recovers sharp structural boundaries and high-frequency details.
Qualitative Comparison 2: More Qualitative comparison on more sequences of the N3DV dataset.
Qualitative Comparison 3: More Qualitative comparison on more sequences of the MeetRoom dataset.
@misc{wu2026georect4dgeometrycompatiblegenerativerectification,
title={GeoRect4D: Geometry-Compatible Generative Rectification for Dynamic Sparse-View 3D Reconstruction},
author={Zhenlong Wu and Zihan Zheng and Xuanxuan Wang and Qianhe Wang and Hua Yang and Xiaoyun Zhang and Qiang Hu and Wenjun Zhang},
year={2026},
eprint={2604.20784},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.20784},
}