FB-4D:基于特征库的空间-时间相干动态三维内容生成
FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks
摘要 Abstract
随着扩散模型和三维生成技术的快速发展,动态三维内容生成已成为一个重要的研究领域。然而,实现具有强空间-时间一致性的高保真四维(动态三维)生成仍然是一项具有挑战性的任务。受最近研究表明预训练扩散特征能够捕获丰富的对应关系的启发,我们提出了FB-4D,这是一种新颖的四维生成框架,集成了特征库机制以增强生成帧的空间和时间一致性。在FB-4D中,我们将从先前帧中提取的特征存储起来,并将其融合到后续帧生成的过程中,确保时间和多个视图之间的一致特性。为确保紧凑的表示,特征库通过一种提出的动态合并机制进行更新。利用该特征库,我们首次证明了通过多次自回归迭代生成额外参考序列可以持续提高生成性能。实验结果表明,FB-4D在渲染质量、空间-时间一致性以及鲁棒性方面显著优于现有方法,大幅超越所有无调参的多视角生成方法,并且达到了与基于训练的方法相当的性能。
With the rapid advancements in diffusion models and 3D generation techniques, dynamic 3D content generation has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) generation with strong spatial-temporal consistency remains a challenging task. Inspired by recent findings that pretrained diffusion features capture rich correspondences, we propose FB-4D, a novel 4D generation framework that integrates a Feature Bank mechanism to enhance both spatial and temporal consistency in generated frames. In FB-4D, we store features extracted from previous frames and fuse them into the process of generating subsequent frames, ensuring consistent characteristics across both time and multiple views. To ensure a compact representation, the Feature Bank is updated by a proposed dynamic merging mechanism. Leveraging this Feature Bank, we demonstrate for the first time that generating additional reference sequences through multiple autoregressive iterations can continuously improve generation performance. Experimental results show that FB-4D significantly outperforms existing methods in terms of rendering quality, spatial-temporal consistency, and robustness. It surpasses all multi-view generation tuning-free approaches by a large margin and achieves performance on par with training-based methods.