单目视频动态场景的前馈子弹时间重建

Research

arXiv

单目视频动态场景的前馈子弹时间重建

Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos

Jiawei Ren ,

Ziwei Liu ,

Huan Ling ,

Zan Gojcic ,

Jiahui Huang

论文信息在线阅读PDF

摘要 Abstract

静态前馈场景重建近期在高质量新视图合成方面取得了显著进展。然而，这些模型往往难以在多样化环境中保持泛化能力，并且无法有效处理动态内容。我们提出了BTimer（BulletTimer的简称），这是首个具备运动感知的前馈模型，用于实时动态场景的重建和新视图合成。我们的方法通过聚合所有上下文帧的信息，在给定的目标（“子弹”）时间戳处以3D高斯点绘表示重构整个场景。这种公式化方式使BTimer能够利用静态和动态场景数据集，从而获得可扩展性和泛化能力。给定随意的单目动态视频，BTimer在150毫秒内重构子弹时间场景，其性能在静态和动态场景数据集上达到最先进水平，甚至优于基于优化的方法。

Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.