VideoHandles：利用视频生成先验编辑视频中3D对象组成

Research

arXiv

VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors

Juil Koo ,

Paul Guerrero ,

Chun-Hao Paul Huang ,

Duygu Ceylan ,

Minhyuk Sung

论文信息在线阅读PDF

摘要 Abstract

图像和视频编辑的生成方法利用生成模型作为先验，在信息不完整的情况下进行编辑，例如改变单个图像中显示的3D对象的组成。近期的方法在图像设置下展示了有前景的组成编辑结果，但在视频设置下，编辑方法主要集中在编辑对象的外观和运动或相机运动上，因此仍缺乏用于编辑视频中对象组成的解决方案。我们提出了\name方法，用于编辑静态场景视频中具有相机运动的3D对象组成。我们的方法允许以时间一致性的方式编辑视频中所有帧中的3D对象位置。这是通过将生成模型的中间特征提升到共享于所有帧之间的3D重建，编辑该重建，并将特征投影回每个帧上的编辑重建来实现的。据我们所知，这是首个用于编辑视频中对象组成的生成方法。我们的方法简单且无需训练，同时优于最先进的图像编辑基准。

Generative methods for image and video editing use generative models as priors to perform edits despite incomplete information, such as changing the composition of 3D objects shown in a single image. Recent methods have shown promising composition editing results in the image setting, but in the video setting, editing methods have focused on editing object's appearance and motion, or camera motion, and as a result, methods to edit object composition in videos are still missing. We propose \name as a method for editing 3D object compositions in videos of static scenes with camera motion. Our approach allows editing the 3D position of a 3D object across all frames of a video in a temporally consistent manner. This is achieved by lifting intermediate features of a generative model to a 3D reconstruction that is shared between all frames, editing the reconstruction, and projecting the features on the edited reconstruction back to each frame. To the best of our knowledge, this is the first generative approach to edit object compositions in videos. Our approach is simple and training-free, while outperforming state-of-the-art image editing baselines.