Matrix3D:面向大型摄影测量模型的一站式解决方案

Matrix3D: Large Photogrammetry Model All-in-One

摘要 Abstract

我们介绍了Matrix3D,这是一个统一的模型,仅使用相同的模型即可执行多种摄影测量子任务,包括姿态估计、深度预测和新颖视图合成。Matrix3D利用多模态扩散变压器(DiT)在图像、相机参数和深度图等多种模态之间集成转换。Matrix3D实现大规模多模态训练的关键在于引入了掩码学习策略,这使得即使在部分完整数据(如图像-姿态和图像-深度对的双模态数据)的情况下也能进行全模态模型训练,从而显著增加了可用的训练数据池。Matrix3D在姿态估计和新颖视图合成任务中展示了最先进的性能。此外,它通过多轮交互提供细粒度控制,成为三维内容创建的一项创新工具。项目页面:https://nju-3dv.github.io/projects/matrix3d。

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D's large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.