Matrix3D：面向大型摄影测量模型的一站式解决方案

Research

arXiv

Matrix3D: Large Photogrammetry Model All-in-One

Yuanxun Lu ,

Jingyang Zhang ,

Tian Fang ,

Jean-Daniel Nahmias ,

Yanghai Tsin ,

Long Quan ,

Xun Cao ,

Yao Yao ,

Shiwei Li

论文信息在线阅读PDF

摘要 Abstract

我们介绍了Matrix3D，这是一个统一的模型，仅使用相同的模型即可执行多种摄影测量子任务，包括姿态估计、深度预测和新颖视图合成。Matrix3D利用多模态扩散变压器（DiT）在图像、相机参数和深度图等多种模态之间集成转换。Matrix3D实现大规模多模态训练的关键在于引入了掩码学习策略，这使得即使在部分完整数据（如图像-姿态和图像-深度对的双模态数据）的情况下也能进行全模态模型训练，从而显著增加了可用的训练数据池。Matrix3D在姿态估计和新颖视图合成任务中展示了最先进的性能。此外，它通过多轮交互提供细粒度控制，成为三维内容创建的一项创新工具。项目页面：https://nju-3dv.github.io/projects/matrix3d。

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D's large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.