编辑室：基于LLM参数化图扩散的可组合3D房间布局编辑

Research

arXiv

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Kaizhi Zheng ,

Xiaotong Chen ,

Xuehai He ,

Jing Gu ,

Linjie Li ,

Kevin Lin ,

Lijuan Wang ,

摘要 Abstract

面对专业3D软件陡峭的学习曲线以及管理大型3D资产所耗费的时间，语言引导的3D场景编辑在虚拟现实、增强现实和游戏等领域具有重要意义。然而，现有的语言引导3D场景编辑方法要么需要人工干预，要么仅限于外观修改，无法支持全面的场景布局更改。为了解决这一问题，我们提出了EditRoom，这是一种统一框架，可以通过自然语言命令执行多种布局编辑任务，而无需人工干预。具体而言，EditRoom利用大规模语言模型（LLMs）进行命令规划，并采用基于扩散的方法生成目标场景，支持六种编辑类型：旋转、平移、缩放、替换、添加和删除。针对语言引导3D场景编辑数据缺乏的问题，我们开发了一种自动管道来扩充现有的3D场景合成数据集，并引入了EditRoom-DB，这是一个包含83k个编辑对的大规模数据集，用于训练和评估。实验结果表明，我们的方法在所有指标上均优于其他基线方法，显示出更高的准确性和语言引导场景布局编辑的一致性。

Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. However, recent approaches to language-guided 3D scene editing either require manual interventions or focus only on appearance modifications without supporting comprehensive scene layout changes. In response, we propose EditRoom, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To address the lack of data for language-guided 3D scene editing, we have developed an automatic pipeline to augment existing 3D scene synthesis datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation. Our experiments demonstrate that our approach consistently outperforms other baselines across all metrics, indicating higher accuracy and coherence in language-guided scene layout editing.