编辑室:基于LLM参数化图扩散的可组合3D房间布局编辑

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

摘要 Abstract

面对专业3D软件陡峭的学习曲线以及管理大型3D资产所耗费的时间,语言引导的3D场景编辑在虚拟现实、增强现实和游戏等领域具有重要意义。然而,现有的语言引导3D场景编辑方法要么需要人工干预,要么仅限于外观修改,无法支持全面的场景布局更改。为了解决这一问题,我们提出了EditRoom,这是一种统一框架,可以通过自然语言命令执行多种布局编辑任务,而无需人工干预。具体而言,EditRoom利用大规模语言模型(LLMs)进行命令规划,并采用基于扩散的方法生成目标场景,支持六种编辑类型:旋转、平移、缩放、替换、添加和删除。针对语言引导3D场景编辑数据缺乏的问题,我们开发了一种自动管道来扩充现有的3D场景合成数据集,并引入了EditRoom-DB,这是一个包含83k个编辑对的大规模数据集,用于训练和评估。实验结果表明,我们的方法在所有指标上均优于其他基线方法,显示出更高的准确性和语言引导场景布局编辑的一致性。

Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming. However, recent approaches to language-guided 3D scene editing either require manual interventions or focus only on appearance modifications without supporting comprehensive scene layout changes. In response, we propose EditRoom, a unified framework capable of executing a variety of layout edits through natural language commands, without requiring manual intervention. Specifically, EditRoom leverages Large Language Models (LLMs) for command planning and generates target scenes using a diffusion-based method, enabling six types of edits: rotate, translate, scale, replace, add, and remove. To address the lack of data for language-guided 3D scene editing, we have developed an automatic pipeline to augment existing 3D scene synthesis datasets and introduced EditRoom-DB, a large-scale dataset with 83k editing pairs, for training and evaluation. Our experiments demonstrate that our approach consistently outperforms other baselines across all metrics, indicating higher accuracy and coherence in language-guided scene layout editing.

编辑室:基于LLM参数化图扩散的可组合3D房间布局编辑 - arXiv