ChatGarment:基于大型语言模型的服装估计、生成与编辑
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
摘要 Abstract
我们引入了一种名为ChatGarment的新方法,该方法利用大型视觉-语言模型(VLMs),实现从图像或文本描述自动进行3D服装的估计、生成和编辑。与以往在现实场景中表现不佳或缺乏交互式编辑功能的方法不同,ChatGarment可以从野外图像或草图中估计缝制图案,从文本描述中生成缝制图案,并根据用户指令编辑服装,所有这些都通过交互对话完成。这些缝制图案随后可以披挂在3D人体上并进行动画处理。这通过微调VLM直接生成包含服装类型和样式文本描述以及连续数值属性的JSON文件实现。然后,这个JSON文件用于通过编程参数化模型创建缝制图案。为此,我们通过扩展现有编程模型GarmentCode的服装类型覆盖范围并简化其结构以提高VLM微调效率对其进行改进。此外,我们通过自动化数据流水线构建了一个大规模的图像到缝制图案和文本到缝制图案的数据对数据集。广泛的评估表明,ChatGarment能够准确地从多模态输入中重建、生成和编辑服装,展示了其在时尚和游戏应用中简化工作流程的潜力。代码和数据可在https://chatgarment.github.io/ 获取。
We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions, all within an interactive dialogue. These sewing patterns can then be draped on a 3D body and animated. This is achieved by finetuning a VLM to directly generate a JSON file that includes both textual descriptions of garment types and styles, as well as continuous numerical attributes. This JSON file is then used to create sewing patterns through a programming parametric model. To support this, we refine the existing programming model, GarmentCode, by expanding its garment type coverage and simplifying its structure for efficient VLM fine-tuning. Additionally, we construct a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs through an automated data pipeline. Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to simplify workflows in fashion and gaming applications. Code and data are available at https://chatgarment.github.io/ .