大语言模型驱动的实例模型生成

Research

arXiv

大语言模型驱动的实例模型生成

LLM-enabled Instance Model Generation

Long Wen ,

摘要 Abstract

在基于模型的工程领域，模型是系统设计和分析的重要组成部分。传统上，这些模型的创建是一个需要深厚建模专业知识以及大量目标系统领域知识的手动过程。随着生成式人工智能的快速发展，大型语言模型（LLMs）在自动化模型生成方面展现出潜力。本文探索了利用LLMs生成实例模型的方法，特别关注从Ecore元模型和自然语言规范生成基于XMI的实例模型。研究发现，当前的LLMs难以直接生成有效的XMI模型。为了解决这一问题，我们提出了一种两步方法：首先，使用LLMs生成包含所有必要实例模型信息的简化结构化输出，即概念实例模型；然后，将此中间表示编译为有效的XMI文件。概念实例模型是格式无关的，可以通过不同的编译器转换为各种建模格式。实验结果表明，该方法显著提高了LLMs在实例模型生成任务中的可用性。值得注意的是，在所提出的框架内，较小的开源模型Llama 3.1 70B展示了与专有GPT模型相当的性能。

In the domain of model-based engineering, models are essential components that enable system design and analysis. Traditionally, the creation of these models has been a manual process requiring not only deep modeling expertise but also substantial domain knowledge of target systems. With the rapid advancement of generative artificial intelligence, large language models (LLMs) show potential for automating model generation. This work explores the generation of instance models using LLMs, focusing specifically on producing XMI-based instance models from Ecore metamodels and natural language specifications. We observe that current LLMs struggle to directly generate valid XMI models. To address this, we propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, namely a conceptual instance model, and then compiling this intermediate representation into a valid XMI file. The conceptual instance model is format-independent, allowing it to be transformed into various modeling formats via different compilers. The feasibility of the proposed method has been demonstrated using several LLMs, including GPT-4o, o1-preview, Llama 3.1 (8B and 70B). Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks. Notably, the smaller open-source model, Llama 3.1 70B, demonstrated performance comparable to proprietary GPT models within the proposed framework.