开放、轻量、繁琐——评估Llama 3.2 3B对编程练习反馈的效果

Research

arXiv

Open, Small, Rigmarole -- Evaluating Llama 3.2 3B's Feedback for Programming Exercises

Imen Azaiz ,

摘要 Abstract

大型语言模型（LLMs）在过去几年受到了广泛研究，尤其是在LLMs为大学初学者生成形成性编程反馈的潜力方面。与基于LLMs的生成式人工智能（GenAI）工具（如GPT）相比，较小且开放的模型受到的关注较少。然而，这些较小的开放模型具有许多优势，例如教育者可以在虚拟机或个人计算机上运行它们，这有助于规避其他GenAI工具和LLMs的一些主要问题（例如数据保护、对更改缺乏控制以及隐私问题）。因此，本研究探讨了开放、轻量级LLM Llama 3.2（3B）的反馈特性。我们特别研究了该模型对真实学生提交的Java入门编程练习解答的响应。通过定性分析生成的输出，以评估反馈的质量、内容、结构及其他特征。结果提供了对该开放、小型LLM反馈能力及其严重不足之处的全面概述。此外，我们在先前关于LLMs的研究背景下讨论了这些发现，并为基准测试最近可用的GenAI工具及其对编程初学者的反馈做出了贡献。这项工作对试图利用各种形式的LLMs（包括开放、小型模型）生成形成性反馈和支持学习的教育工作者、学习者和工具开发者具有重要意义。

Large Language Models (LLMs) have been subject to extensive research in the past few years. This is particularly true for the potential of LLMs to generate formative programming feedback for novice learners at university. In contrast to Generative AI (GenAI) tools based on LLMs, such as GPT, smaller and open models have received much less attention. Yet, they offer several benefits, as educators can let them run on a virtual machine or personal computer. This can help circumvent some major concerns applicable to other GenAI tools and LLMs (e. g., data protection, lack of control over changes, privacy). Therefore, this study explores the feedback characteristics of the open, lightweight LLM Llama 3.2 (3B). In particular, we investigate the models' responses to authentic student solutions to introductory programming exercises written in Java. The generated output is qualitatively analyzed to help evaluate the feedback's quality, content, structure, and other features. The results provide a comprehensive overview of the feedback capabilities and serious shortcomings of this open, small LLM. We further discuss the findings in the context of previous research on LLMs and contribute to benchmarking recently available GenAI tools and their feedback for novice learners of programming. Thereby, this work has implications for educators, learners, and tool developers attempting to utilize all variants of LLMs (including open, and small models) to generate formative feedback and support learning.