交互式画板:一种支持协作视觉问题解决的多模态辅导系统
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving
摘要 Abstract
人类长期以来依赖草图和图表等视觉辅助工具来支持推理和问题解决。几何中的辅助线或微积分中的图形等可视化工具对于理解复杂概念至关重要。然而,许多辅导系统仍然是基于文本的,仅通过自然语言提供反馈。本文利用大型多模态模型(LMMs)的最新进展,介绍了交互式画板,这是一种结合基于语言的解释与交互式可视化功能的辅导系统,以提升学习效果。该系统基于预训练的LMM,并经过微调,能够在文本和视觉方面为学生提供逐步指导,实现自然的多模态互动。通过在推理过程中引入代码执行,准确且稳健的图表得以生成。针对几何、微积分和三角学等数学问题的用户研究表明,交互式画板能够提高任务理解、问题解决的准确性以及参与度,展示了其在教育技术转型中的潜力。所有代码可在以下网址获取:https://stevenshinechen.github.io/interactivesketchpad/。
Humans have long relied on visual aids like sketches and diagrams to support reasoning and problem-solving. Visual tools, like auxiliary lines in geometry or graphs in calculus, are essential for understanding complex ideas. However, many tutoring systems remain text-based, providing feedback only through natural language. Leveraging recent advances in Large Multimodal Models (LMMs), this paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. Built on a pre-trained LMM, Interactive Sketchpad is fine-tuned to provide step-by-step guidance in both text and visuals, enabling natural multimodal interaction with the student. Accurate and robust diagrams are generated by incorporating code execution into the reasoning process. User studies conducted on math problems such as geometry, calculus, and trigonometry demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels, highlighting its potential for transforming educational technologies. All code is available at: https://stevenshinechen.github.io/interactivesketchpad/.